CS 220 Databases Relational Model
78 Slides2.05 MB
CS 220 Databases Relational Model
START RECORDING 1.2
Attendance Quiz: Cardinality Ratios Specify the cardinality ratios for the following entities: Entity 1 Entity 2 M N Student Social Security Number 1 1 Student Teacher ? ? Classroom Wall ? ? Country Current President ? ? Course Textbook ? ? Item (that can be found in an order) Order ? ? Student Class ? ? Class Instructor ? ? Instructor Office ? ? eBay Auction Item eBay Bid ? ? Entity 1 1.3 M N Entity 2
Chapter 5 Outline Relational Model Concepts Relational Model Constraints and Relational Database Schemas Update Operations, Transactions, and Dealing with Constraint Violations 1.4
Review-the big picture Data Modelling E-R Relational Query Languages SQL Relational Algebra Relational Calculus Data retrieval Storing Data Data Organization File Indexes Buffer Pool Management Query Optimization External Sorting Join Algorithms Query Plans, Cost Estimation Data Integrity 1.5
Steps in Database Design Requirements Analysis user needs; what must database do? Conceptual Design high level descr (often done w/ER model) Logical Design translate ER into DBMS data model Schema Refinement consistency, normalization Physical Design indexes, disk layout Security Design who accesses what, and how 1.6
Review E/R Model: since name ssn did lot Employees dname Works In budget Departments Entities, relationships, attributes Cardinalities: 1:1, 1:n, m:1, m:n Keys: superkeys, candidate keys, primary keys 1.7
Review Weak Entity sets, identifying relationship Discriminator (AKA, partial key), total participation, one- to-many Loan lno 1 Loan Pmt N pno lamt 1.8 Payment pdate pamt
Review Generalization-specialization a2 a1 E1 superclass Isa S1 S2 b1 c1 subclasses Aggregation E1 R1 R2 E3 1.9 E2
Review Data models: framework for organizing and interpreting data E/R Model OO, Object relational, XML Relational Model Intro E/R to relational SQL preview 1.10
Relational Data Model Introduced by Ted Codd (early 70’) (Turing Award, ‘81) Before “Network Data Model” (COBOL ) Relational data model contributes: 1. Separation of logical and physical data models (data independence) 2. Declarative query languages 3. Formal semantics 4. Query optimization (key to commercial success) First prototypes: Ingres System R 1.11
Relational Model Concepts Represents data as a collection of relations Table of values Row Represents a collection of related data values Fact that typically corresponds to a real-world entity or relationship Tuple Table name and column names Interpret the meaning of the values in each row attribute 1.12
Relational Model Concepts (cont’d.) 1.13
Domains, Attributes, Tuples, and Relations Domain D Set of atomic values Atomic Each value indivisible Specifying a domain Data type specified for each domain 1.14
Domains, Attributes, Tuples, and Relations (cont’d.) Relation schema R Denoted by R(A1, A2, .,An) Made up of a relation name R and a list of attributes, A1, A2, ., An Attribute Ai Name of a role played by some domain D in the relation schema R Degree (or arity) of a relation Number of attributes n of its relation schema 1.15
Domains, Attributes, Tuples, and Relations (cont’d.) Relation (or relation state) Set of n-tuples r {t1, t2, ., tm} Each n-tuple t Ordered list of n values t v1, v2, ., vn Each value vi, 1 i n, is an element of dom(Ai) or is a special NULL value 1.16
Domains, Attributes, Tuples, and Relations (cont’d.) Relation (or relation state) r(R) Mathematical relation of degree n on the domains dom(A1), dom(A2), ., dom(An) Subset of the Cartesian product of the domains that define R: r(R) (dom(A1) dom(A2) . dom(An)) 1.17
Domains, Attributes, Tuples, and Relations (cont’d.) Cardinality Total number of values in domain Current relation state Relation state at a given time Reflects only the valid tuples that represent a particular state of the real world Attribute names Indicate different roles, or interpretations, for the domain 1.18
Relations account bname Worcester Brighton Brookline acct no balance A-101 500 A-202 450 A312 600 Rows (tuples, records) Columns (attributes) Tables (relations) Why relations? 1.19
Relations Mathematical relations (from set theory): Given 2 sets R { 1, 2, 3, 5}, S {3, 4} R x S {(1,3), (1, 4), (2, 3), (2,4), (3,3), (3,4), (5,3), (5,4)} A relation between R and S is any subset of R x S e.g., {(1,3), (2,4), (5,3)} See Also: http://en.wikipedia.org/wiki/Cartesian product Database relations: Given attribute domains: bname {Worcester, Brighton, .} acct no { A-101, A-102, A-203, } balance { , 400, 500, } account subset of bname x acct no x balance 1.20 { (Worcester, A-101, 500), (Brighton, A-202, 450), (Brookline, A-312, 600), (Downtown, A-415, 200)}
Characteristics of Relations Ordering of tuples in a relation Relation defined as a set of tuples Elements have no order among them Ordering of values within a tuple and an alternative definition of a relation Order of attributes and values is not that important, so long as correspondence between attributes and values is maintained 1.21
Characteristics of Relations (cont’d.) 1.22
Characteristics of Relations (cont’d.) Alternative definition of a relation Tuple considered as a set of ( attribute , value ) pairs Each pair gives the value of the mapping from an attribute Ai to a value vi from dom(Ai) Better: Use the first definition of relation Attributes and the values within tuples are ordered Simpler notation 1.23
Characteristics of Relations (cont’d.) Values and NULLs in tuples Each value in a tuple is atomic Flat relational model Composite and multivalued attributes not allowed “First normal form” assumption Multivalued attributes must be represented by separate relations Composite attributes are represented by simple component attributes in basic relational model 1.24
Characteristics of Relations (cont’d.) NULL values Represent the values of attributes that may be unknown or may not apply to a tuple Meanings for NULL values Value unknown Value exists but is not available Attribute does not apply to this tuple (also known as value undefined) 1.25
Characteristics of Relations (cont’d.) Interpretation (meaning) of a relation, 2 approaches: Assertion Each tuple in the relation is a fact or a particular instance of the assertion Predicate Values in each tuple interpreted as values that satisfy predicate 1.26
Relational Model Notation Relation schema R of degree n Denoted by R(A1, A2, ., An) Uppercase letters Q, R, S Denote relation names Lowercase letters q, r, s Denote relation states Letters t, u, v Denote tuples 1.27
Relational Model Notation Name of a relation schema: STUDENT Indicates the current set of tuples in that relation Notation: STUDENT(Name, Ssn, .) Refers only to relation schema Attribute A can be qualified with the relation name R to which it belongs Using the dot notation R.A 1.28
Relational Model Notation n-tuple t in a relation r(R) Denoted by t v1, v2, ., vn vi is the value corresponding to attribute Ai Component values of tuples: t[Ai] and t.Ai refer to the value vi in t for attribute Ai t[Au, Aw, ., Az] and t.(Au, Aw, ., Az) refer to the subtuple of values vu, vw, ., vz from t corresponding to the attributes specified in the list 1.29
Storing Data in a Table sid 53666 53688 53650 name login Jones jones@cs Smith smith@eecs Smith smith@math Data about individual students One row per student How to represent course enrollment? 1.30 age 18 18 19 gpa 3.4 3.2 3.8
Storing More Data in Tables Students may enroll in more that one course Most efficient: keep enrollment in separate table Enrolled cid Carnatic101 Reggae203 Topology112 History105 grade C B A B sid 53666 53666 53650 53666 1.31
Linking Data from Multiple Tables How to connect student data to enrollment? Need a Key Enrolled cid Carnatic101 Reggae203 Topology112 History105 grade C B A B sid 53666 53666 53650 53666 Students sid 53666 53688 53650 1.32 name login Jones jones@cs Smith smith@eecs Smith smith@math age gpa 18 3.4 18 3.2 19 3.8
Relational Data Model: Formal Definitions Relational database: a set of relations. Relation: made up of 2 parts: Instance : a table, with rows and columns. #rows cardinality Schema : specifies name of relation, plus name and type of each column. E.g. Students(sid: string, name: string, login: string, age: integer, gpa: real) #fields degree / arity Can think of a relation as a set of rows or tuples. i.e., all rows are distinct 1.33
In other words. Data Model – a way to organize information Schema – one particular organization, i.e., a set of fields/columns, each of a given type Relation a name a schema a set of tuples/rows, each following organization specified in schema 1.34
Example Instance of Students Relation sid 53666 53688 53650 name login Jones jones@cs Smith smith@eecs Smith smith@math Cardinality 3, arity (degree) 5 , all rows distinct 1.35 age gpa 18 3.4 18 3.2 19 3.8
Relational Model Constraints Constraints Restrictions on the actual values in a database state Derived from the rules in the miniworld that the database represents Inherent model-based constraints or implicit constraints Inherent in the data model 1.36
Relational Model Constraints (cont’d.) Schema-based constraints or explicit constraints Can be directly expressed in schemas of the data model Application-based or semantic constraints or business rules Cannot be directly expressed in schemas Expressed and enforced by application program Database design as part of a software engineering process. 1.37
Schema-based: Domain Constraints Typically include: Numeric data types for integers and real numbers Characters Booleans Fixed-length strings Variable-length strings Date, time, timestamp Money Other special data types 1.38
Schema-based: Key Constraints and Constraints on NULL Values No two tuples can have the same combination of values for all their attributes. Superkey No two distinct tuples in any state r of R can have the same value for SK Key Superkey of R Removing any attribute A from K leaves a set of attributes K that is not a superkey of R any more See also: http://en.wikipedia.org/wiki/Superkey 1.39
Schema-based: Key Constraints and Constraints on NULL Values (cont’d.) Key satisfies two properties: Two distinct tuples in any state of relation cannot have identical values for (all) attributes in key Minimal superkey Cannot remove any attributes and still have uniqueness constraint in above condition hold 1.40
Schema-based: Key Constraints and Constraints on NULL Values (cont’d.) Candidate key Relation schema may have more than one key Primary key of the relation Designated among candidate keys Underline attribute in the schema Other candidate keys are designated as unique keys NOT NULL constraint Enforce that a particular attribute cannot be a NULL value. 1.41
Schema-based: Key Constraints and Constraints on NULL Values (cont’d.) 1.42
Relational Databases and Relational Database Schemas Relational database schema S Set of relation schemas S {R1, R2, ., Rm} Set of integrity constraints IC Relational database state Set of relation states DB {r1, r2, ., rm} Each ri is a state of Ri and such that the ri relation states satisfy integrity constraints specified in IC 1.43
Relational Databases and Relational Database Schemas (cont’d.) Invalid state Does not obey all the integrity constraints Valid state Satisfies all the constraints in the defined set of integrity constraints IC 1.44
Schema-based: Integrity, Referential Integrity, and Foreign Keys Entity integrity constraint No primary key value can be NULL Referential integrity constraint Specified between two relations Maintains consistency among tuples in two relations 1.45
Schema-based: Integrity, Referential Integrity, and Foreign Keys (cont’d.) Foreign key rules: The attributes in FK have the same domain(s) as the primary key attributes PK Value of FK in a tuple t1 of the current state r1(R1) either occurs as a value of PK for some tuple t2 in the current state r2(R2) or is NULL 1.46
Schema-based: Integrity, Referential Integrity, and Foreign Keys (cont’d.) Diagrammatically display referential integrity constraints Directed arc from each foreign key to the relation it references All integrity constraints should be specified on relational database schema 1.47
Semantic Integrity Constraints May have to be specified and enforced on a relational database Use triggers and assertions (we’ll see these in SQL) More common to check for these types of constraints within the application programs 1.48
Other Types of Constraints State constraints What we’ve talked about so far (i.e., keeping the DB in a valid state) Functional dependency constraints Used when “normalizing” the database to avoid redundancies, etc. Transition constraints Define to deal with state changes in the database E.g. Salary can only be increased Typically enforced by applications 1.49
SQL - A language for Relational DBs SQL: standard language Based on SEQUEL in System R (IBM now DB2) Data Definition Language (DDL) create, modify, delete relations specify constraints administer users, security, etc. Data Manipulation Language (DML) Specify queries to find tuples that satisfy criteria add, modify, remove tuples 1.50
SQL Overview CREATE TABLE name ( field domain , ) INSERT INTO name ( field names ) VALUES ( field values ) DELETE FROM name WHERE condition UPDATE name SET field name value WHERE condition SELECT fields FROM name WHERE condition 1.51
Creating Relations in SQL Creates the Students relation. Note: the type (domain) of each field is specified, and enforced by the DBMS whenever tuples are added or modified. CREATE TABLE Students (sid CHAR(20), name CHAR(20), login CHAR(10), age INTEGER, gpa REAL) Another example: the Enrolled table holds information about courses students take. CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2)) 1.52
Adding and Deleting Tuples Can insert a single tuple using: INSERT INTO VALUES Students (sid, name, login, age, gpa) (‘53688’, ‘Smith’, ‘smith@ee’, 18, 3.2) Can delete all tuples satisfying some condition (e.g., name Smith): DELETE FROM WHERE Students S S.name ‘Smith’ Powerful variants of these commands are available; more later! 1.53
Updating Tuples Can update all tuples, or just those matching some criterion UPDATE Students SET sid sid 100000 UPDATE Enrolled SET grade ‘E’ WHERE grade ‘F’ 1.54
Keys Integrity Constraints (IC): conditions that restrict the data that can be stored in the database Keys are a way to associate tuples in different relations Keys are one form of integrity constraint (IC) Enrolled cid Carnatic101 Reggae203 Topology112 History105 grade C B A B sid 53666 53666 53650 53666 Students sid 53666 53688 53650 1.55 name login Jones jones@cs Smith smith@eecs Smith smith@math age gpa 18 3.4 18 3.2 19 3.8
Primary Keys - Definitions Key: A minimal set of attributes that uniquely identify a tuple A set of fields is a superkey if: A set of fields is a candidate key for a relation if : No two distinct tuples can have same values in all key fields It is a superkey No subset of the fields is a superkey (i.e., it is a minimal superkey) 1 candidate keys for a relation? one of the keys is chosen (by DBA) to be the primary key. Students E.g. sid is a key for Students. What about name? The set {sid, gpa} is a superkey. sid 53666 53688 53650 1.56 name login Jones jones@cs Smith smith@eecs Smith smith@math age gpa 18 3.4 18 3.2 19 3.8
Primary and Candidate Keys in SQL Possibly many candidate keys (specified using UNIQUE), one of which is chosen as the primary key. CREATE TABLE Enrolled “For a given student and (sid CHAR(20) course, there is a single cid CHAR(20), grade CHAR(2), grade.” PRIMARY KEY (sid,cid)) vs. “Students can take only one course, and receive a single CREATE TABLE Enrolled (sid CHAR(20) grade for that course; further, cid CHAR(20), no two students in a course grade CHAR(2), PRIMARY KEY (sid), receive the same grade.” UNIQUE (cid, grade)) Used carelessly, an IC can prevent storage of database 1.57
Foreign Keys A Foreign Key is a field whose values are keys in another relation. Enrolled cid Carnatic101 Reggae203 Topology112 History105 grade C B A B sid 53666 53666 53650 53666 Students sid 53666 53688 53650 1.58 name login Jones jones@cs Smith smith@eecs Smith smith@math age gpa 18 3.4 18 3.2 19 3.8
Foreign Keys, Referential Integrity Foreign key : Set of fields in one relation used to refer’ to tuples in another relation. Must correspond to primary key of the second relation. Like a logical pointer’. E.g. sid in Enrolled is a foreign key referring to Students: Enrolled(sid: string, cid: string, grade: string) If all foreign key constraints are enforced, referential integrity is achieved (i.e., no dangling references.) 1.59
Foreign Keys in SQL Only students listed in the Students relation should be allowed to enroll for courses. CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students ) Enrolled sid 53666 53666 53650 53666 cid grade Carnatic101 C Reggae203 B Topology112 A History105 B Students sid 53666 53688 53650 1.60 name login Jones jones@cs Smith smith@eecs Smith smith@math age gpa 18 3.4 18 3.2 19 3.8
Integrity Constraints (ICs) IC: condition that must be true for any instance of the database; e.g., domain constraints. ICs are specified when schema is defined. ICs are checked when relations are modified. A legal instance of a relation is one that satisfies all specified ICs. DBMS should not allow illegal instances. If the DBMS checks ICs, stored data is more faithful to real-world meaning. Avoids data entry errors, too! 1.61
Update Operations, Transactions, and Dealing with Constraint Violations Operations of the relational model can be categorized into retrievals and updates Basic operations that change the states of relations in the database: Insert Delete Update (or Modify) 1.62
1.63
1.64
The Insert Operation Provides a list of attribute values for a new tuple t that is to be inserted into a relation R Can violate any of the four types of constraints If an insertion violates one or more constraints Default option is to reject the insertion 1.65
The Insert Operation 1.66
The Delete Operation Can violate only referential integrity If tuple being deleted is referenced by foreign keys from other tuples Restrict Reject the deletion Cascade Propagate the deletion by deleting tuples that reference the tuple that is being deleted Set null or set default Modify the referencing attribute values that cause the violation 1.67
The Delete Operation 1.68
The Update Operation: RESUMED HERE Necessary to specify a condition on attributes of relation Select the tuple (or tuples) to be modified If attribute not part of a primary key nor of a foreign key Usually causes no problems Updating a primary/foreign key Similar issues as with Insert/Delete 1.69
The Update Operation 1.70
The Transaction Concept Transaction Executing program Includes some database operations Must leave the database in a valid or consistent state Online transaction processing (OLTP) systems Execute transactions at rates that reach several hundred per second 1.71
Relational Query Languages A major strength of the relational model: supports simple, powerful querying of data. Declarative, not Procedural Specify what you want, not how to get it Queries can be written intuitively, and the DBMS is responsible for efficient evaluation. The key: precise semantics for relational queries. Allows the optimizer to extensively re-order operations, and still ensure that the answer does not change. 1.72
The SQL Query Language The most widely used relational query language. Current std is SQL:2016 To find all 18 year old students, we can write: SELECT * FROM Students S WHERE S.age 18 sid name 53666 Jones login age gpa jones@cs 18 3.4 53688 Smith smith@ee 18 3.2 To find just names and logins, replace the first line: SELECT S.name, S.login FROM Students S WHERE S.age 18 1.73
Querying Multiple Relations What does the following query compute? SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid E.sid AND E.grade 'A' sid 53831 53831 53650 53666 Given the following instances: sid name login age gpa 18 3.4 53650 Smith smith@math 19 3.8 53831 Smith smith@ee 3.2 53666 Jones jones@cs 18 we get: cid grade Carnatic101 C Reggae203 B Topology112 A History105 B S.name E.cid Smith Topology112 1.74
Semantics of a Query SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid E.sid AND E.grade 'A' A conceptual evaluation method for the query: 1. FROM clause: compute cross-product of Students, Enrolled 2. WHERE clause: Check conditions, discard tuples that fail 3. SELECT clause: Delete unwanted fields Remember, this is conceptual. Actual evaluation will be much more efficient, (but must produce the same answers) 1.75
Cross-product of Students and Enrolled Instances sid name 53666 Jones login age gpa jones@cs 18 3.4 sid 53831 53831 53650 53666 53650 Smith smith@math 19 3.8 53831 Smith smith@ee S.sid 53666 53666 53666 53666 53688 53688 53688 53688 53650 53650 53650 53650 S.name Jones Jones Jones Jones Smith Smith Smith Smith Smith Smith Smith Smith 18 3.2 S.login jones@cs jones@cs jones@cs jones@cs smith@ee smith@ee smith@ee smith@ee smith@math smith@math smith@math smith@math S.age 18 18 18 18 18 18 18 18 19 19 19 19 S.gpa 3.4 3.4 3.4 3.4 3.2 3.2 3.2 3.2 3.8 3.8 3.8 3.8 1.76 E.sid 53831 53832 53650 53666 53831 53831 53650 53666 53831 53831 53650 53666 cid Carnatic101 Reggae203 Topology112 History105 grade C B A B E.cid E.grade Carnatic101 C Reggae203 B Topology112 A History105 B Carnatic101 C Reggae203 B Topology112 A History105 B Carnatic101 C Reggae203 B Topology112 A History105 B
Summary Constraints differentiate relations from ordinary tables or files Classify database constraints into: Inherent model-based constraints Explicit schema-based constraints Application-based constraints Modification operations on the relational model: Insert, Delete, and Update 1.77
Relational Model: Summary A tabular representation of data Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations. Two important ICs: primary and foreign keys In addition, we always have domain constraints. Powerful and natural query languages exist. 1.78