Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most
33 Slides277.00 KB
Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture slides CS511 Advanced Database Management Systems 1
DBMS Mission Statement Simply: maintenance and computation of data Data Operations Results But how to do it? CS511 Advanced Database Management Systems 2
DBMS Architecture User/Web Forms/Applications/DBA query transaction Query Parser Transaction Manager Query Rewriter Query Optimizer Lock Manager Logging & Recovery Query Executor Files & Access Methods Buffer Manager Storage Manager Buffers Lock Tables Main Memory Storage CS511 Advanced Database Management Systems 3
A Design Dilemma To what extent should we reuse OS services? Reuse as much as we can – Performance problem (inefficient) – Lack of control (incorrect crash recovery) Replicating some OS functions (“mini OS”) – Have its own buffer pool – Directly manage record structures with files – CS511 Advanced Database Management Systems 4
OS vs. DBMS Conjecture: Perhaps pretty close Proof: – There exists someone who can write popular textbooks in both OS and DBMS! [Operating Database] System Concepts – Jim Gray is from OS background! CS511 Advanced Database Management Systems 5
OS vs. DBMS Similarities? What do they manage? What do they provide? CS511 Advanced Database Management Systems 6
OS vs. DBMS: Similarities Purpose of an OS: – managing hardware – presenting interface abstraction to applications DBMS is in some sense an OS? – DBMS manages data – presenting interface abstraction to applications Both as API for application development! CS511 Advanced Database Management Systems 7
Applications built upon DBMS ERP: Enterprise Resource Planning – SAP, Baan, PeopleSoft, Oracle, IBM,. CRM: Customer Relationship Management – E.phiphany, Siebel, Vantive, Oracle, IBM, . SCM: Supply Chain Management – Trilogy, i2, Oracle, IBM, . A lot more in the Info Tech era: – e-business software – scientific data – multimedia – data analysis and decision support CS511 Advanced Database Management Systems 8
OS vs. DBMS: Related Concepts Process Management What DB concepts? – process synchronization – deadlock handling Storage management What DB concepts? – virtual memory – file system Protection and security What DB concepts? – authentication – access control CS511 Advanced Database Management Systems 9
OS vs. DBMS: Differences? CS511 Advanced Database Management Systems 10
OS vs. DBMS: Differences DBMS: Top-down to encapsulate high-level semantics! – Data data with particular logical structures – Queries query language with well defined operations – Transactions transactions with ACID properties OS: Bottom-up to present low-level hardware CS511 Advanced Database Management Systems 11
DBMS on top of OS: Relations vs. File system Data object abstraction – file: array of characters – relation: set of tuples Physical contiguity: – large DB files want clustering of blocks – extent: larger granularity allocation unit sol1: managing raw disks by DBMS sol2: simulate by managing free spaces in DBMS Multiple trees (access methods) – file access: directory hierarchy (user access method) – block access: inodes – tuple access: DBMS indexes CS511 Advanced Database Management Systems 12
DBMS on top of OS: BM vs. VM Query-aware replacement needed for performance – not always LRU Examples? – how about sort-merge join? – how about nested-loop join? CS511 Advanced Database Management Systems 13
DBMS on top of OS: BM vs. VM System-controlled replacement needed for correctness – not always LRU Examples? CS511 Advanced Database Management Systems 14
Not Really OS Problems: Deferred Update Semantics Update emp.sal 0.8*emp.sal if emp.sal mgr.sal empname sal manager Smith 10k Jones 9k Brown 11k Brown Jones – what are the possible semantics? INGRES solution: deferred updates – buffer updates in intentions list for actual updates (also serve as redo log) – an example of “needing buffer knowledge in DBMS”, so perhaps not sensible to do BM totally in OS CS511 Advanced Database Management Systems 15
As the data model and application context change, so does the DBMS architecture CS511 Advanced Database Management Systems 16
Post-Relational DB Projects Motivation: – RDBMS not powerful enough for non-administrative dataintensive applications such as: CAD/CAM, GIS Buzz terms: object-oriented, extensible Sample projects – Postgres: U.C. Berkeley – Starburst: IBM Almaden – “highly extensible” after System R (relational), R* (distributed) ultimately finding its way into IBM DB2 UDB – Exodus: U. Wisconsin not a complete DB; an OO-style storage manager toolkit followed by Shore at Wisconsin, Predator at Cornell CS511 Advanced Database Management Systems 17
POSTGRES: Post INGRES Stonebraker, U.C. Berkeley 1977-1985: INGRES – among the first relational DB implementation Ingres Inc. -- . acquired by Computer Associates 1986-1994: POSTGRES – among the first object-relational DB implementation Illustra acauqired by Informix – PostgreSQL (the SQL version) CS511 Advanced Database Management Systems 18
RDBMS: the Relational Root Data model: (Codd, 1970’s) – a database is a set of relations – relation of n attributes: a set of n-tuples – n-tuple: (v1, , vn), where vi is in domain Si CS511 Advanced Database Management Systems 19
Relational Model: Normal Forms Basic: 1NF (First Normal Form) – implicitly required in the relation model – definition: only simple domains of atomic elements (Codd) simple domains represent the base (built-in) types – ? why? “Stronger” normal forms: – 4NF, Boyce-Codd Normal Form, 3NF, 2NF, – ? why? CS511 Advanced Database Management Systems 20
Normalizing Relations: Example Unnormalized relation of book “objects”: Books: title authors date great future {smith, jones} 4/01/01 career {jones} 7/12/00 Normalized relations: by decomposition Books: title day great future 4 career 7 Books: title great future great future career month 1 12 year 01 00 authors smith jones jones ? Problems of the relational model? CS511 Advanced Database Management Systems 21
Relational Model Problems “A relational DB is like a garage that forces you to take your car apart and store the pieces in little drawers.” (some researcher) “Object” notion lost by decomposition Base types are too restrictive – non-intuitive: object is decomposed into several relations – inefficient: a lot of online assembling by joins – integers and strings are very primitive – data “types” are typically application specific Relational algebra is the only allowed operation – simple, declarative, but also restrictive – application host language embedded SQL ? How to remedy these problems? CS511 Advanced Database Management Systems 22
Quest for a Richer Model? Object-oriented data model Extensible ADTs Programming-language constructs CS511 Advanced Database Management Systems 23
ORDBMS vs. OODBMS Question: How important is the relation? ORDBMS: – RDBMS OO features # – query-based OODBMS: – OO PL database features (persistent objects) – programming-based Meeting in the middle CS511 Advanced Database Management Systems 24
Stonebraker’s Matrix Simple Data Complex Data Query RDBMS ORDBMS No Query File System OODBMS Prediction: ORDBMS will dominate – evidence: big DB players are all on this side CS511 Advanced Database Management Systems 25
Object Orientation Concepts Classes: – classes as types – encapsulation: interface implementation – inheritance: building class hierarchies Objects: – complex objects: built from constructors, e.g., set-of, array, nested objs – object identity (OID): system generated as unique object reference enables (efficient) object linking and navigation CS511 Advanced Database Management Systems 26
POSTGRES Data Model POSTGRES data model: OO constructs – classes as relations object (class instance) tuple object-id tuple-id method attribute or function of attributes – inheritance (multiple parents) ADT constructs: – types – functions CS511 Advanced Database Management Systems 27
POSTGRES Functions Arbitrary C functions – e.g.: overpaid(Employee) – arbitrary semantics-- not optimized – no fancy access methods-- typically sequential scan Binary operators – “hints” to provide semantics – extensible access methods extensible B tree or user-defined index PostQuel procedures – parameterized queries as functions – e.g.: sal-lookup(name): retrieve Emp.salary where Emp.name name CS511 Advanced Database Management Systems 28
POSTGRES Storage System We were guided by a missionary zeal to do something different No-overwrite system Logging: – old values are not overwritten-- no value logging necessary – log only needs to keep transaction state (commit/abort/going) – ? crash recovery-- how? Vacuum-cleaner daemon to archive historical data Advantages: – recovery is cheap – time travel is easy CS511 Advanced Database Management Systems 29
Storage System: Problems Problems – flushing differential data (why?) by commit time can be costly unless “stable” main memory more costly than sequentially writing out logs – why ? – reads have to stitch together current picture And, yes, there are lots details unexplored or unexplained CS511 Advanced Database Management Systems 30
Questing for the Right Models Speaking about knowledge representation– The simple relational model is by far the only successful KR paradigm. When the relational model came along, the network guys resisted and their companies went under. When the OO model came along, the relational guys absorb its best, and their companies prospered again! -- Jeffery Ullman CS511 Advanced Database Management Systems 31
What You Should Know What are some major limitations of services provided by an OS in supporting a DBMS? In response to such limitations, what does a DBMS do? As the data model and task environment change, the architecture will also need to change CS511 Advanced Database Management Systems 32
Carry Away Messages One usually doesn’t fit all! – An OS is designed to serve all kinds of applications, so it’s not optimal for supporting a DBMS – Other examples: a search engine is designed to serve all kinds of people, so it’s not optimal for a particular person (personalized search) When a problem is recognized, there are often opportunities for breakthroughs in multiple areas – DBMS could take over OS functions – OS could provide more opportunities for customization From “day 1”, high efficiency has been the primary challenge/concern in designing and implementing a DBMS; reliability may be the second major concern – In contrast, “accuracy of answers” is at least as important as efficiency for a Web search engine – In the future, accuracy of answers will likely become more important CS511 Advanced Database Management Systems 33