1 SCIENCE PASSION TECHNOLOGY Data Management 05 Query Languages (SQL)
42 Slides1.75 MB
1 SCIENCE PASSION TECHNOLOGY Data Management 05 Query Languages (SQL) Matthias Boehm Graz University of Technology, Austria Computer Science and Biomedical Engineering Institute of Interactive Systems and Data Science BMK endowed chair for Data Management Last update: Nov 08, 2021
2 Announcements/Org #1 Video Recording Link in TeachCenter & TUbe (lectures will be public) Currently via https://tugraz.webex.com/meet/m.boehm #2 Exercise 1 Deadline: Nov 02 7 late days in TeachCenter Grading starts Nov 09 #3 Exercise 2 Task description published last weekend, discussed today Remaining data cleaning until Wednesday simplification Deadline: Nov 30 7 late days in TeachCenter INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22 Q&A
3 Agenda Structured Query Language (SQL) Other Query Languages (XML, JSON) Exercise 2: Query Languages and APIs INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
4 Structured Query Language (SQL) INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
5 What is a(n) SQL Query? SELECT Firstname, Lastname, Affiliation, Location FROM Participant AS R, Locale AS S #1 Declarative: what not how WHERE R.LID S.LID AND Location LIKE '%, GER' Firstname Lastname Affiliation Location Volker Markl TU Berlin Berlin, GER Thomas Neumann TU Munich Munich, GER #2 Flexibility: closed composability #3 Automatic Optimization #4 Physical Data Independence INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
6 Why should I care? SQL as a Standard Standards ensure interoperability, avoid vendor lock-in, and protect application investments Mature standard with heavy industry support for decades Rich eco system (existing apps, BI tools, services, frameworks, drivers, design tools, systems) [https://xkcd.com/927/] SQL is here to stay Foundation of mobile/server application data management Adoption of existing standard by new systems (e.g., SQL on Hadoop, cloud DBaaS) Complemented by NoSQL abstractions, see lecture 10 NoSQL (key-value, document, graph) INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22 Microsoft
Structured Query Language (SQL) 7 Overview SQL Structured Query Language (SQL) Current Standard: ISO/IEC 9075:2016 (SQL:2016) Data Definition Language (DDL) Manipulate the database schema Data Manipulation Language (DML) Update and query database Data Control Language (DCL) Modify permissions Dialects Spectrum of system-specific dialects for non-core features Data types and size constraints Catalog, builtin functions, and tools Support for new/optional features Case-sensitive identifiers Name Examples T-SQL Microsoft, Sybase PL/SQL Oracle, (IBM) PL/pgSQL PostgreSQL, derived Unnamed Most systems INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Structured Query Language (SQL) 8 The History of the SQL Standard [C. J. Date: A Critique of the SQL Database Language. SIGMOD Record 1984] SQL:1986 Database Language SQL, ANSI X3.135-1986, ISO-9075-1987(E) ‘87 international edition SQL:1989 (120 pages) Database Language SQL with Integrity Enhancements, ANSI X3.135-1989, ISO-9075-1989(E) SQL:1992 (580 pages) Database Language SQL, ANSI X3-1992, ISO/IEC-9075 1992, DIN 66315 ‘95 SQL/CLI (part 3), ‘96 SQL/PSM (part 4) SQL:1999 (2000 pages) Information Technology – Database Language – SQL, ANSI/ISO/IEC-9075 1999 Complete reorg, ’00 OLAP, ’01 SQL/MED, ’01 SQL/OLB, ‘02 SQL/JRT SQL:2003 (3764 pages) Information Technology – Database Language – SQL, ANSI/ISO/IEC-9075 2003 INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Structured Query Language (SQL) 9 The History of the SQL Standard, cont. Overview SQL:2003 x: . a part (x) . a package 1: Framework 4: PSM 3: CLI Call Level Interface 9: MED Persistent Stored Modules Management of External Data 10: OLB Object Language Bindings 13: JRT Java Routines and Types 11: Schemata 2: Foundation optional features mandatory features (1) Enhanced Date/Time Fac. (2) Enhanced Integrity Management (7) Enhanced Objects (8) Active Databases (6) Basic Objects (10) OLAP Core SQL (all SQL:92 entry, some extended SQL:92/SQL:99) INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22 14: XML Extensible Markup Language
Structured Query Language (SQL) 10 The History of the SQL Standard, cont. Since SQL:2003 overall structure remained unchanged . SQL:2008 (? pages) Information Technology – Database Language – SQL, ANSI/ISO/IEC-9075 2003 E.g., XML XQuery extensions, case/trigger extension SQL:2011 (4079 pages) Information Technology – Database Language – SQL, ANSI/ISO/IEC-9075 2011 E.g., time periods, temporal constraints, time travel queries SQL:2016 (? pages) Information Technology – Database Language – SQL, ANSI/ISO/IEC-9075 2016 E.g., JSON documents and functions (optional) Note: We can only discuss common primitives [Working Draft SQL:2011: https://www.wiscorp.com/ SQLStandards.html] INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Structured Query Language (SQL) 11 Data Types in SQL:2003 Large Variety of Types With support for multiple spellings Added in SQL:1999 / SQL:2003 SQL data types Composite Data Types Exact Predefined Data Types Interval Numeric Deleted in SQL:2003 Approximate String Bit Blob User-defined Types (UDT) Boolean Character Datetime Date Time NUMERIC DECIMAL REAL SMALLINT FLOAT INTEGER DOUBLE PRECISION BIGINT Fixed Varying Fixed Varying Clob Implicit casts among numeric types and among character types INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22 Timestamp
Structured Query Language (SQL) 12 Data Types in PostgreSQL Appropriate, Brief, Complete Strings CHAR(n) VARCHAR(n) TEXT fixed-length character sequence (padded to n) variable-length character sequence (n max) variable-length character sequence Numeric SMALLINT INT/INTEGER SERIAL NUMERIC(p, s) 2 byte integer (signed short) 4 byte integer (signed int) INTEGER w/ auto increment exact real with p digits and s after decimal point Time DATE date TIMESTAMP/TIMESTAMPTZ date and time, timezone-aware if needed JSON JSON INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) textofJSON representation (requires Matthias Boehm, Graz University Technology, WS 2021/22 reparsing)
Structured Query Language (SQL) 13 Create, Alter, and Delete Tables Create Table Templates in SQL Examples in PostgreSQL CREATE TABLE Students ( SID INTEGER PRIMARY KEY, Typed attributes Fname VARCHAR(128) NOT NULL, Primary and foreign keys Lname VARCHAR(128) NOT NULL, NOT NULL, UNIQUE constraints Mtime DATE DEFAULT DEFAULT values CURRENT DATE CHECK constraints ); CREATE TABLE Students AS SELECT ; Alter Table ALTER TABLE Students ADD DoB ADD/DROP columns DATE; ALTER TABLE Students ADD ALTER data type, defaults, CONSTRAINT constraints, etc PKStudent PRIMARY KEY(SID); DROP TABLE Students; -- sorry Delete Table DROP TABLE Students CASCADE; Delete table DROP TABLE IF EXISTS Countries, Cities, Airports, Airlines, Note: order of tables matters Routes, Planes, due to referential integrity Routes Planes; INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Structured Query Language (SQL) 14 Create and Delete Indexes Create Index CREATE INDEX ixStudLname ON Students USING btree Create a secondary (nonclustered) index on a set of attributes (Lname ASC NULLS Clustered: tuples sorted by index Non-clustered: sorted attribute with tupleFIRST); references Can specify uniqueness, order, and indexing method ix PostgreSQL methods: btree, hash, gist, and gin table data see lecture 07 Physical Design and Tuning Delete Index Drop indexes by name DROP INDEX ixStudLname; Tradeoffs Indexes often automatically created for primary keys / unique attributes Lookup/scan performance vs insert performance INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Structured Query Language (SQL) 15 Database Catalog [Meikel Poess: TPC-H. Encyclopedia of Big Data Technologies 2019] Catalog Overview Meta data of all database objects (tables, constraints, indexes) mostly read-only Accessible through SQL Organized by schemas (CREATE SCHEMA tpch;) SQL Information Schema Schema with tables for all tables, views, constraints, etc Example: check for existence of accessible table SELECT 1 FROM information schema.tables WHERE table schema ‘tpch’ AND table name ‘customer’ (defined as views over PostgreSQL catalog tables) INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22 pgAdmin graphical representation
Structured Query Language (SQL) 16 Insert Insert Tuple Insert a single tuple with implicit or explicit attribute assignment INSERT INTO Students (SID, Lname, Fname, MTime, DoB) VALUES (7,'Boehm','Matthias','2002-10-01','1982-0625'); Insert attribute key-value pairs to use auto increment, defaults, NULLs, etc INSERT INTO Students (Lname, Fname, DoB) SERIAL SID, DEFAULT MTime VALUES ('Boehm','Matthias','1982-0625'), (.), (.); INSERT INTO Students SELECT * FROM NewStudents; Insert Table Analogy Linux redirect (append): Redirect query result into INSERT (append semantics) cat NewStudents.txt Students.txt INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Structured Query Language (SQL) 17 Update and Delete Update Tuple/Table Set-oriented update of attributes Update single tuple via predicate on primary key Delete Tuple/Table Set-oriented delete of tuples Delete single tuple via predicate on primary key UPDATE Students SET MTime ‘2002-1002’ WHERE LName ‘Boehm’; DELETE FROM Students WHERE extract(year FROM mtime) 2010; Note: Time travel and multi-version concurrency control Deleted tuples might be just marked as inactive See lecture 09 Transaction Processing and Concurrency INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Structured Query Language (SQL) 18 Basic Queries Basic Query Template SELECT [DISTINCT] column list Select-From-Where FROM [ table list table1 [RIGHT LEFT FULL] Grouping and Aggregation JOIN Having and ordering table2 ON condition ] Duplicate elimination [WHERE predicate ] [GROUP BY column list ] [HAVING grouping predicate ] [ORDER BY column list [ASC DESC]] Example πFname,Affil,Location SELECT Fname, Affil, Location FROM Participant AS P, Locale AS L σP.LID L.LID WHERE P.LID L.LID; Participant Location INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Structured Query Language (SQL) 19 Basic Queries, cont. Distinct and All Distinct and all alternatives Projection w/ bag semantics by default SELECT DISTINCT Lname, Fname FROM Students; Sorting SELECT * FROM Convert a bag into a sorted list of Students tuples; order lost if used in other ops ORDER BY Lname Single order: (Lname, Fname) DESC DESC, Fname Evaluated last in a query tree DESC; (SELECT Firstname, Set Operations Lastname See 04 Relational Algebra and Calculus FROM Participant2018) UNION, INTERSECT, EXCEPT UNION DISTINCT Set operations set semantics by default (SELECT Firstname, Lastname DISTINCT (set) vs ALL (bag) FROM Participant2013) INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Structured Query Language (SQL) 20 Grouping and Aggregation Grouping and Aggregation Grouping: determines the distinct groups Aggregation: compute aggregate f(B) per group Column list can only contain grouping columns, aggregates, or literals Having: selection predicate on groups and aggregates Example Sales (Customer, Location, Product, Quantity, Price) Q: Compute number of sales sumQ and revenue per product sumQP SELECT Product, sum(Quantity) AS SumQ, sum(Quantity*Price) AS SumQP FROM Sales GROUP BY Product Product SumQ A 3 B 4 Product Quantity A 1 B 3 A 2 B 1 INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22 SumQP 30 80 Price 10 20 10 20
Structured Query Language (SQL) 21 BREAK (and Test Yourself) Task: SQL queries for the following query trees. δ πCustomer,Date σO.PID P.PID Customer Date A C D ‘2019-06-22’ ‘2019-06-23’ ‘2019-06-23’ Orders σName {Y,Z} O Products P γCustomer, sum(O.Quantity*P.Price) O.PID P.PID Customer Sum A B C D 120 120 130 75 Orders Products P O SELECT Customer, SELECT DISTINCT Customer, sum(O.Quantity * Date P.Price) FROM Orders O, Products P FROM Orders O, Products INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) WHERE O.PID Matthias P.PID P Boehm, Graz University of Technology, WS 2021/22
Structured Query Language (SQL) 22 Subqueries Subqueries in Table List Use a subquery result like a base table Modularization with WITH C AS (SELECT ) Subqueries w/ IN Check containment of values in result set of sub query SELECT S.Fname, S.Lname, C.Name FROM Students AS S, (SELECT CID, Name FROM Country WHERE ) AS C WHERE S.CID C.CID; SELECT Product, Quantity, Price FROM Sales WHERE Product NOT IN( SELECT Product FROM Sales GROUP BY Product HAVING sum(Quantity*Price) 1e6) Other subqueries EXISTS: existential quantifier x for correlated subqueries ALL: comparison (w/ universal quantifier x) SOME/ANY: comparison (w/ existential quantifier x) INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Structured Query Language (SQL) 23 Correlated and Uncorrelated Subqueries Correlated Subquery Evaluated subquery for every tuple of outer query Use of attribute from table bound in outer query inside subquery Uncorrelated Subquery Evaluate subquery just once No attribute correlations between subquery and outer query Query Unnesting (de-correlation) Rewrite during query compilation See lecture 08 Query Processing SELECT P.Fname, P.Lname FROM Professors P, WHERE NOT EXISTS( SELECT * FROM Courses C WHERE C.PID P.PID); SELECT P.Fname, P.Lname FROM Professors P, WHERE P.PID NOT IN( SELECT PID FROM Courses); [Thomas Neumann, Alfons Kemper: Unnesting Arbitrary Queries. BTW 2015] INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Structured Query Language (SQL) 24 Recursive Queries Approach WITH RECURSIVE name ( arguments ) Compose recursive table from non-recursive term, union all/distinct, and recursive term Terminates when recursive term yields empty result [https://xkcd.com/1739/] Example WITH RECURSIVE rPrereq(p,s) Courses(CID, Name), AS( Precond(pre REF CID, suc REF CID) (SELECT pre, suc Dependency graph (pre suc) FROM Precond WHERE 4 suc 5) 1 3 UNION DISTINCT (SELECT B.pre, B.suc 3 2 1 FROM Precond B, rPrereq R 4 2 WHERE B.suc R.p) 5 ) SELECT DISTINCT p FROM INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz UniversityrPrereq of Technology, WS 2021/22
Structured Query Language (SQL) 25 Procedures and Functions Overview Procedures and Functions Stored programs, written in PL/pgSQL or other languages Control flow (loops, branches) and SQL queries CREATE PROCEDURE prepStud(a INT) (Stored) Procedures LANGUAGE PLPGSQL AS Can be called standalone via BEGIN CALL proc name ( args ); DELETE FROM Students; Procedures return no outputs INSERT INTO Students SELECT * FROM NewStudents; Functions END; ; CREATE FUNCTION Can be called standalone or sampleProp(FLOAT) inside queries RETURNS FLOAT Functions are value mappings AS 'SELECT 1 * (1 - 1);' Table functions can return sets LANGUAGE SQL; of records with multiple attributes INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Structured Query Language (SQL) 26 Triggers Overview Trigger Similar to stored procedure but register ON INSERT, DELETE, or UPDATE Allows complex check constraints and active behavior such as replication, auditing, etc (good and bad) Trigger CREATE TRIGGER triggername BEFORE AFTER INSTEAD OF Template INSERT DELETE (UPDATE OF column list ) ON tablename [REFERENCING old new alias list ] [FOR EACH {ROW STATEMENT}] [WHEN ( search condition )] SQL procedure statement Not supported in BEGIN ATOMIC PostgreSQL { SQL Procedure statement ;}. (need single UDF) END INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22 Event Condition Action
Structured Query Language (SQL) 27 Views and Authorization Creating Views CREATE VIEW TeamDM AS Create a logical table from a query SELECT * FROM Employee E, Employee M Inserts can be propagated back to WHERE E.MgrID M.EID base relations only in special cases AND M.login ‘mboehm’; Allows authorization for subset of tuples Access Permissions Tables/Views Grant query/modification rights on database objects for specific users, roles Revoke access rights from users, roles (recursively revoke permissions of dependent views via CASCADE) GRANT SELECT ON TABLE TeamDM TO mboehm; REVOKE SELECT ON TABLE TeamDM FROM mboehm; INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Structured Query Language (SQL) 28 Beware of SQL Injection Problematic SQL String Concatenation INSERT INTO Students (Lname, Fname) VALUES (‘“ @lname ”‘,’“ @fname ”’);”; Possible SQL-Injection Attack [https://xkcd.com/327/] INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
29 Other Query Languages (XML, JSON) INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Other Query Languages (XML, JSON) 30 No really, why should I care? Semi-structured XML and JSON Self-contained documents for representing nested data Common data exchange formats without redundancy of flat files Human-readable formats often used for SW configuration Goals Awareness of XML and JSON as data models Query languages and embedded querying in SQL INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Other Query Languages (XML, JSON) 31 XML (Extensible Markup Language) XML Data Model Meta language to define specific exchange formats Document format for semi-structured data Well formedness XML schema / DTD ?xml version “1.0“ encoding “UTF-8“? data student id “1” course id “INF.01017UF” name “DM”/ course id “706.550” name “AMLS”/ /student student id “5” course id “706.520” name “DIA”/ /student /data XPath (XML Path Language) /data/student[@id ‘1’]/course/@name Query language for accessing collections of nodes of an XML document Axis specifies for ancestors, descendants, siblings, etc XSLT (XML Stylesheet Language Transformations) Schema mapping (transformation) language for XML documents XQuery INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22 “DM” “AMLS”
Other Query Languages (XML, JSON) 32 XML in PostgreSQL, cont. Overview XML in PostgreSQL Data types TEXT or XML (well-formed, type-safe operations) ISO/IEC 9075-14 XML-related specifications (SQL/XML) Creating XML INSERT INTO Students (Fname,Lname,Doc) VALUES(‘John’,’Smith’, Various built-in functions to parse documents, and create elements/attributes XMLPARSE( xml document ) XML type XMLELEMENT / XMLATTRIBUTES xmlparse( source doc )); Processing XML Execute XPath expressions on XML types XMLEXIST with XPath instead of XQuery XPATH with optional namespace handling SELECT Fname, Lname, xpath(‘/student/@id’,Doc) FROM Students INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Other Query Languages (XML, JSON) 33 JSON (JavaScript Object Notation) JSON Data Model Data exchange format for semi-structured data Not as verbose as XML (especially for arrays) Popular format (e.g., Twitter) Query Languages {“students:”[ {“id”: 1, “courses”:[ {“id“:“INF.01017UF”, “name“:“DM”}, {“id“:“706.550”, “name“:“AMLS”}]}, {“id”: 5, “courses”:[ {“id“:“706.520”, “name“:“DIA”}]}, ]} JSONiq Example: Most common: libraries for declare option jsoniq-version tree traversal and data extraction “ ”; JSONiq: XQuery-like query language for x in collection(“students”) where x.id lt 10 JSONPath: XPath-like query language let c : count( x.courses) return {“sid”: x.id, “count”: c} [http://www.jsoniq.org/docs/JSONiq/html-single/index.html] INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Other Query Languages (XML, JSON) 34 JSON in PostgreSQL, cont. Overview JSON in PostgreSQL Alternative data types: JSON (text), JSONB (binary, with restrictions) Implements RFC 7159, built-ins for conversion and access Creating JSON Built-in functions for creating JSON from tables and tables from JSON input Processing JSON SELECT row to json(t) FROM (SELECT Fname, Lname FROM Students) t SELECT Fname, Lname, Doc- students- id FROM Students Specialized operators for tree traversal and data extraction - operator: get JSON array element/object - operator: get JSON array element/object as text Built-in functions for extracting json (e.g., json each) INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
35 Exercise 2: Query Languages and APIs https://mboehm7.github.io/teaching/ws2122 dbs/ 02 ExerciseQueriesAPIs.pdf Published: Nov 06, 2021 (data cleaning until Nov 09 publish date) Deadline: Nov 30, 2021 INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Course Organization 36 Exercises: Austrian National Elections Dataset Austrian National Elections 2017 / 2019 with results over time and Graz districts (still being cleaned/prepared Ex 02) Clone or download your copy from https://github.com/tugraz-isds/datasets.git Find CSV files in datasets /elections at www.offenewahlen.at/ www.data.gv.at Exercises 01 Data modeling (relational schema) 02 Data ingestion and SQL query processing 03 Physical design tuning, query processing, and transaction processing 04 Large-scale data analysis (distributed query processing and ML model training) INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22 New
Exercise 2: Query Languages and APIs 37 Task 2.1: Schema Creation via SQL (3/25 points) Schema creation via SQL Relies on lectures 04 Relational Algebra and 05 Query Languages (SQL) Setup DBMS PostgreSQL, and start pgAdmin (UI), or psql (terminal) Docker container w/ basic setup in next days Create database db studentID and setup relational schema, including primary keys, foreign keys, NOT NULL, UNIQUE Recommended Schema Feel free to use and submit the provided schema https://mboehm7.github.io/teaching/ws2122 dbs/CreateSchema.sql Partial Results CreateSchema.sql CREATE TABLE Locations( LKey INT PRIMARY KEY, LocationID VARCHAR(32) UNIQUE NOT NULL, Name VARCHAR(128) NOT NULL, ParentLKey INT REFERENCES Locations ); INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Exercise 2: Query Languages and APIs 38 Task 2.2 Data Ingestion via CLI (10/25 points) Data Ingestion Program via ODBC/JDBC Relies on lectures 05 Query Languages (SQL) and 06 APIs (ODBC, JDBC) Write a program that performs deduplication and data ingestion Programming language of your choosing (Python, Java, C#, C recommended) Data Ingestion Process Data: https://github.com/tugraz-isds/datasets/tree/master/elections at Invoke your ingestion program as follows script to compile and run ./runIngestData.sh ./Locations.csv ./Parties.csv ./Votes.csv \ ./Elections.csv host port database user password (e.g., localhost 5432 db1234567 postgres postgres) Partial Results Source code IngestData.*, and Data Management / 706.010 Databases – 05 Query Languages (SQL) Script INF.01017UF runIngestData.sh Matthias Boehm, Graz University of Technology, WS 2021/22
Exercise 2: Query Languages and APIs 39 Task 2.3: SQL Query Processing (10/25 points) SQL Query Processing Relies on lecture 05 Query Languages (SQL) Expected results: https://mboehm7.github.io/teaching/ws2122 dbs/Results.zip List of Queries Q01: What is the ID of location Graz(Stadt)? (return LocationID) Q02: Select all parties of the election NR2017. (return ShortName, LongName, Ballot Position, sorted ascending by Ballot Position with NULLs last) Q03: Compute the voter turnout rate (total-votes/eligible) for all districts of Graz(Stadt) in election NR2019. (return location name, turnout rate, sorted descending by turnout) Q04: Compute the top 10 locations of election NR2019 by voter turnout rate. (return name, turnout, sorted descending by turnout) Q05: Which parties from the election NR2019 did not participate in NR2017? (return ShortName, LongName, sorted ascending by ShortName) INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Exercise 2: Query Languages and APIs 40 Task 2.3: SQL Query Processing (10/25 points) List of Queries, cont. Q06: Compute the support (fraction of received votes) in NR2019 of all parties that received more than 4% of votes. (return ShortName, support; sorted descending by support) Q07: Find the parties that won (with highest support rate) at least one location in NR2019. (return ShortName, count of won locations, total Austrian support rate; sorted descending by won locations) Q08: Compare for each state of Österreich (e.g., Steiermark) the total number of votes with the sum of votes in all last-level child locations. (return the state name, total votes, sum of votes in child locations, difference in votes, sorted ascending by state name) Partial Results SQL Script for each query: Q01.sql, Q02.sql, , Q08.sql INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
Exercise 2: Query Languages and APIs 41 Task 2.4: Query Plans (2/25 points) Explain Query Plans Relies on lecture 04 Relational Algebra and 05 Query Languages (SQL) Obtain and analyze execution plans of Q05 Example EXPLAIN VERBOSE SELECT L.location, count(*) FROM Participant P, Locale L WHERE P.lid L.lid GROUP BY L.location HAVING count(*) 1 "HashAggregate (.)" // grouping " Output: l.location, count(*)" " Group Key: l.location" " Filter: (count(*) 1)" // selection " - Hash Join (.)" // join " Output: l.location" // projection " Hash Cond: (l.lid p.lid)" " - Seq Scan on Locale l (.)" " Output: l.lid, l.location" " - Hash (.)" " Output: p.lid" // projection " - Seq Scan on Participant p (.)" " Output: p.lid" INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22
42 Conclusions and Q&A Summary History and fundamentals of the Structured Query Language (SQL) Awareness of XML and JSON (data model and querying) Exercise Submissions Exercise 1: Nov 02 7 late days, grading in progress Exercise 2: Nov 30, published Nov 09 (preview Nov 06) Next Lectures (Part A) 06 APIs (ODBC, JDBC, OR frameworks) [Nov 15] 07 Physical Design and Tuning [Nov 22] 08 Query Processing [Nov 29] 09 Transaction Processing and Concurrency [Dec 06] INF.01017UF Data Management / 706.010 Databases – 05 Query Languages (SQL) Matthias Boehm, Graz University of Technology, WS 2021/22