Current Trends in Data Security Dan Suciu Joint work with Gerome
48 Slides168.00 KB
Current Trends in Data Security Dan Suciu Joint work with Gerome Miklau 1
Data Security Dorothy Denning, 1982: Data Security is the science and study of methods of protecting data (.) from unauthorized disclosure and modification Data Security Confidentiality Integrity 2
Data Security Distinct from systems and network security – Assumes these are already secure Tools: – Cryptography, information theory, statistics, Applications: – An enabling technology 3
Outline Traditional data security Two attacks Data security research today Conclusions 4
Traditional Data Security Security in SQL Access control Views Security in statistical databases Theory 5
[Griffith&Wade'76, Fagin'78] Access Control in SQL GRANT GRANT privileges privileges ON ON object object TO TOusers users [WITH [WITH GRANT GRANT OPTIONS] OPTIONS] privileges SELECT INSERT DELETE . . . object table attribute REVOKE REVOKE privileges privileges ON ONobject object FROM FROMusers users [CASCADE [CASCADE ]] 6
Views in SQL A SQL View (almost) any SQL query Typically used as: CREATE CREATEVIEW VIEWpmpStudents pmpStudentsAS AS SELECT SELECT** FROM FROMStudents StudentsWHERE WHERE GRANT GRANT SELECT SELECTON ONpmpStudents pmpStudentsTO TODavidRispoli DavidRispoli 7
Summary of SQL Security Limitations: No row level access control Table creator owns the data: that’s unfair ! Access control great success story of the DB community. or spectacular failure: Only 30% assign privileges to users/roles – And then to protect entire tables, not columns 8
Summary (cont) Most policies in middleware: slow, error prone: – – – – SAP has 10**4 tables GTE over 10**5 attributes A brokerage house has 80,000 applications A US government entity thinks that it has 350K Today the database is not at the center of the policy administration universe 9 [Rosenthal&Winslett’2004]
[Adam&Wortmann’89] Security in Statistical DBs Goal: Allow arbitrary aggregate SQL queries Hide confidential data SELECT SELECTname name FROM Not OK FROM Patient Patient WHERE WHEREage 42 age 42 and andsex ‘M’ sex ‘M’ and anddiagnostic ‘schizophrenia’ diagnostic ‘schizophrenia’ SELECT SELECTcount(*) count(*) OK FROM FROM Patients Patients WHERE WHEREage 42 age 42 and andsex ‘M’ sex ‘M’ and anddiagnostic ‘schizophrenia’ diagnostic ‘schizophrenia’ 10
[Adam&Wortmann’89] Security in Statistical DBs What has been tried: Query restriction – Query-size control, query-set overlap control, query monitoring – None is practical Data perturbation – Most popular: cell combination, cell suppression – Other methods, for continuous attributes: may introduce bias Output perturbation – For continuous attributes only 11
Summary on Security in Statistical DB Original goal seems impossible to achieve Cell combination/suppression are popular, but do not allow arbitrary queries 12
Outline Traditional data security Two attacks Data security research today Conclusions 13
[Chris Anley, Advanced SQL Injection In SQL] SQL Injection Your health insurance company lets you see the claims online: First login: User: fred Password: ******** Now search through the claims : Search claims by: Dr. Lee SELECT FROM WHERE 14 SELECT FROM WHEREdoctor ‘Dr. doctor ‘Dr.Lee’ Lee’and andpatientID ‘fred’ patientID ‘fred’
SQL Injection Now try this: Search claims by: Dr. Lee’ OR patientID ‘suciu’; .WHERE .WHEREdoctor ‘Dr. doctor ‘Dr.Lee’ Lee’ OR ORpatientID ‘suciu’; patientID ‘suciu’; -- --’ --’and andpatientID ‘fred’ patientID ‘fred’ Better: Search claims by: Dr. Lee’ OR 1 1; -15
SQL Injection When you’re done, do this: Search claims by: Dr. Lee’; DROP TABLE Patients; -- 16
SQL Injection The DBMS works perfectly. So why is SQL injection possible so often ? Quick answer: – Poor programming: use stored procedures ! Deeper answer: – Move policy implementation from apps to DB 17
Latanya Sweeney’s Finding In Massachusetts, the Group Insurance Commission (GIC) is responsible for purchasing health insurance for state employees GIC has to publish the data: GIC(zip, GIC(zip, dob, dob, sex, sex, diagnosis, diagnosis, procedure, procedure, .) .) 18
Latanya Sweeney’s Finding Sweeney paid 20 and bought the voter registration list for Cambridge Massachusetts: GIC(zip, GIC(zip, dob, dob, sex, sex, diagnosis, diagnosis, procedure, procedure, .) .) VOTER(name, VOTER(name, party, party, ., ., zip, zip, dob, dob, sex) sex) 19
Latanya Sweeney’s Finding zip, dob, sex William Weld (former governor) lives in Cambridge, hence is in VOTER 6 people in VOTER share his dob only 3 of them were man (same sex) Weld was the only one in that zip Sweeney learned Weld’s medical records ! 20
Latanya Sweeney’s Finding All systems worked as specified, yet an important data has leaked How do we protect against that ? Some of today’s research in data security address breaches that happen even if all systems work correctly 21
Summary on Attacks SQL injection: A correctness problem: – Security policy implemented poorly in the application Sweeney’s finding: Beyond correctness: – Leakage occurred when all systems work as specified 22
Outline Traditional data security Two attacks Data security research today Conclusions 23
Research Topics in Data Security Rest of the talk: Information Leakage Privacy Fine-grained access control Data encryption Secure shared computation 24
[Samarati&Sweeney’98, Meyerson&Williams’04] Information Leakage: k-Anonymity Definition: each tuple is equal to at least k-1 others Anonymizing: through suppression and generalization First Harry * John Beatrice * John Last Stone Reyser R* Stone Ramos R* Age 30-50 34 20-40 36 30-50 47 20-40 22 Hard: NP-complete for supression only Approximations exists Race Afr-Am Cauc * Afr-am Hisp * 25
[Miklau&S’04, Miklau&Dalvi&S’05,Yang&Li’04] Information Leakage: Query-view Security Have data: TABLE TABLE Employee(name, Employee(name, dept, dept, phone) phone) Secret Query S(name) View(s) Disclosure ? V(name,phone) total V1(name,dept) big S(name,phone) V2(dept,phone) S(name) V(dept) tiny S(name) V(name) none where dept ‘HR’ where dept ‘RD’ 26
Summary on Information Disclosure The theoretical research: – Exciting new connections between databases and information theory, probability theory, cryptography [Abadi&Warinschi’05] The applications: – many years away 27
Privacy “Is the right of individuals to determine for themselves when, how and to what extent information about them is communicated to [Agrawal’03] others” More complex than confidentiality 28
Privacy Involves: Data Owner Requester Purpose Consent Example: Alice gives her email to a web service [email protected] Privacy policy: P3P 29
Hippocratic Databases DB support for implementing privacy policies. Purpose specification Consent Hippocratic DB Limited use [email protected] Limited retention Protection against: Sloppy organizations Malicious organizations Privacy policy: P3P [Agrawal’03, LeFevrey’04] 30
Privacy for Paranoids Idea: rely on trusted agents [email protected] [email protected] Agent Protection against: Sloppy organizations Malicious attackers [email protected] foreign keys ? 31 [Aggarwal’04]
Summary on Privacy Major concern in industry – Legislation – Consumer demand Challenge: – How to enforce an organization’s stated policies 32
Fine-grained Access Control Control access at the tuple level. Policy specification languages Implementation 33
Policy Specification Language No standard, but usually based on parameterized views. CREATE CREATEAUTHORIZATION AUTHORIZATIONVIEW VIEWPatientsForDoctors PatientsForDoctorsAS AS SELECT SELECTPatient.* Patient.* FROM FROMPatient, Patient,Doctor Doctor WHERE WHEREPatient.doctorID Patient.doctorID Doctor.ID Doctor.ID and and Doctor.login Doctor.login %currentUser %currentUser Context parameters 34
Implementation SELECT SELECTPatient.name, Patient.name,Patient.age Patient.age FROM FROMPatient Patient WHERE WHEREPatient.disease Patient.disease ‘flu’ ‘flu’ SELECT SELECTPatient.name, Patient.name,Patient.age Patient.age FROM FROMPatient, Patient,Doctor Doctor WHERE WHEREPatient.disease Patient.disease ‘flu’ ‘flu’ and andPatient.doctorID Patient.doctorID Doctor.ID Doctor.ID and andPatient.login Patient.login %currentUser %currentUser e.g. Oracle 35
Two Semantics The Truman Model filter semantics – – – – transform reality ACCEPT all queries REWRITE queries Sometimes misleading results SELECT SELECT count(*) count(*) FROM FROM Patients Patients WHERE WHERE disease ‘flu’ disease ‘flu’ The non-Truman model deny semantics – – – – reject queries ACCEPT or REJECT queries Execute query UNCHANGED May define multiple security views for a user 36 [Rizvi’04]
Summary of Fine Grained Access Control Trend in industry: label-based security Killer app: application hosting – Independent franchises share a single table at headquarters (e.g., Holiday Inn) – Application runs under requester’s label, cannot see other labels – Headquarters runs Read queries over them Oracle’s Virtual Private Database 37 [Rosenthal&Winslett’2004]
Data Encryption for Publishing Scientist wants to publish medical research data on the Web Users and their keys: Complex Policies: All Allauthorized authorizedusers: users: Patient: Patient: Doctor: Doctor: Nurse: Nurse: Administrator Administrator:: KKuser user KKpat pat KKdr dr KKnu nu KKadmin admin Doctor Doctorresearchers researchersmay mayaccess accesstrials trials Nurses Nursesmay mayaccess accessdiagnostic diagnostic Etc Etc What is the encryption granularity ? 38
[Miklau&S.’03] Data Encryption for Publishing An XML tree protection: Doctor: Doctor: KKuser, KKdr user, dr Nurse: Nurse: KKuser, KKnu user, nu Nurse admin: Knu, Kadm Nurse admin:K Kuser, user, Knu, Kadm patient Kpat (Knu Kadm) Kuser Kdr Knu Kdr privateData diagnostic flu Kpat Kmaster name age address drug JoeDoe 28 Seattle Tylenol trial Kmaster placebo 39 Candy
Summary on Data Encryption Industry: – Supported by all vendors: Oracle, DB2, SQL-Server – Efficiency issues still largely unresolved Research: – Hard theoretical security analysis [Abadi&Warinschi’05] 40
Secure Shared Processing Alice has a database DBA Bob has a database DBB How can they compute Q(DBA, DBB), without revealing their data ? Long history in cryptography Some database queries are easier than general case 41
[Agrawal’03] Secure Shared Processing Alice Task: find intersection without revealing the rest a b c d Bob c d e Compute one-way hash h(a) h(b) h(c) h(d) h(c) h(d) h(e) Exchange h(c) h(d) h(e) h(a) h(b) h(c) h(d) What’s wrong ? 42
[Agrawal’03] Secure Shared Processing commutative encryption: h(x) EA(EB(x)) EB(EA(x)) Alice a b c d EA Bob c d e EB EA(a) EA(b) EA(c) EA(d) EB(c) EB(d) EB(e) EB(c) EB(d) EB(e) EA(a) EA(b) EA(c) EA(d) EB h(c) h(d) h(e) h(a) h(b) h(c) h(d) EA h(a) h(b) h(c) h(d) 43 h(c) h(d) h(e)
Summary on Secure Shared Processing Secure intersection, joins, data mining But are there other examples ? 44
Outline Traditional data security Two attacks Data security research today Conclusions 45
Conclusions Traditional data security confined to one server – Security in SQL – Security in statistical databases Attacks possible due to: – Poor implementation of security policies: SQL injection – Unintended information leakage in published data 46
Conclusions State of the industry: – Data security policies: scattered throughout applications – Database no longer center of the security universe – Needed: automatic means to translate complex policies into physical implementations State of research: data security in global data sharing – Information leakage, privacy, secure computations, etc. – Database research community has an increased appetite for cryptographic techniques 47
Questions ? 48