COE426: Data Privacy Lecture 13: Secure Data Outsourcing
16 Slides641.92 KB
COE426: Data Privacy Lecture 13: Secure Data Outsourcing
Outline 2 Cloud computing challenges Data outsourcing Distributed machine learning Secure/private platforms for distributed machine learning COE426: Lecture 13
What is Cloud Computing? New computing paradigm that relies on sharing computing resources rather than having local servers or personal devices to handle applications Cloud computing involves data and/or computation outsourcing, with Provides different service models "X-as-a-service" 3 Infinite and elastic resource scalability Ability to quickly scale in/out service On demand “just-in-time” provisioning No upfront cost pay-as-you-go IaaS, PaaS, SaaS, FaaS, MLaaS COE426: Lecture 13
Cloud Challenges 4 Clouds are still subject to traditional data confidentiality, integrity, availability, and privacy issues, plus some additional attacks COE426: Lecture 13
Anatomy of fear Integrity Availability Will the sensitive data stored on a cloud remain confidential? Will cloud compromises leak confidential client data (i.e., fear of loss of control over data) Will the cloud provider itself be honest and won’t peek into the data? Privacy issues raised via massive machine learning 5 Will critical systems go down at the client, if the provider is attacked in a Denial of Service attack? What happens if cloud provider goes out of business? Confidentiality How do I know that the cloud provider is doing the computations correctly? How do I ensure that the cloud provider really stored my data without tampering with it? Cloud now stores data from a lot of clients, and can run machine learning algorithms to get large amounts of information on clients COE426: Lecture 13
Ensuring confidentiality/Privacy of outsourced data 6 Most type of computations require decrypting data before any computations If the cloud provider is not trusted, this may result in breach of confidentiality How can we ensure confidentiality of data and computations in a cloud? Existing Approaches: Homomorphic encryption, Trusted Cloud Computing Platform, Secure Multiparty Computation COE426: Lecture 13
Data Outsourcing Data owner outsources its data and processing functionalities to a cloud in order to reduce management cost and less overhead of data storage Data outsourcing is necessary for full utilization of huge amount x1 x2 of data Data outsourcing is required in applications like f(x1,x2, , xn) Distributed data mining Federated machine learning x4 x3 xn (e.g., hospitals, government bodies) or may be the individual him or herself 7 COE426: Lecture 13
Distributed Data Mining Government / public agencies. The Centers for Disease Control want to identify disease outbreaks Insurance companies have data on disease incidents, seriousness, patient background, etc. But can/should they release this information? Industry Collaborations / Trade Groups. Example: 8 Example: An industry trade group may want to identify best practices to help members But some practices are trade secrets How do we provide “commodity” results to all (Manufacturing using chemical supplies from supplier X have high failure rates), while still preserving secrets (manufacturing process Y gives low failure rates)? COE426: Lecture 13
Approaches to Preserve Privacy Restrict Access to data (Protect Individual records) Protect both the data and its source: 9 Secure Multi-party computation (SMC) Input Data Randomization There is no such one solution that fits all purposes COE426: Lecture 13
Application of SMC to Private Machine Learning Setting Aim 10 Data is distributed at different sites These sites may be third parties (e.g., hospitals, government bodies) or individuals Compute the machine learning algorithm on the data so that nothing but the output is learned That is, carry out a secure computation COE426: Lecture 13
Privacy Preserving Machine Learning Toolkit Many different machine learning techniques often perform similar computations at various stages (e.g., computing sum, counting the number of items) Toolkit simple computations – sum, union, intersection assemble them to solve specific mining tasks – association rule mining, bayes classifier, The protocols may not be truly secure but more efficient than traditional SMC methods Tools for Privacy Preserving Data Mining, Clifton, 2002 11 COE426: Lecture 13
Primitive Protocols Secure functions 12 Secure sum: sum two or more values without revealing each value Secure multiplication: multiply two or more values without revealing each value Secure comparison: Comparing two integers without revealing the integer values Secure set intersection: party has set and Party has set , the goal is to calculate without revealing anything else COE426: Lecture 13
Primitive Protocols Secure functions 13 Secure set union: party has set and Party has set , the goal is to calculate without revealing anything else Secure Dot Product: party has a vector and Party has a vector . The goal is to calculate without revealing anything else Secure Polynomial Evaluation: party has polynomial and party has a value , the goal is to calculate without revealing or COE426: Lecture 13
Private Machine Learning Specific Secure Tools Association Rule Mining Secure Comparison Decision Trees Secure Set Intersection Secure Dot Product K-means Clustering Secure Logarithm Naïve Bayes Classifier Secure Poly. Evaluation 14 Outlier Detection COE426: Lecture 13
Drawbacks for SMC Based Machine Learning 15 Still not efficient enough for very large datasets Semi-honest model may not be realistic Malicious model is even slower COE426: Lecture 13
Summary of SMC Based PPDDM 16 Mainly used for distributed machine learning Learned models are accurate Efficient/specific cryptographic solutions for many distributed machine learning problems are developed Mainly semi-honest assumption (i.e. parties follow the protocols) Malicious model is also explored recently COE426: Lecture 13