Securing the Hadoop Ecosystem ATM (Cloudera) & Shreepadma (Cloudera)
34 Slides371.47 KB
Securing the Hadoop Ecosystem ATM (Cloudera) & Shreepadma (Cloudera) Strata/Hadoop World, Oct 2013
Had o op Agenda Eco sys tem Inte rac tion s Hadoop Ecosystem Interactions Security Concepts Authentication Authorization Overview Confidentiality Auditing IT Infrastructure Integration Deployment Recommendations Advanced Authorization (Apache Sentry (Incubating))
Had o op Hadoop on its Own WebHdfs client HDFS client Eco sys tem Inte Hadoop NN SNN DN TT Map Task HttpFS MR client hdfs, httpfs & mapred users DN TT Map Task DN TT Reduce Task JT end users protocols: RPC/data transfer/HTTP rac tion s
Had o op Hadoop and Friends service users end users clients Eco sys tem rac tion s protocols: RPCs/data/HTTP/Thrift/Avro-RPC services clients Hbase Zookeeper Inte RPC Hbase RPC Zookeeper Oozie HTTP Oozie WebHdfs Pig HTTP Hue Crunch HTTP browser HTTP Cascading MapRed RPC Hadoop RPC Flume Sqoop Impala Hive Hive Metastore Thrift Avro RPC Thrift Flume Impala
Authentication / Authorization Con cep ts Authentication: Sec urit y End users to services, as a user: user credentials Services to Services, as a service: service credentials Services to Services, on behalf of a user: service credentials trusted service Job tasks to Services, on behalf of a user: job delegation token Authorization Data: HDFS, HBase, Hive Metastore, Zookeeper Jobs: who can submit, view or manage Jobs (MR, Pig, Oozie, Hue, ) Queries: who can run queries (Impala, Hive)
Confidentiality / Auditing Confidentiality Sec urit y Data at rest (on disk) Data in transit (on the network) Auditing Who accessed (read/write) data Who submitted, managed or viewed a Job or a Query Con cep ts
Authentication Details End Users to services, as a user CLI & libraries: Kerberos (kinit or keytab) Web UIs: Kerberos SPNEGO & pluggable HTTP auth Services to Services, as a service Aut hen tica tion Credentials: Kerberos (keytab) Services to Services, on behalf of a user Proxy-user (after Kerberos for service)
Authorization Details HDFS Data Fine-grained authorization through Apache Sentry (Incubating) Jobs (Hadoop, Oozie) Read/Write Access Control Lists (ACLs) at table level Hive Server 2 and Impala File System permissions (Unix like user/group permissions) HBase Data Aut hor izati on Job ACLs for Hadoop Scheduler Queues, manage & view jobs Zookeeper ACLs at znodes, authenticated & read/write
Confidentiality Details Data in transit Con fide ntia l it y RPC: using SASL HDFS data: using SASL HTTP: using SSL (web UIs, shuffle). Requires SSL certs Thrift: not avail (Hive Metastore, Impala) Avro-RPC: not avail (Flume) Data at rest Nothing out of the box Doable by: custom ‘compression’ codec or local file system encryption
Auditing Details Who accessed (read/write) FS data NN audit log contains all file opens, creates NN audit log contains all metadata ops, e.g. rename, listdir Who submitted, managed, or viewed a Job or a Query Aud itin g JT, RM, and Job History Server logs contain history of all jobs run on a cluster Who submitted, managed, or viewed a workflow Oozie audit logs contain history of all user requests
Auditing Gaps Not all projects have explicit audit logs Aud itin g Audit-like information can be extracted by processing logs Eg: Impala query logs are distributed across all nodes It is difficult to correlate jobs & data access Eg: Map-Reduce jobs launched by Pig job Eg: HDFS data accessed by a Map-Reduce job Tools written on top of Hadoop can do this well, e.g. Cloudera Navigator
IT Integration: Kerberos Users don’t want Yet Another Credential Corp IT doesn’t want to provision thousands of service principals Solution: local KDC one-way trust Run a KDC (usually MIT Kerberos) in the cluster IT I nte gra tion Put all service principals here Set up one-way trust of central corporate realm by local KDC Normal user credentials can be used to access Hadoop
IT Integration: Groups Much of Hadoop authorization uses “groups” User ‘atm’ might belong to groups ‘analysts’, ‘eng’, etc. Users’ groups are not stored in Hadoop anywhere IT I nte gra tion Refers to external system to determine group membership NN/JT/Oozie/Hive servers all must perform group mapping Default plugins for user/group mapping: ShellBasedUnixGroupsMapping – forks/runs /bin/id’ JniBasedUnixGroupsMapping – makes a system call LdapGroupsMapping – talks directly to an LDAP server
IT Integration: Kerberos LDAP Central Active Directory LDAP group mapping [email protected] IT I nte gra tion Hadoop Cluster NN JT Local KDC Cross-realm trust hdfs/[email protected] yarn/[email protected]
IT Integration: Web Interfaces Most web interfaces authenticate using SPNEGO IT I nte gra tion Standard HTTP authentication protocol Used internally by services which communicate over HTTP Most browsers support Kerberos SPNEGO authentication Hadoop components which use servlets for web interfaces can plug in custom filter Integrate with intranet SSO HTTP solution
Deployment Recommendations mm end atio ns Do only what you really need Enable cluster security (Kerberos) only if un-trusted groups of users are sharing the cluster Rec o Security configuration is a PITA Dep loy me nt Otherwise use edge-security to keep outsiders out Only enable wire encryption if required Only enable web interface authentication if required
Deployment Recommendations Rec o mm end atio ns Secure Hadoop bring-up order 1. 2. 3. 4. 5. 6. 7. Dep loy me nt HDFS RPC (including SNN check-pointing) JobTracker RPC TaskTrackers RPC & LinuxTaskControler Hadoop web UI Configure monitoring to work with security Other services (HBase, Oozie, Hive Metastore, etc) Continue with authorization and network encryption if needed Recommended: Use an admin/management tool Several inter-related configuration knobs To manage principals/keytabs creation and distribution Automatically configures monitoring for security
Apache Sentry (Incubating)
Authorization What is Authorization? Authorization Concepts Privilege Right to perform a particular action or an action on an object of a particular type Eg., query table FOO Role Collection of privileges Benefit: Ease of privilege administration Group Collection of users Benefit: Ease of user administration Sen try
Authorization Requirements Secure Authorization Ability to control access to subset of data E.g., specific rows and columns in a table Role-based Authorization Reliably enforce privileges to control access to data and resources to authenticated users Fine-grained Authorization Sen try Ability to group and administer privileges through roles Multi-Tenant Administration Allow global administrator to delegate management of security for subsets of data to other administrator E.g., A global server admin may delegate management of security for individual databases to database admins
State of Security Support for Strong Authentication Kerberos LDAP/AD Custom Authentication (Hive) Two sub-optimal choices for Authorization Coarse-grained HDFS File Permissions (Hive) Achieved through HS2 impersonation Controls permissions at file level Insufficient for controlling access to chunks of data in a file No authorization for metadata Insecure Advisory Authorization (Hive) Self-service system that allows users to grant themselves privileges Prevents accidental deletion but doesn’t stop malicious use Sen try
Introducing Apache Sentry (Incubating) Authorization system for various components of Hadoop ecosystem Currently, supports Hive and Impala Support for Solr underway Secure, fine-grained, role-based and multi-tenant Open Source Currently undergoing incubation at ASF Sen try
Sentry Architecture Sen try
Sentry Policy File Contains sections for roles, groups, users Users section maps users to groups Roles section maps privileges to roles Groups section maps roles to groups Global policy file can also contain databases section to point to a db specific policy file [databases] customers hdfs://ha-nn-uri/usr/config/sentry/customers.ini Sen try Policy file is protected by file permissions Policy file can be on localFS/HDFS
Fine-Grained Authorization For Hive and Impala, ability to specify privileges on SERVER DATABASE TABLE VIEW (Row/Column level authorization) URI Privilege Granularity SELECT INSERT ALL Sen try
Role-Based Authorization Sen try Roles provide a mechanism to group privileges Used commonly by organizations to restrict access based on an employee’s role Example: Manager role allows INSERT on table EMPLOYEE and SELECT on view DIRECT REPORTS on table EMPLOYEE manager server server1- db hr db- table employee- action INSERT, \ server server1- db hr db- table direct reports- action SELECT
Multi-Tenant Administration Sen try Support for DB specific policy file Allows the global admin to delegate security administration of databases to database admins DB policy file can specify privileges for a DB Global policy file contains location of the DB policy file Privileges in the global file supersede the privileges in the DB specific policy file
User Management Sentry doesn’t perform user management Groups provide a container for a set of users Reuses Kerberos/LDAP/AD users Roles can be assigned to groups Example: analyst sales reporting, audit reports User to Group Mapping Reuse Hadoop groups Specify locally in policy file using user section Sen try
Granting/Revoking Privileges Sen try Specified in the policy file Example: Grant INSERT on table CUSTOMERS in database SALES: server server1- db sales- table customer- action INSERT Privileges are represented by a hierarchy (mirrors the hierarchy in Hive’s data model) Privileges granted for an object and its containees Example: ALL on DB implies SELECT, INSERT on all tables within the DB
Privilege Hierarchy Sen try
Configuring Sentry Sen try Old Hive CLI is not supported; HS2 /Impala is required Warehouse directory must be owned by the user running HS2/Impala Secure warehouse directory, including sub-directories, using 770 permissions In case of Hive, user HS2 is running as must be able to run MR jobs Turn off HS2 impersonation (strongly recommended) Configure sentry-site.xml and hive-site.xml appropriately
Q&A
Thanks ATM (Cloudera) & Shreepadma (Cloudera) Strata/Hadoop World, Oct 2013
App end ix Security Capabilities Client Protocol Authentication Hadoop HDFS Hadoop HDFS RPC Data Transfer Hadoop WebHDFS HTTP Hadoop MapReduce (Pig, Hive, Sqoop, Crunch, Cascading) RPC Kerberos Yes SASL No Kerberos SPNEGO plus pluggable Yes Yes (requires job Kerberos config work) Oozie Hbase HiveServer2 Zookeeper Impala Kerberos SPNEGO HTTP plus pluggable RPC/Thrift/HTTP Kerberos Kerberos/LDAP RPC Kerberos Thrift Kerberos Hue Flume HTTP Avro RPC pluggable N/A Proxy User Authorization Confidentiality Auditing FS permissions SASL FS permissions SASL Yes No FS permissions N/A Yes Job & Queue ACLs SASL No Yes Yes Yes No No Job & Queue ACLs and FS permissions table ACLs Sentry znode ACLs Sentry SSL (HTTPS) SASL In the works N/A N/A Yes No Yes No No No No Job & Queue ACLs and FS permissions N/A HTTPS N/A No No