MIS 5208 Week 9: Big Data & Splunk Ed Ferrara, MSIA,
29 Slides3.23 MB
MIS 5208 Week 9: Big Data & Splunk Ed Ferrara, MSIA, CISSP [email protected]
Agenda Chapter 1 Introduction / Splunk & Big Data What is Big Data? Alternate Data Processing Techniques Machine Data What is Splunk? Chapter 2 Variety of Data Dealing with Data File & Directories Fox School of Busines
W h at i s B i g D ata ? T h e T h re e Vs Big Data are: High volume High velocity High variety Information assets that require new forms of processing to enable: Enhanced decision making Insight discovery Process optimization Volume – Data measured in petabytes Highway sensors Data processing logs Amazon purchase data Velocity – Speed of data generation and frequency of delivery Variety – Difference in the number of data types Fox School of Busines
B I G DATA Facebook had more than 1B users with more than 618M active on a daily basis LinkedIn had more than 200M members – with the service adding 2 new members every second Instagram members upload 40M photos per day Twitter has 500M users – with the service adding 150K per day Wordpress has more than 40M new posts per day Pandora music streaming service has more than 13,700 years of music Etc. Fox School of Busines
Splunk and the Kill Chain There are four classes of data that security teams need to leverage for a complete view: log data binary data (flow and PCAP) threat intelligence data and contextual data. If any of these data types are missing, there’s a higher risk that an attack will go unnoticed. These data types are the building blocks for knowing what’s normal and what’s not in your environment. This single question lies at the intersection of both system availability (IT operations and application) and security use cases. Fox School of Busines
Splunk and the Kill Chain Effective data-driven security decisions require: Tens of terabytes of data per day without normalization Access data anywhere in the environment, including: Traditional security data sources Personnel time management systems HR databases Industrial control systems Hadoop data stores and custom enterprise applications that run the business Delivers fast time-to-answer for forensic analysis and can be quickly operationalized for security operations teams Makes data more available for analysis and helps staff view events in context. https://www.splunk.com/web assets/pdfs/secure/Splunk for Security.pdf Fox School of Busines
M a c h i n e D ata Machine data contains a definitive record of all the activity and behavior of your customers, users, transactions, applications, servers, networks and mobile devices. Machine data includes: configurations, API data Message queues Change events, Diagnostic command output Call detail records Sensor data from industrial systems Machine data comes in an array of unpredictable formats and the traditional set of monitoring and analysis tools were not designed for the variety, velocity, volume or variability of this data. A new approach, one specifically architected for this unique class of data, is required to quickly diagnose service problems, detect sophisticated security threats, understand the health and performance of remote equipment and demonstrate compliance. Fox School of Busines
S p l u n k D ata S o u rc e s Fox School of Busines
M a c h i n e D ata Data Type Where What Application Logs Local log files, log4j, log4net, Weblogic, WebSphere, JBoss, .NET, PHP User activity, fraud detection, application performance Business Process Logs Business process management logs Customer activity across channels, purchases, account changes, trouble reports Call Detail Records Clickstream Data Call detail records (CDRs), charging data records, event data records logged by telecoms and network switches Data Type Billing, revenue assurance, customer assurance, partner settlements, marketing intelligence Web server, routers, Usability analysis, proxy servers, ad digital marketing servers and general research Where What Configuration Files System configuration files How an infrastructure has been set up, debugging failures, backdoor attacks, time bombs Database Audit Logs Database log files, audit tables How database data was modified over time and who made the changes Filesystem Audit Logs Sensitive data stored in shared filesystems Monitoring and auditing read access to sensitive data Management and Logging APIs Checkpoint firewalls Management data log via the OPSEC and log events Log Export API (OPSEC LEA) and other vendor specific APIs from VMware and Citrix Fox School of Busines
M a c h i n e D ata Data Type Where What Data Type Where What Message Queues JMS, RabbitMQ, and Debug problems in AquaLogic complex applications and as the backbone of logging architectures for applications SCADA Data Supervisory Control Identify trends, and Data patterns, anomalies Acquisition (SCADA) in the SCADA infrastructure and used to drive customer value Operating System Metrics, Status and Diagnostic Commands CPU and memory utilization and status information using command-line utilities like ps and iostat on Unix and Linux and performance monitor on Windows Packet/Flow Data tcpdump and tcpflow, which generate pcap or flow data and other useful packet-level and session-level information Troubleshooting, analyzing trends to discover latent issues and investigating security incidents Performance degradation, timeouts, bottlenecks or suspicious activity that indicates that the network may be compromised or the object of a remote attack Fox School of Busines
Module Quiz Machine data is always structured? True False Machine data makes up more than % of the data accumulated by organizations. False 10% 25% 50% 90% 90% Fox School of Busines
Module Quiz Machine data can give you insights into:? Application performance Security Hardware monitoring Sales User behavior All of the Above Machine data is only log files on web servers. True False False Fox School of Busines
Splunk Components
S p l u n k C o m p o n e nt s Fox School of Busines
I n d ex D ata Collects data from any source Data Enters Inspectors decide how to process the data into a consistent format When the indexer finds a match – Splunk tags the data type for future use Events are then stored in Splunk Index Fox School of Busines
S p l u n k Eve nt P ro c e s s i n g Fox School of Busines
S e a rc h & I nve sti gate Enter a query into the Splunk search bar Run statistics using the Splunk search language Collects and indexes log and machine data from any source Powerful search, analysis and visualization capabilities Fox School of Busines
A d d Kn o w l e d ge Knowledge Objects Event Types Transactions Tags Saved Searches Lookups Data interpretation: Fields and field extractio ns Fields and field extractions make up the first order of Splunk Enterprise knowledge. The fields that Splunk Enterprise automatically extracts from your IT data help bring meaning to your raw data, clarifying what can at first glance seem incomprehensible. The fields that you extract manually expand and improve upon this layer of meaning. Data classification: Event types and transac tions Event types and transactions group together interesting sets of similar events. Event types group together sets of events discovered through searches, while transactions are collections of conceptuallyrelated events that span time. Data models Data models are representations of one or more datasets, and they drive the Pivot tool, enabling quick generation of useful tables, complex visualizations, and reports without needing to interact with the Splunk Enterprise search language. Data models are designed by knowledge managers who fully understand the format and semantics of their indexed data. Fox School of Busines
A d d Kn o w l e d ge Data Models cont. A typical data model makes use of other knowledge object types discussed in this manual, including lookups, transactions, searchtime field extractions, and calculated fields. Data enrichment: Lookups and workflow actions Data normalization: Tags and aliases Lookups and workflow actions are categories of knowledge objects that extend the usefulness of your data in various ways. Field lookups enable you to add fields to your data from external data sources such as static tables (CSV files) or Python-based commands. Workflow actions enable interactions between fields in your data and other applications or web resources, such as a WHOIS lookup on a field containing an IP address. Tags and aliases are used to manage and normalize sets of field information. You can use tags and aliases to group sets of related field values together, and to give extracted fields tags that reflect different aspects of their identity. For example, you can group events from set of hosts in a particular location (such as a building or city) together--just give each host the same tag. Or maybe you have two different sources using different field names to refer to same data-you can normalize your data by using aliases (by aliasing client ip to ip address, for example). Fox School of Busines
M o n i to r & A l e r t Type of alert Base search is a. Description Alert examples Alerts based on real-time searches that trigger every time the base search return s a result. Real-time search (runs over all time) Use this alert type if you need to know the moment a matching result comes in. This type is also useful if you need to design an alert for machine consumption (such as a workflow-oriented application). You can throttle these alerts to ensure that they don't trigger too frequently. Referred to as a "perresult alert." Trigger an alert for every failed login attempt, but alert at most once an hour for any given username. Trigger an alert when a "file system full" error occurs on any host, but only send notifications for any given host once per 30 minutes. Trigger an alert when a CPU on a host sustains 100% utilization for an extended period of time, but only alert once every 5 minutes. Fox School of Busines
M o n i to r & A l e r t Type of alert Alerts based on historical searches that run on a regular schedule. Base search is a. Historical search Description Alert examples This alert type triggers whenever a scheduled run of a historical search returns results that meet a particular condition that you have configured in the alert definition. Best for cases where immediate reaction to an alert is not a priority. You can use throttling to reduce the frequency of redundant alerts. Referred to as a "scheduled alert." Trigger an alert whenever the number of items sold in the previous day is less than 500. Trigger an alert when the number of 404 errors in any 1 hour interval exceeds 100. Fox School of Busines
M o n i to r & A l e r t Type of alert Alerts based on real-time searches that monitor events w ithin a rolling time "window". Base search is a. Real-time search Description Alert examples Use this alert type to monitor events in real time within a rolling time window of a width that you define, such as a minute, 10 minutes, or an hour. The alert triggers when its conditions are met by events as they pass through this window in real time. You can throttle these alerts to ensure that they don't trigger too frequently. Referred to as a "rolling-window alert." Trigger an alert whenever there are three consecutive failed logins for a user between now and 10 minutes ago, but don't alert for any given user more than once an hour. Trigger an alert when a host is unable to complete an hourly file transfer to another host within the last hour, but don't alert more than once an hour for any particular host. Fox School of Busines
Re p o r t & A n a l y ze When you create a search or a pivot that you would like to run again or share with others, you can save it as a report. This means that you can create reports from both the Search and the Pivot sides of Splunk Enterprise. After you create a report you can: Run the report on an ad hoc basis to review the results it returns on the report viewing page. You can get to the viewing page for a report by clicking the report's name on the Reports listing page. Open the report and edit it so that it returns different data or displays its data in a different manner. Your report will open in either Pivot or Search, depending on how it was created. This topic explains how you can create and edit reports.In addition, if your permissions enable you to do so, you can: Change the report permissions to share it with other Splunk Enterprise users. Schedule the report so that it runs on a regular interval. Scheduled reports can be set up to perform actions each time they're run, such as sending the results of each report run to a set of stakeholders. Accelerate slow-completing reports built in Search. Add the report to a dashboard as a dashboard panel. For more information about scheduling reports, see "Schedule reports," in this manual. http://docs.splunk.com/Documentation/Splunk/6.0.2/Report/Createandeditreports Fox School of Busines
S p l u n k U s e r Ro l e s Fox School of Busines
Module 2 Quiz Which of these is not a main component of Splunk? Collect and Index the data Search and Investigate Add knowledge Compress and Archive Compress and Archive The index does not play a major role in Splunk True False False Fox School of Busines
Module 2 Quiz Data is broken into single events by: Sourcetype Host Number of files The “-” character Time stamps are stored . Sourcetype In a consistent format Differently for each indexed item Differently for each year As Images files In a consistent format Fox School of Busines
Module 2 Quiz Which role defines what apps a user will see by default: Which two apps ship with Splunk Enteprise Admin Power User Admin DB Connect Search & reporting Sideview Utils Home App Search & Reporting Home App Fox School of Busines
Installing Splunk Demonstration
Thank you