Flexible Web Visualization for AlertBased Network Security Analytics
22 Slides7.15 MB
Flexible Web Visualization for AlertBased Network Security Analytics Lihua Hao1, Christopher G. Healey1, Steve E. Hutchinson2 1 North Carolina State University, 2U.S. Army Research Laboratory [email protected] ARO MURI Meeting, ASU, October 29, 2013 1/22
Introduction Building a visualization tool for Army Research Laboratory (ARL) network security analysts Driven by analysts - “Do not fit our problem to your tool, but build a tool to fit our problem.” - Our approach does not focus explicitly on network security data, but rather on network security analysts Balance - Meeting needs of the analysts - Applying knowledge and best practices from visualization A web-based visualization tool to support flexible network data analysis Looking for comments & advices about an idea - Will the ongoing ensemble visualization research be useful in network security domain? - How to adjust the techniques to better fit the requirements in network security domain? 2/22
Design Constraints 1. Mental models - “Fit” the mental models the analysts use to investigate problems 2. Working environment - Integrate into the analyst’s current working environment (e.g., web browser for ARL analysts) 3. Configurability - Static, pre-defined presentations of the data are typically NOT useful 4. Accessibility - The visualizations should be familiar to analysts (avoid steep learning curve) 5. Scalability - Support query and retrieval from multiple data sources 6. Integration - Augment the analyst’s current problem-solving strategies with useful support 3/22
Existing Visualization Techniques Node-link graphs - Portall, HoNe, LinkRank Treemaps - NetVis, NFlowVis Timelines and Event Plots - An aggregate value over all events - The patterns of individual events Basic Charts - Snorby, NVisionIP Zooming, Multivariate - NVisionIP: galaxy, small multiple, and machine views - VisFlowConnect: global, domain, internal, and host statistics views 4/22
Data Management MySQL & PHP running on a remote server - Provide reasonable scalability - Efficient data filtering and projection No pre-defined table format - The analyst chooses columns to visualize - Sets table correlations and data filtering - Flexibility and configurability Only cache results of current query in memory - Generate queries to retrieve new data on demand Full SQL is available on demand - Analysts provide visualization requirement - System generates whole queries automatically 5/22
Web-Based Visualization ARL analysts work in a browser - Mental models & working environment HTML5’s canvas element - No external plug-ins required - Run in any modern web browser - Accessibility Use 2D charts - Common in other security visualization systems - Effective for presenting values, trends, patterns and relationships our analysts want to explore - Accessibility 6/22
Analyst-Driven Charts RGraph for basic chart visualizations - General information visualization with 2D charts - Only choose types of charts commonly used in network data visualization Assisted chart selection based on data and task (capability) Initialize chart properties dest ip Proportion and frequency comparison (pie) Value comparison over a secondary attribute (bar) Trends of change of a value over time (line) Correlation between two attributes (scatterplots) Range related correlation (gantt) Number of alerts - dest ip src ip, port - E.g., background grids, glyph size, color and type Free to change the initial choices src ip time time 7/22
Interactive Visualization Intelligent zoom - Redraw chart to include only the selected chart elements - Rescale the visual attributes of chart elements Tooltips for value query - Data-driven notes attached to chart elements - Access to quantitative data on demand Toolbars - Customize glyph size, color, size - Change chart title, size, label width, and so on - Zooming, correlated views, spreadsheets 8/22
Correlated Views A sequence of visualizations to track an ongoing investigation - Correlate multiple data sources - Explore data at multiple levels of details Correlated charts - Select sub-regions of a chart Filter corresponding rows Add additional constraints, tables, attributes Generate a following-on, correlated chart Raw data spreadsheets - Text-based value examination - A conventional approach - Working environment and mental models 9/22
Track Visualization Requests Record visualization requests in each step When new request is issued, list all previous requests, actions and charts Improve an analyst’s “working memory” capacity 10/22
Trap Data Need real world data to test the system For security reasons, it is not possible to use data from ARL for testing The trap server - Data from network security researchers at NCSU Real world network traffic in Computer Science building Transmitted to a Snort sensor to perform: (1) intrusion detection and (2) extraction of network packets Stores two types of data: (1) NetFlow data and (2) Snort alerts An example file for 24 hours of data - 17.4GB of packet headers - 938K unique source IPs, 168K unique destination IPs - 1.6M flows with 615K alerts 11/22
Summarization of our Web-based Visualization MySQL & PHP based database management - Scalability, data filtering and projection - No predefined table format Web-based visualization & analyst driven 2D charts - Mental model & working environment - Avoid steep learning curve - Select chart based on data and task RGraph Interactive Visualization - Intelligent zoom, tooltips, toolbar Correlated Views - A sequence of visualizations - Track an ongoing investigation - Raw data spreadsheets 12/22
Ensemble Visualization Scientific ensemble analysis & visualization - A collection of related datasets (members), from runs of a simulation or an experiment, with slightly varying initial conditions or parameters - Focus on scalability (data attribute, data element, member) - Relationships between members (comparison, aggregation, pattern mining) Apply to network security data - Scalability is also critical - Relationships between network traffics - Opportunity to apply ongoing research from ensembles to network security domain How is a network security dataset an ensemble? - E.g., NetFlow ensemble (member: a NetFlow) - Distributions of alerts within and between NetFlows Are ensemble techniques useful in network security domain? - Determine the value added of this analysis 13/22
Two Stages of Ensemble Analysis 1. Structure the members into sets based on their similarities - Level of detail clustering - Visualize the cluster hierarchy as a tree - Analysts choose members to visualize from the cluster tree (configurability) 2. Visualizing member sets - Use chart visualizations - Working environment, accessibility 14/22
NetFlow Similarity Measurement 1. Time duration 2. Density of alerts 3. Distributions of alerts Analysts decide - Which factors to measure - Weights of each factor - Configurability 4. Types of alerts within NetFlow 46 secs 1 alert 46 secs 7 alerts 46 secs 7 alerts 15/22
NetFlow Cluster Tree Clustering at varying threshold of similarity Analysts choose tree nodes to visualize Trade off: similarity vs. number of members 16/22
NetFlows Ensemble – 123 Members Analysts define members to form an ensemble 17/22
A Cluster of NetFlows Currently all NetFlows are visualized individually in a gantt chart Developing methods to aggregate NetFlows into a composite visualization source IP, port time 18/22
Feedbacks for Further Adjustment Ensemble analysis and visualization is flexible - Techniques vary based on requirements of applications Different perspectives to define a network ensemble (member)? Useful ways to measure correlations between ensemble members? Useful ways to structure ensemble members? Special requirements for the composite visualization? Other recommendations? 19/22
Future Work Analysis Sandbox - Individual analyses can be performed, stored, reviewed and compared - Improve an analyst’s “working memory” capacity Analysis Preferences - Track an analyst’s actions to better anticipate their strategies for specific types of tasks - Use preference elicitation algorithms to track an analyst’s interest within a visualization session Real-world Integration - Not allowed to speak directly with the analysts - Coordinate with IT staffs who support the analysts Ensemble Visualization - Further adjust existing techniques to meet the requirements in network security domain - Integrate into the web-based network security visualization tool 20/22
Progress Summary Papers - Flexible Web Visualization for Alert-Based Network Security Analytics. Hao, Healey, and Hutchinson. In Proceedings VizSec 2013 (Atlanta, GA), 2013. Students supported - Lihua Hao, PhD candidate, NC State University Projects supported - Web-based visualization for network security analytics - Ensemble visualization for network security analytics 21/22
FY 2014 Research Plan Validation of web-based tool with ARL collaborators - Finalize web-based visualization tool - Present tool to ARL IT staff - Integrate feedback into tool’s design, iterate on requested changes and improvements Investigation of scalability support through ensemble visualization - Confirm interest in pursuing scalability support - Integrate ensemble visualization research into web-based visualization tool - Update visualizations to support intelligent summarization and aggregation 22/22