BigSnarf blog

Infosec FTW

Big Data Infosec – Bigsnarf Open Source Solution

To start the conversation off on my Big Data Infosec journey, I created this placemat to consider where Infosec might end up. This is an example of my first experiments with visualizing data with Hadoop and Hive. It is on the same idea as packetpig. (Link: PDF). I think that Big Data, gives Information Security, a second chance to “do it better”.

Last year, while building my POC, I found Wayne’s SherpaSurfing solution. (Link:Slideshare) and though that Big Data and Infosec could be better. In Ben’s post (Link:Blog), a sobering discussion around are we winning or losing. Scott Crawford has been discussing data-driven security with his (Link: blog series). George followed up with his post (Link:Website) which I interpreted as “I think we’re in a precarious spot”. Here’s the RSA panel’s position on the topic (Link:Website). Andrew suggests that SIEM aren’t providing value buy rebranding with Big Data “buzzword” stickers (Link:Website). Packetloop presented their interpretation of Big Data Infosec at Blackhat EU 2012 (Link:PDF). Raffael followed up recently with his post (Link:Website). Moar visual analytics! Ed in this post suggested that Infosec stop using “stoplight reports” and using different metrics to get a situational aware (Link:Website).

Level 1 – Data Collection

  • Organizations in this level have data silos spitting out a variety of log and machine data collected from various sources and “Enterprise Security” systems. Most organization need humans to connect the silo’d data to interpret results. This in not very good position to be in because there is a severe reliance on humans processing the data.

Level 2 – Big Data Aggregation

  • Organizations in this level have some semblance of a plan and big data strategy. The organization is focused on integration activities to get the silo’d data in some sort of Data Warehouse/Hadoop/(name your flavour of big data technology here). A POC system is providing some insight/reporting that has traditionally required an Infosec analyst to produce.

Level 3 – Basic Tools of Analysis

  • Organizations have managed to stockpile months/years worth of data for a “data scientist” and “Infosec analyst” to spend time producing standard charts and reports already produced by other silo’d systems. The difference is the data mining/pattern matching focus on the complete Infosec dataset. Adhoc reports are pushed out of this group, but there is still quite a bit of hand holding to get a report generated. Hundreds of jobs are run nightly to produce the first Big Data Infosec KPI’s and metric reports.

Level 4 – Data Enrichment / ETL / Real-time Queries

  • Organizations have managed to get several big data and stats experts on staff. A mature system is in place. Migration plans are being executed for ever-greening and formal training plans for users and power users of the system. In this level, it’s now time to look at another system that focuses on taking the best of the best of the Hadoop Gen1 system and creating data specifically for real time queries, visual analytics, drill down analysis, exploration and analysis, and serious BI digging. Teams are dedicated to getting every ounce of efficiency of this system. DFIR, SOC, NOC, CISO tower, e-discovery, audit, and compliance, all routinely use this system as the “private Google search engine” to answer on demand questions, adhoc queries. Routine answers at the touch of a submit button.

Level 5 – Business Intelligence

  • Organizations have made a decision to open the data gates and formally allow users self serve, limited requests to the system. Real BI technologies provide historical, current of business operations. Reporting, dynamic reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining are typically “published” to internal company websites. Business intelligence aims to support better business decision-making.

Level 6 – Predictive Model – Attack Simulation – War Games

  • Joshua: Shall we play a game? Organizations have created thousands of models and have a solid understand of the business and priorities. The organization is planning to use predictive analytics and statistical techniques from modeling, machine learning, data mining and game theory that analyze current and historical facts to make predictions about future events. These reports require heavy interaction of the BI, visualization, and Infosec teams to produce real validated results.

Level 7 – What If Scenario’s by CISO

  • Organizations are confident enough to let the CISO have query access and access to the system without a team being present. This really is the “dream” dashboard that every CISO wants but never gets. S/He can self serve plausible outcomes to any question they have. True value of the Information Security organization can easily be shared, like “Tweets from an iPhone.” At this level, the CISO has a cape under his/her suit.

Level 8 – Data Democracy

  • This is where you set your data free. Free as in let any users have access to query the system. Users, administrator, outside institutions, and Interwebs are publicly to query your Infosec Big Data. At this level your Infosec organization is at a new level of transparency. Why does Infosec always guessing or investigating what’s normal or abnoral? Why is one person trying understand everyone elses data and patterns? At this level every user is empowered to participate in their own security discovery? Social Collective Security Intelligence (SCSI?). Infosec could be considered “social” at this level opposed to secretive. Users consume and routinely self serve vanity queries to “pump their own ego’s” because the have access to “life statistics”. (

Where does your organization fall into the Big Data Infosec Maturity scale? Wanna read more on building your own Predictive Analytics Engine blog series?

One response to “Big Data Infosec – Bigsnarf Open Source Solution

  1. Martin March 30, 2012 at 7:40 pm

    Nice post! I saw it linked from your comment on Anton’s blog. We’re at level 8 if you toss level 6 (which I’m skeptical of, any examples?). Not only can our CISO tweet the data, he can find out who else has tweeted from an iPhone from his iPhone. I think I would replace level 6 with something regarding integrating with existing enterprise tools and non-security data repositories. That may be implied by some of the other levels, but I think it’s worth making it explicit. For example, I just added generic SQL transforms to ELSA ( which will perform subqueries against arbitrary fields in arbitrary databases to enhance or filter map/reduce results. The result is direct security to CMDB (or billing, org charts, etc.) integration. In addition, cloud data transforms are crucial. We have transforms for things like whois lookups, DNSDB (passive DNS), and others which leverage other web services.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: