About Bigsnarf – Open Source Big Data Security Analytics
Do you know what machines are compromised on your network? Are you missing data? How confident are you? The amount of data in Information Security is growing so fast, traditional database systems will have difficulty scaling. As of March 2012, there are only a few open source big data solutions and recipes published for the security industry. As of February 2013, there are a couple of commercial options for big data analytics geared towards the Information Security industry.
This blog is a collection of ideas, tools, and framework to leverage all the goodness of open source solutions, Hadoop MapReduce, and the big data abilities focused on Information Security. You will also find stream processing, in memory processing, parallel processing with clusters, and predictive analytics. I also look at instrumentation of applications and log processing.
With this information in this blog, organizations can analyze mountains of data. The ability to analyze large InfoSec datasets will become a key basis of advantage for Information Security groups. Bigsnarf is open source Security Investigation Analytics.
Closing the gap between compromise and resolution underpins new methods of innovation and investigation for DFIR and Infosec. Every organization will have to grapple with the implications of big data. BigSnarf can:
- help organizations understand steps required to set up a security data analytics team
- help organizations manage the increase the volume and detail of data captured by enterprises
- index and store full context network PCAPs
- index and store network log data
- index and store individual log data
- index and store RAM images of resident memory snapshots on individual machines
- index and store MD5 hashes of all files on HDD of individual machines
- index and store snapshots of all running processes of individual machines
- analytical models of trusted user traffic and behaviour
- analytical models of untrusted user and machine traffic and behaviour
- use of machine learning to detect and identify anomalous behaviour of untrusted traffic
- use machine learning to cluster malware, identify who handled stolen data, identify graph of connected systems
- use machine learning, fuzzy hashing, and massive datastore for “finding needles in haystacks”
Essentially: play, record, pause, forward, rewind and review full context history of everything going to, from, and running on individual machines in the network, and outside the network. Collection of data. Indexing, storage and processing of data. Real-time search of data. Data analytics. Predictive analytics.