BigSnarf blog

Infosec FTW

Monthly Archives: January 2013

The emergence of a new security role – Security Data Analytics Engineer

Screen Shot 2013-01-30 at 7.23.46 PM

Security Data Analytics Engineer shall be responsible for carrying out engineering tasks to deliver a clustered computing environment.  The engineer shall design and build large-scale security data analytics platforms, using open source software and tools, Cloud based tools and COTS technologies. The engineer shall establish a security data analytics system that produces manageable, actionable intelligence from massive streams of a structured and semi-structured security data.

This is a broad engineering role which requires years of defensive security experience, automating data feeds from different sources, and encompasses building of the core frameworks and platforms to deal with the complexities of ingesting, storing, and manipulating masses of data in real-time.

This engineering role will research and analyse large volumes of data by applying advanced analytical tools and methodologies, build data analytic pipelines, build data processing pipelines, and drive analytical reports to security analysts and investigators for situational awareness.

The reports and analytics dashboards provide analysts and investigators the ability to identify, process, and comprehend critical elements of information about what is happening.

Job Qualifications:

Software engineering, machine learning, data mining, modelling users, modelling attackers, data visualization, big data, data analytics, investigations, ETL, data munging, data wrangling, pipeline automation, Information Security, DFIR. 25 percent Infosec, 25 percent DFIR, 25 percent business knowledge, 25 percent analytics expertise, 25 percent technological capabilities and 25 percent visualization.

Potential Example of a Big Data Security Data Analytics system:

Screen Shot 2013-01-30 at 9.37.00 PM

Cleaning tweetstream or twitter archive

import string

  'a', 'about', 'also', 'am', 'an', 'and', 'any', 'are', 'as', 'at', 'be',

  'but', 'by', 'can', 'com', 'did', 'do', 'does', 'for', 'from', 'had',

  'has', 'have', 'he', "he'd", "he'll", "he's", 'her', 'here', 'hers',

  'him', 'his', 'i', "i'd", "i'll", "i'm", "i've", 'if', 'in', 'into', 'is',

  'it', "it's", 'its', 'just', 'me', 'mine', 'my', 'of', 'on', 'or', 'org',

  'our', 'ours', 'she', "she'd", "she'll", "she's", 'some', 'than', 'that',

  'the', 'their', 'them', 'then', 'there', 'these', 'they', "they'd",

  "they'll", "they're", 'this', 'those', 'to', 'us', 'was', 'we', "we'd", 

  "we'll", "we're", 'were', 'what', 'where', 'which', 'who', 'will', 'with',

  'would', 'you', 'your', 'yours',

def clean_data(data):

  for char in string.punctuation:

    data = data.replace(char, "")

  return data

def clean_stop_words(data):

  for word in DEFAULT_STOP_WORDS:

    data = data.replace(word, "")
  return data

Hello World Disco MapReduce Framework – Infosec Style

Disco MapReduce Framework – getting started video

Counting Words: The “Hello World” of MapReduce

Screen Shot 2013-01-29 at 7.08.12 PM

Counting IP Addresses: The “Hello World” of MapReduce for Infosec

Screen Shot 2013-01-29 at 7.09.56 PM

egrep ‘[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}’

Screen Shot 2013-01-29 at 7.40.22 PM

1) Install Inferno:
pip install inferno


2) Read the docs:

Machine Learning -sklearn decision flowchart -Awesome!

Matplotlib images in Twitter Archive index.html

Screen Shot 2013-01-27 at 1.54.42 PM Screen Shot 2013-01-27 at 1.54.05 PM

Blur of tweets – Looking at Twitter Archive DataSet

Screen Shot 2013-01-26 at 10.55.17 PM    Screen Shot 2013-01-26 at 10.55.06 PM   Screen Shot 2013-01-26 at 10.54.59 PM

Top 25 in my Tweetstream




IPython Notebook and d3.js Mashup

Animated GIF – Key unlocking door


Gaming big data analytics for fun and profit – Dr. Evil Data Scientist


Building the Human Detection Algorithm – There is a history of gaming, foretelling the world of blackhat data science. 

4 Chan is an awesome example of early wins for blackhats

Identify the voting ring challenge

HackerNews HoneyPots