Solutions for Big Data ingest of network traffic – Analyzing PCAP traffic with Hadoop
March 28, 2012Posted by on
Building a system that can do full context PCAP for a single machine is trivial, IMHO compared to creating predictive algorithms for analyzing PCAP traffic. There are log data search solutions like Elasticsearch, GreyLog2, ELSA, Splunk and Logstash that can help you archive and dig through the data.
My favorite network traffic big data solution (2012) is PacketPig. I found this nice setup by Alienvault. Security Onion is a great network security IDS etc distro. I have seen one called xtractr, MR for forensics. Several solutions exist and PCAP files can be fed to the engines for analysis. I think ARGUS has a place here too, but I haven’t tackled it yet.
I started using PCAP to CSV conversion perl program, and written my own sniffer to csv in scapy. Super Timelines are being done in python too. Once I get a PCAP file converted to csv, I load it up to HDFS via HUE. I also found this PCAP visualization blog entry by Raffael Marty.
I’ve stored a bunch of csv network traces and did analysis using HIVE and PIG queries. It was very simple. Name the columns and query each column looking for specific entries. Very labour intensive. Binary analysis on Hadoop.
I’m working on a MapReduce library that uses machine learning to classify attackers and their network patterns. As of 2013, there are a few commercial venders like IBM and RSA which have added Hadoop capability to their SIEM product lines. Here is Twitters logging setup.
There are a few examples of PCAP ingestion with open source tools like Hadoop:
The second presentation I found was Wayne Wheelers – SherpaSurfing and https://github.com/sherpasurfing/SHERPASURFING:
The third I found was https://github.com/RIPE-NCC/hadoop-pcap: