BigSnarf blog

Infosec FTW

Category Archives: Tools

Process logs with Kinesis, S3, Apache Spark on EMR, Amazon RDS

Apache Spark Streaming and AWS Kinesis integration in version 1.1.0

OpenSOC Machine Learning

Screen Shot 2014-09-26 at 3.26.58 PM Screen Shot 2014-09-26 at 3.23.34 PM Screen Shot 2014-09-26 at 3.22.05 PM Screen Shot 2014-09-26 at 3.20.43 PM

Self Hosted Maven repo on S3

s3cmd mb s3://www.example.mavenrepo
s3cmd ws-create s3://www.example.mavenrepo
mkdir com
cd com
mkdir amazonaws
cd amazonaws/
mkdir amazon-kinesis-connector
cd amazon-kinesis-connector
mkdir 1.0.0
cd com/amazonaws/amazon-kinesis-connector/1.0.0/
s3cmd -P sync /home/ubuntu/com/amazonaws/amazon-kinesis-connector/1.0.0 s3://www.example.mavenrepo/snapshots/com/amazonaws/amazon-kinesis-connector/1.0.0/

"AWS Snapshots" at ""


Monitoring JVM


Scala REPL in Notebook

Screen Shot 2014-08-26 at 10.29.31 PM

Simple Apache Auth Log Processing with Spark job

Screen Shot 2014-08-03 at 10.07.22 PM
Screen Shot 2014-08-03 at 10.13.14 PM



Simple Spark Job for processing Apache auth.log for Invalid user login attempts and Failed password counts
./bin/spark-submit --class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
 def main(args: Array[String]) {
 val logFile = "/Users/antigen/Downloads/sanitized_log/auth.log" 
 val conf = new SparkConf().setAppName("SimpleApacheLogProcessing Application")
 val sc = new SparkContext(conf)
 val logData = sc.textFile(logFile, 2).cache()
 val numAs = logData.filter(line => line.contains("Invalid user")).count()
 val numBs = logData.filter(line => line.contains("Failed password")).count()
 println("Lines with INVALID USER: %s, Lines with FAILED PASSWORD: %s".format(numAs, numBs))

Code, Folder Structure, simple.sbt, and packaged jar files here:

Data Science Stack

Screen Shot 2014-08-02 at 10.47.09 PM

Finally got Algebird and Apache Log Parsing libraries into my Apache Spark REPL

Screen Shot 2014-08-02 at 9.06.57 AM

Apache Spark 1.0.1 – SQL – Load from files, JSON and arrays


Get every new post delivered to your Inbox.

Join 43 other followers