BigSnarf blog

Infosec FTW

Category Archives: Tools

JSON ETL to Parquet using Apache Spark

Process logs with Kinesis, S3, Apache Spark on EMR, Amazon RDS

Apache Spark Streaming and AWS Kinesis integration in version 1.1.0

OpenSOC Machine Learning

Screen Shot 2014-09-26 at 3.26.58 PM Screen Shot 2014-09-26 at 3.23.34 PM Screen Shot 2014-09-26 at 3.22.05 PM Screen Shot 2014-09-26 at 3.20.43 PM

Self Hosted Maven repo on S3

s3cmd mb s3://www.example.mavenrepo
s3cmd ws-create s3://www.example.mavenrepo
mkdir com
cd com
mkdir amazonaws
cd amazonaws/
mkdir amazon-kinesis-connector
cd amazon-kinesis-connector
mkdir 1.0.0
cd com/amazonaws/amazon-kinesis-connector/1.0.0/
s3cmd -P sync /home/ubuntu/com/amazonaws/amazon-kinesis-connector/1.0.0 s3://www.example.mavenrepo/snapshots/com/amazonaws/amazon-kinesis-connector/1.0.0/



"AWS Snapshots" at "http://www.example.mavenrepo.s3.amazonaws.com/snapshots"

			

Monitoring JVM

BuS2MHPCMAAm-fk

Scala REPL in Notebook

Screen Shot 2014-08-26 at 10.29.31 PM

Simple Apache Auth Log Processing with Spark job

Screen Shot 2014-08-03 at 10.07.22 PM
Screen Shot 2014-08-03 at 10.13.14 PM

/* 

SimpleApp.scala

Simple Spark Job for processing Apache auth.log for Invalid user login attempts and Failed password counts
./bin/spark-submit --class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar

 */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
 def main(args: Array[String]) {
 val logFile = "/Users/antigen/Downloads/sanitized_log/auth.log" 
 val conf = new SparkConf().setAppName("SimpleApacheLogProcessing Application")
 val sc = new SparkContext(conf)
 val logData = sc.textFile(logFile, 2).cache()
 val numAs = logData.filter(line => line.contains("Invalid user")).count()
 val numBs = logData.filter(line => line.contains("Failed password")).count()
 println("Lines with INVALID USER: %s, Lines with FAILED PASSWORD: %s".format(numAs, numBs))
 }
}


Code, Folder Structure, simple.sbt, and packaged jar files here:

https://github.com/bigsnarfdude/SimpleApp

Data Science Stack

Screen Shot 2014-08-02 at 10.47.09 PM

Finally got Algebird and Apache Log Parsing libraries into my Apache Spark REPL

Screen Shot 2014-08-02 at 9.06.57 AM

Follow

Get every new post delivered to your Inbox.

Join 43 other followers