Process logs with Kinesis, S3, Apache Spark on EMR, Amazon RDS

Apache Spark now on AWS-EMR from S3

Joining firewall and geolocation log data with Apache Spark

val format = new java.text.SimpleDateFormat("yyyy-MM-dd")
case class Register (d: java.util.Date, uuid: String, cust_id: String, lat: Float, lng: Float)
case class Click (d: java.util.Date, uuid: String, landing_page: Int)

val reg = sc.textFile("geoLocation.tsv").map(_.split("\t")).map(
 r => (r(1), Register(format.parse(r(0)), r(1), r(2), r(3).toFloat, r(4).toFloat))

val clk = sc.textFile("dnsEntry.tsv").map(_.split("\t")).map(
 c => (c(1), Click(format.parse(c(0)), c(1), c(2).trim.toInt))



Metrics vs Analytics vs Data Mining vs Machine Learning

Metrics    ->    Analytics    ->    Data Mining    ->    Machine Learning

1. Metrics is simple measurements of performance. Aggregates and counts of raw data.

2. Analytics is the ability to slice and dice metrics and aids in the the discovery and communication of meaningful patterns in data. Includes manual segmentation and filtering of data to discover patterns.

3. Data mining uses a computer to automatically discover patterns in larger dataset that the average human can manage. It aids in the discovery of meaningful patterns in data.

4. Machine learning users a computer to automatically discovering patterns in larger datasets. ML can also learn about new patterns and discover unknown patterns.  ML aids in the discovery of meaningful patterns and unknown patterns.


Apache Spark Streaming and AWS Kinesis integration in version 1.1.0

Me learning Scala, Akka, Spark, JVM, Intellij, sbt, Java, maven, ant, Functional programming – Meat grinding knob turner

OpenSOC Machine Learning

Self Hosted Maven repo on S3

s3cmd mb s3://www.example.mavenrepo
s3cmd ws-create s3://www.example.mavenrepo
mkdir com
cd com
mkdir amazonaws
cd amazonaws/
mkdir amazon-kinesis-connector
cd amazon-kinesis-connector
mkdir 1.0.0
cd com/amazonaws/amazon-kinesis-connector/1.0.0/
s3cmd -P sync /home/ubuntu/com/amazonaws/amazon-kinesis-connector/1.0.0 s3://www.example.mavenrepo/snapshots/com/amazonaws/amazon-kinesis-connector/1.0.0/

"AWS Snapshots" at ""


Finding attackers with Neo4J


