BigSnarf blog

Infosec FTW

Using Akka for backpressure to Spark


  • incident response checklists
  • memory capture
  • routine live memory analysis reporting
  • elasticsearch netflows
  • asset listing
  • users listing
  • users login histories
  • remote login histories
  • password changes histories
  • patch management
  • last contact firewall
  • last contact SIEM
  • software inventories
  • graph analysis netflows
  • ports listing
  • connection listing
  • stats. history
  • anomaly detection
  • dns resolution
  • asn
  • malware
  • blacklists
  • whitelists
  • google
  • long running history
  • zip functionality and hosting
  • ssdeep
  • cuckoo sandbox submission
  • md5
  • google safebrowsing
  • Carbon Black
  • NetWitness Pivot Query
  • RSA NetWitness
  • RSA Security Analytics graphs
  • url void
  • safe search
  • malware domain search
  • centralops
  • bit9 md5
  • virustotal
  • dns
  • asn
  • netflow
  • internal search history
  • betweeness and centrality measures
  • mathy anomaly detection
  • ML AD

JSON ETL to Parquet using Apache Spark

Process logs with Kinesis, S3, Apache Spark on EMR, Amazon RDS

Apache Spark now on AWS-EMR from S3

Joining firewall and geolocation log data with Apache Spark

val format = new java.text.SimpleDateFormat("yyyy-MM-dd")
case class Register (d: java.util.Date, uuid: String, cust_id: String, lat: Float, lng: Float)
case class Click (d: java.util.Date, uuid: String, landing_page: Int)

val reg = sc.textFile("geoLocation.tsv").map(_.split("\t")).map(
 r => (r(1), Register(format.parse(r(0)), r(1), r(2), r(3).toFloat, r(4).toFloat))

val clk = sc.textFile("dnsEntry.tsv").map(_.split("\t")).map(
 c => (c(1), Click(format.parse(c(0)), c(1), c(2).trim.toInt))



Metrics vs Analytics vs Data Mining vs Machine Learning

Metrics    ->    Analytics    ->    Data Mining    ->    Machine Learning

1. Metrics is simple measurements of performance. Aggregates and counts of raw data.

2. Analytics is the ability to slice and dice metrics and aids in the the discovery and communication of meaningful patterns in data. Includes manual segmentation and filtering of data to discover patterns.

3. Data mining uses a computer to automatically discover patterns in larger dataset that the average human can manage. It aids in the discovery of meaningful patterns in data.

4. Machine learning users a computer to automatically discovering patterns in larger datasets. ML can also learn about new patterns and discover unknown patterns.  ML aids in the discovery of meaningful patterns and unknown patterns.


Apache Spark Streaming and AWS Kinesis integration in version 1.1.0

Me learning Scala, Akka, Spark, JVM, Intellij, sbt, Java, maven, ant, Functional programming – Meat grinding knob turner

Screen Shot 2014-09-29 at 11.58.29 AM

I thought this photo was really cool, then I found a better one




Get every new post delivered to your Inbox.

Join 43 other followers