BigSnarf blog

Infosec FTW

Category Archives: Thoughts

Joining firewall and geolocation log data with Apache Spark

Metrics vs Analytics vs Data Mining vs Machine Learning

Metrics    ->    Analytics    ->    Data Mining    ->    Machine Learning

1. Metrics is simple measurements of performance. Aggregates and counts of raw data.

2. Analytics is the ability to slice and dice metrics and aids in the the discovery and communication of meaningful patterns in data. Includes manual segmentation and filtering of data to discover patterns.

3. Data mining uses a computer to automatically discover patterns in larger dataset that the average human can manage. It aids in the discovery of meaningful patterns in data.

4. Machine learning users a computer to automatically discovering patterns in larger datasets. ML can also learn about new patterns and discover unknown patterns.  ML aids in the discovery of meaningful patterns and unknown patterns.


Me learning Scala, Akka, Spark, JVM, Intellij, sbt, Java, maven, ant, Functional programming – Meat grinding knob turner

Screen Shot 2014-09-29 at 11.58.29 AM

I thought this photo was really cool, then I found a better one



Finding attackers with Neo4J

Algebird Monoids for IP Addresses and counts in Scala

import com.twitter.algebird.Operators._

case class IPRecord(val ipAddress: String, val number: Int) extends Ordered[IPRecord] {
 def compare(that: IPRecord): Int = {
   val c = this.number - that.number
   if (c == 0) this.ipAddress.compareTo(that.ipAddress) else c

val oneOneOneOne = IPRecord("", 67391)
val twoTwoTwoTwo = IPRecord("", 48013573)
val threeThreeThreeThree = IPRecord("", 6470)
val fourFourFourFour = IPRecord("", 731)

val topIPAddress: Max[IPRecord] = Max(oneOneOneOne) + Max(twoTwoTwoTwo) + Max(threeThreeThreeThree) + Max(fourFourFourFour)
assert(topIPAddress.get == twoTwoTwoTwo)






Using machine learning for anomaly detection is not new but …

Algebird for Infosec Analytics


Get every new post delivered to your Inbox.

Join 41 other followers