BigSnarf blog

Infosec FTW

Category Archives: Thoughts

Table Flip


Intrusion Detection approaches for Anomaly Detection still relies on the Analyst not Software

Typical approaches for Anomaly Detection

  1. Statistical anomaly detection using 90th and 99th percentile T-Digest Algorithm, Time Series Analysis, Heavy Hitters, TopK
  2. Distance based methods like SimHash and LSH on features
  3. Rule-based detection using Data Mining (geoLocation, login behaviors per day, workstation, time)
  4. Signature-based detection using Snort and BRO
  5. Model based AD built on tons of features for DNS traffic, Users, Servers
  6. Change Detection
  7. Machine Learning

Typical approaches for Analyst ad-hoc query detection

  1. Visual Analysis
  2. Alert investigation
  3. Correlation Analysis
  4. Search
  5. SQL
  6. Time Series Analysis
  7. Graph Processing Queries

Malware Detection with Algebird LSH

Detection of polymorphic malware variants by identifying features based on static/dynamic analysis and using Locality-sensitive hashing (LSH) data structure for comparisons. Enrich? Geo? Host?

Couple papers?

Brute force comparison. Return distinct matches above threshold.

.flatMap { case (_, malwareIdSet) =>
      for {
        (malwareId1, sig1) <- malwareIdSet
        (malwareId2, sig2) <- malwareIdSet
        sim = minHasher.similarity(sig1, sig2)
         if (malwareId1 != malwareId2 && sim >= targetThreshold)
      } yield (malwareId1, malwareId2)

When my Scala code compiles with no errors


How I learned to program

Sorry honey, not tonight I’ve gotta get this Apache Spark Fat jar compiled and shipped

Streaming Prototype

Apache Spark Use Casesstreaming-arch

Our specific use case

Screen Shot 2015-05-23 at 11.18.22 AM

Kinesis gets raw logs

Screen Shot 2015-05-21 at 5.14.41 PM

Spark Streaming does the counting

Screen Shot 2015-05-21 at 5.14.04 PM

Two Tables Created, One for Kinesis Log Position and the Second for Aggregates

Screen Shot 2015-05-23 at 10.40.15 AM

DynamoDB stores the aggregations

Screen Shot 2015-05-21 at 5.12.11 PM



Building Custom Queries, Grouping, Aggregators and Filters for Apache Spark

Query Metrics

Returns a list of metric values based on a set of criteria. Also returns a set of all tag names and values that are found across the data points.

The time range can be specified with absolute or relative time values. Absolute time values are in milliseconds. Relative time values are specified as an integer duration and a unit. Possible unit values are “milliseconds”, “seconds”, “minutes”, “hours”, “days”, “weeks”, “months”, and “years”. For example, “5 hours” means that metric values submitted 5 hours ago will be returned. The end time is optional. If no end time is specified, the end time is assumed to be now (the current date and time).


The results of the query can be grouped together.There are three ways to group the data; by tags, by a time range, and by value. Grouping is done with the groupBy or groupByKey which is an array of one or more groupers.


Aggregators perform an operation on data points and down samples. For example, you could sum all data points that exist in 5 minute periods.

Aggregators can be combined together. For example, you could sum all data points in 5 minute periods then average them for a week period.


It is possible to filter the data returned by specifying a tag. The data returned will only contain data points associated with the specified tag. Filtering is done using the “tags” property.


Amazon introduces ML service

Face Detection Rasperry Pi 2 Day


Get every new post delivered to your Inbox.

Join 52 other followers