BigSnarf blog

Infosec FTW

Category Archives: Thoughts

RF, SVM, KNN ensembles training

Screen Shot 2016-03-26 at 12.24.10 AM

Spark OLAP

STL for anomaly detection

Table Flip

Flip_28_grande

Intrusion Detection approaches for Anomaly Detection still rely on the Analyst not Software

Typical approaches for Anomaly Detection

  1. Statistical anomaly detection using 90th and 99th percentile T-Digest Algorithm, Time Series Analysis, Heavy Hitters, TopK
  2. Distance based methods like SimHash and LSH on features
  3. Rule-based detection using Data Mining (geoLocation, login behaviors per day, workstation, time)
  4. Signature-based detection using Snort and BRO
  5. Model based AD built on tons of features for DNS traffic, Users, Servers
  6. Change Detection
  7. Machine Learning

Typical approaches for Analyst ad-hoc query detection

  1. Visual Analysis
  2. Alert investigation
  3. Correlation Analysis
  4. Search
  5. SQL
  6. Time Series Analysis
  7. Graph Processing Queries

Malware Detection with Algebird LSH

Detection of polymorphic malware variants by identifying features based on static/dynamic analysis and using Locality-sensitive hashing (LSH) data structure for comparisons. Enrich? Geo? Host?

Couple papers?

http://link.springer.com/chapter/10.1007/978-3-319-23461-8_6

http://link.springer.com/chapter/10.1007/978-3-319-23461-8_8

Brute force comparison. Return distinct matches above threshold.

.flatMap { case (_, malwareIdSet) =>
      for {
        (malwareId1, sig1) <- malwareIdSet
        (malwareId2, sig2) <- malwareIdSet
        sim = minHasher.similarity(sig1, sig2)
         if (malwareId1 != malwareId2 && sim >= targetThreshold)
      } yield (malwareId1, malwareId2)
    }
    .distinct

When my Scala code compiles with no errors

fatStacks

How I learned to program

Sorry honey, not tonight I’ve gotta get this Apache Spark Fat jar compiled and shipped

Streaming Prototype

Apache Spark Use Casesstreaming-arch

Our specific use case

Screen Shot 2015-05-23 at 11.18.22 AM

Kinesis gets raw logs

Screen Shot 2015-05-21 at 5.14.41 PM

Spark Streaming does the counting

Screen Shot 2015-05-21 at 5.14.04 PM

Two Tables Created, One for Kinesis Log Position and the Second for Aggregates

Screen Shot 2015-05-23 at 10.40.15 AM

DynamoDB stores the aggregations

Screen Shot 2015-05-21 at 5.12.11 PM

 

mySparkStreaming

 

https://github.com/snowplow/spark-streaming-example-project

Follow

Get every new post delivered to your Inbox.

Join 53 other followers