BigSnarf blog

Infosec FTW

Monthly Archives: May 2015

Compile Apache Spark with Kinesis Support

Creating Kinesis Stream in pictures












How I learned to program

Sorry honey, not tonight I’ve gotta get this Apache Spark Fat jar compiled and shipped

Streaming Prototype

Apache Spark Use Casesstreaming-arch

Our specific use case

Screen Shot 2015-05-23 at 11.18.22 AM

Kinesis gets raw logs

Screen Shot 2015-05-21 at 5.14.41 PM

Spark Streaming does the counting

Screen Shot 2015-05-21 at 5.14.04 PM

Two Tables Created, One for Kinesis Log Position and the Second for Aggregates

Screen Shot 2015-05-23 at 10.40.15 AM

DynamoDB stores the aggregations

Screen Shot 2015-05-21 at 5.12.11 PM



Building Custom Queries, Grouping, Aggregators and Filters for Apache Spark

Query Metrics

Returns a list of metric values based on a set of criteria. Also returns a set of all tag names and values that are found across the data points.

The time range can be specified with absolute or relative time values. Absolute time values are in milliseconds. Relative time values are specified as an integer duration and a unit. Possible unit values are “milliseconds”, “seconds”, “minutes”, “hours”, “days”, “weeks”, “months”, and “years”. For example, “5 hours” means that metric values submitted 5 hours ago will be returned. The end time is optional. If no end time is specified, the end time is assumed to be now (the current date and time).


The results of the query can be grouped together.There are three ways to group the data; by tags, by a time range, and by value. Grouping is done with the groupBy or groupByKey which is an array of one or more groupers.


Aggregators perform an operation on data points and down samples. For example, you could sum all data points that exist in 5 minute periods.

Aggregators can be combined together. For example, you could sum all data points in 5 minute periods then average them for a week period.


It is possible to filter the data returned by specifying a tag. The data returned will only contain data points associated with the specified tag. Filtering is done using the “tags” property.


Netflix Security tool – FIDO


FIDO is an orchestration layer that automates the incident response process by evaluating, assessing and responding to malware and other detected threats.

Tracking attackers using heatmap visualization with Google Maps v3 and Heatmap Layer

Screen Shot 2015-05-07 at 9.02.12 AM