BigSnarf blog

Infosec FTW

Category Archives: Thoughts

How I learned to program

Sorry honey, not tonight I’ve gotta get this Apache Spark Fat jar compiled and shipped

Streaming Prototype

Apache Spark Use Casesstreaming-arch

Our specific use case

Screen Shot 2015-05-23 at 11.18.22 AM

Kinesis gets raw logs

Screen Shot 2015-05-21 at 5.14.41 PM

Spark Streaming does the counting

Screen Shot 2015-05-21 at 5.14.04 PM

Two Tables Created, One for Kinesis Log Position and the Second for Aggregates

Screen Shot 2015-05-23 at 10.40.15 AM

DynamoDB stores the aggregations

Screen Shot 2015-05-21 at 5.12.11 PM

 

mySparkStreaming

 

https://github.com/snowplow/spark-streaming-example-project

Building Custom Queries, Grouping, Aggregators and Filters for Apache Spark

Query Metrics

Returns a list of metric values based on a set of criteria. Also returns a set of all tag names and values that are found across the data points.

The time range can be specified with absolute or relative time values. Absolute time values are in milliseconds. Relative time values are specified as an integer duration and a unit. Possible unit values are “milliseconds”, “seconds”, “minutes”, “hours”, “days”, “weeks”, “months”, and “years”. For example, “5 hours” means that metric values submitted 5 hours ago will be returned. The end time is optional. If no end time is specified, the end time is assumed to be now (the current date and time).

Grouping

The results of the query can be grouped together.There are three ways to group the data; by tags, by a time range, and by value. Grouping is done with the groupBy or groupByKey which is an array of one or more groupers.

Aggregators

Aggregators perform an operation on data points and down samples. For example, you could sum all data points that exist in 5 minute periods.

Aggregators can be combined together. For example, you could sum all data points in 5 minute periods then average them for a week period.

Filtering

It is possible to filter the data returned by specifying a tag. The data returned will only contain data points associated with the specified tag. Filtering is done using the “tags” property.

Links

Amazon introduces ML service

Face Detection Rasperry Pi 2 Day

DataFrames meet Apache Spark 1.3

Spark Scala Notebook incubating Apache video

Norvig on Machine Learning

Happy Pancake Stack

Follow

Get every new post delivered to your Inbox.

Join 50 other followers