BigSnarf blog

Infosec FTW

Monthly Archives: May 2016

Who is inside your bot?

Hors d’oeuvre Machine Learning Algorithms


A Tour of Machine Learning Algorithms

Text Feature Extraction in Frequency Matrix


uses of syntaxnet a.k.a parse mcparseface

Google’s SyntaxNet and Sentiment Analysis


Data Engineering

Screen Shot 2016-05-25 at 12.17.10 PM

Screen Shot 2016-06-14 at 9.55.01 AM

Security ChatBots

NLP Use Cases


Identify compromised logins

A collection of data models for real-time analysis, behaviour analysis, and artificial intelligence (AI) to quickly predict between valid and malicious user activity

CloudTrail provides an audit trail of the API activity in your AWS Environment. In order to maintain compliance with one of the many auditing standards, you need to implement continuous monitoring and demonstrate the ability to provide evidence when needed.



Use behavioural patterns and build an identity profile for each user

  1. Sign in from unknown locations algorithm – Approximations and comparisons for each login against the population logins reachability score. Kinda like graph random walks:

2.  Impossible travel or logins too close together algorithm – Create likelihood of logins comparison of two places. Another was is geo-distance algos:

  • Login location 1 east coast
  • Login location 2 west coast
  • Create high, medium, low and output prediction

Screen Shot 2016-05-11 at 9.27.11 AM

3. Credential leak algorithm – check if AWS leaked

4. Sign-in anonymous IP or Tor algorithm – check if logins match ip address blacklist

5. Malware alerts correlated to antivirus or phishing campaigns opening algo – check if person machine compromised blacklist or blacklist dns lately

Screen Shot 2016-05-11 at 5.03.27 PM

6. Rolling 47 day window of ip address and location history using InfluxDB and CQ

7. Rolling 47 day window of Browser and OS history simple rolling window

8. Keep counters history on logins and failures for each user -tdigest or list of counts per user

9. Keep IP address history for each user and correlate different user logins from untrusted sources. Alert on flag on two logins, fail on three logins from untrusted

10. Locations time browser os for password resets – correlated to

11. Was this you email/slack message? Using feedback loop with security slackbot in validating admin and superuser logins to reduce 2nd factor flags/emails and updating models with data to create trusted user data profile


Identity is the perimeter


Most anomaly detection techniques I’ve seen is to record login data.

  • Alert of threshold
  • Alert threshold over time
  • Maintain a count for logins
  • Maintain a count for failures
  • Track IP Addresses

Ensemble ML Diagrams




Small set labelled data – Bayesian Convolutional Neural Networks