BigSnarf blog

Infosec FTW

Text Feature Extraction in Frequency Matrix

Data Engineering

Screen Shot 2016-05-25 at 12.17.10 PM

Security ChatBots

NLP Use Cases


Identify compromised logins

A collection of data models for real-time analysis, behaviour analysis, and artificial intelligence (AI) to quickly predict between valid and malicious user activity

CloudTrail provides an audit trail of the API activity in your AWS Environment. In order to maintain compliance with one of the many auditing standards, you need to implement continuous monitoring and demonstrate the ability to provide evidence when needed.



Use behavioural patterns and build an identity profile for each user

  1. Sign in from unknown locations algorithm – Approximations and comparisons for each login against the population logins reachability score. Kinda like graph random walks:

2.  Impossible travel or logins too close together algorithm – Create likelihood of logins comparison of two places. Another was is geo-distance algos:

  • Login location 1 east coast
  • Login location 2 west coast
  • Create high, medium, low and output prediction

Screen Shot 2016-05-11 at 9.27.11 AM

3. Credential leak algorithm – check if AWS leaked

4. Sign-in anonymous IP or Tor algorithm – check if logins match ip address blacklist

5. Malware alerts correlated to antivirus or phishing campaigns opening algo – check if person machine compromised blacklist or blacklist dns lately

Screen Shot 2016-05-11 at 5.03.27 PM

6. Rolling 47 day window of ip address and location history using InfluxDB and CQ

7. Rolling 47 day window of Browser and OS history simple rolling window

8. Keep counters history on logins and failures for each user -tdigest or list of counts per user

9. Keep IP address history for each user and correlate different user logins from untrusted sources. Alert on flag on two logins, fail on three logins from untrusted

10. Locations time browser os for password resets – correlated to

11. Was this you email/slack message? Using feedback loop with security slackbot in validating admin and superuser logins to reduce 2nd factor flags/emails and updating models with data to create trusted user data profile


Identity is the perimeter


Most anomaly detection techniques I’ve seen is to record login data.

  • Alert of threshold
  • Alert threshold over time
  • Maintain a count for logins
  • Maintain a count for failures
  • Track IP Addresses

Ensemble ML Diagrams




Small set labelled data – Bayesian Convolutional Neural Networks

Good Sources of Labelled Security Attack Data – The ongoing challenge

Screen Shot 2016-05-09 at 8.15.43 AM

We all need to watch for compromising account credentials.

  • password brute forcing/password guessing
  • password reset
  • phishing/whaling
  • credential leaks/harvesting
  • drive by compromise

How do you watch this stuff in the cloud? Workstations? Users?  Account breaches increase risk and gives a “bad guy” anywhere, anytime access.

Also, in regards to this interesting slide above from RSA conference. I would add:

  • Crawl – Public Data
  • Walk – HoneyPot Data
  • Jog – Red Team Data
  • Run – Shared Normalized Breach Data and Attach Methodology for PP rules (IMHO)

Retraining Inception 3 Tensorflow to recognize new task


Get every new post delivered to your Inbox.

Join 53 other followers