BigSnarf blog

Infosec FTW

Monthly Archives: December 2012

Mike’s awesome d3py python library in iPython Notebook

Multi-Dimensional charting built to work natively with crossfilter rendered using d3.js

This is a follow up article on my original post when I first came across this library for cross-filtering and rendering data with d3.js. In this post I will visually walk the reader through installing (and hopefully soon – getting a sample dataset up and working with this library). Here is the original post.

Step 1 – Download node.js

Screen Shot 2012-12-28 at 10.09.38 AM

Step 2 – Find package in your download folder

Screen Shot 2012-12-28 at 10.10.39 AM

Step 3 – Open pkg

Screen Shot 2012-12-28 at 10.11.32 AM

Step 4 – Install pkg

Screen Shot 2012-12-28 at 10.11.59 AM

Step 5 – Installer will need your authentication to install

Screen Shot 2012-12-28 at 10.12.25 AM

Step 6 – Open favorite editor and create JavaScript file for testing

Screen Shot 2012-12-28 at 10.14.48 AM

Step 7 – Start node with your example.js

Screen Shot 2012-12-28 at 10.15.59 AM

Step 8 – Use favorite browser to render JavaScript file

Screen Shot 2012-12-28 at 10.17.47 AM   

Step 9 – Download npm

Screen Shot 2012-12-28 at 10.19.23 AM

Step 10 – Install npm with sudo make install

Screen Shot 2012-12-28 at 10.20.24 AM

Step 11 – Install npm install dc

Screen Shot 2012-12-28 at 10.47.20 AM

Step 12 – Obtain data!/data

Screen Shot 2012-12-28 at 11.12.44 AM

Step 13- Install dc.js to node

Screen Shot 2012-12-28 at 11.58.35 AM

Screen Shot 2012-12-28 at 12.01.14 PM

Step 14 – ./make dc.js

Screen Shot 2012-12-28 at 12.02.27 PM

Step 15 – npm install -g express

Screen Shot 2012-12-28 at 4.04.06 PM

Loading PCAP DNS traffic into ElasticSearch for RESTful queries

ElasticSearch Instance populated with custom python loader

Screen Shot 2012-12-24 at 12.25.15 PM

Searchable by port or answer (potentially 24 billion records queried in less than 1 second)

Screen Shot 2012-12-24 at 1.12.39 PM

Searchable by time

Screen Shot 2012-12-24 at 1.16.06 PM


Wild Card Search for *google* and slice by time “12:12”

Screen Shot 2012-12-24 at 1.24.45 PM


Tool selection is fun

Building a Twitter classifier using K-Means in python


Feature Selection

  • Tweet text
  • Tweeted URL
  • Tweeted hashtag
  • Following
  • Retweeting
  • Twitter Bio
  • URLs
  • Photos
  • Location

K-means Clustering with pycluster

Watching Events or Disasters Unfold in realtime from Twitter

  • Pinterest style layout fed from MongoDB and Tweetstream

Account security on Facebook and machine learning opportunities

Screen Shot 2012-12-21 at 8.33.26 PM

Identity is the new perimeter

Successful identity management is good authentication. Good authentication needs user feedback. Most applications still rely on username and password for access. Machine Learning can help identify misuse.

Using machine learning to identify password hacking attempts


Adversarial machine learning – “the study of effective machine learning techniques against an adversarial opponent”

Looking at failed attempts is just one feature in trying to identify password hacking attempts. Above is a traditional dashboard for helping humans identify passwords usage abuse. Finding “badguys” in the sea of authenticated users is not trivial. Building an algorithm to catch “badguys” seems like a tough problem.

Machine learning can help identify potential abuses. Below is a few features of passwords that I can think of right now:

  • Source IP address
  • Browser information
  • User agent string
  • Cookies in the browser
  • Time of the login
  • Location of login
  • Incorrect password guesses
  • Origin-bound certificates
  • One-time SMS codes
  • One-time email code requests
  • One-time website code requests
  • Typing dynamics and history
  • Additional authentication from unknown devices
  • Secondary authentication on top of password
  • Usage behaviour
  • Password change behaviour
  • New User login locations
  • New User login IP
  • New User login devices
  • Time of day probability of login
  • Location probability of login
  • Device probability of login
  • Destination IP probability of login
  • Login attempts from small sized botnets
  • Login attempts from large sized botnets – Million+

Potential Machine Learning Algorithm

  1. Google’s Page Rank variation algorithm that leverages prior knowledge of both malicious, benign domains, new domain for  rank assignment encountered from user logins, locations, etc based on the above features.
  2. Personalized Page Rank variation algorithm that focuses on what happens just before and just after logins combined with features learning in the above algorithm.
  3. Fast Flux domains algorithm and popup IP addresses that appear to be new or are only valid for a limited period of time. Combined with potential information from spam nets, botnets, Google safe browsing, malware, blacklists and whitelists can be leveraged to alert on high probability of risk.
  4. Password history analysis algorithm interest in password changes from a combo of above algorithms and patterns in password changes. Time, location, device, passwords, password history, password change history, source and destination are all strong features.
  5. User modeling: age, sex, group, location
  6. User recommendation: cosine similarity, collaborative filtering, ARL ranking users. Maybe even features based on login histories to identify genuine logins.
  7. User reputation: pagerank for user reputations or weighted pagerank
  8. Analysis on pairwise user interactions, and logistic regression model to predict strength of user ties
  9. Analytics and metrics by country, language, user login history using cohort and session analysis identify anomalies
  10. Instead of hashtag, geotag, entities or conversation threads, these features can be substituted for logins, servers contacted, geo etc.
  11. Top users and user rankings based on contact, combined with recency metrics and calendar metrics

In 2016: I saw this

Why don’t we look at password logins with machine learning?


Passwords and machine learning

“Large providers already use many other input signals including the source IP address, browser information and user agent string, cookies cached on the browser, the time of the login and the number of incorrect password guesses. More factors can be added over time: more complex behavioral profiles of users, cryptographic means to identify browsers like origin-bound certificates, one-time codes sent by SMS or generated by a mobile device, or perhaps lightweight biometrics like typing dynamics.”

We use passwords for everything, logging into our workstations, hopefully on your iDevices. We log into Facebook, LinkedIn, Google, gmail, Hotmail, Yahoo etc. You either use just one password for everything or have a bunch of passwords. Its all based on secrets that you have to remember.

I’m more concerned about trying to catch the “badguys” walking among the endless authentication stream of logs and login credentials  identities and stolen passwords. We have created baselines, anomaly detection and tuned sensitive on these technologies, but the “badguys” seem to still be winning. How do we tackle the deluge of data from authentication systems? Just watching for the spikes in traffic is not enough.

Machine learning and classifiers have been deployed to detect malicious behavior ranging from spam to terrorism. “Badguys” are getting better at flying under the radar. Why don’t we look at password logins with machine learning? Why don’t we start authenticating the human?

Read more

Visualizing size of 1000 DNS packets

Screen Shot 2012-12-18 at 2.59.24 PM

Screen Shot 2012-12-18 at 5.18.48 PM

Dataset yet to analyze

Assessing Outbound DNS Traffic to Uncover (APT) Advanced Persistent Threat


Advanced persistent threat (APT)

APT usually refers to a group, such as a foreign government, with both the capability and the intent to persistently and effectively target a specific entity. The term is commonly used to refer to cyber threats, in particular that of Internet-enabled espionage using a variety of intelligence gathering techniques to access sensitive information, but applies equally to other threats such as that of traditional espionage or attack.

Other recognized attack vectors include infected media, supply chain compromise, and social engineering. Individuals, such as an individual hacker, are not usually referred to as an APT as they rarely have the resources to be both advanced and persistent even if they are intent on gaining access to, or attacking, a specific target.

13 Signs that bad guys are using DNS Exfiltration to steal your data

UDP 53 Indicators of Exfiltration

  • encrypted payloads
  • MD5, SHA1, SHA256 hashed subdomains
  • lots of requests to restricted or suspicious domains
  • lots of requests to one domain
  • lots of requests to fast flux domains
  • plain text requests of subdomains
  • DNS replies have private addresses
  • DNS replies have single IP address
  • lots of DNS traffic going to bad guy country
  • DNS replies have patterned encoding
  • Packet size outside the normal distribution
  • Pattern of many requests to specific domains in round robin pattern
  • Spike in DNS byte count across normal traffic patterns
  • Multiple Pointer Records for single query
  • PTR records do not appear to be coming from org ASN

I should also note that there are “DNS Firewalls” that inspect traffic and also work on blacklists to block. I also saw a DNSsec course.