BigSnarf blog
Infosec FTW
Monthly Archives: December 2012
Mike’s awesome d3py python library in iPython Notebook
Posted by on December 30, 2012
Multi-Dimensional charting built to work natively with crossfilter rendered using d3.js
Posted by on December 28, 2012
This is a follow up article on my original post when I first came across this library for cross-filtering and rendering data with d3.js. In this post I will visually walk the reader through installing (and hopefully soon – getting a sample dataset up and working with this library). Here is the original post.
Step 1 – Download node.js
Step 2 – Find package in your download folder
Step 3 – Open pkg
Step 4 – Install pkg
Step 5 – Installer will need your authentication to install
Step 6 – Open favorite editor and create JavaScript file for testing
Step 7 – Start node with your example.js
Step 8 – Use favorite browser to render JavaScript file
Step 9 – Download npm
Step 10 – Install npm with sudo make install
Step 11 – Install npm install dc
Step 12 – Obtain data
http://buzzdata.com/azad2002/the-united-states-of-venture-capital-2011#!/data
Step 13- Install dc.js to node
Step 14 – ./make dc.js
Step 15 - npm install -g express
Loading PCAP DNS traffic into ElasticSearch for RESTful queries
Posted by on December 24, 2012
ElasticSearch Instance populated with custom python loader
https://github.com/bigsnarfdude/machineLearning/blob/master/esloader.py
Searchable by port or answer (potentially 24 billion records queried in less than 1 second)
Searchable by time
Wild Card Search for *google* and slice by time “12:12″
Building a Twitter classifier using K-Means in python
Posted by on December 23, 2012
Account security on Facebook and machine learning opportunities
Posted by on December 22, 2012
Using machine learning to identify password hacking attempts
Posted by on December 20, 2012
Adversarial machine learning - ”the study of effective machine learning techniques against an adversarial opponent”
Looking at failed attempts is just one feature in trying to identify password hacking attempts. Above is a traditional dashboard for helping humans identify passwords usage abuse. Finding “badguys” in the sea of authenticated users is not trivial. Building an algorithm to catch “badguys” seems like a tough problem.
Machine learning can help identify potential abuses. Below is a few features of passwords that I can think of right now:
- Source IP address
- Browser information
- User agent string
- Cookies in the browser
- Time of the login
- Location of login
- Incorrect password guesses
- Origin-bound certificates
- One-time SMS codes
- One-time email code requests
- One-time website code requests
- Typing dynamics and history
- Additional authentication from unknown devices
- Secondary authentication on top of password
- Usage behaviour
- Password change behaviour
- New User login locations
- New User login IP
- New User login devices
- Time of day probability of login
- Location probability of login
- Device probability of login
- Destination IP probability of login
- Login attempts from small sized botnets
- Login attempts from large sized botnets – Million+
Potential Machine Learning Algorithm
- Google’s Page Rank variation algorithm that leverages prior knowledge of both malicious, benign domains, new domain for rank assignment encountered from user logins, locations, etc based on the above features.
- Personalized Page Rank variation algorithm that focuses on what happens just before and just after logins combined with features learning in the above algorithm.
- Fast Flux domains algorithm and popup IP addresses that appear to be new or are only valid for a limited period of time. Combined with potential information from spam nets, botnets, Google safe browsing, malware, blacklists and whitelists can be leveraged to alert on high probability of risk.
- Password history analysis algorithm interest in password changes from a combo of above algorithms and patterns in password changes. Time, location, device, passwords, password history, password change history, source and destination are all strong features.
Why don’t we look at password logins with machine learning?
Posted by on December 19, 2012
Passwords and machine learning
“Large providers already use many other input signals including the source IP address, browser information and user agent string, cookies cached on the browser, the time of the login and the number of incorrect password guesses. More factors can be added over time: more complex behavioral profiles of users, cryptographic means to identify browsers like origin-bound certificates, one-time codes sent by SMS or generated by a mobile device, or perhaps lightweight biometrics like typing dynamics.”
We use passwords for everything, logging into our workstations, hopefully on your iDevices. We log into Facebook, LinkedIn, Google, gmail, Hotmail, Yahoo etc. You either use just one password for everything or have a bunch of passwords. Its all based on secrets that you have to remember.
I’m more concerned about trying to catch the “badguys” walking among the endless authentication stream of logs and login credentials identities and stolen passwords. We have created baselines, anomaly detection and tuned sensitive on these technologies, but the “badguys” seem to still be winning. How do we tackle the deluge of data from authentication systems? Just watching for the spikes in traffic is not enough.
Machine learning and classifiers have been deployed to detect malicious behavior ranging from spam to terrorism. “Badguys” are getting better at flying under the radar. Why don’t we look at password logins with machine learning? Why don’t we start authenticating the human?
Read more
http://www.lightbluetouchpaper.org/2012/12/14/authentication-is-machine-learning/
http://blaine-nelson.com/research/pubs/Huang-Joseph-AISec-2011
Visualizing size of 1000 DNS packets
Posted by on December 18, 2012
Assessing Outbound DNS Traffic to Uncover (APT) Advanced Persistent Threat
Posted by on December 15, 2012
Advanced persistent threat (APT)
APT usually refers to a group, such as a foreign government, with both the capability and the intent to persistently and effectively target a specific entity. The term is commonly used to refer to cyber threats, in particular that of Internet-enabled espionage using a variety of intelligence gathering techniques to access sensitive information, but applies equally to other threats such as that of traditional espionage or attack.
Other recognized attack vectors include infected media, supply chain compromise, and social engineering. Individuals, such as an individual hacker, are not usually referred to as an APT as they rarely have the resources to be both advanced and persistent even if they are intent on gaining access to, or attacking, a specific target.
http://en.wikipedia.org/wiki/Advanced_persistent_threat
13 Signs that bad guys are using DNS Exfiltration to steal your data
UDP 53 Indicators of Exfiltration
- encrypted payloads
- MD5, SHA1, SHA256 hashed subdomains
- lots of requests to restricted or suspicious domains
- lots of requests to one domain
- lots of requests to fast flux domains
- plain text requests of subdomains
- DNS replies have private addresses
- DNS replies have single IP address
- lots of DNS traffic going to bad guy country
- DNS replies have patterned encoding
- Packet size outside the normal distribution
- Pattern of many requests to specific domains in round robin pattern
- Spike in DNS byte count across normal traffic patterns
http://code.google.com/p/dnscapy/
https://github.com/bigsnarfdude/DFTP




























