BigSnarf blog

Infosec FTW

Monthly Archives: December 2012

Mike’s awesome d3py python library in iPython Notebook

1 Comment Posted by Security Dude on December 30, 2012

Thoughts

Multi-Dimensional charting built to work natively with crossfilter rendered using d3.js

Leave a comment Posted by Security Dude on December 28, 2012

This is a follow up article on my original post when I first came across this library for cross-filtering and rendering data with d3.js. In this post I will visually walk the reader through installing (and hopefully soon – getting a sample dataset up and working with this library). Here is the original post.

https://bigsnarf.wordpress.com/2012/07/14/great-visualization-and-drill-down-capability-with-this-d3-js-based-library/

Step 1 – Download node.js

Step 2 – Find package in your download folder

Step 3 – Open pkg

Step 4 – Install pkg

Step 5 – Installer will need your authentication to install

Step 6 – Open favorite editor and create JavaScript file for testing

Step 7 – Start node with your example.js

Step 8 – Use favorite browser to render JavaScript file

Step 9 – Download npm

Step 10 – Install npm with sudo make install

Step 11 – Install npm install dc

Step 12 – Obtain data

http://buzzdata.com/azad2002/the-united-states-of-venture-capital-2011#!/data

Step 13- Install dc.js to node

Step 14 – ./make dc.js

Step 15 – npm install -g express

Thoughts

Loading PCAP DNS traffic into ElasticSearch for RESTful queries

Leave a comment Posted by Security Dude on December 24, 2012

ElasticSearch Instance populated with custom python loader

https://github.com/bigsnarfdude/machineLearning/blob/master/esloader.py

Searchable by port or answer (potentially 24 billion records queried in less than 1 second)

Searchable by time

Wild Card Search for google and slice by time “12:12”

Thoughts

Tool selection is fun

Leave a comment Posted by Security Dude on December 23, 2012

http://selection.datavisualization.ch/

Tools

Building a Twitter classifier using K-Means in python

Leave a comment Posted by Security Dude on December 23, 2012

Feature Selection

Tweet text
Tweeted URL
Tweeted hashtag
Following
Retweeting
Twitter Bio
URLs
Photos
Location

K-means Clustering with pycluster

Watching Events or Disasters Unfold in realtime from Twitter

Pinterest style layout fed from MongoDB and Tweetstream

Thoughts

Account security on Facebook and machine learning opportunities

Leave a comment Posted by Security Dude on December 22, 2012

Identity is the new perimeter

Successful identity management is good authentication. Good authentication needs user feedback. Most applications still rely on username and password for access. Machine Learning can help identify misuse.

Thoughts

Using machine learning to identify password hacking attempts

Leave a comment Posted by Security Dude on December 20, 2012

Adversarial machine learning – “the study of eﬀective machine learning techniques against an adversarial opponent”

Looking at failed attempts is just one feature in trying to identify password hacking attempts. Above is a traditional dashboard for helping humans identify passwords usage abuse. Finding “badguys” in the sea of authenticated users is not trivial. Building an algorithm to catch “badguys” seems like a tough problem.

Machine learning can help identify potential abuses. Below is a few features of passwords that I can think of right now:

Source IP address
Browser information
User agent string
Cookies in the browser
Time of the login
Location of login
Incorrect password guesses
Origin-bound certificates
One-time SMS codes
One-time email code requests
One-time website code requests
Typing dynamics and history
Additional authentication from unknown devices
Secondary authentication on top of password
Usage behaviour
Password change behaviour
New User login locations
New User login IP
New User login devices
Time of day probability of login
Location probability of login
Device probability of login
Destination IP probability of login
Login attempts from small sized botnets
Login attempts from large sized botnets – Million+

Potential Machine Learning Algorithm

Google’s Page Rank variation algorithm that leverages prior knowledge of both malicious, benign domains, new domain for rank assignment encountered from user logins, locations, etc based on the above features.
Personalized Page Rank variation algorithm that focuses on what happens just before and just after logins combined with features learning in the above algorithm.
Fast Flux domains algorithm and popup IP addresses that appear to be new or are only valid for a limited period of time. Combined with potential information from spam nets, botnets, Google safe browsing, malware, blacklists and whitelists can be leveraged to alert on high probability of risk.
Password history analysis algorithm interest in password changes from a combo of above algorithms and patterns in password changes. Time, location, device, passwords, password history, password change history, source and destination are all strong features.
User modeling: age, sex, group, location
User recommendation: cosine similarity, collaborative filtering, ARL ranking users. Maybe even features based on login histories to identify genuine logins.
User reputation: pagerank for user reputations or weighted pagerank
Analysis on pairwise user interactions, and logistic regression model to predict strength of user ties
Analytics and metrics by country, language, user login history using cohort and session analysis identify anomalies
Instead of hashtag, geotag, entities or conversation threads, these features can be substituted for logins, servers contacted, geo etc.
Top users and user rankings based on contact, combined with recency metrics and calendar metrics

In 2016: I saw this https://twitter.com/ram_ssk/status/707712375726563328

Thoughts

Why don’t we look at password logins with machine learning?

Leave a comment Posted by Security Dude on December 19, 2012

Passwords and machine learning

“Large providers already use many other input signals including the source IP address, browser information and user agent string, cookies cached on the browser, the time of the login and the number of incorrect password guesses. More factors can be added over time: more complex behavioral profiles of users, cryptographic means to identify browsers like origin-bound certificates, one-time codes sent by SMS or generated by a mobile device, or perhaps lightweight biometrics like typing dynamics.”

We use passwords for everything, logging into our workstations, hopefully on your iDevices. We log into Facebook, LinkedIn, Google, gmail, Hotmail, Yahoo etc. You either use just one password for everything or have a bunch of passwords. Its all based on secrets that you have to remember.

I’m more concerned about trying to catch the “badguys” walking among the endless authentication stream of logs and login credentials identities and stolen passwords. We have created baselines, anomaly detection and tuned sensitive on these technologies, but the “badguys” seem to still be winning. How do we tackle the deluge of data from authentication systems? Just watching for the spikes in traffic is not enough.

Machine learning and classifiers have been deployed to detect malicious behavior ranging from spam to terrorism. “Badguys” are getting better at flying under the radar. Why don’t we look at password logins with machine learning? Why don’t we start authenticating the human?

http://blaine-nelson.com/research/pubs/Huang-Joseph-AISec-2011

http://engineering.foursquare.com/2011/10/25/understanding-human-mobility-with-machine-learning-and-a-billion-check-ins/

https://www.facebook.com/note.php?note_id=10150172618258920

Thoughts

Visualizing size of 1000 DNS packets

Leave a comment Posted by Security Dude on December 18, 2012

Dataset yet to analyze

http://mawi.wide.ad.jp/mawi/

Thoughts

Assessing Outbound DNS Traffic to Uncover (APT) Advanced Persistent Threat

Leave a comment Posted by Security Dude on December 15, 2012

Advanced persistent threat (APT)

APT usually refers to a group, such as a foreign government, with both the capability and the intent to persistently and effectively target a specific entity. The term is commonly used to refer to cyber threats, in particular that of Internet-enabled espionage using a variety of intelligence gathering techniques to access sensitive information, but applies equally to other threats such as that of traditional espionage or attack.

Other recognized attack vectors include infected media, supply chain compromise, and social engineering. Individuals, such as an individual hacker, are not usually referred to as an APT as they rarely have the resources to be both advanced and persistent even if they are intent on gaining access to, or attacking, a specific target.

http://en.wikipedia.org/wiki/Advanced_persistent_threat

13 Signs that bad guys are using DNS Exfiltration to steal your data

UDP 53 Indicators of Exfiltration

encrypted payloads
MD5, SHA1, SHA256 hashed subdomains
lots of requests to restricted or suspicious domains
lots of requests to one domain
lots of requests to fast flux domains
plain text requests of subdomains
DNS replies have private addresses
DNS replies have single IP address
lots of DNS traffic going to bad guy country
DNS replies have patterned encoding
Packet size outside the normal distribution
Pattern of many requests to specific domains in round robin pattern
Spike in DNS byte count across normal traffic patterns
Multiple Pointer Records for single query
PTR records do not appear to be coming from org ASN

http://theworldsoldestintern.wordpress.com/2012/11/30/dns-exfiltration-udp-53-indicators-of-exfiltration-udp53ioe/

http://code.google.com/p/dnscapy/

https://github.com/bigsnarfdude/DFTP

http://blog.strategiccyber.com/2012/12/19/hacking-like-apt/

https://bigsnarf.wordpress.com/2013/02/20/classifying-malicious-dns-with-22-features-using-random-forests-presentation/

I should also note that there are “DNS Firewalls” that inspect traffic and also work on blacklists to block. I also saw a DNSsec course.

Thoughts

BigSnarf blog

Monthly Archives: December 2012

Mike’s awesome d3py python library in iPython Notebook

Multi-Dimensional charting built to work natively with crossfilter rendered using d3.js

Loading PCAP DNS traffic into ElasticSearch for RESTful queries

ElasticSearch Instance populated with custom python loader

Searchable by port or answer (potentially 24 billion records queried in less than 1 second)

Searchable by time

Wild Card Search for google and slice by time “12:12”

Tool selection is fun

Building a Twitter classifier using K-Means in python

Account security on Facebook and machine learning opportunities

Using machine learning to identify password hacking attempts

Adversarial machine learning – “the study of eﬀective machine learning techniques against an adversarial opponent”

Why don’t we look at password logins with machine learning?

Passwords and machine learning

Visualizing size of 1000 DNS packets

Assessing Outbound DNS Traffic to Uncover (APT) Advanced Persistent Threat

13 Signs that bad guys are using DNS Exfiltration to steal your data

Recent Posts

Archives

Categories

Meta

Monthly Archives: December 2012

ElasticSearch Instance populated with custom python loader

Searchable by port or answer (potentially 24 billion records queried in less than 1 second)

Searchable by time

Wild Card Search for *google* and slice by time “12:12”

Adversarial machine learning – “the study of eﬀective machine learning techniques against an adversarial opponent”

Passwords and machine learning

13 Signs that bad guys are using DNS Exfiltration to steal your data

Recent Posts

Archives

Categories

Meta

Wild Card Search for google and slice by time “12:12”