BigSnarf blog

Infosec FTW

Using machine learning to identify password hacking attempts


Adversarial machine learning – “the study of effective machine learning techniques against an adversarial opponent”

Looking at failed attempts is just one feature in trying to identify password hacking attempts. Above is a traditional dashboard for helping humans identify passwords usage abuse. Finding “badguys” in the sea of authenticated users is not trivial. Building an algorithm to catch “badguys” seems like a tough problem.

Machine learning can help identify potential abuses. Below is a few features of passwords that I can think of right now:

  • Source IP address
  • Browser information
  • User agent string
  • Cookies in the browser
  • Time of the login
  • Location of login
  • Incorrect password guesses
  • Origin-bound certificates
  • One-time SMS codes
  • One-time email code requests
  • One-time website code requests
  • Typing dynamics and history
  • Additional authentication from unknown devices
  • Secondary authentication on top of password
  • Usage behaviour
  • Password change behaviour
  • New User login locations
  • New User login IP
  • New User login devices
  • Time of day probability of login
  • Location probability of login
  • Device probability of login
  • Destination IP probability of login
  • Login attempts from small sized botnets
  • Login attempts from large sized botnets – Million+

Potential Machine Learning Algorithm

  1. Google’s Page Rank variation algorithm that leverages prior knowledge of both malicious, benign domains, new domain for  rank assignment encountered from user logins, locations, etc based on the above features.
  2. Personalized Page Rank variation algorithm that focuses on what happens just before and just after logins combined with features learning in the above algorithm.
  3. Fast Flux domains algorithm and popup IP addresses that appear to be new or are only valid for a limited period of time. Combined with potential information from spam nets, botnets, Google safe browsing, malware, blacklists and whitelists can be leveraged to alert on high probability of risk.
  4. Password history analysis algorithm interest in password changes from a combo of above algorithms and patterns in password changes. Time, location, device, passwords, password history, password change history, source and destination are all strong features.
  5. User modeling: age, sex, group, location
  6. User recommendation: cosine similarity, collaborative filtering, ARL ranking users. Maybe even features based on login histories to identify genuine logins.
  7. User reputation: pagerank for user reputations or weighted pagerank
  8. Analysis on pairwise user interactions, and logistic regression model to predict strength of user ties
  9. Analytics and metrics by country, language, user login history using cohort and session analysis identify anomalies
  10. Instead of hashtag, geotag, entities or conversation threads, these features can be substituted for logins, servers contacted, geo etc.
  11. Top users and user rankings based on contact, combined with recency metrics and calendar metrics

In 2016: I saw this

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: