BigSnarf blog

Infosec FTW

Creating my first algorithm from scratch – Euclidean distance and Pearson correlation

Leave a comment Posted by Security Dude on March 25, 2012

For this part of the exercise, I look at 2 IP Address and calculate similarity using Euclidean distance and Pearson correlation. I created a small dataset that is a nested dictionary. I did manual calculations, but python’s Pandas can work the numbers easily. I calculate the distance of Lisa from Kirk by isolating 1.1.1.1 and 2.2.2.2 and plot those on a graph. I do it for each of the combinations of people and each of the combinations of IP addresses. I even find people that are very similar and one that is not as similar. This model can help understand clusters and identify baseline conversations between people and visited IP addresses. Somehow it all makes sense to me.

talkers={‘Lisa’: {’1.1.1.1′: 2.5, ’2.2.2.2′: 3.5,
’3.3.3.3′: 3.0, ’4.4.4.4′: 3.5, ’5.5.5.5′: 2.5,
’6.6.6.6′: 3.0},
‘Kirk’: {’1.1.1.1′: 3.0, ’2.2.2.2′: 3.5,
’3.3.3.3′: 1.5, ’4.4.4.4′: 5.0, ’6.6.6.6′: 3.0,
’5.5.5.5′: 3.5},
‘Phillip’: {’1.1.1.1′: 2.5, ’2.2.2.2′: 3.0,
’4.4.4.4′: 3.5, ’6.6.6.6′: 4.0},
‘Dan’: {’2.2.2.2′: 3.5, ’3.3.3.3′: 3.0,
’6.6.6.6′: 4.5, ’4.4.4.4′: 4.0,
’5.5.5.5′: 2.5},
‘James’: {’1.1.1.1′: 3.0, ’2.2.2.2′: 4.0,
’3.3.3.3′: 2.0, ’4.4.4.4′: 3.0, ’6.6.6.6′: 3.0,
’5.5.5.5′: 2.0},
‘Britney’: {’1.1.1.1.’: 3.0, ’2.2.2.2′: 4.0,
’6.6.6.6′: 3.0, ’4.4.4.4′: 5.0, ’5.5.5.5′: 3.5},
‘Toby’: {’2.2.2.2′:4.5,’5.5.5.5′:1.0,’4.4.4.4′:4.0}}

from math import sqrt 
# Returns a distance-based similarity score for person1 and person2 
def sim_distance(prefs,person1,person2): 
  # Get the list of shared_items 
  si={} 
  for item in prefs[person1]: 
    if item in prefs[person2]: 
       si[item]=1 
  # if they have no ratings in common, return 0 
  if len(si)==0: return 0 
  # Add up the squares of all the differences 
  sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2) 
                      for item in prefs[person1] if item in prefs[person2]]) 
  return 1/(1+sum_of_squares)

Tools

← Trying to unlock Level 6 Achievement – Predictive Analytics Awesome Visual Exploration of Time with Timesearcher from HCIL →

BigSnarf blog

Creating my first algorithm from scratch – Euclidean distance and Pearson correlation

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta

BigSnarf blog

Creating my first algorithm from scratch – Euclidean distance and Pearson correlation

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta