BigSnarf blog

Infosec FTW

Category Archives: Framework

Vincent Vega d3.js in python charts are super simple for pandas dataframes

Graphing different website user experiences

graph5 graph4 graph3

graph2

graph1

 

User experience (UX) involves a person’s emotions about using a particular productsystem or service. User experience highlights the experiential, affective, meaningful and valuable aspects of human-computer interaction and product ownership. Additionally, it includes a person’s perceptions of the practical aspects such as utility, ease of use and efficiency of the system. User experience is subjective in nature because it is about individual perception and thought with respect to the system. User experience is dynamic as it is constantly modified over time due to changing circumstances and new innovations.

http://en.wikipedia.org/wiki/User_experience

 

Metrics platitudes or just the Fogg behaviour grid applied to startups

d3.js mixedtape tutorials – creators gotta create

Bulk processing memory, network traces and HDD using fuzzy hashing and sdhash

Cloudera Impala for Real Time Queries in Hadoop

Machine Learning – LinkedIn profile matcher based on Skills tags

Screen Shot 2013-01-03 at 10.45.58 AM

Linkedin Profiles 4,2, and 1 matched to ‘jQuery’ etc. tags.

Linkedin Profiles 5 and 4 matched to ‘Data Analysis’ etc. tags

https://github.com/bigsnarfdude/machineLearning/tree/master/linkedin

Here is definitely something that will be part of the bigsnarf technology stack

 

iPython Notebook pandas data analysis of web logs and auth logs

Get code here:

https://github.com/dgleebits/PythonSystemAdminTools/blob/master/pandasAuthLogAnalysis.ipynb

Get sample attack data set here:

http://honeynet.org/files/sanitized_log.zip

Thanks to Vincent for testing the code and helping out with the screenshots.

Influences

http://pixlcloud.com/

Using pandas to report on apache web logs

So I got this new book:

Step 1 – Start with this Forensic Challenge dataset:

http://honeynet.org/files/sanitized_log.zip

Step 2 – Build program without pandas:

#! /usr/bin/python
”’
This program takes in a apache www-media.log and provides basic report
”’
for collections import Counters
ipAddressList = []
methodList = []
requestedList = []
referalList = []
mylist = []
data = open(‘www-media.log’).readlines()
for line in data:
     ipAddressList.append(line.split()[0])
     requestedList.append(line.split()[6])
    methodList.append(line.split()[5])
    referalList.append(line.split()[10])
count_ip = Counter(ipAddressList)
count_requested = Counter(requestedList)
count_method = Counter(methodList)
count_referal = Counter(referalList)
count_ip.most_common()
count_requested.most_common()
count_method.most_common()
count_referal.most_common()

Step 3 – Build program with pandas … code is very simple and easy once you figure out how the DataFrame works

import pandas
data = open(‘www-media.log’).readlines()
frame = pandas.DataFrame([x.split() for x in data])
countIP = frame[0].value_counts()
countRequested = frame[6].value_counts()
countReferal = frame[10].value_counts()
print countIP
print countRequested
print countReferal

Step 4 – Enjoy Responsibly

Step 5 – Get code here

 https://github.com/dgleebits/PythonSystemAdminTools/blob/master/weblogAnalysis.py

Follow

Get every new post delivered to your Inbox.

Join 40 other followers