BigSnarf blog

Infosec FTW

Using pandas to report on apache web logs

So I got this new book:

Step 1 – Start with this Forensic Challenge dataset:

http://honeynet.org/files/sanitized_log.zip

Step 2 – Build program without pandas:

#! /usr/bin/python
”’
This program takes in a apache www-media.log and provides basic report
”’
for collections import Counters
ipAddressList = []
methodList = []
requestedList = []
referalList = []
mylist = []
data = open(‘www-media.log’).readlines()
for line in data:
     ipAddressList.append(line.split()[0])
     requestedList.append(line.split()[6])
    methodList.append(line.split()[5])
    referalList.append(line.split()[10])
count_ip = Counter(ipAddressList)
count_requested = Counter(requestedList)
count_method = Counter(methodList)
count_referal = Counter(referalList)
count_ip.most_common()
count_requested.most_common()
count_method.most_common()
count_referal.most_common()

Step 3 – Build program with pandas … code is very simple and easy once you figure out how the DataFrame works

import pandas
data = open(‘www-media.log’).readlines()
frame = pandas.DataFrame([x.split() for x in data])
countIP = frame[0].value_counts()
countRequested = frame[6].value_counts()
countReferal = frame[10].value_counts()
print countIP
print countRequested
print countReferal

Step 4 – Enjoy Responsibly

Step 5 – Get code here

 https://github.com/dgleebits/PythonSystemAdminTools/blob/master/weblogAnalysis.py

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: