BigSnarf blog

Infosec FTW

Using WEKA for data mining and predictive learning

What is data mining? I didn’t know and it sounded like secret magic stuff. After digging, mind the pun, I found out it’s alot of advanced math beyond simple addition and subtraction.  The math is around algorithms that do different things.  I found that there is a deep science to working in the data mining and predictive learning field.  It is different from Artificial Intelligence and Statistics. In the past few months I have learned from online resources like STATS202 course. I learned about Machine Learning from Andrew Ng class. I learned about AI from Introduction to Artificial IntelligenceI learned about munging data with Pandas Python. I learned how to rip Twitter and weblogs with RI learned about using Pentaho PDI and Hadoop.

I’m now learning about WEKA and data mining and munging data to get into the ARFF format for predictive analysis.  The secret magic stuff is gone, and I realize that getting the data into the tool of your choice is hard part.  Getting accurate results from your tools is hard stuff. Finding case studies and examples for Infosec is also hard to do. The goal of data mining is to create a model that can help you interpret your data.  Add visual analytics and you have the recipe to process your data and gain some insight to action.


Using ARFF sample data set for simple linear regression

We have created an algorithm for predicting housing prices based on the sample dataset:

Linear Regression Model

sellingPrice =

-26.6882 * houseSize +
7.0551 * lotSize +
43166.0767 * bedrooms +
42292.0901 * bathroom +

Basically plugging in the 4 variables like houseSize, lotSize, bedrooms and bathroom we can calculate sellingPrice. 

Read more:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: