What is data mining? I didn’t know and it sounded like secret magic stuff. After digging, mind the pun, I found out it’s alot of advanced math beyond simple addition and subtraction. The math is around algorithms that do different things. I found that there is a deep science to working in the data mining and predictive learning field. It is different from Artificial Intelligence and Statistics. In the past few months I have learned from online resources like STATS202 course. I learned about Machine Learning from Andrew Ng class. I learned about AI from Introduction to Artificial Intelligence. I learned about munging data with Pandas Python. I learned how to rip Twitter and weblogs with R. I learned about using Pentaho PDI and Hadoop.
I’m now learning about WEKA and data mining and munging data to get into the ARFF format for predictive analysis. The secret magic stuff is gone, and I realize that getting the data into the tool of your choice is hard part. Getting accurate results from your tools is hard stuff. Finding case studies and examples for Infosec is also hard to do. The goal of data mining is to create a model that can help you interpret your data. Add visual analytics and you have the recipe to process your data and gain some insight to action.
Using ARFF sample data set for simple linear regression
We have created an algorithm for predicting housing prices based on the sample dataset:
Linear Regression Model
-26.6882 * houseSize +
7.0551 * lotSize +
43166.0767 * bedrooms +
42292.0901 * bathroom +
Basically plugging in the 4 variables like houseSize, lotSize, bedrooms and bathroom we can calculate sellingPrice.
Read more: http://www.ibm.com/developerworks/opensource/library/os-weka1/index.html