BigSnarf blog

Infosec FTW

Running Mahout Hadoop Taste Recommender algorithm on example Grouplens dataset

Here are the links I found on my CDH3 tutorial setup. Second is recommendation guide https://mahout.apache.org/users/recommender/recommender-documentation.html

The org.apache.mahout.cf.taste.hadoop.item.RecommenderJob is a completely distributed itembased recommender. It expects a .csv file with preference for data as input. Here is an example of the csv file I inputted:

1,1193,5
1,661,3
1,914,3
1,3408,4
1,2355,5
1,1197,3
1,1287,5
1,2804,5
1,594,4
1,919,4

So at the end of the processing,  I end up with a more data in file. Impressions: Mahout is for developers needing large scale processing of data with some limitations to algorithms. WEKA is mainly for data mining analysts and learners. The GUI and “autoAlgorithm” feature make it easier for beginners to process data. WEKA will have issues scaling to very large datasets because of memory limitations.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: