BigSnarf blog

Infosec FTW

The importance of cleaning your data – Word Cloud of 3000 Tweets

Here is a visualization of pure tweet dump visualized by Wordle

Visualization below with months, and days of the week removed. Mixed case dominates

Final version fully cleaned of dates, punctuation, and mixed case influences

It’s clear what I like to tweet about.  The bigger the word, the more it was tweeted.  I also decided to really preprocess the data with this script: https://github.com/dgleebits/Twitter-Friend-or-Foe/blob/master/tweetClean.py

Data file: https://github.com/dgleebits/Twitter-Friend-or-Foe/blob/master/TweetBackup15June2012

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: