Twitter Sentiment Analysis With Python+Sklearn

Standard

My next step in learning Machine Learning was to implement a Twitter Sentiment Analysis. I wanted to do a 3-class classification with “negative”, “neutral” and “positive”. Therefore it was much better to use a RandomForestClassfier instead of a SVM which is optimized for 2-class classification.

Before you can start you need the following data:
a) List of stop words (i.e. “and”, “because”, “it”, etc. – words that do not have any meaning)
b) List of slang words
c) List of affinity words
d) List of emoticons (i.e. “:-)”, etc.)
e) Set of negative, neutral and positive tweets (for training the model)

Then you can implement the sentiment analysis like this:
1) Download unclassified tweets
2) Pre-process tweets (remove stop words, replace slang words, …)
3) Create bag of words set (from the negative, neutral and positive tweets)
3a) Optional: you can do a k-cross validation to find the best attributes for training the model
4) Train the model
5) Classify the downloaded tweets
5a) Optional: store the classified tweets

You can download my project files here: Twitter Sentiment

You would need Scikit-Learn and NLTK to run the code. I use Eclipse+PyDev+Anaconda3 as development environment.

Next steps: parallelize download of tweets and processing / classifying them.