Amazing analysis of the Brexit with machine learning

MonkeyLearn analyzed thousands of tweets concerning the Brexit and the results are fascinating ... and you can do it, too!

Credit: Pixabay

So the UK has just given itself a national headache. Whether you think the Brexit was the right decision or a dangerous and unmitigated screw-up (as I do), the consequences of the referendum will be non-trivial and take years to complete. But the mechanics of the UK exiting the European Union aside, the question of how people now feel about the Brexit is interesting. Are they awash in jubilation or has buyer’s remorse set in? An intriguing post by MonkeyLearn attempts to answer this question by analyzing tweets and, as a bonus, provides tools that you might well find useful for similar exercises.

First, let me explain what MonkeyLearn is: The service defines itself as a “[highly] scalable Machine Learning API to automate text classification.” To use MonkeyLearn you assemble your text data,  train and test a machine learning model with that data, then, using a custom API for your model, have your application code interact with the API to perform analysis and classification of new data. You can also provide your data to MonkeyLearn by pasting it into their Web interface or uploading CSV files or Excel spreadsheets.

The beauty of MonkeyLearn’s service is that you don’t need to know much at all about the mechanics of machine learning although there’s still some technology to master to get the best out of the service. Interestingly, you don’t even need to have training data available to create classifiers as MonkeyLearn has more than 100 pre-built classifiers for functions such as classifying retail products from their descriptions, English tweet and product review sentiment analysis, and keyword and entity extraction.

screen shot 2016 06 25 at 10.59.17 am

MonkeyLearn's Web interface showing the details of the English Tweets Sentiment Analysis module

In MonkeyLearn’s post, The Divided Kingdom: a machine learning analysis on the Brexit result, the company explains that they wanted to determine how people felt about Brexit and what they were saying so they turned to Twitter:

First, we used a python library called tweepy to connect to the Twitter stream and get more than 450,000 tweets that used the hashtag #Brexit.

Afterwards, we filtered these tweets by language using our language classifier and kept only those that were in English (around 250,000 tweets). Then, we analyzed these tweets using MonkeyLearn with some public, pre-trained and ready-to-use machine learning models. We performed sentiment analysis on these tweets to understand if people talking were talking positively, negatively or neutrally about the brexit.

Finally, we wanted to go a step deeper and better understand the different point of views, so we performed keyword extraction on the tweets of the different sentiments we analyzed to know the words or phrases people were using to get a better picture and more context.

It's important to note that the tweets collected were a random sample expressing the sentiments of the Twitter universe rather than just those of people in the UK but the results were interesting all the same. MonkeyLearn found that from a final sample of 133,605 tweets, 47% were classified as positive with (natch) 53% negative which is extremely close [(within 5%)] to the actual final UK voting results of 48% for against and 52% against for. If you wanted to specifically measure UK sentiment, you'd have to restrict the analysis to tweets with attached geolocation data.

The MonkeyLearn post has some interesting findings on sentiment regarding specific topics. For example, David Cameron, who was mentioned in 8% of the tweets, was referred to positively by 17%, neutrally by 58%, and negatively by 25%. Surprisingly, Donald Trump, who you might have thought to be irrelevant to the Brexit, was mentioned in only slightly fewer tweets (7%) than Cameron and referred to positively by 32%, neutrally by 32%, and negatively by 36%.

The big bonus of MonkeyLearn’s post is that they’ve made the Python code they used available so you can run your own analysis. I wish I’d thought of doing this kind of exercise over the lead-up to the Brexit vote but you can bet I’ll be running a similar project for the U.S. election in November. Eat your heart out, Nate Silver, you’re being replaced by code.

Comments? Thoughts? Send me your analysis via email or comment below then follow me on Twitter and Facebook.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: Hidden Cause of Slow Internet and how to fix it
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.