Research published in the International Journal of Cloud Computing looks at how machine learning might allow us to analyse the nature and characteristics of social media updates and detect which of those updates are adding grist to the rumour mill rather than being factual.
Fake news has been with us ever since the first gossip passed on a rumour back in the day. But, with the advent of social media, it is now so much easier to spread fake news, disinformation, and propaganda to a vast global audience with little constraint. A rumour can make or break a reputation. These days, that might happen the world over through the amplifying echo chamber of social media.
Mohammed Al-Sarem, Muna Al-Harby, Faisal Saeed, and Essa Abdullah Hezzam of Taibah University in Medina, Saudi Arabia have surveyed the different text pre-processing approaches for approaching the vast quantities of data that pour from social media on a daily basis. How well these approaches work in the subsequent rumour detection analysis is critical to how well fake news can be spotted and stopped. The team has tested various approaches on a dataset of political news-related tweets from Saudi Arabia.
Pre-processing can look at the three most relevant characteristics of an update before the text analysis is carried out and silo the different updates accordingly: First, it can look at the use of question marks and exclamation marks and the word count. Secondly, it can look at whether an account is verified or has properties more often associated with a fake or bot account, such as tweet count, replies, retweets, etc. Thirdly, it can look at user-based features, such as the user name and the user’s logo or profile picture.
The researchers found that pre-processing can improve analysis significantly when the output is fed to any of support vector machine (SVM), multinomial naïve Bayes (MNB), and K-nearest neighbour (KNN) classifiers. However, those classifiers do react differently depending on what combination of pre-processing techniques is used. For instance, removing stop words, and cleaning out coding tags, such as HTML, stemming, and tokenization.
Al-Sarem, M., Al-Harby, M., Saeed, F. and Hezzam, E.A. (2022) ‘Machine learning classifiers with pre-processing techniques for rumour detection on social media: an empirical study’, Int. J. Cloud Computing, Vol. 11, No. 4, pp.330–344.