Bagging and boosting your way to a spam-free inbox

Research in the International Journal of Advanced Intelligence Paradigms, discusses the potential of bagging and boosting of machine learning classifiers for the accurate detection of email spam. Bagging and boosting are two popular methods used to improve the performance of machine learning classifiers. They are used to improve the output from machine learning algorithms, such as decision trees, logistic regression tools, and support vector machines.

Bagging, or bootstrap aggregating, is a technique used to reduce variance of results given by a machine learning model. The approach works by training several models independently on different random subsets of the training data. The predictions from those models are then averaged, this effectively smooths out the different mistakes made by each individual model so that the overall degree of error in the final output is lower than it would be for any single model.

Boosting, on the other hand, involves training a series of models one after the other so that each model can attempt to correct the mistakes made by the previous model in the sequence. Ultimately, the predictions of these models are then combined so that once again the overall error of the holistic model is lower than any one model within the approach. The boosting happens when the algorithms give more weight to examples in the training set that were misclassified by previous models.

Uma Bhardwaj and Priti Sharma of the Department of Computer Science and Applications at Maharshi Dayanand University in Rohtak, Haryana, India, have used both bagging and boosting to demonstrate how email spam might be more effectively detected to improve the lot of users. Their approach detects email spam by first “bagging” the machine learning-based multinomial Naïve Bayes (MNB) and J48 decision tree classifiers and then “boosting” the weak classifiers using the Adaboost algorithm to make them strong.

The team did experiments to compare their approach to results obtained by using individual classifiers, the bagging approach alone, and the boosting approach by itself. They were able to demonstrate an evaluation accuracy of 98.79% with a precision of 100% and a recall of 92.78%. The researchers explain that this indicates that the boosting concept has classified all the legitimate emails as true values and spam emails also have a lesser error rate of 7.22%.

In terms of future development, bagging and boosting might also be used to detect fake news, reveal suspicious activity on social media, and spot unsubstantiated rumours.

Bhardwaj, U. and Sharma, P. (2023) ‘Email spam detection using bagging and boosting of machine learning classifiers’, I Int. J. Advanced Intelligence Paradigms, Vol. 24, Nos. 1/2, pp.229–253.