It is perhaps a significant concern that internet users willingly and sometimes unwittingly share their personal and private information through online social networks without a second thought for how that information might be used. There is an ongoing risk of identity theft and users being the victim of other cybercrimes such as scams and phishing attacks. The obverse of perceiving all this shared information is that for researchers hoping to understand the trends within society, the information offers a vast seam of data, opinions, and behaviour that could be mined to extract nuggets of information about humanity. It might even be used to predict how behaviour online and offline might change.
For researchers hoping to dig into this motherlode of data, however, there is a significant obstacle. Many users have accounts on many different social networks and do not necessarily maintain consistency in terms of biography, demographic, data, and identity per se, across the different platforms. Specifically, data obtained from a Facebook or LinkedIn profile can reveal demographic information, such as age, gender, sexuality, relationship status and relatives, race, education, and occupation. Facebook updates and those on Twitter can reveal psychographic information, such as attitude towards a product, online behaviour, and politics.
New research published in the International Journal of Enterprise Network Management, demonstrates an accurate way in which user profiles across different online social networks can be matched. Once matched it is then possible to couple all the demographic information obtained from one platform with the behavioural information from another. One would hope that such information might then be anonymised for the purposes of legitimate research. However, there is always the spectre of nefarious uses being plausible once such data mining tools are available.
Nevertheless, Deepesh Kumar Srivastava of the Institute of Management Technology Dubai in UAE and Basav Roychoudhury Indian Institute of Management Shillong in Meghalaya, India, have demonstrated a way to match profiles on different platforms. Their approach relies on extracting user-generated content and user-shared updates across the different platforms and analyzing it to find the overlap where a user is active on multiple platforms. Their text mining techniques extract high-frequency words and words commonly used in the users’ updates on social media platforms. They have tested the current iteration of their approach on publicly available data sets and demonstrated 72.5 per cent accuracy in matching a user’s profiles on different platforms.
Such a level of accuracy would be useful when coupled with other techniques, such as basic name and location matching and other relatively mundane data mining approaches. Even as a baseline from which to improve the approach it offers an excellent starting point. Future work will home in on overlapping characteristics in user chronology at the timeline level to improve matching where a user might duplicate the sentiment or content of a post on more than one platform and so reveal a match.
Srivastava, D.K. and Roychoudhury, B. (2022) ‘Profile matching of online users across multiple social networks: a text mining approach’, Int. J. Enterprise Network Management, Vol. 13, No. 1, pp.19–36.