A survey of privacy-preserving data-mining techniques published in the International Journal of Business Intelligence and Data Mining assesses the pros and cons of each approach and offers guidance to potential users.
G. Sathish Kumar of the Department of Computer Science and Engineering at the Sri Krishna College of Engineering and Technology in Coimbatore and K. Premalatha of the Department of Computer Science and Engineering at the Bannari Amman Institute of Technology in Erode, both in Tamil Nadu, India explain how data mining has come to the fore as a powerful way to find patterns and correlations in big data.
However, as with any useful tool it can be mishandled or abused. In the case of big data, there are risks associated with breaches of private and personal information. This is particularly important given that data mining is so widely used with disparate data sets including criminal records, consumer shopping habits, bank transactions, medical information, and much more. Third parties might gain access to the identity of individuals represented in a database and so see associated information regarding that kind of personal and private data. A total breach would represent the worst-case scenario where all information and all individuals in a database is revealed to that third party.
There is therefore a pressing need to have full control of the data being mined so that third parties, malicious or otherwise, cannot compromise that data. The team has reviewed the various approaches and describes the benefits and disadvantages of each, including randomisation, anonymisation, condensation, cryptographic, fuzzy, and statistical methods of privacy preservation in data mining.
It is inevitable that there is always compromise in any approach. Indeed, the team has found that no technique outperforms all the others in all measures. Some work better than others in a given situation but there are trade-offs with each, the team writes. As such, there is still a need, despite recent advances in this area, to develop a system that can solidly preserve privacy while allowing data mining to be carried out.
Kumar, G.S. and Premalatha, K. (2022) ‘Privacy preserving data mining – past and present’, Int. J. Business Intelligence and Data Mining, Vol. 21, No. 2, pp.149–170.