Analysing big data

We live in the information age, you might say. More than 2.5 quintillion bytes* of data are generated around the globe every day. Managing that data is impossible and yet we make use of huge chunks of it in many disparate and sometimes unimaginable ways. Extracting knowledge from repositories and databases, the big data, can lead to a better understanding of natural and non-natural phenomena in climate change, economics, medicine, and beyond.

Predictive analysis is key to making intelligent decisions based on such big data, according to researchers writing in the International Journal of Engineering Systems Modelling and Simulation. However, there are problems that must be addressed especially when such big data exists in the cloud.

Krishna Kumar Mohbey and Sunil Kumar of the Central University of Rajasthan in Ajmer, India, consider the impact of big data in this context. They point out that one of the biggest issues facing those who would work with big data is that while some of it may well be structured, much of it is only semi-structured, and vast amounts are entirely unstructured.

The storage, management, and analysis of all of this data is one of the greatest challenges facing computing today. While cloud computing provides many of the tools needed in a distributed way and to some extent has revolutionized information and communications technology (ICT), there remains a long road ahead before we can truly cope with big data fully.

However, distributed storage and massive parallel processing of big data in the cloud could provide the foundations on which the future of big data and predictive analysis might be built. The team reviews many of the current approaches that use historical data and machine learning to build predictions about the outcomes of future scenarios based on contemporary big data sources. The team points to where research might take us next in the realm of big data and warns of the possible dead-ends.

“The key aim is to transform the cloud into a scalable data analytics tool, rather than just a data storage and technology platform,” the team writes. They add that now is the time to develop appropriate standards and application programming interfaces (APIs) that enable users to easily migrate between solutions and so take advantage of the elasticity of cloud infrastructure.

Mohbey, K.K. and Kumar, S. (2022) ‘The impact of big data in predictive analytics towards technological development in cloud computing’, Int. J. Engineering Systems Modelling and Simulation, Vol. 13, No. 1, pp.61–75.

*2.5 quintillion bytes is about 1 million terabytes. A general household computer might have a 1 terabyte hard drive these days, so that’s data maxing out the storage capacity of about 2,500,000 computers every day.