Science, as in other fields, is not buzzword-free and one such word doing the rounds is “Big Data”. Just how big is “big”? And is there any use of spending money and producing a large amount of data in India when the West is already pumping so much data into public databases?
In the context of science, big data refers to the explosion of data now available as a result of modern, large scientific experiments and how it far exceeds what has been traditionally available. For example, the amount of data produced from analysing the network of genes — or genomes — from bacteria, plants, viruses, vegetables and animals in the last five years exceeds all such data from the fields of life sciences/biomedical research in the last five decades. It is estimated that by 2025, exabytes (10 bytes) or the equivalent of about 300 million full-length ‘Star Wars’ of genomics data will be produced globally and will far exceed that from Twitter and YouTube.
Moreover, the genomics data being produced roughly doubles every year and will require new solutions in precision and accuracy for storage, analysis, sharing and security. All this is of relevance to every citizen in India and not just computer geeks because it is such data that will help find cures to vexing human diseases.
Disease study
A study published last week in the journal Science Translational Medicine has reported the use of big data to untangle a lethal class of diseases called prion diseases. They were previously known to have been caused by PRNP, an errant gene, and it was thought that having even one of the known 63 variants of the gene would lead to the fatal disease. Thanks to cooperation from all involved — scientists from multiple countries who shared the data on prion disease genetics and study participants who agreed on sharing data from a large database and patients — researchers combed through the genomes of nearly 63,000 people and found that only four variants were pathogenic, three completely harmless and no more than 10 per cent likely to cause disease. A killer disease had suddenly become much less fearsome and, through it, new ideas have opened up on finding a cure.
There are lessons from this for India. It can begin by predicting global outbreaks of infectious diseases such as dengue fever and malaria using customised models and an open-source framework. Big data can identify the cases of dengue fever or malaria cases on a map and predict the spread of disease by overlaying the disease map with that of the movement of people. India will soon have the second-largest smartphone market in the world. Therefore, by using mobile phone data analytics and real time movement of infected people, it is possible to pinpoint sources of infection and predict areas of transmission.
Before doing this and going on to more ambitious targets such as curing diseases, a crucial step is to also learn from international efforts in large data generation and sharing by building proper infrastructure at multiple levels. To find cures for rare diseases and others such as cancer and heart diseases, we need to generate a large amount of genomics data, best done by building consortia on normal and different diseases and making data available openly without compromising patient privacy. In this, we can leverage best practices already existing in the country, such as in sectors like information technology, high-energy physics and astronomy, in building smart analysis, visualisation and interpretation platforms for big data. Generating big data is necessary but will not be sufficient to solve societal problems unless it is available openly without compromising patient privacy and ethical standards. Therefore, a comprehensive national big data policy framework is needed for analytical and other solutions towards data storage, analyses, interpretation, archiving, sharing, distribution and collaboration.
Making all scientific data open, free and readily accessible without compromising privacy and ethics will be the right step. Placing data in the hands of more than 500 million young Indians will help usher in the next phase of science-driven innovation in India.
(Binay Panda is at Ganit Labs, Bengaluru.)