Coronavirus (COVID-19), big data and silver linings

There is no denying that the COVID-19 pandemic will have tremendous practical and economic impact on research. Many labs have had to close, and organisations that fund essential research will have to deal with unprecedented strains on their resources (see e.g. the AMRC website for details of ways in which medical research charities are tackling the challenge of supporting their researchers and redirecting resources to support the response to the COVID-19 crisis).   It can be hard in times like this to see how anything good for the research community can come out of this.

A couple of weeks ago, I discussed how computational biology can help us tackle the COVID-19 epidemic. Many computational biologists, bioinformaticians, machine learning/AI specialists and data scientists working in the field of life sciences will be in the fortunate position of still being able to produce at least some of their invaluable research output. This means that many researchers will be able to turn the challenges that we are facing into opportunities, not only to contribute to tackling the pandemic, but also to do new ground breaking research. In these difficult times, we are in dire need of silver linings. Here are a few of mine…

Structural biology and the power of crowds in the fight against coronavirus

Here is the challenge: predicting the 3D conformation of a protein is hard. Proteins are very complex molecules and the number of possible configurations that can be adopted by even small proteins is mind boggling. As a result, simulating protein folding is necessarily a computationally intensive process (though we have got much better at it in the last couple of decades). It is a very worthwhile process however, since protein function and how to interfere with this are ultimately dependent on the protein structure. This is the case for every protein but in the context of the present pandemic, understanding how Sars-CoV-2 (the virus that causes COVID-19) targets human cells and can be targeted by drugs would be tremendously easier with a better knowledge of the conformation of key virus proteins.

Here is the opportunity: more people than ever have a computer at home, and that means that they can contribute to volunteer computing projects (in addition to more people being able to work from home more or less normally than would have been possible even 10 years ago, which I for one am so grateful for).

Here is the reason to be hopeful: One such volunteer computing project is folding@home, a distributed computing project that aims to simulate protein dynamics by running a simple piece of software on volunteers’ computers. The project recently reported a significant spike in support, enabling the combined power of this supercomputer to reach 2.4 exaFLOPs a few weeks ago (that’s 2.4*1018 floating point operations per second). This has resulted in encouraging progress in studying the mechanism of action of the Sars-CoV-2 spike protein, which is a key mediator of the virus binding to human cells (additional scientific progress on this was also recently published in Science).

Big data science meets big data sharing

Here is the challenge: collaborating and data sharing is hard at the best of times. Creating, maintaining, and sharing biological data resources is not trivial, not cheap, and requires complex coordination efforts. Now that we are all isolated and governments and organisations are having to function in crisis-management mode, it seems likely to be even more challenging. However, the power of big data can only be truly unlocked if the data is available for researchers to analyse, integrate, re-analyse, and cross reference.

Here is the opportunity: many researchers have re-oriented their researchers towards COVID-19-related projects (see e.g. efforts at the Crick Institute, University of Oxford and University of Cambridge, to name a few). That means that we have lots of people working at the same time, on similar and complementary questions. Some of these researchers will be generating invaluable data, such as e.g. sequences of the virus as it spreads, and some will be sat at their computer at home, analysing this data.

Here is the reason to be hopeful: the European Commission and the EMBL European Bioinformatics Institute have recently launched the European COVID-19 Data Portal. This is a platform for researchers to upload, access and analyse COVID-19 related data. This could boost researchers’ ability to do their work, just like other biological databases have changed the face of biological research in the last few decades (except at unprecedented speed).

Here is another reason to be hopeful: the combination of unprecedented willingness to collaborate and unprecedented brainpower being put together to tackle the same challenges has led to the incredibly fast creation of collaborations between organisations that have data and organisations that have the skills to analyse it in new and interesting ways. This symbiosis is particularly essential in fields such as AI-powered drug discovery. I have mentioned the work of Oxford based company Exscientia (in collaboration with the Scripps research institute) in my previous post. Recently, Cambridge based company Healx announced that it had repurposed their AI platform for rare disease drug discovery (which uses natural language processing to generate a knowledge graph that is mined to produce drug repurposing and drug combination candidates) to identify combinations of existing drugs that may be used to fight coronavirus. Similarly, on the other side of the pond, a partnership between the Barabasi lab (Northeastern University), Harvard Medical School, Stanford Network Science Institute and the precision medicine start-up Schipher Medicine was put together in record time to study how Sars-CoV-2 affects humans.