Bioinformatics - science's big secret

Now deeply woven into the fabric of biological research, bioinformatics has become the secret sauce of modern science. But what is it? Matt Packer explains.

What if I told you that there's a collection of technologies that underpins everything from the food you eat to the products on your bathroom shelves and the healthcare you receive - yet few people know its name?

In fact, 'bioinformatics' has a footprint so large that it's frankly incredible it hasn't been accepted into common parlance in the same way as the now-ubiquitous (yet terribly technical) blockchain. But perhaps this 'short' definition from Abel Ureta-Vidal, founder and chief product officer of Eagle Genomics, provides a clue as to why. He describes it as: 'The process of organising, annotating, analysing and visualising biological data with the aid of hardware and software tools.' The fact of the matter is that bioinformatics covers a huge area of activity, happens behind the scenes and is linked to complex scientific work that most of us never get to see up close.

Evolutionary tree

Ureta -Vidal dates the origins of bioinformatics 'back to when sequencing started or, in other words, when biological data took digital form and had to be analysed by software'.

One of the applications of DNA sequencing is to develop phylogenies - evolutionary trees - of biological subjects. Ureta -Vidal explains: 'In 1980, genomics expert Joseph Felsenstein launched a free, open-source software platform called PHYLIP (PHYLogeny Inference Package), which is still maintained today. Once you've sequenced DNA from, say, different virus isolates, your next step is to compare them and build a kind of genealogy tree to see how they relate to each other. That's where PHYLIP came in and, over time, that became a more and more widely accepted way of working.

'At that point, the term bioinformatics didn't really exist. It began to coalesce in the mid-90s when the Human Genome Project got going. As well as assisting genomics - mapping the DNA of specific organisms - the approach could be applied to any kind of biological data. And by the time the Project concluded in 2003, the term bioinformatics had properly surfaced.'

Viral technology

The discipline now runs through every aspect of the research undertaken within the life sciences field, pulling findings out of the wet lab, where physical samples are analysed, into the dry lab, where that biological data is crunched. 'Any outcome of any experiment, large or small, that biologists or life scientists conduct will require them in every circumstance to use bioinformatics techniques and tools,' says Ureta-Vidal 'Students taking biology or biochemistry degrees will have to work with databases, and that will require them to use bioinformatics software. Simply put, it's everywhere.'

That ubiquity ensures that bioinformatics has secured a prominent role across a wide range of sectors and domains. For example, Ureta-Vidal says: 'At Eagle, we work with personal care companies to study the microbiome. We will examine, say, the microbiome in the mouth to inform the development of toothpaste or mouthwash; in the armpits for deodorant; in the scalp for shampoo. All the DNA sequences from the bacteria and skin will be analysed with bioinformatics.'

Cells of knowledge

In human health, one of the most vital activities that bioinformatics is helping with is honing scientists' understanding of a disease that will affect roughly half of us: cancer. Dr Florian Markowetz, head of the Markowetz Lab at the Cancer Research UK Cambridge Institute, is well versed in how the relevant technologies are assisting work in this field.
Tm a mathematician by training,' Markowetz says, 'and all the questions I address relate to cancer biology and oncology. The goal of my team is to develop technologies that enable doctors to make better decisions. So it's a matter of integrating different types of data to make judgement calls about patients, ensuring they receive the best treatment. We figure out whether they are likely to develop resistance to treatment, and we use different types of data to understand how they are doing and what the cancer is doing.' That research, Markowetz explains, plays out
across three bioinformatics-fuelled work streams.

The first is genomics. 'We examine the genomes of tumours, which are based on the patients' own genomes - but then, within the tumours, lots of additional mutations take place,' says Markowetz. 'Those mutations partly drive cancer growth and are also targeted by drugs. So we try to understand the processes by which those mutations are accumulated, with the goal of being able to predict what comes next in cancer evolution.'

Next is the 'very hip field' of image analysis, he explains, which uses computer-based visualisation tools that draw on artificial intelligence (AI). 'We take images harvested
from radiology and histopathology - the microscopic study of disease in different tissues - and process them through AI and deep-learning software, so they can tell us something about what's happening in the tissues and cells. We combine this with the genomics to determine which therapies will benefit which patients.'

Last is experimental modelling. 'We look at basic biological processes, which, for our purposes, revolve around the oestrogen-receptor complex - the main player in three-quarters of all breast cancers. We try to understand how that complex works, what its composition is and how it regulates genes downstream. To do that, we use CRISPR perturbations, externally influencing the sample so we can work out different co-factors and see what kinds of effects they have on cells.'

Tools of the trade

Dry-lab bioinformatics set-ups will feature different types of hardware and software depending upon the nature of the research they are designed for. 'We have a computing cluster in our building,' says Markowetz, 'and we invest mostly in graphical processing units, or GPUs, which are very good for image analysis. That's our essential requirement - we need lots of processors: On the software side, he notes that 'most of what we do is in pretty high-level programming languages like R, which is a free statistics language, plus Python and MATLAB. The reason we use those is that large communities have grown up around them. If you want to analyse, say, genomics data, the key packages are either in R or Python, and everyone uses them. So it's a way of standardising analyses.'

However, Ureta -Vidal points out that having all the kit on -site is beginning to look less necessary. 'Storage used to be all on the local servers of an organisation, but not anymore. Large amounts of data produced in the dry lab can be taken off and kept in the cloud. That has become very commoditised. And the interesting task for bioinformaticians is to look at all the data solutions, hardware and pieces of commercial and open -source software, and then think about how to combine those elements to suit the type of data they want to extract, and the use case they want to apply it to.'

Modelling the future

So how does Markowetz see bioinformatics developing within his critical field of research over the next five to ten years? 'Sequencing will get cheaper and cheaper,' he says, 'and will be used much more in the clinic. During the diagnosis stage, there will be a much tighter integration of genomics, image analysis and predictive tools in standard care. I think that's where it's going.'

He explains: 'Last year, Health Education England published a report by Eric Topal on the use of AI in the NHS. In Topol's assessment, AI will completely change the way doctors work - particularly pathologists and radiologists. They will effectively become "data brokers", translating the outcomes of work carried out by AI algorithms and explaining them to the patient. That's very different to what they do right now.'

'So the work changes, because instead of doing these diagnoses themselves, they will sense-check the algorithms' outputs, then explain them. If, indeed, that becomes the
norm, it will mark a big shift in medicine.'

Bioinformatics in practice

Bioinformatics now has a prominent role across a wide range of sectors. Abel Ureta- Vidal of Eagle Genomics offers a few everyday examples:

Human and animal health

'Bioinformatics helps us understand disease threats ... it covers research not just around animals with agricultural benefits, such as cows or chickens, but also those we keep as pets.'

Nutrition and food

'The technology can examine anything to do with what we eat and its impact on the gut. Where it is necessary to determine the origin of a meat, for instance for quality control, that involves genetic sequencing - and, again, bioinformatics software.'

Agriculture

'Understanding the benefits of nutrients that are going to be used for crops is an important task for bioinformatics. There's also the matter of biological selection: how do you get the most resilient corn or wheat that will resist drought and/or disease? The technology will help you refine your cultivation programmes.'

Environment

'Bioinformatics can be used to analyse problems such as water contamination. If you want to measure the spread of bacteria on beaches or in rivers, for example, or analyse biodiversity to evaluate the impact of an oil spill, then you're collecting data and will need to run it through the relevant tools.'

Energy

'When you're trying to pinpoint a fracking site, you will need to measure the volume of bacteria eating, digesting or degrading the hydrocarbons to make the oil you want to extract. So bioinformatics will help you correlate the presence of those micro-organisms to rich oil sources.'

Towards a data-driven future

We live in the data age. Our ability to collect, process and store data has changed almost every aspect of the way we live, and biological sciences are no exception. Bioinformatics is changing the way we study, teach, research and use life sciences. Bringing together biology, medicine, mathematics and computer science comes with many challenges, but also enormous possibilities. Unlocking these possibilities will require scientists, companies, regulators and service providers to embrace change. Mewburn Ellis could not be more excited about these possibilities and is ready to support clients in their journey towards a data-driven future.

Author - Matt Packer for Mewburn Ellis Forward

third-edition

This article was originally published in the third edition of Mewburn Ellis Forward — a biannual publication that celebrates the best of innovation and exploration.

Subscribe to Mewburn Ellis Forward here.

Camille Terfve

Camille is a Partner and Patent Attorney at Mewburn Ellis. She does patent work in the life sciences sector, with a particular focus on bioinformatics/computational biology, precision medicine, medical devices and bioengineering. Camille has a PhD from the University of Cambridge and the EMBL-European Bioinformatics Institute. Her PhD research focused on the combined analysis of various sources of high-content data to reverse engineer healthy and diseased cellular signalling networks, and the effects of drugs on these networks. Prior to that, she completed a Master’s degree in Bioengineering at the University of Brussels and a Masters in Computational Biology at the University of Cambridge.

Email: camille.terfve@mewburn.com