
Professor Andreas Bender of the University of Cambridge reveals how cheminformatics is revolutionising the way new molecules and compounds are discovered.
Forward: features are independent pieces written for Mewburn Ellis discussing and celebrating the best of innovation and exploration from the scientific and entrepreneurial worlds.
Cheminformatics is a vital and growing part of pharmaceutical research. It’s a discipline that uses chemistry, computer science and data analysis to gather, store and analyse chemical data.
Researchers run virtual screening, using computational techniques to search through large libraries of chemical compounds for those that hold the most promise.
The blood pressure tablet you took this morning or the statins your parents have been prescribed may well have been developed with the help of cheminformatic-based virtual screening.
‘Every pharma company does that – it’s an oldie but goldie,’ says Andreas Bender, Professor of Life Science Informatics at the University of Cambridge, and entrepreneur.
Andreas Bender, Professor of Life Science Informatics at the University of Cambridge
Chemistry supplies information about chemical structures, and computer science provides the algorithms, software and models that allow the chemistry to be analysed using techniques that are typically drawn from statistics and machine learning.
The goal could be to develop a new drug, a novel material or a safer pesticide.
At the heart of it all are vast databases of chemicals. These don’t just contain details of a compound’s structure – which determine its properties – but also relevant information from other fields such as biology, physics, medicine and law.
This could be how strongly the compound binds to a particular receptor in the body, how it smells, the results of toxicity tests or which patents are associated with it.
The largest free chemical database, PubChem, holds information on more than 300 million substances.
These databases can be searched, using cheminformatics, to find compounds with particular properties. A drug company might search for compounds that bind a particular receptor in the body in a particular way, for example, or have a structure that is similar to that of an existing drug they would like to create a version of.
Such virtual screens cut the time and cost associated with developing new drugs and, ultimately, should increase the odds of success.
Another important use of cheminformatics, says Bender, is in the prediction of in vivo relevant properties – that is, how a compound will act in the human body.
‘If you want to discover a drug, you don’t only want to predict how a ligand will bind to a protein. The ligand, or drug-to-be, also needs to have efficacy and safety in vivo,’ says the professor, who has worked with GSK and AstraZeneca on predicting the liver safety of compounds.
One way to predict the safety of a compound is to analyse data on thousands of existing drugs (pharma companies need to supply data on organ toxicity when applying for a new drug to be approved).
Bender explains: ‘We know that there is information on liver toxicity for about 3,000 approved drugs and we can train a machine learning model on these measurements to tell us which types of chemical structures are more likely to be toxic and which are more likely to be benign.
‘It’s an empirical model – it’s based on data. We don’t necessarily need to understand the toxicity behaviour; we just model the data as it is and use it for predictions. Of course, if the model is also interpretable, that’s always a plus, since it can tell us how to modify a compound next.’
Armed with this information, a drug company can decide if a compound is worth developing further.
Another interesting application of cheminformatics, as chosen by Bender, is drug repurposing – the finding of a new use for an ‘old’ drug.
Healx, a biotech company co-founded by the professor in 2014, has a database with information from scientific research papers, clinical trials, disease symptoms and drug targets. AI analyses the information to find existing drugs that could treat rare diseases, cutting the cost and time involved in bringing new treatments to the market.
Starting with a compound that has already passed safety assessments in humans also reduces the risks associated with drug discovery.
The last few years have seen many changes in cheminformatics. Among the biggest are the advent of public databases, such as PubChem, and the (often) open-source software that’s used to search them.
The data that’s held is also often better annotated, thanks to advances in biology, as well as more efforts to standardise and annotate data. The chemical description of a drug may, for instance, be annotated in databases with information about its solubility, bioactivity and various other properties such as organ-based toxicity, all of which are valuable to know for drug discovery.
Investigating different ways to structure data, enabling us to see the links and interdependencies between different properties, is also ‘a valuable tool in the toolbox’ for the analysis of large data sets.
Data access and sharing are, however, not ideal, says the professor. This is partly because a lot of the information is proprietary. Data quality, and predictability, is another issue.
FEATURED IN PHARMAPHORUM
Cheminformatics 101: The science behind smarter drug design
From accelerating drug discovery to optimising materials science, cheminformatics is shaping the future of pharmaceutical innovation. Featuring in pharmaphorum, we break down what this means for drug developers.
‘Some of the platforms don’t enforce minimum standards properly,’ says Bender. ‘As a researcher, you can access the data, but you don’t know how good it is, which makes any analysis of it less worthwhile. That can be a real problem.’
Another challenge is that it isn’t always clear which data will be the most useful. Should researchers be generating data on gene expression or proteomics or something else entirely when trying to predict how a compound will act in the human body? Given that we often don’t know which data tells us what, this is difficult to know from the onset.
So, what might the future bring? In vivo relevance is the one to watch here – and not just when selecting compounds to develop into drugs.
In future, cheminformatics could help end animal testing in drug development, says Bender, but most likely in combination with advanced experimental setups.
The aim is to find in vitro models, such as cell lines or organoids, which produce data that’s relevant to the human body. This data could then be analysed to determine various properties of a drug, including safety.
He explains: ‘Historically, the safety of chemicals was assessed in animal studies but these systems aren’t very good scientifically – they’re not very predictive in many cases.
‘They are also quite costly and, morally, they are not good to do.
‘But if we have the right experimental setups, the right data and the right machine learning models to predict safety computationally, we can do that in a better and safer way.’
The number of animal experiments in pharma is already falling. For example, Roche, one of the largest drug companies, has reduced the number of animals it uses by half over the past 14 years.
Meanwhile, official data shows that the number of animal experiments carried out in the UK, including those in drug research, has fallen by a third over the past decade.
However, we’re still some years away – perhaps a decade or two, although no one can say for sure – from animal experiments being phased out completely in drug development.
In some cases, modelling is already largely able to replace experiments. However, in others we don’t know which data will be most useful in predicting how a drug will act in vivo. And, without that, scientists can’t run the experiments to obtain the data – and the computer scientists can’t use it to build models to analyse the data.
‘But that will come,’ says Bender. ‘The important thing is to not just have any data, but to have data that predicts the endpoint that matters. And, in drug discovery, that’s the safety and efficacy of the drug in a living system, most often in humans.’
Matthew Smith, Partner and Patent Attorney at Mewburn Ellis, comments:
It was fascinating to hear Professor Andreas Bender’s reflections on how cheminformatics is transforming the way molecules are discovered and developed. His insights highlight just how integral data science has become to innovation in chemistry: not as a supporting tool, but as a driver of progress in its own right. As IP professionals, we increasingly see the impact of this in the inventions our clients are looking to protect, particularly those combining chemistry and machine learning. Protecting these innovations requires a nuanced understanding of both disciplines, which is something we’re excited to be helping clients navigate as this space continues to evolve.
Written by Fiona MacRae
Matthew is a Partner and Patent Attorney at Mewburn Ellis. Working primarily in the chemical and materials science fields, he has significant experience of the intricacies of the EPO. Matthew advises and assists clients with all stages of drafting, prosecution, opposition and appeal before the EPO. Many of his clients are Japanese and Chinese businesses that are seeking European patent protection. These include multinational corporations in the fields of high-performance ceramics and carbon fibre technologies, as well as pharmaceutical and cosmetic companies. Matthew also works with several research institutions and university technology transfer departments across Europe.
Email: matthew.smith@mewburn.com
Our IP specialists work at all stage of the IP life cycle and provide strategic advice about patent, trade mark and registered designs, as well as any IP-related disputes and legal and commercial requirements.
Our peopleWe have an easily-accessible office in central London, as well as a number of regional offices throughout the UK and an office in Munich, Germany. We’d love to hear from you, so please get in touch.
Get in touch