Deep learning cracks the protein folding problem?

Deep learning has just made a new headline grabbing achievement: DeepMind’s latest AI-powered system AlphaFold recently demonstrated a major leap in accuracy in the prediction of protein structure from sequence (“the protein folding problem”) in the Critical Assessment of protein Structure Prediction (CASP) challenge.

The protein folding problem

Predicting the 3D conformation of a protein is hard. Proteins are very complex molecules and the number of possible configurations that can be adopted by even small proteins is mind boggling. It’s a worthwhile challenge to tackle though, since protein function and how to interfere with this are ultimately dependent on the protein structure. If you think that this does not concern you, think again: the vast majority of the drugs we use interfere with protein function.

Experimentally determining protein structure is no small feat either (and has in fact proved impossible so far for a great many proteins). As a result, much effort has been focused on in silico prediction of protein structure, fostered by initiatives such as CASP which benchmarks candidate approaches by providing a series of protein folding challenges for teams to solve.

Deep learning changes the protein folding game

DeepMind first entered CASP in 2018, where it already made quite an entrance by topping the leader board. DeepMind’s contribution, termed AlphaFold, was later published in Nature and the code was made freely available for research and non-commercial use.

This year, the new version of AlphaFold not only out-performed all other candidates (and its own previous version), it achieved unprecedented levels of accuracy that are in many cases similar to those achievable by experimental approaches. Of note, AlphaFold was not the only AI-powered approach at CASP this year. It was however the most successful by far (on average[1]).

The AlphaFold model has not yet been published, but DeepMind has indicated that the new version uses an attention-based neural network system that uses evolutionarily related sequences, multiple sequence alignment (MSA), and a representation of amino acid residue pairs to predict protein structure modelled as a spatial graph where residues are nodes connected by distance-dependent edges.

Reaching near experimental accuracy is a very significant line, and one that might open the door to structure determination using data that is easier to collect, or for proteins where experimental data simply could not be collected. The applications are endless both for fundamental research and for applied research. Repercussions will even reach the IP world, and the UKIPO is currently exploring (as did other IP offices such as the USPTO) the many ways in which AI is changing most fields of technology. A publication from the DeepMind team is set to follow, and I for one cannot wait to read it.



References

  1. It seems that AlphaFold does have blind spots, with predictions of proteins as part of complexes having lower accuracy. Nevertheless, the majority of AlphaFold’s CASP14 predictions (reportedly about two-thirds) had accuracies similar to experimental structures.