Database Of Human Protein Structures Released
Date posted:
News
The most important database since the mapping of the human genome has been released freely and openly, revolutionising the life sciences industry overnight.
The collaboration between the European Molecular Biology Laboratory (EMBL) and Google-owned artificial company DeepMind, has led to the release of a database of over 20,000 3D structures that are believed to represent every protein expressed by the human genome.
The Alphafold Protein Structure Database, which is freely available to view and use to the scientific community, would double the number of high accuracy structures available, and expand our knowledge of the human genome significantly.
This is a revolution for life science consulting, as understanding these proteins that are the building blocks of every biological process in every living creature, and enable vital biological research work to be considerably accelerated.
The AI system was the subject of two papers in the journal Nature, which provides more detailed information on the system and how it functions.
The database provides the protein sequence and a 3D predicted model, colour coded based on AlphaFold’s per-residue confidence score (pLDDT).
The system takes full advantage of the sophisticated AI system known as AlphaFold, which can use a protein’s amino acid sequence to predict what shape it will become, and has been heralded by DeepMind CEO Demis Hassabis, PhD., as one of the biggest contributions AI has made to scientific knowledge.
Before the use of AI, protein shape predictions were made through years of painstaking experimentation, which themselves were incredibly expensive.
This work was not in vain, however, as it formed the knowledge base that AlphaFold would be trained with.
Machine learning is a relatively simple concept that allows for incredibly complex tasks to be automated, reducing the amount of time testing and creating predicted protein shapes from years to mere months.
It starts by providing correct data to the system to allow it to understand what to look for before it works on the particular task at hand.
AlphaFold’s systems have already been used previously to create medicines to treat neglected diseases, as well as increase our understanding of the biology of SARS-CoV-2.
Outside of biology, the Centre of Enzyme Innovation used AlphaFold to allow for the chemical recycling of single-use plastics that pollute our environments.
Currently, the database includes over 350,000 structures, with protein structures for the fruit fly, mouse, malaria parasite and E.coli.
DeepMind plans to continually update the database and AlphaFold system as they continue to improve and upgrade both the software and hardware that powers it.
The system has limitations, generally a result of the complex interplay between different protein structures and the dynamic changes in structure often seen by proteins. It also is not designed to predict mutations.
The ultimate aim over the next few years is to expand its coverage to include almost every single protein that is known to science, which totals 100 million structures in the UniProt database.
This would eventually, DeepMind claims, lead to every single protein known to science having a high-quality 3D model available to it.
Artificial intelligence and machine learning have been major talking points in the life sciences industry, with stories of a bread-recognition system that could detect cancer cells making headlines earlier this year.