Basecamp Research Unveils BaseFold: Advancing 3D Protein Structure Prediction for Large, Complex Proteins

Written Posted on

Through access and benefit-sharing partnerships with over 25 biodiversity-rich countries, BaseGraph is Basecamp Research’s purpose-built foundational dataset for biological AI.

Basecamp Research Unveils BaseFold

Basecamp Research Unveils BaseFold: In today’s announcement, Basecamp Research, a leader in the design of proteins and biological systems using artificial intelligence (AI), has released BaseFold, a deep learning model that predicts 3D protein structures more accurately than other AI-powered tools, including AlphaFold2. BioRxiv published these data recently.

Based on the amino acid sequence of a protein, BaseFold was created by augmenting the AlphaFold2 model with BaseGraph. Through access and benefit-sharing partnerships with over 25 biodiversity-rich countries, BaseGraph is Basecamp Research’s purpose-built foundational dataset for biological AI.

With Basecamp Research scaling its global network of biodiversity partnerships, BaseFold is continuously improving week over week, and the published accuracy improvements are just a starting point. Moreover, Basecamp Research will be working with NVIDIA to optimize and productionize BaseFold for NVIDIA BioNeMo, a generative AI platform for drug discovery.

Basecamp Research Unveils BaseFold: Advancing 3D Protein Structure Prediction for Large, Complex Proteins

As of today, protein structure can only be determined by slow and time-consuming methods, such as X-ray crystallography. However, AlphaFold2’s development in 2020 gave scientists confidence in AI-based structural predictions in biotechnology. Since AlphaFold2, a variety of structure prediction models have followed, including CollabFold, ESMFold, OpenFold, and RoseTTAFold.

Nevertheless, the performance of these models depends heavily on their training data; all of them are based on public protein databases, which are widely viewed as unsuitable for biotech’s artificial intelligence era. Public training datasets are small, unreliable, and heavily biased toward proteins from laboratory model organisms.

Less than 0.000001% of life on Earth is represented in these public databases. Existing AI tools work well for predicting the structures of smaller, simpler proteins that are well-represented in public datasets, but struggle beyond that, creating major issues when developing complex new medicines using AI.

Check Out: TCS Announces Plans to Forge One of the World’s Largest AI-Ready Workforces

For larger proteins, AlphaFold2 relies heavily on the public MGnify database, which has issues with incomplete sequences, which can impact the quality of structure predictions. The BaseFold from Basecamp Research tackles the next big computational challenge: crystallography-level accuracy for larger, more complex proteins, especially those underrepresented in existing protein sequence databases.

By extracting over 6 billion relationships in BaseGraph, BaseFold extracts evolutionary information that is orders of magnitude more meaningful. A variety of biological AI models, including AlphaFold2, have been shown to perform significantly better when trained on BaseGraph, which includes extensive genomic context and comprehensive metadata.

Basecamp Research scientists evaluated BaseFold’s performance in the CASP15 (Critical Assessment of Structure Prediction) competition and CAMEO (Continuous Automated Model Evaluation) community project.

BaseFold improved AlphaFold2’s predicted structures by up to six-fold by using Basecamp Research’s purpose-built foundational dataset.

For interactions between small molecules and protein targets, the team demonstrated an up to three-fold improvement in modelling accuracy.

With BaseFold, you can predict 3D structures and dock small molecules with greater accuracy for larger and more complex proteins, especially those underrepresented in public datasets.

In the field of AI-based biological system design, Basecamp Research is the leader in mapping biodiversity. Using BaseGraphTM, the first high-resolution map of global genetic biodiversity, we match and refine novel proteins for our partners’ specific industrial, therapeutic or diagnostic applications.

By understanding the full genetic, evolutionary, and environmental context of each protein, Basecamp Research can design tailored proteins without requiring expensive and time-consuming directed evolution campaigns. We’re explorers, scientists, and policy experts that want to protect nature’s diversity and learn from it, and deliver life-changing breakthroughs.

Check Out: Generative AI Offers Patient-Focused Care, Alleviating Physician Burnout Through Recorded Doctor-Patient Conversations

Loading more posts...