Rocklin Lab Unveils Extensive Protein Stability Dataset to Enhance Biomolecular AI
Rocklin Lab Releases Megascale Open Protein Stability Dataset to Advance Biomolecular AI

Image: Businesswire
The Rocklin Lab at Northwestern University has released the MGnify Stability Dataset, which contains folding stability measurements for 1.8 million protein domains. This dataset aims to improve machine learning models for protein stability prediction and is supported by the OpenFold Consortium.
- 01The MGnify Stability Dataset includes stability measurements for 1.8 million diverse protein domains, significantly expanding the available data for protein stability research.
- 02This dataset provides crucial negative data on unstable proteins, essential for training machine learning models to distinguish between stable and unstable sequences.
- 03The study was led by Gabriel Rocklin and Sergey Ovchinnikov, with contributions from co-lead researchers Kotaro Tsuboyama and Yehlin Cho.
- 04The predictive models developed, SaProtΔG and ESM3ΔG, demonstrate improved accuracy in predicting stability for small protein domains compared to previous models.
- 05OpenFold aims to support the development of open, high-quality experimental datasets to advance biomolecular AI for drug discovery and biological research.
Advertisement
In-Article Ad
The Rocklin Lab at Northwestern University has announced the release of the MGnify Stability Dataset, a comprehensive resource containing folding stability measurements for 1.8 million diverse protein domains. This initiative, supported by the OpenFold Consortium, addresses the critical need for both stable and unstable protein data, which is often lacking in existing biological datasets. The dataset was created using advanced experimental techniques and is designed to enhance the accuracy of machine learning models for predicting protein stability. Led by Gabriel Rocklin and Sergey Ovchinnikov, the research team included co-lead researchers Kotaro Tsuboyama and Yehlin Cho, who developed predictive models, SaProtΔG and ESM3ΔG, that effectively leverage this extensive dataset. These models not only predict stability but also recover trends associated with thermophilic organisms and improve the differentiation between stable and unstable proteins. The dataset is crucial for the ongoing development of open biomolecular AI, as it provides the foundational data necessary for advancing predictive capabilities in protein engineering and drug discovery.
Advertisement
In-Article Ad
The MGnify Stability Dataset will significantly enhance the ability of researchers to predict protein stability, which is vital for drug discovery and biotechnology applications.
Advertisement
In-Article Ad
Reader Poll
How important do you think open datasets are for advancing biomolecular AI?
Connecting to poll...
Read the original article
Visit the source for the complete story.




