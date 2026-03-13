A new AI system, Evo 2, trained on trillions of DNA base pairs can predict genetic mutations and design biological sequences, opening new paths for precision medicine.
- A powerful AI genome model has been trained on DNA sequences from all domains of life
- Evo 2 can predict the functional impact of genetic mutations without task-specific training
- Researchers have made Evo 2 fully open-source to accelerate global biological research
Genome modelling and design across all domains of life with Evo 2
Go to source). Built by Arc Institute and Stanford University researchers in collaboration with NVIDIA, Evo 2 is a genomic foundation model that can analyze and generate DNA, RNA and protein sequences across species.
The new model, called Evo 2, was described in the journal Nature and represents one of the most ambitious attempts yet to teach AI the rules of biology.
Unlike earlier systems designed for specific organisms or tasks, Evo 2 was trained to understand the genetic code across species. By analyzing patterns shaped by millions of years of evolution, the model can predict how changes in DNA might affect biological function.
Researchers say the goal was to build a generalist system that learns the deep structure of genomes, rather than a narrow tool designed for a single experiment.
Training Data Spanning the Diversity Of LifeThe researchers trained Evo 2 using a vast genomic dataset covering organisms from across the biological world.
The dataset included more than 8.8 trillion nucleotides drawn from bacteria, archaea, eukaryotes and bacteriophages. In total, the larger model version was trained on 9.3 trillion DNA tokens compiled in the OpenGenome2 dataset.
Two versions of the system were created. One contains 7 billion parameters, while the larger version contains 40 billion parameters designed to capture complex genomic relationships.
This enormous training effort allowed the model to identify patterns in DNA that reflect evolutionary constraints, which often signal biologically important regions.
Predicting Harmful Mutations In Human GenesOne of the most promising abilities of Evo 2 is predicting the impact of genetic mutations without needing specialized retraining.
By estimating how likely a DNA sequence is to occur in nature, the system can flag mutations that disrupt important biological functions. These predictions extend across multiple layers of biology, including DNA, RNA and protein sequences.
In practical terms, this means the model can evaluate whether a genetic change might damage a gene’s function.
The study notes that Evo 2 can analyze variants in clinically relevant genes such as BRCA1, which is linked to hereditary breast and ovarian cancer.
AI Discovering Hidden Patterns In The GenomeBeyond mutation prediction, Evo 2 also revealed how AI systems can discover meaningful biological signals without being explicitly taught.
Through a technique called mechanistic interpretability, researchers examined patterns inside the model’s internal representations. They found that Evo 2 had learned features linked to real biological structures.
These include:
- Exon-intron boundaries (gene cut-and-join points)
- Transcription factor binding sites (DNA switches that control genes)
- Protein structural elements (shapes that help proteins function)
- Prophage genomic regions (viral DNA hidden inside bacteria)
Generating Entire Genomes with Artificial IntelligenceEvo 2 is not only capable of analyzing DNA. It can also generate long genomic sequences that resemble naturally occurring DNA.
The system can create sequences from multiple biological domains, including mitochondrial genomes, prokaryotic genomes and eukaryotic DNA. Researchers report that these generated sequences show higher coherence and biological realism than earlier models.
In one experimental test, the model successfully retrieved a short DNA sequence hidden within one million base pairs of random DNA, demonstrating its ability to reason across extremely long genomic contexts.
Why This Matters for Future MedicineDecoding the meaning of DNA changes remains one of the biggest challenges in modern genetics. Many genetic variants discovered in medical testing still have uncertain effects.
AI systems like Evo 2 may help scientists interpret these changes faster by learning the deep rules that govern how genomes function.
In everyday life, this could mean quicker insights when a genetic test reveals a rare mutation. Instead of waiting years for research to clarify its significance, predictive models might help researchers estimate whether it is harmful or harmless.
The technology may also support future efforts in precision medicine, gene therapy design and genome engineering.
A Foundation for the Next Generation Of Biological AIThe creators describe Evo 2 as a biological foundation model, similar to how large language models support many different applications.
Because the system is fully open-source, researchers around the world can explore its capabilities, refine its predictions and develop new tools on top of it.
The authors believe this approach could eventually enable AI systems that simulate complex biological behavior, helping scientists better understand health, disease and the design of new therapies.
Scientific progress often begins with learning how to read nature’s hidden patterns. Supporting research that unlocks the language of life today may help build healthier futures for generations to come.
Frequently Asked Questions
Q: What is the Evo 2 AI genome model?
A: Evo 2 is a biological foundation model trained on trillions of DNA base pairs to analyse and generate genomic sequences across many species.
Q: How does Evo 2 predict harmful genetic mutations?
A: Evo 2 estimates the probability of DNA sequences occurring in nature. Mutations that significantly reduce this probability are predicted to disrupt biological function.
Q: What dataset was used to train the Evo 2 genomic AI model?
A: The Evo 2 system was trained on the OpenGenome2 dataset containing more than 8.8 trillion nucleotides from organisms spanning bacteria, archaea, eukaryotes and bacteriophages.
Q: Can Evo 2 generate new DNA sequences?
A: Yes. Evo 2 can produce long DNA sequences, including mitochondrial, prokaryotic and eukaryotic genomic segments with biologically realistic patterns.
Q: Why is Evo 2 important for precision medicine?
A: Evo 2 may help researchers interpret genetic variants more quickly, allowing scientists to better understand how mutations influence disease risk and treatment strategies.
Reference:
- Genome modelling and design across all domains of life with Evo 2 - (https://www.nature.com/articles/s41586-026-10176-5#Sec8)