Imagine a group of researchers venturing into a pitch-black room armed with only a flashlight, their vision limited to what falls within that narrow beam. When it comes to studying microbial communities, scientists have faced a similar challenge—they have been unable to see beyond the limited scope of their knowledge, unaware of the vastness of the microbial world.
But now, a groundbreaking study published in Nature has shed light on the incredible functional diversity of microbes. Led by a team of scientists at the U.S. Department of Energy Joint Genome Institute (JGI), this research has revolutionized our understanding of microbial communities by examining protein function within them. Collaborating with research centers worldwide, the team embarked on a mission to uncover the mysteries hidden within the “dark” functional realm.
Using the Integrated Microbial Genomes & Microbiomes (IMG/M) database, which houses over 26,000 microbiome datasets, the scientists created the Novel Metagenome Protein Families (NMPF) Catalog. This catalog allows researchers to analyze new datasets by comparing them against these protein families, opening up new possibilities for predicting novel functions.
Microbial communities, found in diverse environments such as soils, stomachs, and the deep sea, possess unique abilities when it comes to energy cycles. They can convert biomass into valuable resources like ethanol or hydrogen, or harness solar energy for their needs. However, studying these communities is incredibly challenging. Many microbes cannot be cultivated in a lab, and each community has its own distinct composition and functions, making it impossible to replicate them artificially.
To overcome these obstacles, researchers rely on metagenomic sequencing, which involves studying the genetic makeup of entire microbial communities. However, distinguishing the functions of individual genes within a community is difficult, so scientists reference existing genome sequences. Some proteins are similar to genes with known functions, while others resemble known genes but have unknown functions. However, there is a significant portion of proteins that do not match anything previously defined, representing the “unknown unknowns.”
In recent years, artificial intelligence has been used to decode the language of protein sequences and uncover their functions. However, these efforts have been limited to known protein sequences. This study, on the other hand, delves into uncharted territory, exploring the vast landscape of functional diversity and applying AI methodologies to unravel the roles of proteins. The researchers have amassed groundbreaking insights, expanding the horizons of potential functions across various categories of proteins.
The discovery of new protein families had reached a plateau in recent years, suggesting that scientists had captured much of the diversity present. However, the team’s analysis revealed that the protein family diversity within the metagenomic space was far greater than that of reference genomes. By clustering novel genes into families, they found that the diversity had at least doubled, with the potential for even greater diversity as more samples are sequenced in the future.
While the study did not delve into specific functions, it did characterize these protein families based on their environments. Only a small percentage of protein families were shared across all environmental categories, indicating that they play unique and important roles in specific habitats. Bacteria and viruses accounted for the majority of these protein families, but a significant number of sequences remained unclassified.
To gain insights into the functions of these genes, the researchers employed 3D modeling and compared the structures of unknown proteins to those of known proteins. They also identified protein families with completely novel structures, expanding our understanding of microbial dark matter.
This study represents a significant step forward in our understanding of microbial communities and their functional diversity. With the vast majority of microbial diversity still awaiting genomic capture, there are undoubtedly many more secrets to be uncovered. The future holds exciting possibilities for exploring the hidden world of microbes and harnessing their potential for various applications in biotechnology.