Genome resource expands known diversity of bacteria and archaea by 44% – ScienceDaily



[ad_1]

Despite advances in sequencing technologies and computational methods over the past decade, researchers have discovered genomes for only a small fraction of the Earth’s microbial diversity. Since most microbes cannot be grown under laboratory conditions, their genomes cannot be sequenced using traditional approaches. Identifying and characterizing the planet’s microbial diversity is key to understanding the role of microorganisms in regulating nutrient cycles, as well as gaining insight into the potential applications they may have in a wide range of research fields.

A public archive of 52,515 microbial genomes generated from environmental samples around the world, expanding the known diversity of bacteria and archaea by 44%, is now available and described on November 9, 2020 in Nature Biotechnology. Known as the GEM (Genomes from Earth’s Microbiomes) catalog, this work is the result of a collaboration involving more than 200 scientists, researchers from the Joint Genome Institute (JGI) of the United States Department of Energy (DOE), a DOE Office of Science User Facility located at Lawrence Berkeley National Laboratory (Berkeley Lab) and the DOE Systems Biology Knowledgebase (KBase).

Metagenomics is the study of microbial communities in environmental samples without the need to isolate individual organisms, using various methods of processing, sequencing and analysis. “Using a technique called metagenome binning, we were able to reconstruct thousands of metagenome-assembled (MAG) genomes directly from sequenced environmental samples without the need to grow the microbes in the laboratory,” noted Stephen Nayfach, the study’s first author and researcher in Nikos Kyrpides’ Microbiome Data Science group. “What sets this study apart from previous efforts is the remarkable environmental diversity of the samples we analyzed.”

Emiley Eloe-Fadrosh, head of the JGI Metagenome Program and senior author of the study elaborated on Nayfach’s comments. “This study was designed to encompass the widest and most diverse range of samples and environments, including natural and agricultural soils, associated human and animal host, and oceans and other aquatic environments – it’s truly remarkable.”

Add value beyond genomic sequences

Most of the data was generated from environmental samples sequenced by JGI through the Community Science Program and was already available on JGI’s Integrated Microbial Genomes & Microbiomes (IMG / M) platform. Eloe-Fadrosh noted that this was a fine example of “big data” mining to gain a deeper understanding of the data and increase the value by making it publicly available.

To acknowledge the efforts of the investigators who carried out the sampling, Eloe-Fadrosh contacted more than 200 researchers worldwide in accordance with the JGI’s data use policy. “I felt it was important to recognize the significant efforts to collect and extract DNA from these samples, many of which come from unique and difficult to access environments, and I invited these researchers to be co-authors as part of the IMG data consortium,” he said.

Using this huge dataset, Nayfach grouped MAGs into 18,000 candidate species groups, 70% of which were new compared to the more than 500,000 existing genomes available at the time. “Looking through the tree of life, it’s amazing how many uncultivated lineages are represented only by MAG,” he said. “Although these sketches of genomes are imperfect, they can still reveal much about the biology and diversity of wild microbes.”

Research teams worked on multiple analyzes leveraging the genome repository, and the IMG / M team developed several updates and features to extract the GEM catalog. (Watch this IMG webinar on Metagenome Bins to learn more.) A group extracted the dataset for new secondary metabolites of biosynthetic gene clusters of secondary metabolites (BGCs), increasing these BGCs in IMG / ABC (Atlas of Biosynthetic Gene Clusters) ) of 31%. (Listen to this episode of JGI Natural Prodcast about genome extraction.) Nayfach also worked with another team to predict host-virus connections between all viruses in IMG / VR (Virus) and the GEM catalog, associating 81,000 viruses, 70% of which had not already been associated with a host – with 23,000 MAG.

Shaping a new path for metagenomics researchers

Building on these resources, KBase, a multi-institutional collaborative knowledge creation and discovery environment designed for biologists and bioinformaticians, has developed metabolic models for thousands of MAGs. The models are now available in a public narrative, providing shareable and reproducible workflows. “Metabolic modeling is a routine analysis for isolated genomes, but it has not been done on a large scale for uncultivated microbes,” said Eloe-Fadrosh, “and we felt that the collaboration with KBase would add value beyond clustering. and the analysis of these MAGs “.

“Just putting this dataset into KBase is of immediate value because people can find high-quality MAGs and use them to provide future analyzes,” said José P. Faria, KBase computational biologist at Argonne National Laboratory. “The process of building a metabolic model is simple: just select a genome or MAG and press a button to build a model from our database of mapping between biochemical reactions and annotations. We look at what has been annotated in the genome and the model result to evaluate the metabolic capacity of the organism “. (Watch this KBase webinar on metabolic modeling.)

Elisha Wood-Charlson, head of KBase User Engagement, added that by demonstrating the ease with which metabolic models were generated from the GEM dataset, metagenomics researchers could consider branching into this space. ‘Most metagenomics researchers may not be willing to dive into a whole new field of research [metabolic modeling], but they may be interested in how biochemistry affects what they work on. The genomic community can now explore metabolism using KBase’s easy path from genomes or MAG to modeling that may not have been considered, “he said.

A community resource to facilitate research

Kostas Konstantinidis of the Georgia Institute of Technology, one of the co-authors whose data was part of the catalog, “I don’t think there are many institutions that can do this kind of large-scale metagenomics and that have the ability to it is done on this scale that individual laboratories cannot do, and it gives us new insights into microbial diversity and function. “

He is already finding ways to use the catalog in his research on how microbes respond to climate change. “With this dataset I can see where each microbe is and how abundant it is. This is very useful for my work and for others doing similar research.” Furthermore, he is interested in expanding the diversity of the reference database he is developing called the Microbial Genomes Atlas to allow for more robust analysis by adding MAGs.

“This is a great resource for the community,” added Konstantinidis. “It’s a dataset that will facilitate many more studies later on. And I hope the JGI and other institutions continue to do these kinds of projects.”

.

[ad_2]
Source link