[ad_1]
In the field of molecular epidemiology, the global scientific community has sought to solve the mystery of the early history of SARS-CoV-2.
Since the first SARS-CoV-2 virus infection was detected in December 2019, tens of thousands of its genomes have been sequenced around the world, showing that the coronavirus is mutating, albeit slowly, at a rate of 25 mutations. per genome per year.
Despite increased efforts, no one has yet identified the first case of human transmission or “patient zero” in the COVID-19 pandemic. Finding such a case is necessary to better understand how the virus may first have passed from its animal host to infect humans, as well as the story of how the SARS-CoV-2 viral genome mutates over time and has evolved. spread globally.
“The SARS-CoV-2 virus carries an RNA genome that has infected more than 35 million people around the world,” said Sudhir Kumar, director of Temple University’s Institute of Genomics and Evolutionary Medicine. “We need to find this common ancestor, which we call the precursor genome.”
This precursor genome is the mother of all SARS-CoV-2 coronaviruses that now infect humans.
In the absence of Patient Zero, Kumar and his research team at Temple University may have found the next best resource to help the work of molecular epidemiology detectives around the world. “We set out to reconstruct the precursor genome using a large dataset of coronavirus genomes obtained from infected people,” said Sayaka Miura, lead author of the study.
They found that the “mother” of all SARS-CoV-2 genomes and their early progeny strains mutated and then spread to dominate the global pandemic. “We have now reconstructed the precursor genome and mapped where and when the first mutations occurred,” said Kumar, the corresponding author of a preprint study that can be found on the bioRxiv server.
In this way, their work has provided new insight into the history of SARS-CoV-2’s early mutations. For example, their study reports that a mutation in the SARS-CoV-2 spike protein (D416G), which is often associated with increased infectivity and spread, appeared after several other mutations weeks after the onset of COVID-19. “It is almost always found alongside many other protein mutations, making it difficult to determine its role in increasing infectivity,” said Sergei Pond, one of the study’s lead co-authors.
In addition to their insights into the early history of SARS-CoV-2, Kumar’s group developed mutation fingerprints to quickly identify strains and sub-strains that infect an individual or colonize a global region.
Pandemic Command
To identify the precursor genome, they used a mutation order analysis technique based on clonal analysis of the mutant strains and how often mutation pairs occur together in SARS-CoV-2 genomes.
First, Kumar’s team searched for data on nearly 30,000 complete genomes of SARS-CoV-2, the virus that causes COVID-19. In total, they analyzed 29,681 SARS-CoV-2 genomes, each containing at least 28,000 sequence databases. These genomes were examined between December 24, 2019 and July 7, 2020 in 97 countries and regions around the world.
Many previous attempts to analyze such large datasets have not been successful because “the goal is to create a SARS-CoV-2 evolution tree,” says Kumar. “This coronavirus is developing too slowly, the number of genomes to be analyzed is too large, and the quality of the genome data is very different. I immediately saw parallels between the properties of this coronavirus genetic data and the genetic data of someone else’s clonal spread. “Shameful disease, cancer.”
Kumar’s group has developed and studied many techniques for analyzing the genetic data of tumors in cancer patients. They adapted and innovated these techniques, creating a trail of mutations that automatically trace back to ancestors. “Basically, the genome before the first mutation was that of the ancestor,” Kumar said. “The approach to mutation monitoring is beautiful and predicts the phylogeny of the ‘major strains’ of SARS-CoV-2. It’s a great example of how big data coupled with biologically sound data mining can reveal important patterns. “
Precursor genome
Kumar’s team discovered a predicted sequence of the precursor genome (parent genome) of all SARS-CoV-2 (proCoV2) genomes. In the proCoV2 genome, they identified 170 non-synonyms (mutations that cause amino acid change in a protein) and 958 synonymous substitutions with respect to the genome of a closely related coronavirus, RaTG13, found in a bat Rhinolophus affinis. Although the intermediate animal from bats to humans is still unknown, this was a 96.12% sequence similarity between the proCoV2 and RaTG13 sequences.
Next, they identified 49 single nucleotide (SNV) variants from their dataset that had a variant frequency greater than 1%. These were further studied to examine their mutation patterns and their global distribution.
“The tree of mutations p
Source link