COVID-19 Patient Zero: Data analysis identifies the “mother” of all SARS-CoV-2 genomes.



[ad_1]

Temple researchers identified the first genome to transmit the coronavirus.

In the field of molecular epidemiology, the global scientific community has sought to solve the mystery of the early history of SARS-CoV-2.

Since the first SARS-CoV-2 virus infection was detected in December 2019, tens of thousands of its genomes have been sequenced around the world, showing that the coronavirus is mutating, albeit slowly, at a rate of 25 mutations. per genome per year.

Despite increased efforts, no one has yet identified the first case of human transmission or “patient zero” in the COVID-19 pandemic. Finding such a case is necessary to better understand how the virus could first be jumped from its animal host to infect humans and how the SARS-CoV-2 virus genome changes over time and spreads globally. .

“The SARS-CoV-2 virus carries an RNA genome that has infected more than 35 million people around the world,” said Sudhir Kumar, director of Temple University’s Institute of Genomics and Evolutionary Medicine. “We need to find this common ancestor, which we call the precursor genome.”

This precursor genome is the mother of all SARS-CoV-2 coronaviruses that now infect humans.

In the absence of Patient Zero, Kumar and his research team at Temple University may have found the next best resource to help the work of molecular epidemiology detectives around the world. “We set out to reconstruct the precursor genome using a large dataset of coronavirus genomes obtained from infected people,” said Sayaka Miura, lead author of the study.

They found that the “mother” of all SARS-CoV-2 genomes and their first offspring strains mutated and then spread to dominate the global pandemic. “We have now reconstructed the precursor genome and mapped where and when the first mutations occurred,” said Kumar, the corresponding author of a preprint study.

In this way, their work has provided new insight into the history of SARS-CoV-2’s early mutations. For example, their study reports that a mutation in the SARS-CoV-2 spike protein (D416G), which is often associated with increased infectivity and spread, appeared after several other mutations weeks after the onset of COVID-19. “It is almost always found with many other protein mutations, so its role in increasing infectivity remains difficult to determine,” said Sergei Pond, one of the study’s lead co-authors.
In addition to their insights into the early history of SARS-CoV-2, Kumar’s group developed mutation fingerprints to quickly identify strains and sub-strains that infect an individual or colonize a global region.

Pandemic Command

To identify the precursor genome, they used a mutation order analysis technique based on clonal analysis of the mutant strains and how often mutation pairs occur together in SARS-CoV-2 genomes.

First, Kumar’s team searched for data on nearly 30,000 complete genomes of SARS-CoV-2, the virus that causes COVID-19. In total, they analyzed 29,681 SARS-CoV-2 genomes, each containing at least 28,000 sequence databases. These genomes were examined between December 24, 2019 and July 7, 2020 in 97 countries and regions around the world.

Many previous attempts to analyze such large datasets were unsuccessful because “the goal is to create a SARS-CoV-2 evolution tree,” says Kumar. “This coronavirus is developing too slowly, the number of genomes to be analyzed is too large, and the quality of the genome data is very different. I immediately saw parallels between the properties of this coronavirus genetic data and the genetic data of the clonal spread of another nefarious disease, cancer. “

Kumar’s group has developed and studied many techniques for analyzing the genetic data of tumors in cancer patients. They adapted and innovated these techniques, creating a trail of mutations that automatically trace back to ancestors. “Basically, the genome before the first mutation was that of the ancestor,” Kumar said. “The approach to mutation monitoring is beautiful and predicts a phylogeny of the” major strains “of SARS-CoV-2. It’s a great example of how big data coupled with biologically sound data mining reveals important patterns.”

Precursor genome

Kumar’s team discovered a predicted sequence of the precursor genome (parent genome) of all SARS-CoV-2 (proCoV2) genomes. In the proCoV2 genome, they identified 170 non-synonyms (mutations that cause amino acid change in a protein) and 958 synonymous substitutions with respect to the genome of a closely related coronavirus, RaTG13, found in a bat Rhinolophus affinis. Although the intermediate animal from bats to humans is still unknown, this was a 96.12% sequence similarity between the proCoV2 and RaTG13 sequences.

Next, they identified 49 single nucleotide (SNV) variants from their dataset that had a variant frequency greater than 1%. These were further studied to examine their mutation patterns as well

[ad_2]
Source link