In a world starved for any fresh data to help clarify the origin of the COVID-19 pandemic, a study claiming to have unearthed early sequences of SARS-CoV-2 that were deliberately hidden was bound to ignite a sizzling debate.
The unreviewed paper, by evolutionary biologist Jesse Bloom of the Fred Hutchinson Cancer Research Center, asserts that a team of Chinese researchers sampled viruses from some of the earliest COVID-19 patients in Wuhan, China, posted the viral sequences to a widely used U.S. database, and then a few months later had the genetic information removed to “obscure their existence.”
To some scientists, the claims reinforce suspicions that China has something to hide about the origins of the pandemic. But critics of the preprint, posted yesterday on bioRxiv, say Bloom’s detective work is much ado about nothing, because the Chinese scientists later published the viral information in a different form, and the recovered sequences add little to what’s know about SARS-CoV-2’s origins.
The sequences, Bloom says, do support other evidence that the pandemic did not originate in Wuhan’s Huanan Seafood Market, where SARS-CoV-2 initially came to light.
Chinese health officials on 31 December 2019 tied the market to an outbreak of an “unexplained pneumonia,” but a month later, it had become clear that many of the earliest cases had no link to the location. The paper highlights three mutations found in SARS-CoV-2 collected from patients linked to the market that are not in the unearthed sequences of the coronavirus or its closest relative, which researchers from the Wuhan Institute of Virology discovered in bats in 2013.
Bloom’s more explosive assertion, that the Chinese researchers deleted data, is bound to intensify the debate about whether the virus originally jumped to humans from an unknown animal or somehow leaked from a laboratory.
Bloom says he has no bias toward a particular origin hypothesis for SARS-CoV-2, and he agrees that the viral sequences he highlighted are a small piece of a large unfinished puzzle. “I don’t think this bolsters either the lab origin or zoonosis hypothesis,” he says. “I think it provides additional evidence that this virus was probably circulating in Wuhan before December, certainly, and that probably, we have a less than complete picture of the sequences of the early viruses.”
Bloom, who studies viral evolution, launched his study after a controversial report on the pandemic’s origin issued in March by a joint commission of Chinese and foreign researchers organized by the World Health Organization (WHO). Bloom helped organize a much discussed letter, co-signed by 17 other scientists, that criticized the WHO report for deeming it “extremely unlikely” that SARS-CoV-2 escaped from a laboratory. In the letter, published on 14 May in Science, the authors argued for “a dispassionate science-based discourse on this difficult but important issue.”
The WHO report relied heavily on sequences of SARS-CoV-2 found in COVID-19 patients tied to the market, Bloom notes. “I was just going through and trying to repeat a number of the analyses in the joint WHO-China report,” Bloom says.
This led him to a study that listed all SARS-CoV-2 sequences submitted before 31 March 2020 to the Sequence Read Archive (SRA), a database overseen by the National Center for Biotechnology Information, a division of the U.S. National Institutes of Health (NIH). But when he checked SRA for one of the listed projects, he couldn’t find its sequences.
Googling some of the project’s information, he found another study, led by Ming Wang from Wuhan University’s Renmin Hospital, that was posted as preprint on 6 March on medRxiv, and later published, on 24 June, in Small, a journal more focused on materials and chemistry than virology.
That paper lists some of the earliest Wuhan COVID-19 patients and the specific mutations in their viruses, but doesn’t give the full sequence data. Further internet sleuthing led Bloom to discover that SRA backs up its information in Google’s Cloud platform, and a search there turned up files containing some of the Wang’s team earlier data submissions.
The paper in Small makes no mention of any corrections to viral sequences that might explain why they were removed from SRA, which led Bloom to conclude in his preprint that “the trusting structures of science have been abused to obscure sequences relevant to the early spread of SARS-CoV-2 in Wuhan.”
Bloom asserts that because the deleted sequences lack the three mutations seen in the SARS-CoV-2 from the seafood market, the viruses Wang’s team found more likely represent a progenitor.
But the sequence of that bat virus found in 2013 differs from SARS-CoV-2 by about 1100 nucleotides, which means decades must have passed before it evolved into the pandemic coronavirus—and other species may well have been infected with the bat virus before it made the final jump into people.
This great difference in sequences, says evolutionary biologist Andrew Rambaut at the University of Edinburgh, means researchers cannot use a few mutations like the ones Bloom highlights to look back in time to see the “roots” of the family tree of SARS-CoV-2 tree.