Insertion of SARS-CoV-2 sequences into human cell genomes

Updated 31/05/2021 – see end.

RE-updated 10/06/2021 – see end

A group of researchers who claimed in a preprint a while ago that they could show integration of SARS-CoV-2 genomic sequences into the genome of cultured human cells has now doubled down, with a Proc Natl Acad Sci paper (!!) further claiming proof of ability to insert in cultured cells, and of proof of insertion in patient tissue.

The authors were investigating their hypothesis that inserted fragments of viral genomes that were not infectious, were responsible for the phenomenon of prolonged positive PCR tests in patients who had completely recovered from COVID-19, and who did not shed infectious virions. They investigated this by transfecting HEK293 cells with human LINE1 transposable element-encoding plasmids, then infecting them with SARS-CoV-2. The addition of LINE1 was “To increase the likelihood of detecting rare integration events“. They isolated DNA from cells 2 days post-infection, and did PCR amplification of the N gene from gel-purified “large fragment DNA” that they claim was successful. While they claim this as proof of reverse transcription and integration of the SARS2 N gene into genomic DNA, they went further and subjected extracted cell DNA to Nanopore long-read sequencing. This resulted in their finding evidence of integration of 63 instances of the whole or part of the genomic 3′-terminal N gene in a variety of chromosomal locations, flanked by host DNA sequences in 2 cases and on one side in 61, with a 20 bp direct repeat with “a consensus recognition sequence of the LINE1 endonuclease” in the two whole sequence instances. There appeared to be preferential insertion into exon-associated sites. The integrated DNA was mainly from the 3′ end of the SARS2 genome.

Figure by Ed Rybicki, copyright 2021

Repeating this analysis with SARS2-infected HEK293T and Calu3 cells that had not been transfected with LINE1 DNA gave 7 integrations, again characteristic of a LINE1-type mechanism, and again preferentially associated with exons.

Another claim they make is that integrated sequences can be expressed. They tested this by looking at published RNA-seq data for SARS2-infected cells and organoids from a variety of human tissues, and “found” a number (0.004 – 0.14% of all SARS2-specific reads) of “chimaeric reads”, or virus-human gene fusions in RNA. The abundance of these reads, correlated with the level (=concentration?) of viral RNAs, and most mapped to the SARS2 N gene – which makes the most abundant mRNAs. An important observation was the following:

“Single-cell analysis of patient lung bronchoalveolar lavage fluid (BALF) cells from patients with severe COVID … showed that up to 40% of all viral reads were derived from the negative-strand SARS-CoV-2 RNA …. Fractions of negative-strand RNA in tissues from some patients were orders of magnitude higher than those in acutely infected cells or organoids”,

because they go on to say (after admitting that they showed no chimaeric sequences in patient BALF samples), that:

“in some patient-derived tissues, where the total number of SARS-CoV-2 sequence-positive cells may be small, a large fraction of the viral transcripts could have been transcribed from SARS-CoV-2 sequences integrated into the host genome”.

Yes. Well. Ummmm…no. Seriously, no. Aside from the objections that others have raised – such as the fact that the way they analysed other data as well as their own undue notice of what could very well all be artefactual chimaeras – they do not appear to have a very deep understanding of how ssRNA+ viruses replicate, or that there may be circumstances – such as in dead or dying cells, or bits of cells resulting from processes such as apoptosis – where there is NOT a superabundance of ssRNA+ compared to RNA-. For example, in the “acutely infected cells” – presumably in culture – virus is replicating vigorously, and there could be expected to be a lot of progeny immature virions in addition to the double-membrane-enveloped replication complexes, which is where the RNA- is, engaged in making more ssRNA+. In quiescent, dying or dead cells, on the other hand, one would imagine all the assembling virions had budded, that replication would probably have stopped due to depletion of resources – and that only the replication centres, safe and protected from RNAses by their vesicle membranes, would be left. These might also form stable exosome-like structures, which would be a good thing to look for. Moreover, replication complexes are largely dsRNA – that is, essentially equal amounts of + and – strand RNA, which would account for their observations with no integration of viral RNA being required.

However, my objections are mainly directed at the model system they used in the first instance. The use of cultured cells in the first instance, and transfection of them with LINE1 elements for over-expression of RT in the second, is pretty much guaranteed to “force” outcomes that are highly unusual in natural infections. This is akin to saying “See, if I force-feed mice with 100x the recommended dose of X in the presence of known mutagens, it causes cancer!!” It is a TOTALLY artificial situation, done in a transformed human cell line, that has VERY little relevance to the real world. 

Of course, they also did the experiment in two cell lines without LINE1 transfection – and found a lower number of integrations. There is ALWAYS a chance (albeit very small) that a nucleic acid – RNA or DNA – could be integrated into a somatic cell, via illegitimate recombination or LINE1 element-mediated insertion. HOWEVER: integration of a random piece of SARS2 genome would almost certainly do nothing in that cell; moreover, even if the whole genome inserted, the cell would be killed by T-cells the same way an infected cell is – and they did not find very much more than N or partial N genes integrated, which is a tiny fraction of the relatively huge genome. It could be that the virus 3′ end has some unusual properties – it is an origin of replication for the virus genome, after all – that favour mRNAs deriving from it interacting with LINE1 transposition machinery, and being (occasionally) integrated.

While they had a hypothesis that integrated sequences were responsible for positive PCR tests long after “recovery” from infection, their evidence does not support this because they have not shown that all of the sequences targetted by PCR primers are present in the genomes of patients, or even of cells in their experiments. Presence of a product for just one viral gene does not constitute a positive diagnosis. Moreover, there is evidence for SARS2 reactivation months after initial infection, which could be explained far more easily by viral persistence in immune privileged sites, such as has also been demonstrated for Ebola virus disease. This persistence, or even the survival of dsRNA forms of the genome or even of fragments of it in dormant replication centres, would be a far more likely reason for persistence of PCR positivity.

However, and this is the important point I wanted to make, the ONLY way an insertion from SARS2 (or anything else) could cause any sort of a problem is if that insertion results in runaway malignant transformation (a lot more unlikely than the insertion event itself), or if it inserts into germline cells (egg, sperm precursors) AND is passed on to progeny. There, the probabilities start getting very, very small indeed.

So: a fuss about nothing, is what this “result” is. I bet you they could have showed the same for ANY RNA under the same set of conditions – and it would still mean nothing. You are a LOT more likely to have bits of nucleic acid from lettuce or tomatoes insert into gut cells, given you eat them FAR more often, and in quantities FAR greater than you are exposed to from a virus – and has anyone ever reported a problem with those?


So don’t worry about this much-hyped “discovery”.

Added 31/05/2021:

Aaaaaaaand…here’s someone who disliked the paper enough to refute it thoroughly, by experiment, no less! Nathan Smits et al. used nanopore long-read sequencing to show they could find NO proof of SARS2 sequences flanked by human DNA, in a context where they COULD find integrated single genomes of HBV, and multiple LINE insertions.

Human genome integration of SARS-CoV-2 contradicted by long-read sequencing


A recent study proposed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) hijacks the LINE-1 (L1) retrotransposition machinery to integrate into the DNA of infected cells. If confirmed, this finding could have significant clinical implications. Here, we applied deep (>50x) long-read Oxford Nanopore Technologies (ONT) sequencing to HEK293T cells infected with SARS-CoV-2, and did not find any evidence of the virus existing as DNA. By examining ONT data from separate HEK293T cultivars, we resolved the complete sequences of 78 L1 insertions arising in vitro in the absence of L1 overexpression systems. ONT sequencing applied to hepatitis B virus (HBV) positive liver cancer tissues located a single HBV insertion. These experiments demonstrate reliable resolution of retrotransposon and exogenous virus insertions via ONT sequencing. That we found no evidence of SARS-CoV-2 integration suggests such events in vivo are highly unlikely to drive later oncogenesis or explain post-recovery detection of the virus.

Added 09-06-2021

…and then someone else actually went and found SARS2 RNA in degraded lung tissue!

Persistence of SARS-CoV-2 RNA in lung tissue after mild COVID-19

On Dec 1, 2020, we reported a successful case of double-lung transplantation from a SARS-CoV-2 seropositive donor 105 days after the onset of mild COVID-19.1 Although repeated quantitative (q)RT-PCR analyses of donor nasopharyngeal swabs were negative, this technique detected RNA of the SARS-CoV-2 N gene (delta Ct 35) from a biopsy of the right lung taken during organ procurement. Viral culture of this biopsy was negative and donor-to-recipient transmission did not occur. Complementary orthogonal methods were needed to corroborate and interpret the qRT-PCR results.Therefore, we did ultrasensitive single-molecule fluorescence RNA in-situ hybridisation with RNAscope technology on formalin-fixed paraffin-embedded sections of the same lung biopsy (appendix p 1), and compared the results with those of a lung biopsy from a deceased patient with acute COVID-19 (figure A and Bappendix p 2). We stained 14 slides of the donor lung biopsy, each containing one 5 μm section, as follows: five slides with a probe for the N gene; five slides with a probe for the S gene; and four slides with probes for N and S. A probe for the basigin gene, which has been proposed to encode an alternative host recipient for SARS-CoV-2, served as a positive control on the ten slides stained for N or S only.2 We identified characteristic RNAscope puncta in three out of nine slides for the N probe, and in six out of nine slides for the S probe (figure C and D). These puncta appeared to be located in clumps of sloughed-off material, and no cells or cell nuclei could be discerned in this debris-like tissue. [my emphasis]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: