{"id":31922,"date":"2023-08-05T08:58:36","date_gmt":"2023-08-05T08:58:36","guid":{"rendered":"https:\/\/evaggelatos.com\/?p=31922"},"modified":"2023-08-05T08:58:36","modified_gmt":"2023-08-05T08:58:36","slug":"reverse-transcribed-sars-cov-2-rna-can-integrate-into-the-genome-of-cultured-human-cells-and-can-be-expressed-in-patient-derived-tissues-2","status":"publish","type":"post","link":"https:\/\/evaggelatos.com\/?p=31922","title":{"rendered":"Reverse-transcribed SARS-CoV-2 RNA can integrate into the genome of cultured human cells and can be expressed in patient-derived tissues"},"content":{"rendered":"<p>Liguo Zhanga, Alexsia Richardsa, M. Inmaculada Barrasaa\ue840, Stephen H. Hughesb\ue840, Richard A. Younga,c, and Rudolf Jaenischa,c,1<br \/>\na Whitehead Institute for Biomedical Research, Cambridge, MA 02142; b<\/p>\n<p>HIV Dynamics and Replication Program, Center for Cancer Research, National Cancer<\/p>\n<p>Institute, Frederick, MD 21702; and c<\/p>\n<p>Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142<br \/>\nContributed by Rudolf Jaenisch, April 19, 2021 (sent for review March 29, 2021; reviewed by Anton Berns and Anna Marie Skalka)<\/p>\n<p>Prolonged detection of severe acute respiratory syndrome coro-<br \/>\nnavirus 2 (SARS-CoV-2) RNA and recurrence of PCR-positive tests<\/p>\n<p>have been widely reported in patients after recovery from COVID-<br \/>\n19, but some of these patients do not appear to shed infectious<\/p>\n<p>virus. We investigated the possibility that SARS-CoV-2 RNAs can be<br \/>\nreverse-transcribed and integrated into the DNA of human cells in<br \/>\nculture and that transcription of the integrated sequences might<\/p>\n<p>account for some of the positive PCR tests seen in patients. In sup-<br \/>\nport of this hypothesis, we found that DNA copies of SARS-CoV-2<\/p>\n<p>sequences can be integrated into the genome of infected human<br \/>\ncells. We found target site duplications flanking the viral sequences<br \/>\nand consensus LINE1 endonuclease recognition sequences at the<\/p>\n<p>integration sites, consistent with a LINE1 retrotransposon-<br \/>\nmediated, target-primed reverse transcription and retroposition<\/p>\n<p>mechanism. We also found, in some patient-derived tissues, evi-<br \/>\ndence suggesting that a large fraction of the viral sequences is<\/p>\n<p>transcribed from integrated DNA copies of viral sequences, gener-<br \/>\nating viral\u2013host chimeric transcripts. The integration and transcrip-<br \/>\ntion of viral sequences may thus contribute to the detection of viral<\/p>\n<p>RNA by PCR in patients after infection and clinical recovery. Because<br \/>\nwe have detected only subgenomic sequences derived mainly from<br \/>\nthe 3\u2032 end of the viral genome integrated into the DNA of the host<br \/>\ncell, infectious virus cannot be produced from the integrated<br \/>\nsubgenomic SARS-CoV-2 sequences.<br \/>\nSARS-CoV-2 | reverse transcription | LINE1 | genomic integration | chimeric<br \/>\nRNAs<br \/>\nContinuous or recurrent positive severe acute respiratory<br \/>\nsyndrome coronavirus 2 (SARS-CoV-2) PCR tests have been<br \/>\nreported in samples taken from patients weeks or months after<br \/>\nrecovery from an initial infection (1\u201317). Although bona fide<br \/>\nreinfection with SARS-CoV-2 after recovery has recently been<br \/>\nreported (18), cohort-based studies with subjects held in strict<br \/>\nquarantine after they recovered from COVID-19 suggested that<br \/>\nat least some \u201cre-positive\u201d cases were not caused by reinfection<\/p>\n<p>(19, 20). Furthermore, no replication-competent virus was iso-<br \/>\nlated or spread from these PCR-positive patients (1\u20133, 5, 6, 12,<\/p>\n<p>16), and the cause for the prolonged and recurrent production of<br \/>\nviral RNA remains unknown. SARS-CoV-2 is a positive-stranded<\/p>\n<p>RNA virus. Like other beta-coronaviruses (SARS-CoV-1 and Mid-<br \/>\ndle East respiratory syndrome-related coronavirus), SARS-CoV-2<\/p>\n<p>employs an RNA-dependent RNA polymerase to replicate its<br \/>\ngenomic RNA and transcribe subgenomic RNAs (21\u201324). One<br \/>\npossible explanation for the continued detection of SARS-CoV-2<br \/>\nviral RNA in the absence of virus reproduction is that, in some<br \/>\ncases, DNA copies of viral subgenomic RNAs may integrate<br \/>\ninto the DNA of the host cell by a reverse transcription<br \/>\nmechanism. Transcription of the integrated DNA copies could<\/p>\n<p>be responsible for positive PCR tests long after the initial in-<br \/>\nfection was cleared. Indeed, nonretroviral RNA virus sequences<\/p>\n<p>have been detected in the genomes of many vertebrate species<br \/>\n(25, 26), with several integrations exhibiting signals consistent<\/p>\n<p>with the integration of DNA copies of viral mRNAs into the<br \/>\ngermline via ancient long interspersed nuclear element (LINE)<br \/>\nretrotransposons (reviewed in ref. 27). Furthermore, nonretroviral<br \/>\nRNA viruses such as vesicular stomatitis virus or lymphocytic<br \/>\nchoriomeningitis virus (LCMV) can be reverse transcribed into<br \/>\nDNA copies by an endogenous reverse transcriptase (RT), and<br \/>\nDNA copies of the viral sequences have been shown to integrate<br \/>\ninto the DNA of host cells (28\u201330). In addition, cellular RNAs, for<br \/>\nexample the human APP transcripts, have been shown to be<\/p>\n<p>reverse-transcribed by endogenous RT in neurons with the re-<br \/>\nsultant APP fragments integrated into the genome and expressed<\/p>\n<p>(31). Human LINE1 elements (\u223c17% of the human genome), a<\/p>\n<p>type of autonomous retrotransposons, which are able to retro-<br \/>\ntranspose themselves and other nonautonomous elements such<\/p>\n<p>as Alu, are a source of cellular endogenous RT (32\u201334). Endog-<br \/>\nenous LINE1 elements have been shown to be expressed in aged<\/p>\n<p>human tissues (35) and LINE1-mediated somatic retro-<br \/>\ntransposition is common in cancer patients (36, 37). Moreover,<\/p>\n<p>expression of endogenous LINE1 and other retrotransposons<br \/>\nin host cells is commonly up-regulated upon viral infection,<br \/>\nincluding SARS-CoV-2 infection (38\u201340).<br \/>\nSignificance<br \/>\nAn unresolved issue of SARS-CoV-2 disease is that patients<br \/>\noften remain positive for viral RNA as detected by PCR many<br \/>\nweeks after the initial infection in the absence of evidence for<br \/>\nviral replication. We show here that SARS-CoV-2 RNA can be<\/p>\n<p>reverse-transcribed and integrated into the genome of the in-<br \/>\nfected cell and be expressed as chimeric transcripts fusing viral<\/p>\n<p>with cellular sequences. Importantly, such chimeric transcripts<br \/>\nare detected in patient-derived tissues. Our data suggest that,<br \/>\nin some patient tissues, the majority of all viral transcripts are<\/p>\n<p>derived from integrated sequences. Our data provide an in-<br \/>\nsight into the consequence of SARS-CoV-2 infections that may<\/p>\n<p>help to explain why patients can continue to produce viral RNA<br \/>\nafter recovery.<br \/>\nAuthor contributions: L.Z., R.A.Y., and R.J. designed research; L.Z. and A.R. performed<br \/>\nexperiments; L.Z., A.R., M.I.B., S.H.H., R.A.Y., and R.J. analyzed data; and L.Z. and R.J.<br \/>\nwrote the paper with input from all authors.<br \/>\nReviewers: A.B., Netherlands Cancer Institute; and A.M.S., Fox Chase Cancer Center.<br \/>\nCompeting interest statement: R.J. is an advisor\/co-founder of Fate Therapeutics, Fulcrum<br \/>\nTherapeutics, Omega Therapeutics, and Dewpoint Therapeutics. R.A.Y. is a founder and<br \/>\nshareholder of Syros Pharmaceuticals, Camp4 Therapeutics, Omega Therapeutics, and<br \/>\nDewpoint Therapeutics.<br \/>\nThis open access article is distributed under Creative Commons Attribution License 4.0<br \/>\n(CC BY).<br \/>\n1<br \/>\nTo whom correspondence may be addressed. Email: jaenisch@wi.mit.edu.<br \/>\nThis article contains supporting information online at https:\/\/www.pnas.org\/lookup\/suppl\/<br \/>\ndoi:10.1073\/pnas.2105968118\/-\/DCSupplemental.<br \/>\nPublished May 6, 2021.<\/p>\n<p>PNAS 2021 Vol. 118 No. 21 e2105968118 https:\/\/doi.org\/10.1073\/pnas.2105968118 | 1 of 10<\/p>\n<p>In this study, we show that SARS-CoV-2 sequences can inte-<br \/>\ngrate into the host cell genome by a LINE1-mediated retro-<br \/>\nposition mechanism. We provide evidence that the integrated<\/p>\n<p>viral sequences can be transcribed and that, in some patient<br \/>\nsamples, the majority of viral transcripts appear to be derived<br \/>\nfrom integrated viral sequences.<br \/>\nResults<br \/>\nIntegration of SARS-CoV-2 Sequences into the DNA of Host Cells in<br \/>\nCulture. We used three different approaches to detect genomic<br \/>\nSARS-CoV-2 sequences integrated into the genome of infected<\/p>\n<p>cells. These approaches were Nanopore long-read sequencing, Illu-<br \/>\nmina paired-end whole genomic sequencing, and Tn5 tagmentation-<br \/>\nbased DNA integration site enrichment sequencing. All three<\/p>\n<p>methods provided evidence that SARS-CoV-2 sequences can<br \/>\nbe integrated into the genome of the host cell.<br \/>\nTo increase the likelihood of detecting rare integration events,<br \/>\nwe transfected HEK293T cells with LINE1 expression plasmids<br \/>\nprior to infection with SARS-CoV-2 and isolated DNA from the<br \/>\ncells 2 d after infection (SI Appendix, Fig. S1A). We detected<br \/>\nDNA copies of SARS-CoV-2 nucleocapsid (NC) sequences in<br \/>\nthe infected cells by PCR (SI Appendix, Fig. S1B) and cloned the<br \/>\ncomplete NC gene (SI Appendix, Fig. S1D) from large-fragment<br \/>\ncell genomic DNA that had been gel-purified (SI Appendix, Fig.<br \/>\nS1C). The viral DNA sequence (NC) was confirmed by Sanger<br \/>\nsequencing (Dataset S1). These results suggest that SARS-CoV-2<br \/>\nRNA can be reverse-transcribed, and the resulting DNA could be<br \/>\nintegrated into the genome of the host cell.<br \/>\nTo demonstrate directly that the SARS-CoV-2 sequences were<\/p>\n<p>integrated into the host cell genome, DNA isolated from in-<br \/>\nfected LINE1-overexpressing HEK293T cells was used for<\/p>\n<p>Nanopore long-read sequencing (Fig. 1A). Fig. 1 B\u2013D shows an<br \/>\nexample of a full-length viral NC subgenomic RNA sequence<br \/>\n(1,662 bp) integrated into the cell chromosome X and flanked on<br \/>\nboth sides by host DNA sequences. Importantly, the flanking<\/p>\n<p>sequences included a 20-bp direct repeat. This target site du-<br \/>\nplication is a signature of LINE1-mediated retro-integration (41,<\/p>\n<p>42). Another viral integrant comprising a partial NC subgenomic<br \/>\nRNA sequence that was flanked by a duplicated host cell DNA<br \/>\ntarget sequence is shown in SI Appendix, Fig. S2 A\u2013C. In both<br \/>\ncases, the flanking sequences contained a consensus recognition<br \/>\nsequence of the LINE1 endonuclease (43). These results indicate<br \/>\nthat SARS-CoV-2 sequences can be integrated into the genomes<\/p>\n<p>of cultured human cells by a LINE1-mediated retroposition mech-<br \/>\nanism. Table 1 summarizes all of the linked SARS-CoV-2\u2013host se-<br \/>\nquences that were recovered. DNA copies of portions of the viral<\/p>\n<p>genome were found in almost all human chromosomes. In addi-<br \/>\ntion to the two examples given in Fig. 1 and SI Appendix, Fig. S2,<\/p>\n<p>we also recovered cellular sequences for 61 integrants for which<\/p>\n<p>only one of the two host\u2013viral junctions was retrieved (SI Ap-<br \/>\npendix, Fig. S2 D\u2013F and Table 1; Nanopore reads containing the<\/p>\n<p>chimeric sequences summarized in Dataset S2). Importantly,<br \/>\nabout 67% of the flanking human sequences included either a<br \/>\nconsensus or a variant LINE1 endonuclease recognition sequence<br \/>\n(such as TTTT\/A) (SI Appendix, Fig. S2 D\u2013F and Table 1). These<\/p>\n<p>LINE1 recognition sequences were either at the chimeric junc-<br \/>\ntions that were directly linked to the 3\u2032 end (poly-A tail) of viral<\/p>\n<p>sequences, or within a distance of 8\u201327 bp from the junctions that<br \/>\nwere linked to the 5\u2032 end of viral sequences, which is within the<br \/>\npotential target site duplication. Both results are consistent with a<\/p>\n<p>model in which LINE1-mediated retroposition provides a mech-<br \/>\nanism to integrate DNA copies of SARS-CoV-2 subgenomic<\/p>\n<p>fragments into host genomic DNA. About 71% of the viral se-<br \/>\nquences were flanked by intron or intergenic cellular sequences<\/p>\n<p>and 29% by exons (Fig. 1F and Table 1). Thus, the association of<br \/>\nthe viral sequences with exons is much higher than would be<\/p>\n<p>expected for random integration into the genome [human ge-<br \/>\nnome: 1.1% exons, 24% introns, and 75% intergenic DNA (44)],<\/p>\n<p>suggestive of preferential integration into exon-associated target<br \/>\nsites. While previous studies showed no preference for LINE1<br \/>\nretroposition into exons (45, 46), our finding suggests that LINE1-<br \/>\nmediated retroposition of some other RNAs may be different. We<br \/>\nnoted that viral\u2013cellular boundaries were frequently close to the 5\u2032<br \/>\nor 3\u2032 untranslated regions (UTRs) of the cellular genes, suggesting<br \/>\nthat there is a preference for integration close to promoters or<br \/>\npoly(A) sites in our experimental system.<\/p>\n<p>To confirm the integration of SARS-CoV-2 sequences into ge-<br \/>\nnomic DNA by another method, we subjected DNA isolated from<\/p>\n<p>LINE1-transfected and SARS-CoV-2\u2013infected HEK293T cells to<br \/>\nIllumina paired-end whole-genome sequencing, using a Tn5-based<br \/>\nlibrary construction method (Illumina Nextera) to avoid ligation<br \/>\nartifacts. Viral DNA reads were concentrated at the 3\u2032 end of<br \/>\nthe SARS-CoV-2 genome (SI Appendix, Fig. S3). We recovered<br \/>\n17 viral integrants (sum of two replicates), by mapping human\u2013<br \/>\nviral chimeric DNA sequences (Fig. 1E and Table 2, chimeric<br \/>\nsequences summarized in Dataset S3); 7 (41%) of the junctions<\/p>\n<p>contained either a consensus or a variant LINE1 recognition se-<br \/>\nquence in the cellular sequences near the junction (Fig. 1E and<\/p>\n<p>Table 2), consistent with a LINE1-mediated retroposition mech-<br \/>\nanism. Similar to the results obtained from Nanopore sequencing,<\/p>\n<p>about 76% of the viral sequences were flanked by intron or<br \/>\nintergenic cellular sequences and 24% by exons (Fig. 1F and<br \/>\nTable 2).<br \/>\nAbout 32% of SARS-CoV-2 sequences (6\/21 integration<br \/>\nevents in Nanopore, 4\/10 in Illumina data) were integrated at<br \/>\nLINEs, short interspersed nuclear elements, or long terminal<br \/>\nrepeat elements without evidence for a LINE1 recognition site,<br \/>\nsuggesting that there may be an alternative reverse transcription\/<br \/>\nintegration mechanism, possibly similar to that reported for cells<br \/>\nacutely infected with LCMV, which resulted in integrated<br \/>\nLCMV sequences fused to intracisternal A-type particle (IAP)<br \/>\nsequences (29).<\/p>\n<p>To assess whether genomic integration of SARS-CoV-2 se-<br \/>\nquences could also occur in infected cells that did not over-<br \/>\nexpress RT, we isolated DNA from virus-infected HEK293T and<\/p>\n<p>Calu3 cells that were not transfected with an RT expression<br \/>\nplasmid (Fig. 2A). Tn5 tagmentation-mediated DNA integration<br \/>\nsite enrichment sequencing (47, 48) (Fig. 2B and SI Appendix,<br \/>\nFig. S4A) detected a total of seven SARS-CoV-2 sequences<\/p>\n<p>fused to cellular sequences in these cells (sum of three inde-<br \/>\npendent infections of two cell lines), all of which showed LINE1<\/p>\n<p>recognition sequences close to the human\u2013SARS-CoV-2 se-<br \/>\nquence junctions (Fig. 2 C\u2013F and SI Appendix, Fig. S4 B\u2013D,<\/p>\n<p>chimeric sequences summarized in Dataset S4).<br \/>\nExpression of Viral\u2013Cellular Chimeric Transcripts in Infected Cultured<br \/>\nCells and Patient-Derived Tissues. To investigate the possibility that<br \/>\nSARS-CoV-2 sequences integrated into the genome can be<br \/>\nexpressed, we analyzed published RNA-seq data from<br \/>\nSARS-CoV-2\u2013infected cells for evidence of chimeric transcripts<br \/>\n(49). Examination of these datasets (50\u201355) (SI Appendix, Fig.<\/p>\n<p>S5) revealed a number of human\u2013viral chimeric reads (SI Ap-<br \/>\npendix, Fig. S6 A and B). These occurred in multiple sample<\/p>\n<p>types, including cultured cells and organoids from lung\/heart\/<br \/>\nbrain\/stomach tissues (SI Appendix, Fig. S6B). The abundance of<br \/>\nthe chimeric reads positively correlated with viral RNA level<br \/>\nacross the sample types (SI Appendix, Fig. S6B). Chimeric reads<br \/>\ngenerally accounted for 0.004\u20130.14% of the total SARS-CoV-2<br \/>\nreads in the samples. A majority of the chimeric junctions<\/p>\n<p>mapped to the sequence of the SARS-CoV-2 NC gene (SI Ap-<br \/>\npendix, Fig. S6 C and D). This is consistent with the finding that<\/p>\n<p>NC RNA is the most abundant SARS-CoV-2 subgenomic RNA<br \/>\n(56), making it the most likely target for reverse transcription<br \/>\nand integration. However, recent data showed that up to 1% of<\/p>\n<p>RNA-seq reads from SARS-CoV-2\u2013infected cells can be arti-<br \/>\nfactually chimeric as a result of RT switching between RNA<\/p>\n<p>2 of 10 | PNAS Zhang et al.<br \/>\nhttps:\/\/doi.org\/10.1073\/pnas.2105968118 Reverse-transcribed SARS-CoV-2 RNA can integrate into the genome of cultured human<br \/>\ncells and can be expressed in patient-derived tissues<\/p>\n<p>Downloaded at University of Patras on May 19, 2021<\/p>\n<p>\u201cHuman-CoV2-human\u201d chimeric read (Nanopore)<br \/>\nSARS-CoV-2<br \/>\n(1662 bp)<br \/>\nHuman<br \/>\n(43 bp)<\/p>\n<p>\u201cHuman-CoV2-human\u201d chimeric read (Nanopore)<br \/>\nalignment on Human ChrX<br \/>\nchrX: 137,084,400 137,084,600 137,084,800<br \/>\n21.1 q23 q25 q28<\/p>\n<p>TAAGATAATCCAACTTCATTTTTCTTCAATTGCTATTGCTTCTTGTCATTCTCTAAGAAGCTATTAAATC<br \/>\nACATGGGGATAGCACTACTAAAATTAATTTTGCATTGAGCTCTTCCATATAGGTGGCTCTCTAACATTGT<br \/>\n&#8230;&#8230;<br \/>\nTTATCAGACATTTTAGTTTGTTCGTTTAGAGAACAGATCTACAAGAGATCGAAAGTTGGTTGGTTTGTTA<br \/>\nCCTGGGAAGGTATAAACCTTTAATCGCTATTGCTTCTAAAAGGAAAAAATGAAAACAATTGCAGA&#8230;<\/p>\n<p>bp<br \/>\nHuman<br \/>\n(450 bp)<\/p>\n<p>Nanopore read alignment<\/p>\n<p>&#8212;&gt;CCAACT TCA T T T T TCT TCAA T TGCT A T TGCT TCT AAAAGGAAAAA TG<\/p>\n<p>\u201cHuman-CoV2-human\u201d chimeric read (Nanopore)<br \/>\nalignment on the SARS-CoV-2 genome<br \/>\nScale<br \/>\nNC_045512v2:<br \/>\n10 kb<br \/>\n10,000 15,000 20,000 25,000<\/p>\n<p>Nanopore read alignment<br \/>\nNCBI Genes from NC_045512.2<\/p>\n<p>ORF1a<br \/>\nORF1ab<\/p>\n<p>S<br \/>\nORF3a<br \/>\nE<br \/>\nM<br \/>\nORF6<br \/>\nORF7a<br \/>\nORF7b<br \/>\nORF8<br \/>\nN<\/p>\n<p>ORF10<br \/>\nI II III<br \/>\n&#8212;&gt;<br \/>\n10 bases<br \/>\n55 60 65 70 75 80 85<br \/>\nUUGUAGAUCUGUUCUCUAAACGAACUUUAAAAUCUGU<\/p>\n<p>&#8212;&gt;<br \/>\n10 bases<br \/>\n28,250 28,255 28,260 28,265 28,270 28,275 28,280<br \/>\nGAUUUCAUCUAAACGAACAAACUAAAAUGUCUGAUAA<br \/>\nNCBI Genes from NC_045512.2<\/p>\n<p>ORF8 D 119 F 120 I 121 * 122 N M 1 S 2 D 3 N 4<br \/>\n&#8212;&gt;<br \/>\n10 bases<br \/>\n29,860 29,865 29,870 29,875 29,880 29,885<br \/>\nCUUCUUAGGAGAAUGACAAAAAAAAAAAAAAAAAAAA<\/p>\n<p>TRS-L<\/p>\n<p>TRS-B<\/p>\n<p>I<\/p>\n<p>II<\/p>\n<p>III<br \/>\nDay:<br \/>\nHEK293T<br \/>\n+ LINE1 expression<br \/>\n1<br \/>\n\u00b1 SARS-CoV-2<br \/>\ninfection<\/p>\n<p>3<br \/>\nCell DNA extraction<br \/>\n(+RNase)<\/p>\n<p>A<\/p>\n<p>B<\/p>\n<p>C<\/p>\n<p>D<\/p>\n<p>E<\/p>\n<p>Target site duplication and LINE1 endonuclease recognition sequence (TTCT|A)<\/p>\n<p>F<\/p>\n<p>Intergenic<br \/>\nn=18<br \/>\n28.6%<br \/>\nIntron<br \/>\nn=27<br \/>\n42.9%<br \/>\nExon\/UTR<br \/>\nn=18<br \/>\n28.6%<br \/>\nHuman-CoV2 chimeric junction<br \/>\ndistribution in the human genome<br \/>\n\u201cHuman-CoV2\u201d chimeric read<br \/>\n(Illumina 2 x 150bp paired-end):<\/p>\n<p>Nanopore<\/p>\n<p>Whole genome sequencing<br \/>\n\u2022 Nanopore<br \/>\n\u2022 Illumina paired-end<\/p>\n<p>Scale<br \/>\nchr12:<br \/>\n500 bases<\/p>\n<p>hg38<br \/>\n85,420,000 85,420,500 85,421,000<\/p>\n<p>Scale<br \/>\nNC_045512v2:<br \/>\n500 bases<br \/>\n28,900 29,100 29,300 29,500 29,700 29,900<br \/>\nNCBI Genes from NC_045512.2<\/p>\n<p>N<\/p>\n<p>ORF10<br \/>\nHuman SARS-CoV-2<br \/>\nHuman Chr12<\/p>\n<p>SARS-CoV-2<br \/>\nScale<br \/>\nchr12:<br \/>\n&#8212;&gt;<\/p>\n<p>10 bases<br \/>\n85,420,490 85,420,500 85,420,510 85,420,520<br \/>\nT TCAGGGT TCAAAACCCGCT TCCCA T A TTTTTTTCCTTTCTTTAA<br \/>\nLINE1 endonuclease recognition sequence (TTTT|C)<\/p>\n<p>Exon\/UTR<br \/>\nn=4<br \/>\n23.5%<br \/>\nIntergenic<br \/>\nn=4<br \/>\n23.5%<br \/>\nIntron<br \/>\nn=9<br \/>\n53%<br \/>\nIllumina<\/p>\n<p>hg38<br \/>\n0 2<\/p>\n<p>Fig. 1. SARS-CoV-2 RNA can be reverse transcribed and integrated into the host cell genome. (A) Experimental workflow. (B) Chimeric sequence from a<br \/>\nNanopore sequencing read showing integration of a full-length SARS-CoV-2 NC subgenomic RNA sequence (magenta) and human genomic sequences (blue)<br \/>\nflanking both sides of the integrated viral sequence. Features indicative of LINE1-mediated \u201ctarget-primed reverse transcription\u201d include the target site<br \/>\nduplication (yellow highlight) and the LINE1 endonuclease recognition sequence (underlined). Sequences that could be mapped to both genomes are shown<br \/>\nin purple with mismatches to the human genomic sequences in italics. The arrows indicate sequence orientation with regard to the human and SARS-CoV-2<br \/>\ngenomes as shown in C and D. (C) Alignment of the Nanopore read in B with the human genome (chromosome X) showing the integration site. The human<br \/>\nsequences at the junction region show the target site, which was duplicated when the SARS-CoV-2 cDNA was integrated (yellow highlight) and the LINE1<br \/>\nendonuclease recognition sequence (underlined). (D) Alignment of the Nanopore read in B with the SARS-CoV-2 genome showing the integrated viral DNA is<br \/>\na copy of the full-length NC subgenomic RNA. The light blue highlighted regions are enlarged to show TRS-L (I) and TRS-B (II) sequences (underlined, these are<br \/>\nthe sequences where the viral polymerase jumps to generate the subgenomic RNA) and the end of the viral sequence at the poly(A) tail (III). These viral<br \/>\nsequence features (I\u2013III) show that a DNA copy of the full-length NC subgenomic RNA was retro-integrated. (E) A human\u2013viral chimeric read pair from<br \/>\nIllumina paired-end whole-genome sequencing. The read pair is shown with alignment to the human (blue) and SARS-CoV-2 (magenta) genomes. The arrows<br \/>\nindicate the read orientations relative to the human and SARS-CoV-2 genomes. The highlighted (light blue) region of the human read mapping is enlarged to<br \/>\nshow the LINE1 recognition sequence (underlined). (F) Distributions of human\u2013CoV2 chimeric junctions from Nanopore (Left) and Illumina (Right) sequencing<br \/>\nwith regard to features of the human genome.<br \/>\nZhang et al. PNAS | 3 of 10<br \/>\nReverse-transcribed SARS-CoV-2 RNA can integrate into the genome of cultured human<br \/>\ncells and can be expressed in patient-derived tissues<\/p>\n<p>https:\/\/doi.org\/10.1073\/pnas.2105968118<br \/>\nMEDICAL SCIENCES<\/p>\n<p>Downloaded at University of Patras on May 19, 2021<\/p>\n<p>templates, which can occur during the cDNA synthesis step in<br \/>\nthe preparation of a RNA-seq library (57). Thus, because there is<br \/>\na mixture of host mRNAs and positive-strand viral mRNAs in<br \/>\ninfected cells, the identification of genuine chimeric viral\u2013<br \/>\ncellular RNA transcripts is compromised by the generation of<br \/>\nartifactual chimeras in the assays.<br \/>\nWe reasoned that the orientation of an integrated DNA copy<br \/>\nof SARS-CoV-2 RNA should be random with respect to the<br \/>\norientation of the targeted host gene, predicting that about half<br \/>\nthe viral DNAs that were integrated into an expressed host gene<br \/>\nshould be in an orientation opposite to the direction of the host<br \/>\ncell gene\u2019s transcription (Fig. 3A). As predicted, \u223c50% of viral<\/p>\n<p>integrants in human genes were in the opposite orientation rel-<br \/>\native to the host gene in our Nanopore dataset (integration at<\/p>\n<p>human genes with LINE1 recognition sequences, Fig. 3B). Thus,<br \/>\nfor chimeric transcripts derived from integrated viral sequences,<br \/>\nwe would expect that \u223c50% of the chimeric transcripts should<br \/>\ncontain negative-strand viral sequences linked to positive-strand<br \/>\nhost RNA sequences. We therefore determined the fraction of<\/p>\n<p>the viral and human\u2013viral chimeric transcripts in infected cul-<br \/>\ntured cells\/organoids and in patient-derived tissues containing<\/p>\n<p>negative-strand viral RNA sequences.<br \/>\nThe replication of SARS-CoV2 RNA requires the synthesis of<\/p>\n<p>negative-strand viral RNA, which serves as template for repli-<br \/>\ncation of viral genomic RNA and transcription of viral sub-<br \/>\ngenomic positive-strand RNA (21). To assess the prevalence of<\/p>\n<p>negative-strand viral RNA in acutely infected cells, we<\/p>\n<p>determined the ratio of total positive to negative-strand RNAs.<br \/>\nBetween 0 and 0.1% of total viral reads were derived from<br \/>\nnegative-strand RNA in acutely infected Calu3 cells or lung<br \/>\norganoids [our data and published data (50, 58)] (Fig. 3C and SI<br \/>\nAppendix, Table S1), similar to what has been reported in clinical<br \/>\nsamples taken early after infection (59). These results argue that<br \/>\nthe level of negative-strand viral RNA is at least 1,000-fold lower<br \/>\nthan that of positive-strand viral RNA in acutely infected cells,<br \/>\ndue at least in part to a massive production of positive-strand<br \/>\nsubgenomic RNA during viral replication. This greatly reduces<br \/>\nthe likelihood that random template switching during the reverse<br \/>\ntranscription step in the RNA-seq library construction would<br \/>\ngenerate a large fraction of the artifactual chimeric reads that<br \/>\nwould contain viral negative-strand RNA fused to cellular<br \/>\npositive-strand RNA sequences. We determined that between<\/p>\n<p>0 and 1% of human\u2013viral chimeric reads contained negative-<br \/>\nstrand viral sequences in the acutely infected cells\/organoids<\/p>\n<p>(Fig. 3D and SI Appendix, Table S1), consistent with a small<br \/>\nfraction of viral reads being derived from integrated SARS-CoV-2<br \/>\nsequences.<br \/>\nIn contrast to the results obtained with acutely infected Calu3<br \/>\ncells or lung organoids, up to 51% of all viral reads, and up to<br \/>\n42.5% of human\u2013viral chimeric reads, were derived from the<\/p>\n<p>negative-strand SARS-CoV-2 RNA in some patient-derived tis-<br \/>\nsues [published data (60, 61), patient clinical background avail-<br \/>\nable in the original publications] (Fig. 3 E\u2013G and SI Appendix,<\/p>\n<p>Tables S2 and S3). Single-cell analysis of patient lung<br \/>\nTable 1. Summary of the human-CoV2 chimeric sequences obtained by Nanopore DNA sequencing of infected LINE1-overexpressing<br \/>\nHEK293T cells<br \/>\nNumber of sequences with<br \/>\nhuman-CoV2 junction<\/p>\n<p>With LINE1 recognition sequence at\/near<br \/>\njunction (e.g., TTTT\/A)<\/p>\n<p>Junction at human<br \/>\nintergenic<\/p>\n<p>Junction at<br \/>\nhuman intron<\/p>\n<p>Junction at human<br \/>\nexon\/UTR<br \/>\nchr1 10 6 0 6 4<br \/>\nchr2 2 2 0 2 0<br \/>\nchr3 3 3 0 3 0<br \/>\nchr4 2 2 0 1 1<br \/>\nchr5 1 1 0 1 0<br \/>\nchr6 4 2 3 0 1<br \/>\nchr7 2 2 1 1 0<br \/>\nchr8 0 0 0 0 0<br \/>\nchr9 4 2 0 2 2<br \/>\nchr10 5 1 2 1 2<br \/>\nchr11 3 2 1 1 1<br \/>\nchr12 6 4 2 2 2<br \/>\nchr13 3 3 3 0 0<br \/>\nchr14 2 2 1 1 0<br \/>\nchr15 0 0 0 0 0<br \/>\nchr16 2 1 1 1 0<br \/>\nchr17 2 0 1 0 1<br \/>\nchr18 2 1 0 2 0<br \/>\nchr19 1 1 0 0 1<br \/>\nchr20 0 0 0 0 0<br \/>\nchr21 2 1 1 1 0<br \/>\nchr22 1 1 0 1 0<br \/>\nchrX 6 5 2 1 3<br \/>\nTotal 63 42 18 27 18<br \/>\nFraction 66.7% 28.6% 42.9% 28.6%<\/p>\n<p>Table 2. Summary of the human-CoV2 chimeric sequences obtained by Illumina paired-end<br \/>\nwhole-genome DNA sequencing of infected LINE1-overexpressing HEK293T cells<br \/>\nRegion features (human) Intergenic Intron Exon\/UTR<br \/>\nRegion number 4 9 4<br \/>\nWith L1 recognition sequence at\/near junction 2 3 2<\/p>\n<p>4 of 10 | PNAS Zhang et al.<br \/>\nhttps:\/\/doi.org\/10.1073\/pnas.2105968118 Reverse-transcribed SARS-CoV-2 RNA can integrate into the genome of cultured human<br \/>\ncells and can be expressed in patient-derived tissues<\/p>\n<p>A<\/p>\n<p>B<\/p>\n<p>C<\/p>\n<p>F<br \/>\nCell line Viral primer Human LINE1 recognition Human genomic<br \/>\ntarget chromosome sequence feature<br \/>\nHEK293T Near 3\u2019 end of Chr15 TTTT|G Intergenic<br \/>\nviral genome<br \/>\nChr1 TTTT|G Intergenic<br \/>\nCalu3 (rep1) Near 5\u2019 end of Chr18 CTTT|A Intergenic<br \/>\nNC gene<br \/>\nChr2 TTTA|A UTR<br \/>\nChr12 TTTA|A Intron<br \/>\nCalu3 (rep2) Near 5\u2019 end of Chr12 TTTT|C Intron<br \/>\nNC gene<br \/>\nChr4 TTTT|A Intron<br \/>\nHuman-CoV2 chimeric sequence summary:<br \/>\nRead 1 Read 2<br \/>\n\u201cHuman-CoV2\u201d chimeric read (HEK293T)<br \/>\nnamuH 2-VoC-SRAS<\/p>\n<p>CTCAAACAGCCCTGCTTCAACTAGGGGAGAAAACACAGTGTTTGAATACCA<br \/>\nTGTGATGGTATCCATCCTGTTCCAGGTGGAGGATGCAGAGGATGGCCCCC<br \/>\nTGCACCTTCCAGGATAAGAAGATCCTGATGCAGTCAACTTACCACCAGGA<br \/>\n5\u2019&#8211;<\/p>\n<p>&#8211;3\u2019<\/p>\n<p>Read 1 (151 nt):<\/p>\n<p>5\u2019&#8211;<\/p>\n<p>&#8211;3\u2019<br \/>\nAGCGGAGTACGATCGAGTGTACAGTGAACAATGCTAGGGAGAGCTGCCTA<br \/>\nTATGGAAGAGCCCTAATGTGTAAAATTAATTTTAGTAGTGCTATCCCCATGT<br \/>\nGATTTTAATAGCTTCTTAGGAGAATGACAAAAAAAAAAAAAAAAGAATG<br \/>\nRead 2 (151 nt): primer<\/p>\n<p>\u201cHuman-CoV2\u201d chimeric read (HEK293T)<br \/>\nalignment on human Chr15<br \/>\nScale<br \/>\nchr15:<br \/>\n500 bases<\/p>\n<p>hg38<br \/>\n68,544,500 68,545,000<\/p>\n<p>Scale<br \/>\nchr15:<br \/>\n&#8212;&gt;<br \/>\n10 bases<br \/>\n68,544,775 68,544,785 68,544,795<br \/>\nCATACACATTCTTTTTTTTTTTTTTTTTTTGAGATG<br \/>\n&lt;&#8212; GTATGT GTAAG AAAAAAAAAAAAAAAAAAACTCTAC<\/p>\n<p>\u201cHuman-CoV2\u201d chimeric read (HEK293T)<br \/>\nalignment on the SARS-CoV-2 genome<\/p>\n<p>Scale<br \/>\nNC_045512v2:<br \/>\n500 bases<br \/>\n28,900 29,100 29,300 29,500 29,700 29,900<br \/>\nNCBI Genes from NC_045512.2<\/p>\n<p>N<\/p>\n<p>ORF10<br \/>\nprimer<\/p>\n<p>D<\/p>\n<p>E<br \/>\nLINE1 endonuclease recognition sequence (TTTT|G)<br \/>\nT T T TTT T T T TTT T T T TTTT<br \/>\nA A AAA A A A AAA A A A AAAA A<\/p>\n<p>SARS-CoV-2 Human<\/p>\n<p>Random adapter tag<br \/>\n-mentation by Tn5<\/p>\n<p>Primers<\/p>\n<p>PCR enrichment and<br \/>\nIllumina paired-end sequencing<\/p>\n<p>SARS-CoV-2 genome<\/p>\n<p>CGCGGAGTACGATCGAGTG<br \/>\nDay:<br \/>\nHEK293T or Calu3<br \/>\n+ SARS-CoV-2 infection<br \/>\n1<br \/>\nCell DNA extraction<br \/>\n(+RNase)<\/p>\n<p>PCR enrichment and Illumina<br \/>\npaired-end sequencing<\/p>\n<p>0 2<\/p>\n<p>Fig. 2. Evidence for integration of SARS-CoV-2 cDNA in cultured cells that do not overexpress a reverse transcriptase. (A) Experimental workflow. (B) Ex-<br \/>\nperimental design for the Tn5 tagmentation-mediated enrichment sequencing method used to map integration sites in the host cell genome. (C) A<\/p>\n<p>human\u2013viral chimeric read pair supporting viral integration. The reads are aligned with the human (blue) and SARS-CoV-2 (magenta) genomic sequences. The<br \/>\narrows indicate the read orientations relative to the human and SARS-CoV-2 genomes as shown in D and E. Sequence of the viral primer used for enrichment<br \/>\nis shown with green highlight in the read (corresponding to the green arrow illustrated in B). Sequences that could be mapped to both genomes are shown in<br \/>\npurple. (D) Alignment of the read pair in C with the human genome (chromosome 15, blue arrow). The highlighted (light blue) region of the human sequence<\/p>\n<p>is enlarged to show the LINE1 recognition sequence (underlined) with a 19-base poly-dT sequence (purple highlight) that could be annealed by the viral poly-<br \/>\nA tail for \u201ctarget-primed reverse transcription.\u201d Additional 5-bp human sequence (GAATG, blue) was captured in read 2 (C), supporting a bona fide inte-<br \/>\ngration site. (E) Alignment of the read pair in C with the SARS-CoV-2 genome (magenta). The viral primer sequence is shown with green highlight. (F)<\/p>\n<p>Summary of seven human\u2013viral chimeric sequences identified by the enrichment sequencing method in the two cell lines showing the integrated human<br \/>\nchromosomes, LINE1 recognition sequences close to the chimeric junction, and human genomic features at the read junction.<br \/>\nZhang et al. PNAS | 5 of 10<br \/>\nReverse-transcribed SARS-CoV-2 RNA can integrate into the genome of cultured human<br \/>\ncells and can be expressed in patient-derived tissues<\/p>\n<p>https:\/\/doi.org\/10.1073\/pnas.2105968118<br \/>\nMEDICAL SCIENCES<\/p>\n<p>Downloaded at University of Patras on May 19, 2021<\/p>\n<p>bronchoalveolar lavage fluid (BALF) cells from patients with<br \/>\nsevere COVID [published data (61)] showed that up to 40% of<br \/>\nall viral reads were derived from the negative-strand SARS-CoV-2<br \/>\nRNA (SI Appendix, Fig. S7). Fractions of negative-strand RNA in<br \/>\ntissues from some patients were orders of magnitude higher than<br \/>\nthose in acutely infected cells or organoids (Fig. 3 C\u2013G). In fixed<br \/>\n(formalin-fixed, paraffin-embedded [FFPE]) autopsy samples, in 4<br \/>\nout of 14 patients (Fig. 3E and SI Appendix, Table S2), and in<br \/>\nBALF samples, in 4 out of 6 patients (Fig. 3G and SI Appendix,<\/p>\n<p>Table S3), at least \u223c20% of the viral reads were derived from<br \/>\nnegative-strand viral RNA. In contrast to acutely infected cells<br \/>\n(Fig. 3 C and D and SI Appendix, Table S1), there was little or no<br \/>\nevidence for virus reproduction in these autopsy samples (60). As<br \/>\nsummarized in SI Appendix, Table S2, there were negative-strand<br \/>\nviral sequences in a large fraction of the human\u2013viral chimeric<br \/>\nreads (up to \u223c40%) in samples from one patient. Different<br \/>\nsamples derived from the same patient revealed a similarly high<br \/>\nfraction of negative viral strand\u2013human RNA reads. Several<\/p>\n<p>C<\/p>\n<p>F<\/p>\n<p>B<\/p>\n<p>Patient case<\/p>\n<p>AAAAAA<br \/>\nAAAAAA<\/p>\n<p>RNA-seq:<br \/>\n~100% CoV2 reads from positve strand<br \/>\nPositive strand CoV2 (sub)genomic RNA Random integration and transcription<br \/>\nHuman gene CoV2 integrant<br \/>\n~50% same orientation: CoV2 reads from positive strand<\/p>\n<p>A<\/p>\n<p>D<br \/>\nFraction of CoV2 chimeric<br \/>\nreads from negaitive strand<\/p>\n<p>Fraction of CoV2 chimeric<br \/>\nreads from negative strand<br \/>\nCalu3<br \/>\nLung organoid<\/p>\n<p>Patient case<\/p>\n<p>Fraction of CoV2 reads<br \/>\nfrom negative strand<br \/>\n1 2 3 5 6 7 8 9 10 11 A B C D<br \/>\n0.0001<br \/>\n0.001<br \/>\n0.01<br \/>\n0.1<br \/>\n1<\/p>\n<p>0<br \/>\n0.5<\/p>\n<p>1 8 9 11 C D<br \/>\n0.0001<br \/>\n0.001<br \/>\n0.01<br \/>\n0.1<br \/>\n1<\/p>\n<p>0<br \/>\n0.5<br \/>\n0.0001<br \/>\n0.001<br \/>\n0.01<br \/>\n0.1<br \/>\n1<\/p>\n<p>Fraction of CoV2 reads<\/p>\n<p>0<\/p>\n<p>from negative strand<br \/>\nCalu3<br \/>\nLung organoid<br \/>\n0.0001<br \/>\n0.001<br \/>\n0.01<br \/>\n0.1<br \/>\n1<\/p>\n<p>0<\/p>\n<p>G<br \/>\nSame Opposite<br \/>\n0.0<br \/>\n0.2<br \/>\n0.4<br \/>\n0.6<br \/>\n0.8<br \/>\nRelative orientation:<br \/>\nHuman gene vs<br \/>\nCoV2 integrant<br \/>\nFraction<br \/>\n15<br \/>\n(54%)<br \/>\n13<br \/>\n(46%)<\/p>\n<p>~50% opposite orientation: CoV2 reads from negative strand<\/p>\n<p>EFraction of CoV2 reads<br \/>\nfrom negative strand<br \/>\nC143<br \/>\nC145<br \/>\nC146<br \/>\nC148<br \/>\nC149<br \/>\nC152<\/p>\n<p>0.0001<br \/>\n0.001<br \/>\n0.01<br \/>\n0.1<br \/>\n1<\/p>\n<p>0<br \/>\n0.5<\/p>\n<p>Patient case<\/p>\n<p>Fig. 3. Negative-strand viral RNA-seq reads suggest that integrated SARS-CoV-2 sequences are expressed. (A) Schema predicting fractions of positive- or<br \/>\nnegative-strand SARS-CoV-2 RNA-seq reads that are derived from viral (sub)genomic RNAs or from transcripts of integrated viral sequences. The arrows<br \/>\n(Right) showing the orientation of an integrated SARS-CoV-2 (magenta) positive strand relative to the orientation of the host cellular gene (blue). (B)<br \/>\nFractions of SARS-CoV-2 sequences integrated into human genes with same (n = 15) or opposite (n = 13) orientation of the viral positive strand relative to the<br \/>\npositive strand of the human gene. A total of 28 integration events at human genes with LINE1 endonuclease recognition sequences were identified from our<br \/>\nNanopore DNA sequencing of infected LINE1-overexpressing HEK293T cells (Fig. 1A). (C) Fraction of total viral reads that are derived from negative-strand<br \/>\nviral RNA in acutely infected cells or organoids (see SI Appendix, Table S1 for details). (D) Fraction of human\u2013viral chimeric reads that contain viral sequences<br \/>\nderived from negative-strand viral RNA in acutely infected cells or organoids (see SI Appendix, Table S1 for details). (E) Fraction of total viral reads that are<br \/>\nderived from negative-strand viral RNA in published patient RNA-seq data (autopsy FFPE samples, GSE150316, samples with no viral reads or of low library<br \/>\nstrandedness quality not included; see SI Appendix, Table S2 for details; reanalysis results consistent with the original publication). (F) Fraction of human\u2013viral<br \/>\nchimeric reads that contain viral sequences derived from negative-strand viral RNA in published patient RNA-seq data (autopsy FFPE samples, GSE150316; see<br \/>\nSI Appendix, Table S2 for details). (G) Fraction of total viral reads that are derived from negative-strand viral RNA in published patient RNA-seq data (BALF<br \/>\nsamples, GSE145926; see SI Appendix, Table S3 for details). The red dashed lines in E\u2013G indicate the level at which 50% of all viral reads (E and G) or viral<br \/>\nsequences in human\u2013viral chimeric reads (F) were from negative-strand viral RNAs, a level expected if all the viral sequences were derived from integrated<br \/>\nsequences.<\/p>\n<p>6 of 10 | PNAS Zhang et al.<br \/>\nhttps:\/\/doi.org\/10.1073\/pnas.2105968118 Reverse-transcribed SARS-CoV-2 RNA can integrate into the genome of cultured human<br \/>\ncells and can be expressed in patient-derived tissues<\/p>\n<p>Downloaded at University of Patras on May 19, 2021<\/p>\n<p>other patient samples revealed lower fraction of negative viral<br \/>\nstrand RNA\u2013human RNA chimeras, which were, however, still<br \/>\nsignificantly higher than what was found in acutely infected cells<br \/>\n(Fig. 3 D and F and SI Appendix, Table S1 and S2). Because the<br \/>\nability to identify viral\u2013human chimeric reads using short-read<br \/>\nRNA-seq is limited, our analysis failed to show significant<\/p>\n<p>numbers of chimeric reads in patient BALF samples (SI Appen-<br \/>\ndix, Table S3). In summary, our data suggest that in some patient-<br \/>\nderived tissues, where the total number of SARS-CoV-2 sequence-<br \/>\npositive cells may be small, a large fraction of the viral transcripts<\/p>\n<p>could have been transcribed from SARS-CoV-2 sequences inte-<br \/>\ngrated into the host genome.<\/p>\n<p>Discussion<br \/>\nWe present here evidence that SARS-CoV-2 sequences can be<br \/>\nreverse-transcribed and integrated into the DNA of infected<br \/>\nhuman cells in culture. For two of the integrants, we recovered<br \/>\n\u201chuman\u2013viral\u2013human\u201d chimeric reads encompassing a direct<br \/>\ntarget site repeat (20 or 13 bp), and a consensus recognition site<br \/>\nof the LINE1 endonuclease was present on both ends of the host<br \/>\nDNA that flanked the viral sequences. These and other data are<\/p>\n<p>consistent with a target primed reverse transcription and retro-<br \/>\nposition integration mechanism (41, 42) and suggest that en-<br \/>\ndogenous LINE1 RT can be involved in the reverse transcription<\/p>\n<p>and integration of SARS-CoV-2 sequences in the genomes of<br \/>\ninfected cells.<br \/>\nApproximately 30% of viral integrants analyzed in cultured<\/p>\n<p>cells lacked a recognizable nearby LINE1 endonuclease recog-<br \/>\nnition site. Thus, it is also possible that integration can occur by<\/p>\n<p>another mechanism. Indeed, there is evidence that chimeric<br \/>\ncDNAs can be produced in cells acutely infected with LCMV by<br \/>\ncopy choice with endogenous IAP elements during reverse<br \/>\ntranscription. This mechanism is expected to create a chimeric<br \/>\ncDNA complementary to both LCMV and IAP. In some cases,<\/p>\n<p>the resulting chimeric cDNAs were integrated without the gen-<br \/>\neration of a target site duplication (29). A recent study has also<\/p>\n<p>suggested that the interaction between coronavirus sequences<\/p>\n<p>and endogenous retrotransposon could be a potential viral in-<br \/>\ntegration mechanism (40).<\/p>\n<p>It will be important, in follow-up studies, to demonstrate the<br \/>\npresence of SARS-CoV-2 sequences integrated into the host<br \/>\ngenome in patient tissues. However, this will be technically<br \/>\nchallenging because only a small fraction of cells in any patient<br \/>\ntissues are expected to be positive for viral sequences (61).<\/p>\n<p>Consistent with this notion, it has been estimated that only be-<br \/>\ntween 1 in 1,000 and 1 in 100,000 mouse cells infected with<\/p>\n<p>LCMV either in culture or in the animal carried viral DNA<br \/>\ncopies integrated into the genome (30). In addition, only a<br \/>\nfraction of patients may carry SARS-CoV-2 sequences integrated<br \/>\nin the DNA of some cells. However, with more than 140 million<br \/>\nhumans infected with SARS-CoV-2 worldwide (as of April,<br \/>\n2021), even a rare event could be of significant clinical relevance.<\/p>\n<p>It is also challenging to estimate the frequency of retro-<br \/>\nintegration events in cell culture assays since infected cells usu-<br \/>\nally die and are lost before sample collection. For the same<\/p>\n<p>reason, no clonal expansion of integrated cells is expected in<br \/>\nacute infection experiments. Moreover, the chance of integration<br \/>\nat the same genomic locus in different patients\/tissues may be<br \/>\nlow, due to a random integration process.<br \/>\nThe presence of chimeric virus\u2013host RNAs in cells cannot<br \/>\nalone be taken as strong evidence for transcription of integrated<br \/>\nviral sequences because template switching can happen during<br \/>\nthe reverse transcription step of cDNA library preparation.<br \/>\nHowever, we found that only a very small fraction (0\u20131%) of<\/p>\n<p>chimeric reads from acutely infected cells contained negative-<br \/>\nstrand viral RNA sequences, whereas, in the RNA-seq libraries<\/p>\n<p>prepared from some patients, the fraction of total viral reads,<br \/>\nand the fraction of human\u2013viral chimeric reads that were derived<\/p>\n<p>from negative-strand SARS-CoV-2 RNAs was substantially<br \/>\nhigher. For retrotransposon-mediated integration events, the<br \/>\norientation of the reverse-transcribed SARS-CoV-2 RNA should<br \/>\nbe random with respect to the orientation of a host gene. Thus,<br \/>\nfor chimeric RNAs derived from integrated viral sequences,<br \/>\nabout half of the chimeric reads will link positive-strand host<br \/>\nRNA sequences to negative-strand viral sequences. In some<br \/>\npatient samples, negative-strand viral reads accounted for<br \/>\n40\u201350% of the total viral RNA sequences and a similar fraction<\/p>\n<p>of the chimeric reads contained negative-strand viral RNA se-<br \/>\nquences, suggesting that the majority if not all of the viral RNAs<\/p>\n<p>in these samples were derived from integrated viral sequences.<br \/>\nIt is important to note that, because we have detected only<br \/>\nsubgenomic sequences derived mainly from the 3\u2032 end of the<br \/>\nviral genome integrated into the DNA of the host cell, infectious<br \/>\nvirus cannot be produced from such integrated subgenomic<\/p>\n<p>SARS-CoV-2 sequences. The possibility that SARS-CoV-2 se-<br \/>\nquences can be integrated into the human genome and expressed<\/p>\n<p>in the form of chimeric RNAs raises several questions for future<br \/>\nstudies. Do integrated SARS-CoV-2 sequences express viral<br \/>\nantigens in patients and might these influence the clinical course<br \/>\nof the disease? The available clinical evidence suggests that, at<br \/>\nmost, only a small fraction of the cells in patient tissues express<\/p>\n<p>viral proteins at a level that is detectable by immunohisto-<br \/>\nchemistry. However, if a cell with an integrated and expressed<\/p>\n<p>SARS-CoV-2 sequences survives and presents a viral- or neo-<br \/>\nantigen after the infection is cleared, this might engender con-<br \/>\ntinuous stimulation of immunity without producing infectious<\/p>\n<p>virus and could trigger a protective response or conditions such<br \/>\nas autoimmunity as has been observed in some patients (62, 63).<br \/>\nThe presence of LCMV sequences integrated in the genomes of<br \/>\nacutely infected cells in mice led the authors to speculate that<br \/>\nexpression of such sequences \u201cpotentially represents a naturally<br \/>\nproduced form of DNA vaccine\u201d (30). It is not known how many<br \/>\nantigen-presenting cells are needed to elicit an antigen response,<br \/>\nbut derepressed LINE1 expression, induced by viral infection or<br \/>\nby exposure to cytokines (38\u201340), may stimulate SARS-CoV-2<br \/>\nintegration into the genome of infected cells in patients. More<br \/>\ngenerally, our results suggest that integration of viral DNA in<br \/>\nsomatic cells may represent a consequence of a natural infection<\/p>\n<p>that could play a role in the effects of other common disease-<br \/>\ncausing RNA viruses such as dengue, Zika, or influenza virus.<\/p>\n<p>Our results may also be relevant for current clinical trials of<br \/>\nantiviral therapies (64). If integration and expression of viral<br \/>\nRNA are fairly common, reliance on extremely sensitive PCR<br \/>\ntests to determine the effect of treatments on viral replication<br \/>\nand viral load may not always reflect the ability of the treatment<br \/>\nto fully suppress viral replication because the PCR assays may<br \/>\ndetect viral transcripts that derive from viral DNA sequences<br \/>\nthat have been stably integrated into the genome rather than<br \/>\ninfectious virus.<br \/>\nMaterials and Methods<br \/>\nCell Culture and Plasmid Transfection. HEK293T cells were obtained from ATCC<br \/>\n(CRL-3216) and cultured in DMEM supplemented with 10% heat-inactivated<br \/>\nFBS (HyClone; SH30396.03) and 2 mM L-glutamine (MP Biomedicals;<br \/>\nIC10180683) following ATCC\u2019s method. Calu3 cells were obtained from ATCC<br \/>\n(HTB-55) and cultured in EMEM (ATCC; 30-2003) supplemented with 10%<br \/>\nheat-inactivated FBS (HyClone; SH30396.03) following ATCC\u2019s method.<br \/>\nPlasmids for human LINE1 expression, pBS-L1PA1-CH-mneo (CMV-LINE-1),<br \/>\nwas a gift from Astrid Roy-Engel, Tulane University Health Sciences Center,<br \/>\nNew Orleans, LA (Addgene plasmid #51288 ; http:\/\/addgene.org\/51288;<br \/>\nRRID:Addgene_51288) (65); EF06R (5\u2032UTR-LINE-1) was a gift from Eline<br \/>\nLuning Prak, University of Pennsylvania, Philadelphia, PA (Addgene plasmid<br \/>\n#42940 ; http:\/\/addgene.org\/42940; RRID:Addgene_42940) (66). Transfection<br \/>\nwas done with Lipofectamine 3000 (Invitrogen; L3000001) following<br \/>\nmanufacturer\u2019s protocol.<\/p>\n<p>Zhang et al. PNAS | 7 of 10<br \/>\nReverse-transcribed SARS-CoV-2 RNA can integrate into the genome of cultured human<br \/>\ncells and can be expressed in patient-derived tissues<\/p>\n<p>https:\/\/doi.org\/10.1073\/pnas.2105968118<br \/>\nMEDICAL SCIENCES<\/p>\n<p>Downloaded at University of Patras on May 19, 2021<\/p>\n<p>SARS-CoV-2 Infection. SARS-CoV-2 USA-WA1\/2020 (GenBank: MN985325.1)<br \/>\nwas obtained from BEI Resources and expanded and tittered on Vero cells.<br \/>\nCells were infected in DMEM plus 2% FBS for 48 h using a multiplicity of<br \/>\ninfection (MOI) of 0.5 for infection of HEK293T cells and an MOI of 1 or 2 for<br \/>\nCalu3 cells. All sample processing and harvest with infectious virus were<br \/>\ndone in the BSL3 facility at the Ragon Institute.<br \/>\nNucleic Acids Extraction and PCR Assay. Cellular DNA extraction was done using<br \/>\na published method (31). For purification of genomic DNA, total cellular DNA<br \/>\nwas fractionated on a 0.4% (wt\/vol) agarose\/1\u00d7 TAE gel for 1.5 h with a 3 V\/cm<br \/>\nvoltage, with \u03bb DNA-HindIII Digest (NEB; N3012S) as size markers. Large<br \/>\nfragments (&gt;23.13 kb) were cut out, frozen in \u221280 \u00b0C, and then crushed with a<br \/>\npipette tip. Three volumes (vol\/wt) of high T-E buffer (10 mM Tris\u201310 mM<\/p>\n<p>EDTA, pH 8.0) were added, and then NaCl was added to give a final concen-<br \/>\ntration of 200 mM. The gel solution was heated at 70 \u00b0C for 15 min with<\/p>\n<p>constant mixing and then extracted with phenol:chloroform:isoamyl alcohol<\/p>\n<p>(25:24:1, vol\/vol\/vol) (Life Technologies; 15593031) and chloroform:isoamyl al-<br \/>\ncohol 24:1 (Sigma; C0549-1PT). DNA was precipitated by the addition of so-<br \/>\ndium acetate and isopropyl alcohol. For samples with low DNA concentration,<\/p>\n<p>glycogen (Life Technologies; 10814010) was added as a carrier to aid<br \/>\nprecipitation.<br \/>\nRNA extraction was done with RNeasy Plus Micro Kit (Qiagen; 74034)<br \/>\nfollowing manufacturer\u2019s protocol.<\/p>\n<p>To detect DNA copies of SARS-CoV-2 sequences, we chose four NC gene-<br \/>\ntargeting PCR primer sets that are used in COVID-19 tests [SI Appendix, Fig.<\/p>\n<p>S1A, primer source from World Health Organization (67), modified to match<br \/>\nthe genome version of NC_045512.2]. See SI Appendix, Table S4 for PCR<br \/>\nprimer sequences used in this study. PCR was done using AccuPrime Taq DNA<br \/>\nPolymerase, high fidelity (Life Technologies; 12346094). PCR products were<br \/>\nrun on 1% or 2% (wt\/vol) agarose gel to show amplifications.<br \/>\nNanopore DNA Sequencing and Analysis. A total of 1.6 \u03bcg of DNA extracted<br \/>\nfrom HEK293T cells transfected with the pBS-L1PA1-CH-mneo (CMV-LINE-1)<\/p>\n<p>plasmid and infected with SARS-CoV-2 was used to make a sequencing li-<br \/>\nbrary with the SQK-LSK109 kit (Oxford Nanopore Technologies) and se-<br \/>\nquenced on one R9 PromethION flowcell (FLO-PRO002) for 3 d and 5 min.<\/p>\n<p>The sequencing data were base-called using Guppy 4.0.11 (Oxford Nanopore<br \/>\nTechnologies) using the high-accuracy model.<\/p>\n<p>Nanopore reads were mapped using minimap2 (68) (version 2.15) with pa-<br \/>\nrameters \u201c-p 0.3 -ax map-ont\u201d and a fasta file containing the human genome<\/p>\n<p>sequence from ENSEMBL release 93 (ftp:\/\/ftp.ensembl.org\/pub\/release-93\/fasta\/<\/p>\n<p>homo_sapiens\/dna\/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz) concat-<br \/>\nenated to the SARS-CoV-2 sequence, GenBank ID: MN988713.1, \u201cSevere acute<\/p>\n<p>respiratory syndrome coronavirus 2 isolate SARS-CoV-2\/human\/USA\/IL-CDC-IL1\/<br \/>\n2020, complete genome.\u201d From the SAM file, we selected all the sequences that<\/p>\n<p>mapped to the viral genome and divided them into groups based on the hu-<br \/>\nman chromosomes they mapped to. We blasted the selected sequences, using<\/p>\n<p>blastn, against a BLAST database made with the human and virus sequences<br \/>\ndescribed above. We parsed the blast output into a text file containing one row<\/p>\n<p>per high-scoring segment pair (HSP) with a custom perl script. We further fil-<br \/>\ntered that file, for each sequence, by selecting all the viral HSPs and the top<\/p>\n<p>three human HSPs. We inspected those files visually to identify sequences<br \/>\ncontaining human\u2013viral\u2013human or human\u2013viral junctions. For a few sequences,<\/p>\n<p>longer than 30 kb, we inspected the top 15 human HSPs. Additionally, we vi-<br \/>\nsually inspected all the identified reads containing human and viral sequences<\/p>\n<p>by the University of California, Santa Cruz (UCSC) BLAT (69) tool. Due to errors<br \/>\nin Nanopore sequencing and\/or base-calling, artifactual \u201chybrid sequences\u201d<br \/>\nexist in a subset of these reads, sometimes with Watson and Crick strands from<br \/>\nthe same DNA fragment present in the same read. Therefore, we only focused<br \/>\non chimeric sequences showing clear human\u2013viral junctions and analyzed<br \/>\nknown LINE1-mediated retroposition features such as target-site duplications<br \/>\nand LINE1 endonuclease recognition sequences for evidence of integration.<\/p>\n<p>Tn5 Tagmentation-Mediated Integration Site Enrichment. We used a tagmentation-<br \/>\nbased method to enrich for viral integration sites (47, 48). Briefly, we used Tn5<\/p>\n<p>transposase (Diagenode; C01070010) to randomly tagment the cellular DNA<br \/>\nwith adapters (adapter A, the Illumina Nextera system). Tagmentation was done<br \/>\nusing 100 ng of DNA for 10 min at 55 \u00b0C, followed by stripping off the Tn5<br \/>\ntransposase from the DNA with SDS. We used a reverse primer targeting the<br \/>\nnear-5\u2032 end of SARS-CoV-2 NC gene (CCAAGACGCAGTATTATTGGGTAAA) or a<\/p>\n<p>forward primer targeting the near-3\u2032 end of SARS-CoV-2 genome (CTTGTGCAG-<br \/>\nAATGAATTCTCGTAACT) to linearly amplify (PCR0, 45 cycles) the tagmented DNA<\/p>\n<p>fragments containing viral sequences. We took the product of PCR0 and am-<br \/>\nplified the DNA fragments containing adapter and viral sequences (potential<\/p>\n<p>integration sites) using 15\u201320 cycles of PCR1, with a barcoded (i5) Nextera primer<\/p>\n<p>(AATGATACGGCGACCACCGAGATCTACACNNNNNNNNTCGTCGGCAGCGTC,<br \/>\nNNNNNNNN indicates the barcode) against the adapter sequence and a viral<br \/>\nprimer. The viral primer was designed to either target the near-5\u2032 end of<br \/>\nSARS-CoV-2 NC gene (GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCCGAC<br \/>\nGTTGTTTTGATCG, viral sequence underlined) or target the near-3\u2032 end of<br \/>\nSRAS-CoV-2 genome (GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCGCGGA<br \/>\nGTACGATCGAGTG, viral sequence underlined). The viral primer also contained<br \/>\nan adapter sequence for further PCR amplification. We amplified the PCR1<br \/>\nproduct by 15\u201320 cycles of PCR2, using a short primer (AATGATACGGCGACCACC<br \/>\nGA) against the i5 Nextera primer sequence and a barcoded (i7) Nextera primer<br \/>\n(CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTCTCGTGGGCTCGG,<br \/>\nNNNNNNNN indicates the barcode) against the adapter sequence introduced by<\/p>\n<p>the viral primer in PCR1. The final product of the PCR2 amplification was frac-<br \/>\ntionated on 1.5% agarose gel (Sage Science; HTC1510) with PippinHT (Sage<\/p>\n<p>Science; HTP0001) and 500- to 1,000-bp pieces were selected for Illumina paired-<br \/>\nend sequencing. All three PCR steps (PCR0\u2013PCR2) were done with KAPA HiFi<\/p>\n<p>HotStart ReadyMix (KAPA;KK2602).<br \/>\nIllumina DNA Sequencing and Analysis. We constructed libraries for HEK293T cell<\/p>\n<p>whole-genome sequencing using the Tn5-based Illumina DNA Prep kit (Illu-<br \/>\nmina; 20018704). The whole-genome sequencing libraries or the libraries from<\/p>\n<p>Tn5-mediated integration site enrichment after sizing (described above) were<\/p>\n<p>subjected to Illumina sequencing. qPCR was used to measure the concentra-<br \/>\ntions of each library using KAPA qPCR library quant kit according to the<\/p>\n<p>manufacturer\u2019s protocol. Libraries were then pooled at equimolar concentra-<br \/>\ntions, for each lane, based on qPCR concentrations. The pooled libraries were<\/p>\n<p>denatured using the Illumina protocol. The denatured libraries were loaded<br \/>\nonto an SP flowcell on an Illumina NovaSeq 6000 and run for 2 \u00d7 150 cycles<br \/>\naccording to the manufacturer\u2019s instructions. Fastq files were generated and<br \/>\ndemultiplexed with the bcl2fastq Conversion Software (Illumina).<br \/>\nTo identify human\u2013SARS-CoV-2 chimeric DNA reads, raw sequencing reads were<br \/>\naligned with STAR (70) (version 2.7.1a) to a human plus SARS-CoV-2 genome made<br \/>\nwith a fasta file containing the human genome sequence version hg38 with no<br \/>\nalternative chromosomes concatenated to the SARS-CoV-2 sequence from National<br \/>\nCenter for Biotechnology Information (NCBI) reference sequence NC_045512.2. The<br \/>\nfollowing STAR parameters were used to call chimeric reads: \u2013alignIntronMax 1<br \/>\n\\\u2013chimOutType Junctions SeparateSAMold WithinBAM HardClip<br \/>\n\\\u2013chimScoreJunctionNonGTAG 0 \\\u2013alignSJstitchMismatchNmax -1\u20131 -1\u20131<br \/>\n\\\u2013chimSegmentMin 25 \\\u2013chimJunctionOverhangMin 25 \\\u2013outSAMtype<br \/>\nBAM SortedByCoordinate. We extracted viral reads from the generated<br \/>\nBAM file by samtools (71) (version 1.11) using command: samtools view -b<\/p>\n<p>Aligned.sortedByCoord.out.bam NC_045512v2 &gt; NC_Aligned.sortedBy-<br \/>\nCoord.out.bam. We extracted human\u2013viral chimeric reads by using the read<\/p>\n<p>names from the STAR generated Chimeric.out.junction file to get the read<br \/>\nalignments from the STAR generated Chimeric.out.sam file by Picard (http:\/\/<\/p>\n<p>broadinstitute.github.io\/picard), using command: java -jar picard.jar Filter-<br \/>\nSamReads I = Chimeric.out.sam O = hv-Chimeric.out.sam READ_LIST_FILE = hv-<br \/>\nChimeric.out.junction.ids FILTER = includeReadList. We further confirmed each<\/p>\n<p>of the chimeric reads and filtered out any unconvincing reads (too short or<br \/>\naligned to multiple sites of the human genome) by visual inspection with<br \/>\nthe UCSC BLAT (69) tool. We also loaded the STAR generated Aligned.-<br \/>\nsortedByCoord.out.bam file or the NC_Aligned.sortedByCoord.out.bam file<br \/>\ncontaining extracted viral reads to the UCSC browser SARS-CoV-2 genome<br \/>\n(NC_045512.2) to search for additional chimeric reads that were missed by<br \/>\nthe STAR chimeric calling method. To generate genome coverage file, we<br \/>\nused the bamCoverage from the deepTools suite (72) (version 3.5.0) to convert<br \/>\nthe STAR generated Aligned.sortedByCoord.out.bam file to a bigwig file binned<br \/>\nat 10 bp, using command: bamCoverage -b Aligned.sortedByCoord.out.bam -o<br \/>\nAligned.sortedByCoord.out.bw\u2013binSize 10.<\/p>\n<p>RNA-Seq and Analysis. To identify human\u2013SARS-CoV-2 chimeric reads, pub-<br \/>\nlished RNA-seq data were downloaded from Gene Expression Omnibus<\/p>\n<p>(GEO) with the accession numbers GSE147507 (50), GSE153277 (51),<br \/>\nGSE156754 (52), GSE157852 (53), GSE153684 (54), and GSE154998 (55)<br \/>\n(summarized in SI Appendix, Fig. S5C). Raw sequencing reads were aligned with<br \/>\nSTAR (70) (version 2.7.1a) to human plus SARS-CoV-2 genome and transcriptome<br \/>\nmade with a fasta file containing the human genome sequence version hg38 with<br \/>\nno alternative chromosomes concatenated to the SARS-CoV-2 sequence from NCBI<\/p>\n<p>reference sequence NC_045512.2, and a gtf file containing the human gene an-<br \/>\nnotations from ENSEMBL version GRCh38.97 concatenated to the SARS-CoV-2<\/p>\n<p>gene annotations from NCBI (http:\/\/hgdownload.soe.ucsc.edu\/goldenPath\/wuh-<br \/>\nCor1\/bigZips\/genes\/). The following STAR parameters (56) were used to call chi-<br \/>\nmeric reads unless otherwise specified (SI Appendix, Fig. S5C):\u2013chimOutType<\/p>\n<p>Junctions SeparateSAMold WithinBAM HardClip \\\u2013chimScoreJunctionNonGTAG<br \/>\n8 of 10 | PNAS Zhang et al.<br \/>\nhttps:\/\/doi.org\/10.1073\/pnas.2105968118 Reverse-transcribed SARS-CoV-2 RNA can integrate into the genome of cultured human<br \/>\ncells and can be expressed in patient-derived tissues<\/p>\n<p>Downloaded at University of Patras on May 19, 2021<\/p>\n<p>0 \\\u2013alignSJstitchMismatchNmax -1\u20131 -1\u20131 \\\u2013chimSegmentMin 50<br \/>\n\\\u2013chimJunctionOverhangMin 50.<br \/>\nFor RNA-seq strandedness analysis, we generated RNA-seq data using RNA<br \/>\nfrom SARS-CoV-2\u2013infected Calu3 cells. Stranded libraries were constructed<br \/>\nwith the Kapa mRNA HyperPrep kit (Roche; 08098115702). Libraries were<br \/>\nqPCR\u2019ed using a KAPA qPCR library quant kit as per manufacturer\u2019s protocol.<br \/>\nLibraries were then pooled at equimolar concentrations, for each lane, based<\/p>\n<p>on qPCR concentrations. The pooled libraries were denatured using the Illu-<br \/>\nmina protocol. The denatured libraries were loaded onto an HiSeq 2500<\/p>\n<p>(Illumina) and sequenced for 120 cycles from one end of the fragments.<br \/>\nBasecalls were performed using Illumina offline basecaller (OLB) and then<br \/>\ndemultiplexed. We downloaded published RNA-seq data (stranded libraries)<br \/>\nfrom GEO with the accession numbers GSE147507 (50) (Calu3, SI Appendix,<br \/>\nTable S1), GSE148697 (58) (lung organoids, SI Appendix, Table S1), and<br \/>\nGSE150316 (60) (patient FFPE tissues, SI Appendix, Table S2). Raw RNA-seq<br \/>\nreads were aligned as described above, using parameters\u2013chimSegmentMin<br \/>\n30 \\\u2013chimJunctionOverhangMin 30 to call chimeric reads. We extracted total<br \/>\nviral reads and human\u2013viral chimeric reads as described above. We convert<br \/>\nthe viral read BAM files into Bed files using the bamToBed utility in BEDTools<br \/>\n(73). We then counted the total and stranded read numbers in the converted<br \/>\nBED files.<br \/>\nPublished single-cell RNA-seq data were downloaded from GEO with the<br \/>\naccession number GSE145926 (61) (patient BALF samples, SI Appendix, Table S3).<\/p>\n<p>For bulk analysis, duplicate reads with the same read1 (UMI) and read2 se-<br \/>\nquences in raw fastq files were removed by dedup_hash (https:\/\/github.com\/<\/p>\n<p>mvdbeek\/dedup_hash). Then the pool of read2 were aligned as described above,<br \/>\nusing parameters \u2013chimSegmentMin 30 \\\u2013chimJunctionOverhangMin 30 to call<\/p>\n<p>chimeric reads. Read strandedness was analyzed as described above. For single-<br \/>\ncell analysis, we generated a custom genome by Cell Ranger (10\u00d7 Genomics Cell<\/p>\n<p>Ranger 3.0.2) (74) mkref, using a fasta file containing the human genome<\/p>\n<p>sequence from ENSEMBL release 93 (ftp:\/\/ftp.ensembl.org\/pub\/release-93\/fasta\/<\/p>\n<p>homo_sapiens\/dna\/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz) concate-<br \/>\nnated to the SARS-CoV-2 sequence, GenBank ID: MN988713.1, and a gtf file<\/p>\n<p>containing human and viral annotations. Read mapping, assigning reads to cell<\/p>\n<p>barcodes and removing PCR duplicates were done with Cell Ranger (10\u00d7 Ge-<br \/>\nnomics Cell Ranger 4.0.0) (74) count, using the custom genome described above.<\/p>\n<p>We processed the counts using Seurat (version 3.2.2) (75). We removed cells that<br \/>\nhad less than 200 genes detected or more than 20% of transcript counts deriving<br \/>\nfrom the mitochondria. For each cell, we counted the number of reads mapping<br \/>\nto either the positive or negative viral strand.<br \/>\nData Availability. All data supporting the findings of this study are available<br \/>\nwithin the article and supporting information. All sequencing data generated<br \/>\nin this study have been deposited to the Sequence Read Archive, https:\/\/www.<\/p>\n<p>ncbi.nlm.nih.gov\/sra (accession no. PRJNA721333). All published data ana-<br \/>\nlyzed in this study are cited in this article with accession methods provided in<\/p>\n<p>Materials and Methods.<br \/>\nACKNOWLEDGMENTS. We thank members in the laboratories of R.J. and<br \/>\nR.A.Y. and other colleagues from Whitehead Institute and Massachusetts<br \/>\nInstitute of Technology (MIT) for helpful discussions and resources. We thank<br \/>\nThomas Volkert and staff from the Whitehead genomics core, and Stuart Levine<br \/>\nfrom the MIT\/Koch Institute BioMicro center for sequencing support. We thank<\/p>\n<p>Lorenzo Bombardelli for sharing protocol and advice for Tn5 tagmentation-<br \/>\nmediated integration enrichment sequencing. We thank Jerold Chun, Inder<\/p>\n<p>Verma, Joseph Ecker, and Daniel W. Bellott for discussion and suggestions. This<br \/>\nwork was supported by grants from the NIH to R.J. (1U19AI131135-01;<br \/>\n5R01MH104610-21) and by a generous gift from Dewpoint Therapeutics and<br \/>\nfrom Jim Stone. S.H.H. was supported by the Intramural Research Program of the<br \/>\nCenter for Cancer Research of the National Cancer Institute. Finally, we thank<br \/>\nNathans Island for inspiration.<\/p>\n<p>1. Korean Disease Control and Prevention Agency, Findings from investigation and analysis of<br \/>\nre-positive cases. https:\/\/www.kdca.go.kr\/board\/board.es?mid=a30402000000&amp;bid=0030.<br \/>\nAccessed 12 June 2020.<br \/>\n2. J. Bullard et al., Predicting infectious severe acute respiratory syndrome coronavirus 2<br \/>\nfrom diagnostic samples. Clin. Infect. Dis. 71, 2663\u20132666 (2020).<br \/>\n3. X. He et al., Temporal dynamics in viral shedding and transmissibility of COVID-19.<br \/>\nNat. Med. 26, 672\u2013675 (2020).<br \/>\n4. N. Li, X. Wang, T. Lv, Prolonged SARS-CoV-2 RNA shedding: Not a rare phenomenon.<br \/>\nJ. Med. Virol. 92, 2286\u20132287 (2020).<\/p>\n<p>5. M. J. Mina, R. Parker, D. B. Larremore, Rethinking COVID-19 test sensitivity\u2014a strat-<br \/>\negy for containment. N. Engl. J. Med. 383, e120 (2020).<\/p>\n<p>6. N. Sethuraman, S. S. Jeremiah, A. Ryo, Interpreting diagnostic tests for SARS-CoV-2.<br \/>\nJAMA 323, 2249\u20132251 (2020).<br \/>\n7. J.-R. Yang et al., Persistent viral RNA positivity during the recovery period of a patient<br \/>\nwith SARS-CoV-2 infection. J. Med. Virol. 92, 1681\u20131683 (2020).<br \/>\n8. J. An et al., Clinical characteristics of recovered COVID-19 patients with re-detectable<br \/>\npositive RNA test. Ann. Transl. Med. 8, 1084 (2020).<br \/>\n9. D. Chen et al., Recurrence of positive SARS-CoV-2 RNA in COVID-19: A case report. Int.<br \/>\nJ. Infect. Dis. 93, 297\u2013299 (2020).<br \/>\n10. L. Lan et al., Positive RT-PCR test results in patients recovered from COVID-19. JAMA<br \/>\n323, 1502\u20131503 (2020).<br \/>\n11. D. Loconsole et al., Recurrence of COVID-19 after recovery: A case report from Italy.<br \/>\nInfection 48, 965\u2013967 (2020).<\/p>\n<p>12. J. Lu et al., Clinical, immunological and virological characterization of COVID-19 pa-<br \/>\ntients that test re-positive for SARS-CoV-2 by RT-PCR. EBioMedicine 59, 102960 (2020).<\/p>\n<p>13. S. Luo, Y. Guo, X. Zhang, H. Xu, A follow-up study of recovered patients with COVID-<br \/>\n19 in Wuhan, China. Int. J. Infect. Dis. 99, 408\u2013409 (2020).<\/p>\n<p>14. G. Ye et al., Clinical characteristics of severe acute respiratory syndrome coronavirus 2<br \/>\nreactivation. J. Infect. 80, e14\u2013e17 (2020).<br \/>\n15. R. W\u00f6lfel et al., Virological assessment of hospitalized patients with COVID-2019.<br \/>\nNature 581, 465\u2013469 (2020).<br \/>\n16. M. Cevik et al., SARS-CoV-2, SARS-CoV, and MERS-CoV viral load dynamics, duration<br \/>\nof viral shedding, and infectiousness: A systematic review and meta-analysis. Lancet<br \/>\nMicrobe 2, e13\u2013e22 (2021).<br \/>\n17. A. L. Rasmussen, S. V. Popescu, SARS-CoV-2 transmission without symptoms. Science<br \/>\n371, 1206\u20131207 (2021).<br \/>\n18. K. K. To et al., COVID-19 re-infection by a phylogenetically distinct SARS-coronavirus-2<br \/>\nstrain confirmed by whole genome sequencing. Clin. Infect. Dis., 10.1093\/cid\/ciaa1275<br \/>\n(2020).<br \/>\n19. J. Huang et al., Recurrence of SARS-CoV-2 PCR positivity in COVID-19 patients: A<br \/>\nsingle center experience and potential implications. medRxiv [Preprint] (2020). https:\/\/<br \/>\ndoi.org\/10.1101\/2020.05.06.20089573 (Accessed 6 June 2020).<br \/>\n20. B. Yuan et al., Recurrence of positive SARS-CoV-2 viral RNA in recovered COVID-19<br \/>\npatients during medical isolation observation. Sci. Rep. 10, 11887 (2020).<\/p>\n<p>21. P. V\u2019Kovski, A. Kratzel, S. Steiner, H. Stalder, V. Thiel, Coronavirus biology and rep-<br \/>\nlication: Implications for SARS-CoV-2. Nat. Rev. Microbiol. 19, 155\u2013170 (2021).<\/p>\n<p>22. L. Alanagreh, F. Alzoughool, M. Atoum, The human coronavirus disease COVID-19: Its<\/p>\n<p>origin, characteristics, and insights into potential drugs and its mechanisms. Patho-<br \/>\ngens 9, 331 (2020).<\/p>\n<p>23. E. de Wit, N. van Doremalen, D. Falzarano, V. J. Munster, SARS and MERS: Recent<br \/>\ninsights into emerging coronaviruses. Nat. Rev. Microbiol. 14, 523\u2013534 (2016).<\/p>\n<p>24. A. R. Fehr, S. Perlman, Coronaviruses: An overview of their replication and patho-<br \/>\ngenesis. Methods Mol. Biol. 1282, 1\u201323 (2015).<\/p>\n<p>25. V. A. Belyi, A. J. Levine, A. M. Skalka, Unexpected inheritance: Multiple integrations<br \/>\nof ancient bornavirus and ebolavirus\/marburgvirus sequences in vertebrate genomes.<br \/>\nPLoS Pathog. 6, e1001030 (2010).<\/p>\n<p>26. M. Horie et al., Endogenous non-retroviral RNA virus elements in mammalian ge-<br \/>\nnomes. Nature 463, 84\u201387 (2010).<\/p>\n<p>27. M. Horie, K. Tomonaga, Non-retroviral fossils in vertebrate genomes. Viruses 3,<br \/>\n1836\u20131848 (2011).<\/p>\n<p>28. A. Shimizu et al., Characterisation of cytoplasmic DNA complementary to non-<br \/>\nretroviral RNA viruses in human cells. Sci. Rep. 4, 5074 (2014).<\/p>\n<p>29. M. B. Geuking et al., Recombination of retrotransposon and exogenous RNA virus<br \/>\nresults in nonretroviral cDNA integration. Science 323, 393\u2013396 (2009).<br \/>\n30. P. Klenerman, H. Hengartner, R. M. Zinkernagel, A non-retroviral RNA virus persists in<br \/>\nDNA form. Nature 390, 298\u2013301 (1997).<br \/>\n31. M. H. Lee et al., Somatic APP gene recombination in Alzheimer\u2019s disease and normal<br \/>\nneurons. Nature 563, 639\u2013645 (2018).<br \/>\n32. C. R. Huang, K. H. Burns, J. D. Boeke, Active transposition in genomes. Annu. Rev.<br \/>\nGenet. 46, 651\u2013675 (2012).<br \/>\n33. H. H. Kazazian Jr, J. V. Moran, Mobile DNA in health and disease. N. Engl. J. Med. 377,<br \/>\n361\u2013370 (2017).<br \/>\n34. J. M. Coffin, H. Fan, The discovery of reverse transcriptase. Annu. Rev. Virol. 3, 29\u201351<br \/>\n(2016).<\/p>\n<p>35. M. De Cecco et al., L1 drives IFN in senescent cells and promotes age-associated in-<br \/>\nflammation. Nature 566, 73\u201378 (2019).<\/p>\n<p>36. B. Rodriguez-Martin et al.; PCAWG Structural Variation Working Group; PCAWG<br \/>\nConsortium, Pan-cancer analysis of whole genomes identifies driver rearrangements<br \/>\npromoted by LINE-1 retrotransposition. Nat. Genet. 52, 306\u2013319 (2020).<br \/>\n37. E. C. Scott et al., A hot L1 retrotransposon evades somatic repression and initiates<br \/>\nhuman colorectal cancer. Genome Res. 26, 745\u2013755 (2016).<\/p>\n<p>38. R. B. Jones et al., LINE-1 retrotransposable element DNA accumulates in HIV-1-in-<br \/>\nfected cells. J. Virol. 87, 13307\u201313320 (2013).<\/p>\n<p>39. M. G. Macchietto, R. A. Langlois, S. S. Shen, Virus-induced transposable element ex-<br \/>\npression up-regulation in human and mouse host cells. Life Sci. Alliance 3,<\/p>\n<p>e201900536 (2020).<br \/>\n40. Y. Yin, X. Z. Liu, X. He, L. Q. Zhou, Exogenous coronavirus interacts with endogenous<br \/>\nretrotransposon in human cells. Front. Cell. Infect. Microbiol. 11, 609160 (2021).<br \/>\n41. H. Kaessmann, N. Vinckenbosch, M. Long, RNA-based gene duplication: Mechanistic<br \/>\nand evolutionary insights. Nat. Rev. Genet. 10, 19\u201331 (2009).<\/p>\n<p>42. S. Lanciano, G. Cristofari, Measuring and interpreting transposable element expres-<br \/>\nsion. Nat. Rev. Genet. 21, 721\u2013736 (2020).<\/p>\n<p>43. T. A. Morrish et al., DNA repair mediated by endonuclease-independent LINE-1 ret-<br \/>\nrotransposition. Nat. Genet. 31, 159\u2013165 (2002).<\/p>\n<p>44. J. C. Venter et al., The sequence of the human genome. Science 291, 1304\u20131351<br \/>\n(2001).<\/p>\n<p>Zhang et al. PNAS | 9 of 10<br \/>\nReverse-transcribed SARS-CoV-2 RNA can integrate into the genome of cultured human<br \/>\ncells and can be expressed in patient-derived tissues<\/p>\n<p>https:\/\/doi.org\/10.1073\/pnas.2105968118<br \/>\nMEDICAL SCIENCES<\/p>\n<p>Downloaded at University of Patras on May 19, 2021<\/p>\n<p>45. T. Sultana et al., The landscape of L1 retrotransposons in the human genome is<br \/>\nshaped by pre-insertion sequence biases and post-insertion selection. Mol. Cell 74,<br \/>\n555\u2013570.e7 (2019).<\/p>\n<p>46. D. A. Flasch et al., Genome-wide de novo L1 retrotransposition connects endonu-<br \/>\nclease activity with replication. Cell 177, 837\u2013851.e28 (2019).<\/p>\n<p>47. D. L. Stern, Tagmentation-based mapping (TagMap) of mobile DNA genomic inser-<br \/>\ntion sites. bioRxiv [Preprint] (2017). https:\/\/doi.org\/10.1101\/037762 (Accessed 16 Feb-<br \/>\nruary 2021).<\/p>\n<p>48. S. Picelli et al., Tn5 transposase and tagmentation procedures for massively scaled<br \/>\nsequencing projects. Genome Res. 24, 2033\u20132040 (2014).<br \/>\n49. L. Zhang et al., SARS-CoV-2 RNA reverse-transcribed and integrated into the human<\/p>\n<p>genome. bioRxiv [Preprint] (2020). https:\/\/doi.org\/10.1101\/2020.12.12.422516 (Ac-<br \/>\ncessed 16 March 2021).<\/p>\n<p>50. D. Blanco-Melo et al., Imbalanced host response to SARS-CoV-2 drives development of<br \/>\nCOVID-19. Cell 181, 1036\u20131045.e9 (2020).<br \/>\n51. J. Huang et al., SARS-CoV-2 infection of pluripotent stem cell-derived human lung<br \/>\nalveolar type 2 cells elicits a rapid epithelial-intrinsic inflammatory response. Cell Stem<br \/>\nCell 27, 962\u2013973.e7 (2020).<br \/>\n52. J. A. Perez-Bermejo et al., SARS-CoV-2 infection of human iPSC-derived cardiac cells<br \/>\nreflects cytopathic features in hearts of patients with COVID-19. Sci. Transl. Med.,<br \/>\n10.1126\/scitranslmed.abf7872 (2021).<br \/>\n53. F. Jacob et al., Human pluripotent stem cell-derived neural cells and brain organoids<br \/>\nreveal SARS-CoV-2 neurotropism predominates in choroid plexus epithelium. Cell<br \/>\nStem Cell 27, 937\u2013950.e9 (2020).<br \/>\n54. G. G. Giobbe et al., SARS-CoV-2 infection and replication in human fetal and pediatric<br \/>\ngastric organoids. bioRxiv [Preprint] (2020). https:\/\/doi.org\/10.1101\/2020.06.24.167049<br \/>\n(Accessed 28 October 2020).<br \/>\n55. S. E. Gill et al., Transcriptional profiling of leukocytes in critically ill COVID19 patients:<br \/>\nImplications for interferon response and coagulation. Intensive Care Med. Exp. 8, 75<br \/>\n(2020).<br \/>\n56. D. Kim et al., The architecture of SARS-CoV-2 transcriptome. Cell 181, 914\u2013921.e10<br \/>\n(2020).<br \/>\n57. B. Yan et al., Host-virus chimeric events in SARS-CoV2 infected cells are infrequent<br \/>\nand artifactual. bioRxiv [Preprint] (2021). https:\/\/doi.org\/10.1101\/2021.02.17.431704<br \/>\n(Accessed 20 February 2021).<br \/>\n58. Y. Han et al., Identification of candidate COVID-19 therapeutics using hPSC-derived<br \/>\nlung organoids. bioRxiv [Preprint] (2020). https:\/\/doi.org\/10.1101\/2020.05.05.079095<br \/>\n(Accessed 10 March 2021).<\/p>\n<p>59. S. Alexandersen, A. Chamings, T. R. Bhatta, SARS-CoV-2 genomic and subgenomic<br \/>\nRNAs in diagnostic samples are not an indicator of active replication. Nat. Commun.<br \/>\n11, 6059 (2020).<br \/>\n60. N. Desai et al., Temporal and spatial heterogeneity of host response to SARS-CoV-2<br \/>\npulmonary infection. Nat. Commun. 11, 6319 (2020).<br \/>\n61. M. Liao et al., Single-cell landscape of bronchoalveolar immune cells in patients with<br \/>\nCOVID-19. Nat. Med. 26, 842\u2013844 (2020).<br \/>\n62. M. C. Dalakas, Guillain-Barr\u00e9 syndrome: The first documented COVID-19-triggered<br \/>\nautoimmune neurologic disease: More to come with myositis in the offing. Neurol.<br \/>\nNeuroimmunol. Neuroinflamm. 7, e781 (2020).<\/p>\n<p>63. S. Pfeuffer et al., Autoimmunity complicating SARS-CoV-2 infection in selective IgA-<br \/>\ndeficiency. Neurol. Neuroimmunol. Neuroinflamm. 7, e881 (2020).<\/p>\n<p>64. A. Baum et al., REGN-COV2 antibodies prevent and treat SARS-CoV-2 infection in<br \/>\nrhesus macaques and hamsters. Science 370, 1110\u20131115 (2020).<br \/>\n65. B. J. Wagstaff, M. Barnerssoi, A. M. Roy-Engel, Evolutionary conservation of the<br \/>\nfunctional modularity of primate and murine LINE-1 elements. PLoS One 6, e19672<br \/>\n(2011).<br \/>\n66. E. A. Farkash, G. D. Kao, S. R. Horman, E. T. Prak, Gamma radiation increases<br \/>\nendonuclease-dependent L1 retrotransposition in a cultured cell assay. Nucleic Acids<br \/>\nRes. 34, 1196\u20131204 (2006).<br \/>\n67. WHO, World Health Organization (WHO) resource of in-house\u2013developed molecular<br \/>\nassays. https:\/\/www.who.int\/docs\/default-source\/coronaviruse\/whoinhouseassays.pdf?<br \/>\nsfvrsn=de3a76aa_2. Accessed 6 June 2020.<br \/>\n68. H. Li, Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 34,<br \/>\n3094\u20133100 (2018).<br \/>\n69. W. J. Kent, BLAT\u2013the BLAST-like alignment tool. Genome Res. 12, 656\u2013664 (2002).<br \/>\n70. A. Dobin et al., STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15\u201321<br \/>\n(2013).<br \/>\n71. H. Li et al.; 1000 Genome Project Data Processing Subgroup, The sequence alignment\/<br \/>\nmap format and SAMtools. Bioinformatics 25, 2078\u20132079 (2009).<br \/>\n72. F. Ram\u00edrez et al., deepTools2: A next generation web server for deep-sequencing data<br \/>\nanalysis. Nucleic Acids Res. 44, W160\u2013W165 (2016).<br \/>\n73. A. R. Quinlan, I. M. Hall, BEDTools: A flexible suite of utilities for comparing genomic<br \/>\nfeatures. Bioinformatics 26, 841\u2013842 (2010).<br \/>\n74. G. X. Zheng et al., Massively parallel digital transcriptional profiling of single cells.<br \/>\nNat. Commun. 8, 14049 (2017).<br \/>\n75. T. Stuart et al., Comprehensive integration of single-cell data Cell 177,1888\u20131902.e21 (2019).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Liguo Zhanga, Alexsia Richardsa, M. Inmaculada Barrasaa\ue840, Stephen H. Hughesb\ue840, Richard A. Younga,c, and Rudolf Jaenischa,c,1 a Whitehead Institute for Biomedical Research, Cambridge, MA 02142; b HIV Dynamics and Replication Program, Center for Cancer Research, National Cancer Institute, Frederick, MD 21702; and c Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142 Contributed by &hellip; <\/p>\n<p><a class=\"more-link btn\" href=\"https:\/\/evaggelatos.com\/?p=31922\">\u03a3\u03c5\u03bd\u03ad\u03c7\u03b5\u03b9\u03b1 \u03b1\u03bd\u03ac\u03b3\u03bd\u03c9\u03c3\u03b7\u03c2<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-31922","post","type-post","status-publish","format-standard","hentry","category-85","item-wrap"],"_links":{"self":[{"href":"https:\/\/evaggelatos.com\/index.php?rest_route=\/wp\/v2\/posts\/31922","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/evaggelatos.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/evaggelatos.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/evaggelatos.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/evaggelatos.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=31922"}],"version-history":[{"count":1,"href":"https:\/\/evaggelatos.com\/index.php?rest_route=\/wp\/v2\/posts\/31922\/revisions"}],"predecessor-version":[{"id":31923,"href":"https:\/\/evaggelatos.com\/index.php?rest_route=\/wp\/v2\/posts\/31922\/revisions\/31923"}],"wp:attachment":[{"href":"https:\/\/evaggelatos.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=31922"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/evaggelatos.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=31922"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/evaggelatos.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=31922"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}