Comparing the abundance of one RNA molecule to another is crucial for understanding cellular functions but most sequencing techniques can target only specific subsets of RNA. In this study, we used a new fragmented ribodepleted TGIRT sequencing method that uses a thermostable group II intron reverse transcriptase (TGIRT) to generate a portrait of the human transcriptome depicting the quantitative relationship of all classes of nonribosomal RNA longer than 60 nt. Comparison between different sequencing methods indicated that FRT is more accurate in ranking both mRNA and noncoding RNA than viral reverse transcriptase-based sequencing methods, even those that specifically target these species. Measurements of RNA abundance in different cell lines using this method correlate with biochemical estimates, confirming tRNA as the most abundant nonribosomal RNA biotype. However, the single most abundant transcript is 7SL RNA, a component of the signal recognition particle. Structured noncoding RNAs (sncRNAs) associated with the same biological process are expressed at similar levels, with the exception of RNAs with multiple functions like U1 snRNA. In general, sncRNAs forming RNPs are hundreds to thousands of times more abundant than their mRNA counterparts. Surprisingly, only 50 sncRNA genes produce half of the non-rRNA transcripts detected in two different cell lines. Together the results indicate that the human transcriptome is dominated by a small number of highly expressed sncRNAs specializing in functions related to translation and splicing.
The thermostable Geobacillus stearothermophilus GsI-IIC intron is among the few bacterial group II introns found to proliferate to high copy number in its host genome. Here, we developed a bacterial genetic assay for retrohoming and biochemical assays for protein-dependent and self-splicing of GsI-IIC. We found that GsI-IIC, like other group IIC introns, retrohomes into sites having a 5'-exon DNA hairpin, typically from a bacterial transcription terminator, followed by short intron-binding sequences (IBSs) recognized by base pairing of exon-binding sequences (EBSs) in the intron RNA. Intron RNA insertion occurs preferentially but not exclusively into the parental lagging strand at DNA replication forks, using a nascent lagging strand DNA as a primer for reverse transcription. In vivo mobility assays, selections, and mutagenesis indicated that a variety of GC-rich DNA hairpins of 7-19 bp with continuous base pairs or internal elbow regions support efficient intron mobility and identified a critically recognized nucleotide (T-5) between the hairpin and IBS1, a feature not reported previously for group IIC introns. Neither the hairpin nor T-5 is required for intron excision or lariat formation during RNA splicing, but the 5'-exon sequence can affect the efficiency of exon ligation. Structural modeling suggests that the 5'-exon DNA hairpin and T-5 bind to the thumb and DNA-binding domains of GsI-IIC reverse transcriptase. This mode of DNA target site recognition enables the intron to proliferate to high copy number by recognizing numerous transcription terminators and then finding the best match for the EBS/IBS interactions within a short distance downstream.
Alignment-free RNA quantification tools have significantly increased the speed of RNA-seq analysis. However, it is unclear whether these state-of-the-art RNA-seq analysis pipelines can quantify small RNAs as accurately as they do with long RNAs in the context of total RNA quantification.
We comprehensively tested and compared four RNA-seq pipelines for accuracy of gene quantification and fold-change estimation. We used a novel total RNA benchmarking dataset in which small non-coding RNAs are highly represented along with other long RNAs. The four RNA-seq pipelines consisted of two commonly-used alignment-free pipelines and two variants of alignment-based pipelines. We found that all pipelines showed high accuracy for quantifying the expression of long and highly-abundant genes. However, alignment-free pipelines showed systematically poorer performance in quantifying lowly-abundant and small RNAs.
We have shown that alignment-free and traditional alignment-based quantification methods perform similarly for common gene targets, such as protein-coding genes. However, we have identified a potential pitfall in analyzing and quantifying lowly-expressed genes and small RNAs with alignment-free pipelines, especially when these small RNAs contain biological variations.
Bacterial group II intron reverse transcriptases (RTs) function in both intron mobility and RNA splicing and are evolutionary predecessors of retrotransposon, telomerase, and retroviral RTs as well as the spliceosomal protein Prp8 in eukaryotes. Here we determined a crystal structure of a full-length thermostable group II intron RT in complex with an RNA template-DNA primer duplex and incoming deoxynucleotide triphosphate (dNTP) at 3.0-A˚ resolution. We find that the binding of template-primer and key aspects of the RT active site are surprisingly different from retroviral RTs but remarkably similar to viral RNA-dependent RNA polymerases. The structure reveals a host of features not seen previously in RTs that may contribute to distinctive biochemical properties of group II intron RTs, and it provides a prototype for many related bacterial and eukaryotic non-LTR retroelement RTs. It also reveals how protein structural features used for reverse transcription evolved to promote the splicing of both group II and spliceosomal introns.
Cellular accumulation of repetitive RNA occurs in several dominantly-inherited genetic disorders. Expanded CUG, CCUG or GGGGCC repeats are expressed in myotonic dystrophy type 1 (DM1), myotonic dystrophy type 2 (DM2), or familial amyotrophic lateral sclerosis, respectively. Expanded repeat RNAs (ER-RNAs) exert a toxic gain-of-function and are prime therapeutic targets in these diseases. However, efforts to quantify ER-RNA levels or monitor knockdown are confounded by stable structure and heterogeneity of the ER-RNA tract and background signal from non-expanded repeats. Here, we used a thermostable group II intron reverse transcriptase (TGIRT-III) to convert ER-RNA to cDNA, followed by quantification on slot blots. We found that TGIRT-III was capable of reverse transcription (RTn) on enzymatically synthesized ER-RNAs. By using conditions that limit cDNA synthesis from off-target sequences, we observed hybridization signals on cDNA slot blots from DM1 and DM2 muscle samples but not from healthy controls. In transgenic mouse models of DM1 the cDNA slot blots accurately reflected the differences of ER-RNA expression across different transgenic lines, and showed therapeutic reductions in skeletal and cardiac muscle, accompanied by improvements of the DM1-associated splicing defects. TGIRT-III was also active on CCCCGG- and GGGGCC-repeats, suggesting that ER-RNA analysis is feasible for several repeat expansion disorders.
RNA is secreted from cells enclosed within extracellular vesicles (EVs). Defining the RNA composition of EVs is challenging due to their coisolation with contaminants, lack of knowledge of the mechanisms of RNA sorting into EVs, and limitations of conventional RNA-sequencing methods. Here we present our observations using thermostable group II intron reverse transcriptase sequencing (TGIRT-seq) to characterize the RNA extracted from HEK293T cell EVs isolated by flotation gradient ultracentrifugation and from exosomes containing the tetraspanin CD63 further purified from the gradient fractions by immunoisolation. We found that EV-associated transcripts are dominated by full-length, mature transfer RNAs (tRNAs) and other small noncoding RNAs (ncRNAs) encapsulated within vesicles. A substantial proportion of the reads mapping to protein-coding genes, long ncRNAs, and antisense RNAs were due to DNA contamination on the surface of vesicles. Nevertheless, sequences mapping to spliced mRNAs were identified within HEK293T cell EVs and exosomes, among the most abundant being transcripts containing a 5′ terminal oligopyrimidine (5′ TOP) motif. Our results indicate that the RNA-binding protein YBX1, which is required for the sorting of selected miRNAs into exosomes, plays a role in the sorting of highly abundant small ncRNA species, including tRNAs, Y RNAs, and Vault RNAs. Finally, we obtained evidence for an EV-specific tRNA modification, perhaps indicating a role for posttranscriptional modification in the sorting of some RNA species into EVs. Our results suggest that EVs and exosomes could play a role in the purging and intercellular transfer of excess free RNAs, including full-length tRNAs and other small ncRNAs.
Cas1 integrase is the key enzyme of the clustered regularly interspaced short palindromic repeat (CRISPR)-Cas adaptation module that mediates acquisition of spacers derived from foreign DNA by CRISPR arrays. In diverse bacteria, the cas1 gene is fused (or adjacent) to a gene encoding a reverse transcriptase (RT) related to group II intron RTs. An RT-Cas1 fusion protein has been recently shown to enable acquisition of CRISPR spacers from RNA. Phylogenetic analysis of the CRISPRassociated RTs demonstrates monophyly of the RT-Cas1 fusion, and coevolution of the RT and Cas1 domains. Nearly all such RTs are present within type III CRISPR-Cas loci, but their phylogeny does not parallel the CRISPR-Cas type classification, indicating that RT-Cas1 is an autonomous functional module that is disseminated by horizontal gene transfer and can function with diverse type III systems. To compare the sequence pools sampled by RT-Cas1-associated and RT-lacking CRISPR-Cas systems, we obtained samples of a commercially grown cyanobacterium—Arthrospira platensis. Sequencing of the CRISPR arrays uncovered a highly diverse population of spacers. Spacer diversity was particularly striking for the RT-Cas1-containing type III-B system, where no saturation was evident even with millions of sequences analyzed. In contrast, analysis of the RT-lacking type III-D system yielded a highly diverse pool but reached a point where fewer novel spacers were recovered as sequencing depth was increased. Matches could be identified for a small fraction of the non-RT-Cas1- associated spacers, and for only a single RT-Cas1-associated spacer. Thus, the principal source(s) of the spacers, particularly the hypervariable spacer repertoire of the RT-associated arrays, remains unknown.
High-throughput single-stranded DNA sequencing (ssDNA-seq) of cell-free DNA from plasma and other bodily fluids is a powerful method for non-invasive prenatal testing, and diagnosis of cancers and other diseases. Here, we developed a facile ssDNA-seq method, which exploits a novel template-switching activity of thermostable group II intron reverse transcriptases (TGIRTs) for DNA-seq library construction. This activity enables TGIRT enzymes to initiate DNA synthesis directly at the 3′ end of a DNA strand while simultaneously attaching a DNA-seq adapter without end repair, tailing, or ligation. Initial experiments using this method to sequence E. coli genomic DNA showed that the TGIRT enzyme has surprisingly robust DNA polymerase activity. Further experiments showed that TGIRT-seq of plasma DNA from a healthy individual enables analysis of nucleosome positioning, transcription factor-binding sites, DNA methylation sites, and tissues-of-origin comparably to established methods, but with a simpler workflow that captures precise DNA ends.
Coupling of structure-specific in vivo chemical modification to next-generation sequencing is transforming RNA secondary structure studies in living cells. The dominant strategy for detecting in vivo chemical modifications uses reverse transcriptase truncation products, which introduce biases and necessitate population-average assessments of RNA structure. Here we present dimethyl sulfate (DMS) mutational profiling with sequencing (DMS-MaPseq), which encodes DMS modifications as mismatches using a thermostable group II intron reverse transcriptase. DMS-MaPseq yields a high signal-to-noise ratio, can report multiple structural features per molecule, and allows both genome-wide studies and focused in vivo investigations of even low-abundance RNAs. We apply DMS-MaPseq for the first analysis of RNA structure within an animal tissue and to identify a functional structure involved in noncanonical translation initiation. Additionally, we use DMS-MaPseq to compare the in vivo structure of pre-mRNAs with their mature isoforms. These applications illustrate DMS-MaPseq's capacity to dramatically expand in vivo analysis of RNA structure.
RNA silencing is a conserved eukaryotic gene expression regulatory mechanism mediated by small RNAs. In Caenorhabditis elegans, the accumulation of a distinct class of siRNAs synthesized by an RNA-dependent RNA polymerase (RdRP) requires the PIR-1 phosphatase. However, the function of PIR-1 in RNAi has remained unclear. Since mammals lack an analogous siRNA biogenesis pathway, an RNA silencing role for the mammalian PIR-1 homolog (dual specificity phosphatase 11 [DUSP11]) was unexpected. Here, we show that the RNA triphosphatase activity of DUSP11 promotes the RNA silencing activity of viral microRNAs (miRNAs) derived from RNA polymerase III (RNAP III) transcribed precursors. Our results demonstrate that DUSP11 converts the 5' triphosphate of miRNA precursors to a 5' monophosphate, promoting loading of derivative 5p miRNAs into Argonaute proteins via a Dicer-coupled 5' monophosphate-dependent strand selection mechanism. This mechanistic insight supports a likely shared function for PIR-1 in C. elegans Furthermore, we show that DUSP11 modulates the 5' end phosphate group and/or steady-state level of several host RNAP III transcripts, including vault RNAs and Alu transcripts. This study shows that steady-state levels of select noncoding RNAs are regulated by DUSP11 and defines a previously unknown portal for small RNA-mediated silencing in mammals, revealing that DUSP11-dependent RNA silencing activities are shared among diverse metazoans.
The mitochondrial tyrosyl-tRNA synthetases (mtTyrRSs) of Pezizomycotina fungi, a subphylum that includes many pathogenic species, are bifunctional proteins that both charge mitochondrial tRNA(Tyr) and act as splicing cofactors for autocatalytic group I introns. Previous studies showed that one of these proteins, Neurospora crassa CYT-18, binds group I introns by using both its N-terminal catalytic and C-terminal anticodon binding domains and that the catalytic domain uses a newly evolved group I intron binding surface that includes an N-terminal extension and two small insertions (insertions 1 and 2) with distinctive features not found in non-splicing mtTyrRSs. To explore how this RNA binding surface diverged to accommodate different group I introns in other Pezizomycotina fungi, we determined x-ray crystal structures of C-terminally truncated Aspergillus nidulans and Coccidioides posadasii mtTyrRSs. Comparisons with previous N. crassa CYT-18 structures and a structural model of the Aspergillus fumigatus mtTyrRS showed that the overall topology of the group I intron binding surface is conserved but with variations in key intron binding regions, particularly the Pezizomycotina-specific insertions. These insertions, which arose by expansion of flexible termini or internal loops, show greater variation in structure and amino acids potentially involved in group I intron binding than do neighboring protein core regions, which also function in intron binding but may be more constrained to preserve mtTyrRS activity. Our results suggest a structural basis for the intron specificity of different Pezizomycotina mtTyrRSs, highlight flexible terminal and loop regions as major sites for enzyme diversification, and identify targets for therapeutic intervention by disrupting an essential RNA-protein interaction in pathogenic fungi.
Next-generation RNA sequencing (RNA-seq) has revolutionized our ability to analyze transcriptomes. Current RNA-seq methods are highly reproducible, but each has biases resulting from different modes of RNA sample preparation, reverse transcription, and adapter addition, leading to variability between methods. Moreover, the transcriptome cannot be profiled comprehensively because highly structured RNAs, such as tRNAs and snoRNAs, are refractory to conventional RNA-seq methods. Recently, we developed a new method for strand-specific RNA-seq using thermostable group II intron reverse transcriptases (TGIRTs). TGIRT enzymes have higher processivity and fidelity than conventional retroviral reverse transcriptases plus a novel template-switching activity that enables RNA-seq adapter addition during cDNA synthesis without using RNA ligase. Here, we obtained TGIRT-seq data sets for well-characterized human RNA reference samples and compared them to previous data sets obtained for these RNAs by the Illumina TruSeq v2 and v3 methods. We find that TGIRT-seq recapitulates the relative abundance of human transcripts and RNA spike-ins in ribo-depleted, fragmented RNA samples comparably to non-strand-specific TruSeq v2 and better than strand-specific TruSeq v3. Moreover, TGIRT-seq is more strand specific than TruSeq v3 and eliminates sampling biases from random hexamer priming, which are inherent to TruSeq. The TGIRT-seq data sets also show more uniform 5' to 3' gene coverage and identify more splice junctions, particularly near the 5' ends of mRNAs, than do the TruSeq data sets. Finally, TGIRT-seq enables the simultaneous profiling of mRNAs and lncRNAs in the same RNA-seq experiment as structured small ncRNAs, including tRNAs, which are essentially absent with TruSeq.
Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from <1 ng of plasma RNA in <5 h. TGIRT-seq of RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling.
CRISPR systems mediate adaptive immunity in diverse prokaryotes. CRISPR-associated Cas1 and Cas2 proteins have been shown to enable adaptation to new threats in type I and II CRISPR systems by the acquisition of short segments of DNA (spacers) from invasive elements. In several type III CRISPR systems, Cas1 is naturally fused to a reverse transcriptase (RT). In the marine bacterium Marinomonas mediterranea (MMB-1), we showed that a RT-Cas1 fusion protein enables the acquisition of RNA spacers in vivo in a RT-dependent manner. In vitro, the MMB-1 RT-Cas1 and Cas2 proteins catalyze the ligation of RNA segments into the CRISPR array, which is followed by reverse transcription. These observations outline a host-mediated mechanism for reverse information flow from RNA to DNA.
Despite its biological importance, tRNA has not been adequately sequenced by standard methods because of its abundant post-transcriptional modifications and stable structure, which interfere with cDNA synthesis. We achieved efficient and quantitative tRNA sequencing in HEK293T cells by using engineered demethylases to remove base methylations and a highly processive thermostable group II intron reverse transcriptase to overcome these obstacles. Our method, DM-tRNA-seq, should be applicable to investigations of tRNA in all organisms.
Mobile bacterial group II introns are evolutionary ancestors of spliceosomal introns and retroelements in eukaryotes. They consist of an autocatalytic intron RNA (a "ribozyme") and an intron-encoded reverse transcriptase, which function together to promote intron integration into new DNA sites by a mechanism termed "retrohoming". Although mobile group II introns splice and retrohome efficiently in bacteria, all examined thus far function inefficiently in eukaryotes, where their ribozyme activity is limited by low Mg2+ concentrations, and intron-containing transcripts are subject to nonsense-mediated decay (NMD) and translational repression. Here, by using RNA polymerase II to express a humanized group II intron reverse transcriptase and T7 RNA polymerase to express intron transcripts resistant to NMD, we find that simply supplementing culture medium with Mg2+ induces the Lactococcus lactis Ll.LtrB intron to retrohome into plasmid and chromosomal sites, the latter at frequencies up to ~0.1%, in viable HEK-293 cells. Surprisingly, under these conditions, the Ll.LtrB intron reverse transcriptase is required for retrohoming but not for RNA splicing as in bacteria. By using a genetic assay for in vivo selections combined with deep sequencing, we identified intron RNA mutations that enhance retrohoming in human cells, but <4-fold and not without added Mg2+. Further, the selected mutations lie outside the ribozyme catalytic core, which appears not readily modified to function efficiently at low Mg2+ concentrations. Our results reveal differences between group II intron retrohoming in human cells and bacteria and suggest constraints on critical nucleotide residues of the ribozyme core that limit how much group II intron retrohoming in eukaryotes can be enhanced. These findings have implications for group II intron use for gene targeting in eukaryotes and suggest how differences in intracellular Mg2+ concentrations between bacteria and eukarya may have impacted the evolution of introns and gene expression mechanisms.
This review focuses on recent developments in our understanding of group II intron function, the relationships of these introns to retrotransposons and spliceosomes, and how their common features have informed thinking about bacterial group II introns as key elements in eukaryotic evolution. Reverse transcriptase-mediated and host factor-aided intron retrohoming pathways are considered along with retrotransposition mechanisms to novel sites in bacteria, where group II introns are thought to have originated. DNA target recognition and movement by target-primed reverse transcription infer an evolutionary relationship among group II introns, non-LTR retrotransposons, such as LINE elements, and telomerase. Additionally, group II introns are almost certainly the progenitors of spliceosomal introns. Their profound similarities include splicing chemistry extending to RNA catalysis, reaction stereochemistry, and the position of two divalent metals that perform catalysis at the RNA active site. There are also sequence and structural similarities between group II introns and the spliceosome's small nuclear RNAs (snRNAs) and between a highly conserved core spliceosomal protein Prp8 and a group II intron-like reverse transcriptase. It has been proposed that group II introns entered eukaryotes during bacterial endosymbiosis or bacterial-archaeal fusion, proliferated within the nuclear genome, necessitating evolution of the nuclear envelope, and fragmented giving rise to spliceosomal introns. Thus, these bacterial self-splicing mobile elements have fundamentally impacted the composition of extant eukaryotic genomes, including the human genome, most of which is derived from close relatives of mobile group II introns.
In Eukarya, stalled translation induces 40S dissociation and recruitment of the ribosome quality control complex (RQC) to the 60S subunit, which mediates nascent chain degradation. Here we report cryo–electron microscopy structures revealing that the RQC components Rqc2p (YPL009C/Tae2) and Ltn1p (YMR247C/Rkr1) bind to the 60S subunit at sites exposed after 40Sdissociation, placing the Ltn1p RING (Really Interesting New Gene) domain near the exit channel and Rqc2p over the P-site transfer RNA (tRNA). We further demonstrate that Rqc2p recruits alanine- and threonine-charged tRNA to the A site and directs the elongation of nascent chains independently of mRNA or 40S subunits. Our work uncovers an unexpected mechanism of protein synthesis, in which a protein—not an mRNA—determines tRNA recruitment and the tagging of nascent chains with carboxy-terminal Ala and Thr extensions (“CAT tails”).
Mobile group II introns are bacterial retrotransposons that combine the activities of an autocatalytic intron RNA (a ribozyme) and an intron-encoded reverse transcriptase to insert site-specifically into DNA. They recognize DNA target sites largely by base pairing of sequences within the intron RNA and achieve high DNA target specificity by using the ribozyme active site to couple correct base pairing to RNA-catalyzed intron integration. Algorithms have been developed to program the DNA target site specificity of several mobile group II introns, allowing them to be made into ‘targetrons.’ Targetrons function for gene targeting in a wide variety of bacteria and typically integrate at efficiencies high enough to be screened easily by colony PCR, without the need for selectable markers. Targetrons have found wide application in microbiological research, enabling gene targeting and genetic engineering of bacteria that had been intractable to other methods. Recently, a thermostable targetron has been developed for use in bacterial thermophiles, and new methods have been developed for using targetrons to position recombinase recognition sites, enabling large-scale genome-editing operations, such as deletions, inversions, insertions, and ‘cut-and-pastes’ (that is, translocation of large DNA segments), in a wide range of bacteria at high efficiency. Using targetrons in eukaryotes presents challenges due to the difficulties of nuclear localization and sub-optimal magnesium concentrations, although supplementation with magnesium can increase integration efficiency, and directed evolution is being employed to overcome these barriers. Finally, spurred by new methods for expressing group II intron reverse transcriptases that yield large amounts of highly active protein, thermostable group II intron reverse transcriptases from bacterial thermophiles are being used as research tools for a variety of applications, including qRT-PCR and next-generation RNA sequencing (RNA-seq). The high processivity and fidelity of group II intron reverse transcriptases along with their novel template-switching activity, which can directly link RNA-seq adaptor sequences to cDNAs during reverse transcription, open new approaches for RNA-seq and the identification and profiling of non-coding RNAs, with potentially wide applications in research and biotechnology.
Interferon (IFN) responses play key roles in cellular defense against pathogens. Highly expressed IFN-induced proteins with tetratricopeptide repeats (IFITs) are proposed to function as RNA binding proteins, but the RNA binding and discrimination specificities of IFIT proteins remain unclear. Here we show that human IFIT5 has comparable affinity for RNAs with diverse phosphate-containing 5′-ends, excluding the higher eukaryotic mRNA cap. Systematic mutagenesis revealed that sequence substitutions in IFIT5 can alternatively expand or introduce bias in protein binding to RNAs with 5′ monophosphate, triphosphate, cap0 (triphosphate-bridged N7-methylguanosine), or cap1 (cap0 with RNA 2′-O-methylation). We defined the breadth of cellular ligands for IFIT5 by using a thermostable group II intron reverse transcriptase for RNA sequencing. We show that IFIT5 binds precursor and processed tRNAs, as well as other RNA polymerase III transcripts. Our findings establish the RNA recognition specificity of the human innate immune response protein IFIT5.