Coupling of structure-specific in vivo chemical modification to next-generation sequencing is transforming RNA secondary structure studies in living cells. The dominant strategy for detecting in vivo chemical modifications uses reverse transcriptase truncation products, which introduce biases and necessitate population-average assessments of RNA structure. Here we present dimethyl sulfate (DMS) mutational profiling with sequencing (DMS-MaPseq), which encodes DMS modifications as mismatches using a thermostable group II intron reverse transcriptase. DMS-MaPseq yields a high signal-to-noise ratio, can report multiple structural features per molecule, and allows both genome-wide studies and focused in vivo investigations of even low-abundance RNAs. We apply DMS-MaPseq for the first analysis of RNA structure within an animal tissue and to identify a functional structure involved in noncanonical translation initiation. Additionally, we use DMS-MaPseq to compare the in vivo structure of pre-mRNAs with their mature isoforms. These applications illustrate DMS-MaPseq's capacity to dramatically expand in vivo analysis of RNA structure.
RNA silencing is a conserved eukaryotic gene expression regulatory mechanism mediated by small RNAs. In Caenorhabditis elegans, the accumulation of a distinct class of siRNAs synthesized by an RNA-dependent RNA polymerase (RdRP) requires the PIR-1 phosphatase. However, the function of PIR-1 in RNAi has remained unclear. Since mammals lack an analogous siRNA biogenesis pathway, an RNA silencing role for the mammalian PIR-1 homolog (dual specificity phosphatase 11 [DUSP11]) was unexpected. Here, we show that the RNA triphosphatase activity of DUSP11 promotes the RNA silencing activity of viral microRNAs (miRNAs) derived from RNA polymerase III (RNAP III) transcribed precursors. Our results demonstrate that DUSP11 converts the 5' triphosphate of miRNA precursors to a 5' monophosphate, promoting loading of derivative 5p miRNAs into Argonaute proteins via a Dicer-coupled 5' monophosphate-dependent strand selection mechanism. This mechanistic insight supports a likely shared function for PIR-1 in C. elegans Furthermore, we show that DUSP11 modulates the 5' end phosphate group and/or steady-state level of several host RNAP III transcripts, including vault RNAs and Alu transcripts. This study shows that steady-state levels of select noncoding RNAs are regulated by DUSP11 and defines a previously unknown portal for small RNA-mediated silencing in mammals, revealing that DUSP11-dependent RNA silencing activities are shared among diverse metazoans.
The mitochondrial tyrosyl-tRNA synthetases (mtTyrRSs) of Pezizomycotina fungi, a subphylum that includes many pathogenic species, are bifunctional proteins that both charge mitochondrial tRNA(Tyr) and act as splicing cofactors for autocatalytic group I introns. Previous studies showed that one of these proteins, Neurospora crassa CYT-18, binds group I introns by using both its N-terminal catalytic and C-terminal anticodon binding domains and that the catalytic domain uses a newly evolved group I intron binding surface that includes an N-terminal extension and two small insertions (insertions 1 and 2) with distinctive features not found in non-splicing mtTyrRSs. To explore how this RNA binding surface diverged to accommodate different group I introns in other Pezizomycotina fungi, we determined x-ray crystal structures of C-terminally truncated Aspergillus nidulans and Coccidioides posadasii mtTyrRSs. Comparisons with previous N. crassa CYT-18 structures and a structural model of the Aspergillus fumigatus mtTyrRS showed that the overall topology of the group I intron binding surface is conserved but with variations in key intron binding regions, particularly the Pezizomycotina-specific insertions. These insertions, which arose by expansion of flexible termini or internal loops, show greater variation in structure and amino acids potentially involved in group I intron binding than do neighboring protein core regions, which also function in intron binding but may be more constrained to preserve mtTyrRS activity. Our results suggest a structural basis for the intron specificity of different Pezizomycotina mtTyrRSs, highlight flexible terminal and loop regions as major sites for enzyme diversification, and identify targets for therapeutic intervention by disrupting an essential RNA-protein interaction in pathogenic fungi.
Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from <1 ng of plasma RNA in <5 h. TGIRT-seq of RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling.
CRISPR systems mediate adaptive immunity in diverse prokaryotes. CRISPR-associated Cas1 and Cas2 proteins have been shown to enable adaptation to new threats in type I and II CRISPR systems by the acquisition of short segments of DNA (spacers) from invasive elements. In several type III CRISPR systems, Cas1 is naturally fused to a reverse transcriptase (RT). In the marine bacterium Marinomonas mediterranea (MMB-1), we showed that a RT-Cas1 fusion protein enables the acquisition of RNA spacers in vivo in a RT-dependent manner. In vitro, the MMB-1 RT-Cas1 and Cas2 proteins catalyze the ligation of RNA segments into the CRISPR array, which is followed by reverse transcription. These observations outline a host-mediated mechanism for reverse information flow from RNA to DNA.
Next-generation RNA sequencing (RNA-seq) has revolutionized our ability to analyze transcriptomes. Current RNA-seq methods are highly reproducible, but each has biases resulting from different modes of RNA sample preparation, reverse transcription, and adapter addition, leading to variability between methods. Moreover, the transcriptome cannot be profiled comprehensively because highly structured RNAs, such as tRNAs and snoRNAs, are refractory to conventional RNA-seq methods. Recently, we developed a new method for strand-specific RNA-seq using thermostable group II intron reverse transcriptases (TGIRTs). TGIRT enzymes have higher processivity and fidelity than conventional retroviral reverse transcriptases plus a novel template-switching activity that enables RNA-seq adapter addition during cDNA synthesis without using RNA ligase. Here, we obtained TGIRT-seq data sets for well-characterized human RNA reference samples and compared them to previous data sets obtained for these RNAs by the Illumina TruSeq v2 and v3 methods. We find that TGIRT-seq recapitulates the relative abundance of human transcripts and RNA spike-ins in ribo-depleted, fragmented RNA samples comparably to non-strand-specific TruSeq v2 and better than strand-specific TruSeq v3. Moreover, TGIRT-seq is more strand specific than TruSeq v3 and eliminates sampling biases from random hexamer priming, which are inherent to TruSeq. The TGIRT-seq data sets also show more uniform 5' to 3' gene coverage and identify more splice junctions, particularly near the 5' ends of mRNAs, than do the TruSeq data sets. Finally, TGIRT-seq enables the simultaneous profiling of mRNAs and lncRNAs in the same RNA-seq experiment as structured small ncRNAs, including tRNAs, which are essentially absent with TruSeq.
Despite its biological importance, tRNA has not been adequately sequenced by standard methods because of its abundant post-transcriptional modifications and stable structure, which interfere with cDNA synthesis. We achieved efficient and quantitative tRNA sequencing in HEK293T cells by using engineered demethylases to remove base methylations and a highly processive thermostable group II intron reverse transcriptase to overcome these obstacles. Our method, DM-tRNA-seq, should be applicable to investigations of tRNA in all organisms.
In Eukarya, stalled translation induces 40S dissociation and recruitment of the ribosome quality control complex (RQC) to the 60S subunit, which mediates nascent chain degradation. Here we report cryo–electron microscopy structures revealing that the RQC components Rqc2p (YPL009C/Tae2) and Ltn1p (YMR247C/Rkr1) bind to the 60S subunit at sites exposed after 40Sdissociation, placing the Ltn1p RING (Really Interesting New Gene) domain near the exit channel and Rqc2p over the P-site transfer RNA (tRNA). We further demonstrate that Rqc2p recruits alanine- and threonine-charged tRNA to the A site and directs the elongation of nascent chains independently of mRNA or 40S subunits. Our work uncovers an unexpected mechanism of protein synthesis, in which a protein—not an mRNA—determines tRNA recruitment and the tagging of nascent chains with carboxy-terminal Ala and Thr extensions (“CAT tails”).
This review focuses on recent developments in our understanding of group II intron function, the relationships of these introns to retrotransposons and spliceosomes, and how their common features have informed thinking about bacterial group II introns as key elements in eukaryotic evolution. Reverse transcriptase-mediated and host factor-aided intron retrohoming pathways are considered along with retrotransposition mechanisms to novel sites in bacteria, where group II introns are thought to have originated. DNA target recognition and movement by target-primed reverse transcription infer an evolutionary relationship among group II introns, non-LTR retrotransposons, such as LINE elements, and telomerase. Additionally, group II introns are almost certainly the progenitors of spliceosomal introns. Their profound similarities include splicing chemistry extending to RNA catalysis, reaction stereochemistry, and the position of two divalent metals that perform catalysis at the RNA active site. There are also sequence and structural similarities between group II introns and the spliceosome's small nuclear RNAs (snRNAs) and between a highly conserved core spliceosomal protein Prp8 and a group II intron-like reverse transcriptase. It has been proposed that group II introns entered eukaryotes during bacterial endosymbiosis or bacterial-archaeal fusion, proliferated within the nuclear genome, necessitating evolution of the nuclear envelope, and fragmented giving rise to spliceosomal introns. Thus, these bacterial self-splicing mobile elements have fundamentally impacted the composition of extant eukaryotic genomes, including the human genome, most of which is derived from close relatives of mobile group II introns.
Mobile bacterial group II introns are evolutionary ancestors of spliceosomal introns and retroelements in eukaryotes. They consist of an autocatalytic intron RNA (a "ribozyme") and an intron-encoded reverse transcriptase, which function together to promote intron integration into new DNA sites by a mechanism termed "retrohoming". Although mobile group II introns splice and retrohome efficiently in bacteria, all examined thus far function inefficiently in eukaryotes, where their ribozyme activity is limited by low Mg2+ concentrations, and intron-containing transcripts are subject to nonsense-mediated decay (NMD) and translational repression. Here, by using RNA polymerase II to express a humanized group II intron reverse transcriptase and T7 RNA polymerase to express intron transcripts resistant to NMD, we find that simply supplementing culture medium with Mg2+ induces the Lactococcus lactis Ll.LtrB intron to retrohome into plasmid and chromosomal sites, the latter at frequencies up to ~0.1%, in viable HEK-293 cells. Surprisingly, under these conditions, the Ll.LtrB intron reverse transcriptase is required for retrohoming but not for RNA splicing as in bacteria. By using a genetic assay for in vivo selections combined with deep sequencing, we identified intron RNA mutations that enhance retrohoming in human cells, but <4-fold and not without added Mg2+. Further, the selected mutations lie outside the ribozyme catalytic core, which appears not readily modified to function efficiently at low Mg2+ concentrations. Our results reveal differences between group II intron retrohoming in human cells and bacteria and suggest constraints on critical nucleotide residues of the ribozyme core that limit how much group II intron retrohoming in eukaryotes can be enhanced. These findings have implications for group II intron use for gene targeting in eukaryotes and suggest how differences in intracellular Mg2+ concentrations between bacteria and eukarya may have impacted the evolution of introns and gene expression mechanisms.
The Neurospora crassa mitochondrial tyrosyl-tRNA synthetase (mtTyrRS; CYT-18 protein) evolved a new function as a group I intron splicing factor by acquiring the ability to bind group I intron RNAs and stabilize their catalytically active RNA structure. Previous studies showed: (i) CYT-18 binds group I introns by using both its N-terminal catalytic domain and flexibly attached C-terminal anticodon-binding domain (CTD); and (ii) the catalytic domain binds group I introns specifically via multiple structural adaptations that occurred during or after the divergence of Peziomycotina and Saccharomycotina. However, the function of the CTD and how it contributed to the evolution of splicing activity have been unclear. Here, small angle X-ray scattering analysis of CYT-18 shows that both CTDs of the homodimeric protein extend outward from the catalytic domain, but move inward to bind opposite ends of a group I intron RNA. Biochemical assays show that the isolated CTD of CYT-18 binds RNAs non-specifically, possibly contributing to its interaction with the structurally different ends of the intron RNA. Finally, we find that the yeast mtTyrRS, which diverged from Pezizomycotina fungal mtTyrRSs prior to the evolution of splicing activity, binds group I intron and other RNAs non-specifically via its CTD, but lacks further adaptations needed for group I intron splicing. Our results suggest a scenario of constructive neutral (i.e., pre-adaptive) evolution in which an initial non-specific interaction between the CTD of an ancestral fungal mtTyrRS and a self-splicing group I intron was “fixed” by an intron RNA mutation that resulted in protein-dependent splicing. Once fixed, this interaction could be elaborated by further adaptive mutations in both the catalytic domain and CTD that enabled specific binding of group I introns. Our results highlight a role for non-specific RNA binding in the evolution of RNA-binding proteins.
Interferon (IFN) responses play key roles in cellular defense against pathogens. Highly expressed IFN-induced proteins with tetratricopeptide repeats (IFITs) are proposed to function as RNA binding proteins, but the RNA binding and discrimination specificities of IFIT proteins remain unclear. Here we show that human IFIT5 has comparable affinity for RNAs with diverse phosphate-containing 5′-ends, excluding the higher eukaryotic mRNA cap. Systematic mutagenesis revealed that sequence substitutions in IFIT5 can alternatively expand or introduce bias in protein binding to RNAs with 5′ monophosphate, triphosphate, cap0 (triphosphate-bridged N7-methylguanosine), or cap1 (cap0 with RNA 2′-O-methylation). We defined the breadth of cellular ligands for IFIT5 by using a thermostable group II intron reverse transcriptase for RNA sequencing. We show that IFIT5 binds precursor and processed tRNAs, as well as other RNA polymerase III transcripts. Our findings establish the RNA recognition specificity of the human innate immune response protein IFIT5.
Mobile group II introns are bacterial retrotransposons that combine the activities of an autocatalytic intron RNA (a ribozyme) and an intron-encoded reverse transcriptase to insert site-specifically into DNA. They recognize DNA target sites largely by base pairing of sequences within the intron RNA and achieve high DNA target specificity by using the ribozyme active site to couple correct base pairing to RNA-catalyzed intron integration. Algorithms have been developed to program the DNA target site specificity of several mobile group II introns, allowing them to be made into ‘targetrons.’ Targetrons function for gene targeting in a wide variety of bacteria and typically integrate at efficiencies high enough to be screened easily by colony PCR, without the need for selectable markers. Targetrons have found wide application in microbiological research, enabling gene targeting and genetic engineering of bacteria that had been intractable to other methods. Recently, a thermostable targetron has been developed for use in bacterial thermophiles, and new methods have been developed for using targetrons to position recombinase recognition sites, enabling large-scale genome-editing operations, such as deletions, inversions, insertions, and ‘cut-and-pastes’ (that is, translocation of large DNA segments), in a wide range of bacteria at high efficiency. Using targetrons in eukaryotes presents challenges due to the difficulties of nuclear localization and sub-optimal magnesium concentrations, although supplementation with magnesium can increase integration efficiency, and directed evolution is being employed to overcome these barriers. Finally, spurred by new methods for expressing group II intron reverse transcriptases that yield large amounts of highly active protein, thermostable group II intron reverse transcriptases from bacterial thermophiles are being used as research tools for a variety of applications, including qRT-PCR and next-generation RNA sequencing (RNA-seq). The high processivity and fidelity of group II intron reverse transcriptases along with their novel template-switching activity, which can directly link RNA-seq adaptor sequences to cDNAs during reverse transcription, open new approaches for RNA-seq and the identification and profiling of non-coding RNAs, with potentially wide applications in research and biotechnology.
Clostridium thermocellum is a thermophilic anaerobic bacterium that degrades cellulose by using a highly effective cellulosome, a macromolecular complex consisting of multiple cellulose degrading enzymes organized and attached to the cell surface by non-catalytic scaffoldins. However, due largely to lack of efficient methods for genetic manipulation of C. thermocellum, it is still unclear how the different scaffoldins and their functional modules contribute to cellulose hydrolysis.
How different helicase families with a conserved catalytic ‘helicase core’ evolved to function on varied RNA and DNA substrates by diverse mechanisms remains unclear. In this study, we used Mss116, a yeast DEAD-box protein that utilizes ATP to locally unwind dsRNA, to investigate helicase specificity and mechanism. Our results define the molecular basis for the substrate specificity of a DEAD-box protein. Additionally, they show that Mss116 has ambiguous substrate-binding properties and interacts with all four NTPs and both RNA and DNA. The efficiency of unwinding correlates with the stability of the ‘closed-state’ helicase core, a complex with nucleotide and nucleic acid that forms as duplexes are unwound. Crystal structures reveal that core stability is modulated by family-specific interactions that favor certain substrates. This suggests how present-day helicases diversified from an ancestral core with broad specificity by retaining core closure as a common catalytic mechanism while optimizing substrate-binding interactions for different cellular functions.
Mobile group II introns are bacterial retrotransposons thought to be evolutionary ancestors of spliceosomal introns and retroelements in eukaryotes. They consist of a catalytically active intron RNA ("ribozyme") and an intron-encoded reverse transcriptase, which function together to promote RNA splicing and intron mobility via reverse splicing of the intron RNA into new DNA sites ("retrohoming"). Although group II introns are active in bacteria, their natural hosts, they function inefficiently in eukaryotes, where lower free Mg(2+) concentrations decrease their ribozyme activity and constitute a natural barrier to group II intron proliferation within nuclear genomes. Here, we show that retrohoming of the Ll.LtrB group II intron is strongly inhibited in an Escherichia coli mutant lacking the Mg(2+) transporter MgtA, and we use this system to select mutations in catalytic core domain V (DV) that partially rescue retrohoming at low Mg(2+) concentrations. We thus identified mutations in the distal stem of DV that increase retrohoming efficiency in the MgtA mutant up to 22-fold. Biochemical assays of splicing and reverse splicing indicate that the mutations increase the fraction of intron RNA that folds into an active conformation at low Mg(2+) concentrations, and terbium-cleavage assays suggest that this increase is due to enhanced Mg(2+) binding to the distal stem of DV. Our findings indicate that DV is involved in a critical Mg(2+)-dependent RNA folding step in group II introns and demonstrate the feasibility of selecting intron variants that function more efficiently at low Mg(2+) concentrations, with implications for evolution and potential applications in gene targeting.
Mobile group II introns encode reverse transcriptases (RTs) that function in intron mobility ("retrohoming") by a process that requires reverse transcription of a highly structured, 2-2.5-kb intron RNA with high processivity and fidelity. Although the latter properties are potentially useful for applications in cDNA synthesis and next-generation RNA sequencing (RNA-seq), group II intron RTs have been difficult to purify free of the intron RNA, and their utility as research tools has not been investigated systematically. Here, we developed general methods for the high-level expression and purification of group II intron-encoded RTs as fusion proteins with a rigidly linked, noncleavable solubility tag, and we applied them to group II intron RTs from bacterial thermophiles. We thus obtained thermostable group II intron RT fusion proteins that have higher processivity, fidelity, and thermostability than retroviral RTs, synthesize cDNAs at temperatures up to 81°C, and have significant advantages for qRT-PCR, capillary electrophoresis for RNA-structure mapping, and next-generation RNA sequencing. Further, we find that group II intron RTs differ from the retroviral enzymes in template switching with minimal base-pairing to the 3' ends of new RNA templates, making it possible to efficiently and seamlessly link adaptors containing PCR-primer binding sites to cDNA ends without an RNA ligase step. This novel template-switching activity enables facile and less biased cloning of nonpolyadenylated RNAs, such as miRNAs or protein-bound RNA fragments. Our findings demonstrate novel biochemical activities and inherent advantages of group II intron RTs for research, biotechnological, and diagnostic methods, with potentially wide applications.
DEAD-box proteins are superfamily 2 helicases that function in all aspects of RNA metabolism. They employ ATP binding and hydrolysis to generate tight, yet regulated RNA binding, which is used to unwind short RNA helices non-processively and promote structural transitions of RNA and RNA-protein substrates. In the last few years, substantial progress has been made toward a detailed, quantitative understanding of the structural and biochemical properties of DEAD-box proteins. Concurrently, progress has been made toward a physical understanding of the RNA rearrangements and folding steps that are accelerated by DEAD-box proteins in model systems. Here, we review the recent progress on both of these fronts, focusing on the mitochondrial DEAD-box proteins Mss116 and CYT-19 and their mechanisms in promoting the splicing of group I and group II introns.
Mobile group II introns retrohome by an RNP-based mechanism in which the intron RNA reverse splices into a DNA site and is reverse transcribed by the associated intron-encoded protein. The resulting intron cDNA is then integrated into the genome by cellular mechanisms that have remained unclear. Here, we used an Escherichia coli genetic screen and Taqman qPCR assay that mitigate indirect effects to identify host factors that function in retrohoming. We then analyzed mutants identified in these and previous genetic screens by using a new biochemical assay that combines group II intron RNPs with cellular extracts to reconstitute the complete retrohoming reaction in vitro. The genetic and biochemical analyses indicate a retrohoming pathway involving degradation of the intron RNA template by a host RNase H and second-strand DNA synthesis by the host replicative DNA polymerase. Our results reveal ATP-dependent steps in both cDNA and second-strand synthesis and a surprising role for replication restart proteins in initiating second-strand synthesis in the absence of DNA replication. We also find an unsuspected requirement for host factors in initiating reverse transcription and a new RNA degradation pathway that suppresses retrohoming. Key features of the retrohoming mechanism may be used by human LINEs and other non-LTR-retrotransposons, which are related evolutionarily to mobile group II introns. Our findings highlight a new role for replication restart proteins, which function not only to repair DNA damage caused by mobile element insertion, but have also been co-opted to become an integral part of the group II intron retrohoming mechanism.
Efficient bacterial genetic engineering approaches with broad-host applicability are rare. We combine two systems, mobile group II introns ('targetrons') and Cre/lox, which function efficiently in many different organisms, into a versatile platform we call GETR (Genome Editing via Targetrons and Recombinases). The introns deliver lox sites to specific genomic loci, enabling genomic manipulations. Efficiency is enhanced by adding flexibility to the RNA hairpins formed by the lox sites. We use the system for insertions, deletions, inversions, and one-step cut-and-paste operations. We demonstrate insertion of a 12-kb polyketide synthase operon into the lacZ gene of Escherichia coli, multiple simultaneous and sequential deletions of up to 120 kb in E. coli and Staphylococcus aureus, inversions of up to 1.2 Mb in E. coli and Bacillus subtilis, and one-step cut-and-pastes for translocating 120 kb of genomic sequence to a site 1.5 Mb away. We also demonstrate the simultaneous delivery of lox sites into multiple loci in the Shewanella oneidensis genome. No selectable markers need to be placed in the genome, and the efficiency of Cre-mediated manipulations typically approaches 100%.