Publications

2023
Wylie D, Wang X, Yao J, Xu H, Ferrick-Kiddie EA, Iwase T, Krishnamurthy S, Ueno NT, Lambowitz AM. Inflammatory breast cancer biomarker identification by simultaneous TGIRT-seq profiling of coding and non-coding RNAs in tumors and blood. medRxiv [Internet]. Publisher's VersionAbstract
Inflammatory breast cancer (IBC) is the most aggressive and lethal breast cancer subtype, but lags in biomarker identification. Here, we used an improved Thermostable Group II Intron Reverse Transcriptase RNA sequencing (TGIRT-seq) method to simultaneously profile coding and non-coding RNAs from tumors, PBMCs, and plasma of IBC and non-IBC patients and healthy donors. Besides RNAs from known IBC-relevant genes, we identified hundreds of other overexpressed coding and non-coding RNAs (p≤0.001) in IBC tumors and PBMCs, including higher proportions with elevated intron-exon depth ratios (IDRs), likely reflecting enhanced transcription resulting in accumulation of intronic RNAs. As a consequence, differentially represented protein-coding gene RNAs in IBC plasma were largely intron RNA fragments, whereas those in healthy donor and non-IBC plasma were largely fragmented mRNAs. Potential IBC biomarkers in plasma included T-cell receptor pre-mRNA fragments traced to IBC tumors and PBMCs; intron RNA fragments correlated with high IDR genes; and LINE-1 and other retroelement RNAs that we found globally up-regulated in IBC and preferentially enriched in plasma. Our findings provide new insights into IBC and demonstrate advantages of broadly analyzing transcriptomes for biomarker identification. The RNA-seq and data analysis methods developed for this study may be broadly applicable to other diseases.
Cui H, Diedrich JK, Wu DC, Lim JJ, Nottingham RM, Moresco JJ, Yates JR, Blencowe BJ, Lambowitz AM, Schimmel P. Arg-tRNA synthetase links inflammatory metabolism to RNA splicing and nuclear trafficking via SRRM2. Nature Cell Biology [Internet]. 25 :592–603. Publisher's VersionAbstract
Cells respond to perturbations such as inflammation by sensing changes in metabolite levels. Especially prominent is arginine, which has known connections to the inflammatory response. Aminoacyl-tRNA synthetases, enzymes that catalyse the first step of protein synthesis, can also mediate cell signalling. Here we show that depletion of arginine during inflammation decreased levels of nuclear-localized arginyl-tRNA synthetase (ArgRS). Surprisingly, we found that nuclear ArgRS interacts and co-localizes with serine/arginine repetitive matrix protein 2 (SRRM2), a spliceosomal and nuclear speckle protein, and that decreased levels of nuclear ArgRS correlated with changes in condensate-like nuclear trafficking of SRRM2 and splice-site usage in certain genes. These splice-site usage changes cumulated in the synthesis of different protein isoforms that altered cellular metabolism and peptide presentation to immune cells. Our findings uncover a mechanism whereby an aminoacyl-tRNA synthetase cognate to a key amino acid that is metabolically controlled during inflammation modulates the splicing machinery.
cui_2023.pdf
2022
Park SK, Mohr G, Yao J, Russell R, Lambowitz AM. Group II-like Reverse Transcriptases Function in Double Strand Break Repair. Cell [Internet]. 185 (20) :3671-3688. Publisher's VersionAbstract
Bacteria encode free-standing reverse transcriptases (RTs) of unknown function that are closely related to group II intron-encoded RTs. Here, we found that a Pseudomonas aeruginosa group II intron-like RT (G2L4 RT) with YIDD instead of YADD at its active site functions in DNA repair in its native host and when transferred into Escherichia coli. G2L4 RT has biochemical activities strikingly similar to those of human DNA repair polymerase q and uses them for translesion DNA synthesis and double-strand break repair (DSBR) via microhomology-mediated end-joining (MMEJ) in vitro and in vivo. We also found that a group II intron RT can function similarly to G2L4 RT in DNA repair, with reciprocal substitutions at the active site showing an I residue favors MMEJ and an A residue favors primer extension in both enzymes. The DNA repair functions of these enzymes utilize conserved structural features of non-LTR-retroelement RTs, including human LINE-1 and other eukaryotic non-LTR-retrotransposon RTs, suggesting such enzymes may have an inherent ability to function in DSBR in a wide range of organisms.
park_cell_2022.pdf
Wylie DC, Wang X, Yao J, Xu H, Iwase T, Krishnamurthy S, Ueno NT, Lambowitz AM. Abstract P5-07-03: Disease classification modeling of inflammatory breast cancer based on simultaneous profiling of coding and non-coding RNAs in tumor and blood samples by TGIRT-sequencing. Cancer Res [Internet]. 82. Publisher's VersionAbstract
Background: Inflammatory breast cancer (IBC) is the most aggressive and lethal breast cancer subtype but lags in disease-specific RNA biomarkers due in part to its paucity of large discrete tumors. A strategy to overcome this challenge is to identify blood-based RNA biomarkers that are minimally invasive and reflect the state of both the diseased breast tissue and the patient's immune response. Here, we identified IBC-specific RNA biomarkers by thermostable group II intron reverse transcriptase sequencing (TGIRT-seq), a recently developed comprehensive RNA-seq technology that enables simultaneous profiling of all RNA biotypes from small amounts of starting material. We used these biomarkers to develop novel disease classification models for IBC based on coding and non-coding RNAs from FFPE tumor slices, PBMCs, and plasma. Methods: We obtained biological samples including FFPE, PBMC, and plasma from a cohort of ten patients with IBC and compared them to samples from six patients with non-IBC and sixteen healthy donors using TGIRT-seq technology. Results: TGIRT-seq of FFPE tumor slices identified differentially expressed mRNAs and miRNAs found previously to distinguish IBC from non-IBC tumors, as well as numerous additional differentially expressed mRNAs and small non-coding RNAs characteristic of IBC. Surprisingly, TGIRT-seq revealed that the differentially expressed protein-coding gene transcripts fall into two categories: mature mRNAs with reads confined to exons, and pre-mRNAs-derived transcripts with reads distributed across exons and introns, to our knowledge, a distinction not made previously for any cancer type. Differentially expressed miRNAs included both mature miRNAs and other transcripts of miRNA loci. IBC PBMCs showed a characteristic inflammatory response not seen in PBMCs from non-IBC patients, as well as differentially expressed tRNAs, snoRNAs, and other sncRNAs, while plasma samples, although of variable quality, included coding and non-coding RNAs distinctive of IBC. Classification models using panels consisting of sets of 50 selected biomarkers profiled by TGIRT-seq achieved a high degree of accuracy under cross-validation, with models based on PBMCs and plasma RNAs correlating with those based on tumor RNAs, and models using both coding and non-coding RNA biomarkers outperforming those based on either alone. Conclusions: Our findings are the first to define a distinct IBC profile across three different tissue types and advance TGIRT-seq as a promising method for high-resolution RNA biomarker profiling of both primary tumors and liquid biopsies with potentially broad utility for diagnosing and defining treatment response in IBC and other cancers. COI: Thermostable group II intron reverse transcriptase (TGIRT) enzymes and methods for their use are the subject of patents and patent applications that have been licensed by the University of Texas to InGex, LLC. A.M.L., some former and present members of the Lambowitz laboratory, and the University of Texas are minority equity holders in InGex, and receive royalty payments from the sale of TGIRT enzymes and kits and from sublicensing of intellectual property to other companies.
Yao J, Winans S, Xu H, Ferrick-Kiddie EA, Jr. MA, Lambowitz AM. Human cells contain myriad excised linear intron RNAs with links to gene regulation and potential utility as biomarkers. BioRxiv [Internet]. Publisher's VersionAbstract
By using TGIRT-seq, we identified >8,500 short full-length excised linear intron (FLEXI) RNAs in human cells. Subsets of FLEXIs accumulated in a cell-type specific manner, and ∼200 corresponded to agotrons or mirtrons or encoded snoRNAs. Analysis of CLIP-seq datasets identified potential interactions between FLEXIs and >100 different RNA-binding proteins (RBPs), 53 of which had binding sites in ≥30 different FLEXIs. In addition to proteins that function in RNA splicing, these 53 RBPs included transcription factors, chromatin remodeling proteins, and cellular growth regulators that impacted FLEXI host gene alternative splicing and/or mRNA levels in knockdown datasets. We computationally identified six groups of RBPs whose binding sites were enriched in different subsets of FLEXIs: AGO1-4 and DICER associated with agotrons and mirtrons; AATF, DKC1, NOLCI, and SMNDC1 associated with snoRNA-encoding FLEXIs; two different combinations of alternative splicing factors found in stress granules; and two novel RBP-intron combinations, one including LARP4 and PABC4, which function together in the cytoplasm to regulate ribosomal protein translation. Our results suggest a model in which proteins involved in transcriptional regulation, alternative splicing, or post-splicing secondary functions bind and stabilize cell-type specific subsets of FLEXIs that perform different biological functions and have potential utility as biomarkers.
Faucher-Giguère L, Roy A, Deschamps-Francoeur G, Couture S, Nottingham RM, Lambowitz AM, Scott MS, Elela SA. High-grade ovarian cancer associated H/ACA snoRNAs promote cancer cell proliferation and survival Laurence Faucher-Giguère, Audrey Roy, Gabrielle Deschamps-Francoeur, Sonia Couture, Ryan M Nottingham, Alan M Lambowitz, Michelle S Scott, Sherif. NAR Cancer [Internet]. 4 (1). Publisher's VersionAbstract
Small nucleolar RNAs (snoRNAs) are an omnipresent class of non-coding RNAs involved in the modification and processing of ribosomal RNA (rRNA). As snoRNAs are required for ribosome production, the increase of which is a hallmark of cancer development, their expression would be expected to increase in proliferating cancer cells. However, assessing the nature and extent of snoRNAs' contribution to cancer biology has been largely limited by difficulties in detecting highly structured RNA. In this study, we used a dedicated midsize non-coding RNA (mncRNA) sensitive sequencing technique to accurately survey the snoRNA abundance in independently verified high-grade serous ovarian carcinoma (HGSC) and serous borderline tumour (SBT) tissues. The results identified SNORA81, SNORA19 and SNORA56 as an H/ACA snoRNA signature capable of discriminating between independent sets of HGSC, SBT and normal tissues. The expression of the signature SNORA81 correlates with the level of ribosomal RNA (rRNA) modification and its knockdown inhibits 28S rRNA pseudouridylation and accumulation leading to reduced cell proliferation and migration. Together our data indicate that specific subsets of H/ACA snoRNAs may promote tumour aggressiveness by inducing rRNA modification and synthesis.
Park SK, Mohr G, Yao J, Russell R, Lambowitz AM. Group II Intron-Like Reverse Transcriptases Function in Double-Strand Break Repair by Microhomology-Mediated End Joining. bioRxiv [Internet]. 484287. Publisher's VersionAbstract
Bacteria encode free-standing reverse transcriptases (RTs) of unknown function that are closely related to group II intron-encoded RTs. Here, we found that a Pseudomonas aeruginosa group II intron-like RT (G2L4 RT) with YIDD instead of YADD at its active site functions in DNA repair in its native host and when transferred into Escherichia coli. G2L4 RT has biochemical activities strikingly similar to those of human DNA repair polymerase q and uses them for translesion DNA synthesis and double-strand break repair (DSBR) via microhomology-mediated end-joining (MMEJ) in vitro and in vivo. We also found that a group II intron RT can function similarly to G2L4 RT in DNA repair, with reciprocal substitutions at the active site showing an I residue favors MMEJ and an A residue favors primer extension in both enzymes. The DNA repair functions of these enzymes utilize conserved structural features of non-LTR-retroelement RTs, including human LINE-1 and other eukaryotic non-LTR-retrotransposon RTs, suggesting such enzymes may have an inherent ability to function in DSBR in a wide range of organisms.
2021
Haissi Cui, K. Diedrich J, C. Wu D, J. Lim J, M. Nottingham R, J. Moresco J, R. Yates III J, J. Blencowe B, M. Lambowitz A, Paul Schimmel. Arg-tRNA synthetase links inflammatory metabolism to RNA splicing and nuclear trafficking via SRRM2 View ORCID Profile, , View ORCID Profile, Justin J. Lim, Ryan M. Nottingham, James J. Moresco, John R. Yates III, Benjamin J. Blencowe, Alan M. L. BioRxiv [Internet]. Publisher's VersionAbstract
Cells respond to perturbations like inflammation by sensing changes in metabolite levels. Especially prominent is arginine, which has known connections to the inflammatory response. Here, we found that depletion of arginine during inflammation decreased levels of a nuclear form of arginyl-tRNA synthetase (ArgRS). Surprisingly, we found that nuclear ArgRS interacts with serine/arginine repetitive matrix protein 2 (SRRM2), a spliceosomal protein and nuclear speckle component and that arginine depletion impacted both condensate-like nuclear trafficking of SRRM2 and splice-site usage in certain genes. These splice-site usage changes cumulated in synthesis of different protein isoforms that altered cellular metabolism and peptide presentation to immune cells. Our findings uncover a novel mechanism whereby a tRNA synthetase cognate to a key amino acid that is metabolically controlled during inflammation modulates the splicing machinery.
Lentzsch AM, Stamos JL, Yao J, Russell R, Lambowitz AM. Structural basis for template switching by a group II intron–encoded non-LTR-retroelement reverse transcriptase. J. Biol. Chem. [Internet]. 297 (2) :100971. Publisher's VersionAbstract
Reverse transcriptases (RTs) can switch template strands during complementary DNA synthesis, enabling them to join discontinuous nucleic acid sequences. Template switching (TS) plays crucial roles in retroviral replication and recombination, is used for adapter addition in RNA-Seq, and may contribute to retroelement fitness by increasing evolutionary diversity and enabling continuous complementary DNA synthesis on damaged templates. Here, we determined an X-ray crystal structure of a TS complex of a group II intron RT bound simultaneously to an acceptor RNA and donor RNA template– DNA primer heteroduplex with a 1-nt 30 -DNA overhang. The structure showed that the 30 end of the acceptor RNA binds in a pocket formed by an N-terminal extension present in non–long terminal repeat–retroelement RTs and the RT fingertips loop, with the 30 nucleotide of the acceptor base paired to the 1-nt 30 - DNA overhang and its penultimate nucleotide base paired to the incoming dNTP at the RT active site. Analysis of structureguided mutations identified amino acids that contribute to acceptor RNA binding and a phenylalanine residue near the RT active site that mediates nontemplated nucleotide addition. Mutation of the latter residue decreased multiple sequential template switches in RNA-Seq. Our results provide new insights into the mechanisms of TS and nontemplated nucleotide addition by RTs, suggest how these reactions could be improved for RNA-Seq, and reveal common structural features for TS by non–long terminal repeat–retroelement RTs and viral RNA–dependent RNA polymerases.
Xu H, Nottingham RM, Lambowitz AM. TGIRT-seq Protocol for the Comprehensive Profiling of Coding and Non-coding RNA Biotypes in Cellular, Extracellular Vesicle, and Plasma RNAs. Bio-protocol. 11 (23).Abstract
High-throughput RNA sequencing (RNA-seq) has extraordinarily advanced our understanding of gene expression and disease etiology, and is a powerful tool for the identification of biomarkers in a wide range of organisms. However, most RNA-seq methods rely on retroviral reverse transcriptases (RTs), enzymes that have inherently low fidelity and processivity, to convert RNAs into cDNAs for sequencing. Here, we describe an RNA-seq protocol using Thermostable Group II Intron Reverse Transcriptases (TGIRTs), which have high fidelity, processivity, and strand-displacement activity, as well as a proficient template-switching activity that enables efficient and seamless RNA-seq adapter addition. By combining these activities, TGIRT-seq enables the simultaneous profiling of all RNA biotypes from small amounts of starting material, with superior RNA-seq metrics, and unprecedented ability to sequence structured RNAs. The TGIRT-seq protocol for Illumina sequencing consists of three steps: (i) addition of a 3' RNA-seq adapter, coupled to the initiation of cDNA synthesis at the 3' end of a target RNA, via template switching from a synthetic adapter RNA/DNA starter duplex; (ii) addition of a 5' RNA-seq adapter, by using thermostable 5' App DNA/RNA ligase to ligate an adapter oligonucleotide to the 3' end of the completed cDNA; (iii) minimal PCR amplification, to add capture sites and indices for Illumina sequencing. TGIRT-seq for the Illumina sequencing platform has been used for comprehensive profiling of coding and non-coding RNAs in ribodepleted, chemically fragmented cellular RNAs, and for the analysis of intact (non-chemically fragmented) cellular, extracellular vesicle (EV), and plasma RNAs, where it yields continuous full-length end-to-end sequences of structured small noncoding RNAs (sncRNAs), including tRNAs, snoRNAs, snRNAs, pre-miRNAs, and full-length excised linear intron (FLEXI) RNAs.
bio-protocol4239.pdf
2020
Yao J, Wu DC, Nottingham RM, Lambowitz AM. Identification of protein-protected mRNA fragments and structured excised intron RNAs in human plasma by TGIRT-seq peak calling. eLife [Internet]. Publisher's VersionAbstract
Human plasma contains >40,000 different coding and non-coding RNAs that are potential biomarkers for human diseases. Here, we used thermostable group II intron reverse transcriptase sequencing (TGIRT-seq) combined with peak calling to simultaneously profile all RNA biotypes in apheresis-prepared human plasma pooled from healthy individuals. Extending previous TGIRT-seq analysis, we found that human plasma contains largely fragmented mRNAs from >19,000 protein-coding genes, abundant full-length, mature tRNAs and other structured small non-coding RNAs, and less abundant tRNA fragments and mature and pre-miRNAs. Many of the mRNA fragments identified by peak calling correspond to annotated protein-binding sites and/or have stable predicted secondary structures that could afford protection from plasma nucleases. Peak calling also identified novel repeat RNAs, miRNA-sized RNAs, and putatively structured intron RNAs of potential biological, evolutionary, and biomarker significance, including a family of full-length excised introns RNAs, subsets of which correspond to mirtron pre-miRNAs or agotrons.
2019
Lentzsch AM, Yao J, Russell R, Lambowitz AM. Template switching mechanism of a group II intron-encoded reverse transcriptase and its implications for biological function and RNA-Seq. Journal of Biological Chemistry [Internet]. Publisher's VersionAbstract

The reverse transcriptases (RTs) encoded by mobile group II introns and other non-LTR retroelements differ from retroviral RTs in being able to template-switch efficiently from the 5 end of one template to the 3 end of another with little or no complementarity between the donor and acceptor templates. Here, to establish a complete kinetic framework for the reaction and to identify conditions that more efficiently capture acceptor RNAs or DNAs, we used a thermostable group II intron RT (TGIRT; GsI–IIC RT) that can template switch directly from synthetic RNA template/DNA primer duplexes having either a blunt end or a 3-DNA overhang end. We found that the rate and amplitude of template switching are optimal from starter duplexes with a single nucleotide 3-DNA overhang complementary to the 3 nucleotide of the acceptor RNA, suggesting a role for nontemplated nucleotide addition of a complementary nucleotide to the 3 end of cDNAs synthesized from natural templates. Longer 3-DNA overhangs progressively decreased the templateswitching rate, even when complementary to the 3 end of the acceptor template. The reliance on only a single bp with the 3 nucleotide of the acceptor together with discrimination against mismatches and the high processivity of group II intron RTs enable synthesis of full-length DNA copies of nucleic acids beginning directly at their 3 end. We discuss the possible biological functions of the template-switching activity of group II intron- and other non-LTR retroelement– encoded RTs, as well as the optimization of this activity for adapter addition in RNAand DNA-Seq protocols.

lentzsch_2019.pdf
Temoche-Diaz MM, Shurtleff MJ, Nottingham .MR, Yao J, Fadadu RP, Lambowitz AM, Schekman R. Distinct mechanisms of microRNA sorting into cancer cell-derived extracellular vesicle subtypes. eLife. 8.Abstract
Extracellular vesicles (EVs) encompass a variety of vesicles secreted into the extracellular space. EVs have been implicated in promoting tumor metastasis, but the molecular composition of tumor-derived EV sub-types and the mechanisms by which molecules are sorted into EVs remain mostly unknown. We report the separation of two small EV sub-populations from a metastatic breast cancer cell line, with biochemical features consistent with different sub-cellular origins. These EV sub-types use different mechanisms of miRNA sorting (selective and non-selective), suggesting that sorting occurs via fundamentally distinct processes, possibly dependent on EV origin. Using biochemical and genetic tools, we identified the Lupus La protein as mediating sorting of selectively packaged miRNAs. We found that two motifs embedded in miR-122 are responsible for high-affinity binding to Lupus La and sorting into vesicles formed in a cell-free reaction. Thus, tumor cells can simultaneously deploy multiple EV species using distinct sorting mechanisms that may enable diverse functions in normal and cancer biology.
Reinsborough CW, Ipas H, Abell NS, Nottingham RM, Yao J, Devanathan SK, Shelton SB, Lambowitz AM, Xhemalce B. BCDIN3D regulates tRNAHis 3’ fragment processing. PLoS Genetics. 15 (7).Abstract
5’ ends are important for determining the fate of RNA molecules. BCDIN3D is an RNA phospho-methyltransferase that methylates the 5’ monophosphate of specific RNAs. In order to gain new insights into the molecular function of BCDIN3D, we performed an unbiased analysis of its interacting RNAs by Thermostable Group II Intron Reverse Transcriptase coupled to next generation sequencing (TGIRT-seq). Our analyses showed that BCDIN3D interacts with full-length phospho-methylated tRNAHis and miR-4454. Interestingly, we found that miR-4454 is not synthesized from its annotated genomic locus, which is a primer-binding site for an endogenous retrovirus, but rather by Dicer cleavage of mature tRNAHis. Sequence analysis revealed that miR-4454 is identical to the 3’ end of tRNAHis. Moreover, we were able to generate this ‘miRNA’ in vitro through incubation of mature tRNAHis with Dicer. As found previously for several pre-miRNAs, a 5’P-tRNAHis appears to be a better substrate for Dicer cleavage than a phospho-methylated tRNAHis. Moreover, tRNAHis 3’-fragment/‘miR-4454’ levels increase in cells depleted for BCDIN3D. Altogether, our results show that in addition to microRNAs, BCDIN3D regulates tRNAHis 3’-fragment processing without negatively affecting tRNAHis’s canonical function of aminoacylation.
reinsborough_2019.pdf
Belfort M, Lambowitz AM. Group II Intron RNPs and ReverseTranscriptases: From Retroelements to Research Tools. Cold Spring Harbor Perspective in Biology [Internet]. Publisher's VersionAbstract
Group II introns, self-splicing retrotransposons, serve as both targets of investigation into their structure, splicing, and retromobility and a source of tools for genome editing and RNA analysis. Here, we describe the first cryo-electron microscopy (cryo-EM) structure determination, at 3.8–4.5 Å, of a group II intron ribozyme complexed with its encoded protein, containing a reverse transcriptase (RT), required for RNA splicing and retromobility. We also describe a method called RIG-seq using a retrotransposon indicator gene for high-throughput integration profiling of group II introns and other retrotransposons. Targetrons, RNA-guided gene targeting agents widely used for bacterial genome engineering, are described next. Finally, we detail thermostable group II intron RTs, which synthesize cDNAs with high accuracy and processivity, for use in various RNA-seq applications and relate their properties to a 3.0-Å crystal structure of the protein poised for reverse transcription. Biological insights from these group II intron revelations are discussed.
belfort_2019.pdf
Xu H, Yao J, Wu DC, Lambowitz AM. Improved TGIRT-seq methods for comprehensive transcriptome profiling with decreased adapter dimer-formation and bias correction. Scientific Reports [Internet]. 9 (1). Publisher's VersionAbstract
Thermostable group II intron reverse transcriptases (TGIRTs) with high fidelity and processivity have been used for a variety of RNA sequencing (RNA-seq) applications, including comprehensive profiling of whole-cell, exosomal, and human plasma RNAs; quantitative tRNA-seq based on the ability of TGIRT enzymes to give full-length reads of tRNAs and other structured small ncRNAs; high-throughput mapping of post-transcriptional modifications; and RNA structure mapping. Here, we improved TGIRT-seq methods for comprehensive transcriptome profiling by rationally designing RNA-seq adapters that minimize adapter dimer formation. Additionally, we developed biochemical and computational methods for remediating 5′- and 3′-end biases, the latter based on a random forest regression model that provides insight into the contribution of different factors to these biases. These improvements, some of which may be applicable to other RNA-seq methods, increase the efficiency of TGIRT-seq library construction and improve coverage of very small RNAs, such as miRNAs. Our findings provide insight into the biochemical basis of 5′- and 3′-end biases in RNA-seq and suggest general approaches for remediating biases and decreasing adapter dimer formation.
xu_2019.pdf
2018
Mohr G, Silas S, Stamos J, Makarova KS, Markham LM, Yao J, Lucas-Elio P, Sanchez-Amat A, Fire AZ, Koonin EV, et al. A Reverse Transcriptase-Cas1 Fusion Protein Contains a Cas6 Domain Required for Both CRISPR RNA Biogenesis and RNA Spacer Acquisition. Mol. Cell [Internet]. (72) :700-714. Publisher's VersionAbstract
Prokaryotic CRISPR-Cas systems provide adaptive immunity by integrating portions of foreign nucleic acids (spacers) into genomic CRISPR arrays. Cas6 proteins then process CRISPR array transcripts into spacer-derived RNAs (CRISPR RNAs; crRNAs) that target Cas nucleases to matching invaders. We find that a Marinomonas mediterranea fusion protein combines three enzymatic domains (Cas6, reverse transcriptase [RT], and Cas1), which function in both crRNA biogenesis and spacer acquisition from RNA and DNA. We report a crystal structure of this divergent Cas6, identify amino acids required for Cas6 activity, show that the Cas6 domain is required for RT activity and RNA spacer acquisition, and demonstrate that CRISPR-repeat binding to Cas6 regulates RT activity. Co-evolution of putative interacting surfaces suggests a specific structural interaction between the Cas6 and RT domains, and phylogenetic analysis reveals repeated, stable association of free-standing Cas6s with CRISPR RTs in multiple microbial lineages, indicating that a functional interaction between these proteins preceded evolution of the fusion.
mohr_2018.pdf
Boivin V, Deschamps-Francoeur G, Couture S, Nottingham RM, Bouchard-Bourelle P, Lambowitz AM. Simultaneous sequencing of coding and noncoding RNA reveals a human transcriptome dominated by a small number of highly expressed noncoding genes. RNA [Internet]. 24 (7) :950-965. Publisher's VersionAbstract
Comparing the abundance of one RNA molecule to another is crucial for understanding cellular functions but most sequencing techniques can target only specific subsets of RNA. In this study, we used a new fragmented ribodepleted TGIRT sequencing method that uses a thermostable group II intron reverse transcriptase (TGIRT) to generate a portrait of the human transcriptome depicting the quantitative relationship of all classes of nonribosomal RNA longer than 60 nt. Comparison between different sequencing methods indicated that FRT is more accurate in ranking both mRNA and noncoding RNA than viral reverse transcriptase-based sequencing methods, even those that specifically target these species. Measurements of RNA abundance in different cell lines using this method correlate with biochemical estimates, confirming tRNA as the most abundant nonribosomal RNA biotype. However, the single most abundant transcript is 7SL RNA, a component of the signal recognition particle. Structured noncoding RNAs (sncRNAs) associated with the same biological process are expressed at similar levels, with the exception of RNAs with multiple functions like U1 snRNA. In general, sncRNAs forming RNPs are hundreds to thousands of times more abundant than their mRNA counterparts. Surprisingly, only 50 sncRNA genes produce half of the non-rRNA transcripts detected in two different cell lines. Together the results indicate that the human transcriptome is dominated by a small number of highly expressed sncRNAs specializing in functions related to translation and splicing.
boivin_2018.pdf
Mohr G, Kang SY, Park SK, Qin Y, Grohman J, Yao J, Stamos JL, Lambowitz AM. A Highly Proliferative Group IIC Intron from Geobacillus stearothermophilus Reveals New Features of Group II Intron Mobility and Splicing. Journal of Molecular Biology [Internet]. 430 (17) :2760-2783. Publisher's VersionAbstract
The thermostable Geobacillus stearothermophilus GsI-IIC intron is among the few bacterial group II introns found to proliferate to high copy number in its host genome. Here, we developed a bacterial genetic assay for retrohoming and biochemical assays for protein-dependent and self-splicing of GsI-IIC. We found that GsI-IIC, like other group IIC introns, retrohomes into sites having a 5'-exon DNA hairpin, typically from a bacterial transcription terminator, followed by short intron-binding sequences (IBSs) recognized by base pairing of exon-binding sequences (EBSs) in the intron RNA. Intron RNA insertion occurs preferentially but not exclusively into the parental lagging strand at DNA replication forks, using a nascent lagging strand DNA as a primer for reverse transcription. In vivo mobility assays, selections, and mutagenesis indicated that a variety of GC-rich DNA hairpins of 7-19 bp with continuous base pairs or internal elbow regions support efficient intron mobility and identified a critically recognized nucleotide (T-5) between the hairpin and IBS1, a feature not reported previously for group IIC introns. Neither the hairpin nor T-5 is required for intron excision or lariat formation during RNA splicing, but the 5'-exon sequence can affect the efficiency of exon ligation. Structural modeling suggests that the 5'-exon DNA hairpin and T-5 bind to the thumb and DNA-binding domains of GsI-IIC reverse transcriptase. This mode of DNA target site recognition enables the intron to proliferate to high copy number by recognizing numerous transcription terminators and then finding the best match for the EBS/IBS interactions within a short distance downstream.
mohr_kang_2018.pdf
Wu DC, Yao J, Ho KS, Lambowitz AM, Wilke CO. Limitations of alignment-free tools in total RNA-seq quantification. BMC Genomics [Internet]. 19 (1) :510. Publisher's VersionAbstract

BACKGROUND:

Alignment-free RNA quantification tools have significantly increased the speed of RNA-seq analysis. However, it is unclear whether these state-of-the-art RNA-seq analysis pipelines can quantify small RNAs as accurately as they do with long RNAs in the context of total RNA quantification.

RESULT:

We comprehensively tested and compared four RNA-seq pipelines for accuracy of gene quantification and fold-change estimation. We used a novel total RNA benchmarking dataset in which small non-coding RNAs are highly represented along with other long RNAs. The four RNA-seq pipelines consisted of two commonly-used alignment-free pipelines and two variants of alignment-based pipelines. We found that all pipelines showed high accuracy for quantifying the expression of long and highly-abundant genes. However, alignment-free pipelines showed systematically poorer performance in quantifying lowly-abundant and small RNAs.

CONCLUSION:

We have shown that alignment-free and traditional alignment-based quantification methods perform similarly for common gene targets, such as protein-coding genes. However, we have identified a potential pitfall in analyzing and quantifying lowly-expressed genes and small RNAs with alignment-free pipelines, especially when these small RNAs contain biological variations.

wu_yao_2018.pdf

Pages