BACKGROUND: Although different quality controls have been applied at different stages of the sample preparation and data analysis to ensure both reproducibility and reliability of RNA-seq results, there are still limitations and bias on the detectability for certain differentially expressed genes (DEGs). Whether the transcriptional dynamics of a gene can be captured accurately depends on experimental design/operation and the following data analysis processes. The workflow of subsequent data processing, such as reads alignment, transcript quantification, normalization, and statistical methods for ultimate identification of DEGs can influence the accuracy and sensitivity of DEGs analysis, producing a certain number of false-positivity or false-negativity. Machine learning (ML) is a multidisciplinary field that employs computer science, artificial intelligence, computational statistics and information theory to construct algorithms that can learn from existing data sets and to make predictions on new data set. ML-based differential network analysis has been applied to predict stress-responsive genes through learning the patterns of 32 expression characteristics of known stress-related genes. In addition, the epigenetic regulation plays critical roles in gene expression, therefore, DNA and histone methylation data has been shown to be powerful for ML-based model for prediction of gene expression in many systems, including lung cancer cells. Therefore, it is promising that ML-based methods could help to identify the DEGs that are not identified by traditional RNA-seq method.
RESULTS: We identified the top 23 most informative features through assessing the performance of three different feature selection algorithms combined with five different classification methods on training and testing data sets. By comprehensive comparison, we found that the model based on InfoGain feature selection and Logistic Regression classification is powerful for DEGs prediction. Moreover, the power and performance of ML-based prediction was validated by the prediction on ethylene regulated gene expression and the following qRT-PCR.
CONCLUSIONS: Our study shows that the combination of ML-based method with RNA-seq greatly improves the sensitivity of DEGs identification.
Vernalization is a response to the winter cold to acquire the competence to flower in next spring. VERNALIZATION INSENSITIVE 3 (VIN3) is a PHD-finger protein that binds to modified histones in vitro. VIN3 is induced by long-term cold and is necessary for Polycomb Repression Complex 2 (PRC2)-mediated tri-methylation of Histone H3 Lysine 27 (H3K27me3) at the FLC locus in Arabidopsis. An alteration in the PHD-finger domain of VIN3 changes the binding specificity of the PHD-finger domain of VIN3 in vitro and results in an accelerated vernalization response in vivo. The acceleration in vernalization response is achieved by increased enrichments of VIN3 and tri-methylation of Histone H3 Lysine 27 (H3K27me3) at the FLC locus without invoking the increased enrichment of Polycomb Repressive Complex 2. This result indicates that the binding specificity of the PHD-finger domain of VIN3 plays a role in mediating a proper vernalization response in Arabidopsis. Furthermore, this work shows a potential that the alteration of PHD-finger domains could be applied to alter various developmental processes in plants.
Vernalization is a response to winter cold to initiate flowering in spring. VERNALIZATION INSENSITIVE3 (VIN3) is induced by winter cold and is essential to vernalization response in Arabidopsis (Arabidopsis thaliana). VIN3 encodes a PHD-finger domain that binds to modified histones in vitro. An alteration in the binding specificity of the PHD-finger domain of VIN3 results in a hypervernalization response. The hypervernalization response is achieved by increased enrichments of VIN3 and trimethylation of Histone H3 Lys 27 at the FLC locus without invoking the increased enrichment of Polycomb Repressive Complex 2. Our result shows that the binding specificity of the PHD-finger domain of VIN3 plays a role in mediating a proper vernalization response in Arabidopsis.
The long noncoding RNA COLDAIR is necessary for the repression of a floral repressor FLOWERING LOCUS C (FLC) during vernalization in Arabidopsis thaliana. The repression of FLC is mediated by increased enrichment of Polycomb Repressive Complex 2 (PRC2) and subsequent trimethylation of Histone H3 Lysine 27 (H3K27me3) at FLC chromatin. In this study we found that the association of COLDAIR with chromatin occurs only at the FLC locus and that the central region of the COLDAIR transcript is critical for this interaction. A modular motif in COLDAIR is responsible for the association with PRC2 in vitro, and the mutations within the motif that reduced the association of COLDAIR with PRC2 resulted in vernalization insensitivity. The vernalization insensitivity caused by mutant COLDAIR was rescued by the ectopic expression of the wild-type COLDAIR. Our study reveals the molecular framework in which COLDAIR lncRNA mediates the PRC2-mediated repression of FLC during vernalization.
The maize endosperm consists of three major compartmentalized cell types: the starchy endosperm (SE), the basal endosperm transfer cell layer (BETL), and the aleurone cell layer (AL). Differential genetic programs are activated in each cell type to construct functionally and structurally distinct cells. To compare gene expression patterns involved in maize endosperm cell differentiation, we isolated transcripts from cryo-dissected endosperm specimens enriched with BETL, AL, or SE at 8, 12, and 16 days after pollination (DAP). We performed transcriptome profiling of coding and long noncoding transcripts in the three cell types during differentiation and identified clusters of the transcripts exhibiting spatio-temporal specificities. Our analysis uncovered that the BETL at 12 DAP undergoes the most dynamic transcriptional regulation for both coding and long noncoding transcripts. In addition, our transcriptome analysis revealed spatio-temporal regulatory networks of transcription factors, imprinted genes, and loci marked with histone H3 trimethylated at lysine 27. Our study suggests that various regulatory mechanisms contribute to the genetic networks specific to the functions and structures of the cell types of the endosperm.
Long noncoding RNAs (lncRNAs) affect gene regulation through structural and regulatory interactions with associated proteins. The Polycomb complex often binds to lncRNAs in eukaryotes, and an lncRNA, COLDAIR, associates with Polycomb to mediate silencing of the floral repressor FLOWERING LOCUS C (FLC) during the process of vernalization in Arabidopsis. Here, we identified an additional Polycomb-binding lncRNA, COLDWRAP. COLDWRAP is derived from the repressed promoter of FLC and is necessary for the establishment of the stable repressed state of FLC by vernalization. Both COLDAIR and COLDWRAP are required to form a repressive intragenic chromatin loop at the FLC locus by vernalization. Our results indicate that vernalization-mediated Polycomb silencing is coordinated by lncRNAs in a cooperative manner to form a stable repressive chromatin structure.
Flowering in plants is a dynamic and synchronized process where various cues including age, day length, temperature and endogenous hormones fine-tune the timing of flowering for reproductive success. Arabidopsis thaliana is a facultative long day (LD) plant where LD photoperiod promotes flowering. Arabidopsis still flowers under short-day (SD) conditions, albeit much later than in LD conditions. Although factors regulating the inductive LD pathway have been extensively investigated, the non-inductive SD pathway is much less understood. Here, we identified a key basic helix-loop-helix transcription factor called NFL (NO FLOWERING IN SHORT DAY) that is essential to induce flowering specifically under SD conditions in Arabidopsis. nfl mutants do not flower under SD conditions, but flower similar to the wild type under LD conditions. The no-flowering phenotype in SD is rescued either by exogenous application of gibberellin (GA) or by introducing della quadruple mutants in the nfl background, suggesting that NFL acts upstream of GA to promote flowering. NFL is expressed at the meristematic regions and NFL is localized to the nucleus. Quantitative RT-PCR assays using apical tissues showed that GA biosynthetic genes are downregulated and the GA catabolic and receptor genes are upregulated in the nfl mutant compared with the wild type, consistent with the perturbation of the endogenous GA biosynthetic and catabolic intermediates in the mutant. Taken together, these data suggest that NFL is a key transcription factor necessary for promotion of flowering under non-inductive SD conditions through the GA signaling pathway.
Ethylene is one of the most important hormones for plant developmental processes and stress responses. However, the phosphorylation regulation in the ethylene signaling pathway is largely unknown. Here we report the phosphorylation of cap binding protein 20 (CBP20) at Ser245 is regulated by ethylene, and the phosphorylation is involved in root growth. The constitutive phosphorylation mimic form of CBP20 (CBP20S245E or CBP20S245D), while not the constitutive de-phosphorylation form of CBP20 (CBP20S245A) is able to rescue the root ethylene responsive phenotype of cbp20. By genome wide study with ethylene regulated gene expression and microRNA (miRNA) expression in the roots and shoots of both Col-0 and cbp20, we found miR319b is up regulated in roots while not in shoots, and its target MYB33 is specifically down regulated in roots with ethylene treatment. We described both the phenotypic and molecular consequences of transgenic over-expression of miR319b. Increased levels of miR319b (miR319bOE) leads to enhanced ethylene responsive root phenotype and reduction of MYB33 transcription level in roots; over expression of MYB33, which carrying mutated miR319b target site (mMYB33) in miR319bOE is able to recover both the root phenotype and the expression level of MYB33. Taken together, we proposed that ethylene regulated phosphorylation of CBP20 is involved in the root growth and one pathway is through the regulation of miR319b and its target MYB33 in roots.
BACKGROUND: The Maternally expressed gene (Meg) family is a locally-duplicated gene family of maize which encodes cysteine-rich proteins (CRPs). The founding member of the family, Meg1, is required for normal development of the basal endosperm transfer cell layer (BETL) and is involved in the allocation of maternal nutrients to growing seeds. Despite the important roles of Meg1 in maize seed development, the evolutionary history of the Meg cluster and the activities of the duplicate genes are not understood.
RESULTS: In maize, the Meg gene cluster resides in a 2.3 Mb-long genomic region that exhibits many features of non-centromeric heterochromatin. Using phylogenetic reconstruction and syntenic alignments, we identified the pedigree of the Meg family, in which 11 of its 13 members arose in maize after allotetraploidization ~4.8 mya. Phylogenetic and population-genetic analyses identified possible signatures suggesting recent positive selection in Meg homologs. Structural analyses of the Meg proteins indicated potentially adaptive changes in secondary structure from α-helix to β-strand during the expansion. Transcriptomic analysis of the maize endosperm indicated that 6 Meg genes are selectively activated in the BETL, and younger Meg genes are more active than older ones. In endosperms from B73 by Mo17 reciprocal crosses, most Meg genes did not display parent-specific expression patterns.
CONCLUSIONS: Recently-duplicated Meg genes have different protein secondary structures, and their expressions in the BETL dominate over those of older members. Together with the signs of positive selections in the young Meg genes, these results suggest that the expansion of the Meg family involves potentially adaptive transitions in which new members with novel functions prevailed over older members.