Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks
译名:使用深度卷积神经网络直接从基因组序列预测mRNA丰度
期刊:Cell Reports
发表时间:May 19, 2020
投稿时间长达两年!!!
只能说佩服。SUMMARY
Algorithms that accurately predict gene structure from primary sequence alone were transformative for annotating the human genome. Can we also predict the expression levels of genes based solely on genome sequence? Here, we sought to apply deep convolutional neural networks toward that goal. Surprisingly, a model that includes only promoter sequences and features associated with mRNA stability explains 59% and 71% of variation in steady-state mRNA levels in human and mouse, respectively. This model, termed Xpresso, more than doubles the accuracy of alternative sequence-based models and isolates rules as predictive as models relying on chromatic immunoprecipitation sequencing (ChIP-seq) data. Xpresso recapitulates genome-wide patterns of transcriptional activity, and its residuals can be used to quantify the influence of enhancers, heterochromatic domains, and microRNAs. Model interpretation reveals that promoter-proximal CpG dinucleotides strongly predict transcriptional activity. Looking forward, we propose cell-type-specific gene-expression predictions based solely on primary sequences as a grand challenge for the field.
PS作者原话:
“Five years ago, I embarked on a journey as a bright-eyed postdoc to ask a theoretical question: how predictable are gene expression levels from DNA sequence alone? After ~2 years in review (incl. a world record >10mo to receive 1 round in CellReports)”
这是一个世界纪录啊。
通讯作者:Vikram Agarwa
联系客服