RSEM定量软件得到的明明是FPKM矩阵为什么可以DESeq2差异分析呢

数据集是：https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE180169

是 Illumina HiSeq 4000 (Mus musculus) 测序数据无疑，而且是很标准的两个分组，后续做差异分析很方便：

GSM5454559 RNA-seq of TdTomato+ cells from Cont1 mouse
GSM5454560 RNA-seq of TdTomato+ cells from Cont2 mouse
GSM5454561 RNA-seq of TdTomato+ cells from Cont3 mouse
GSM5454562 RNA-seq of TdTomato+ cells from Cont4 mouse
GSM5454563 RNA-seq of TdTomato+ cells from Cont5 mouse
GSM5454564 RNA-seq of TdTomato+ cells from PAH1 mouse
GSM5454565 RNA-seq of TdTomato+ cells from PAH2 mouse
GSM5454566 RNA-seq of TdTomato+ cells from PAH3 mouse
GSM5454567 RNA-seq of TdTomato+ cells from PAH4 mouse

但是作者给出来了的表达量矩阵文件是：GSE180169_table_FPKM_FC_mousePAH.txt.gz ，打开可以看到其实是有小数点的：

有小数点的表达量矩阵

那，肯定并不是纯粹的counts矩阵啦。我们看了看文章的描述是：

Quality and adapter read trimming was performed using Trim Galore version 0.5.0.
Gene quantification (read count and normalized expression value as Fragments Per Kilobase Million - FPKM) was obtained using RSEM version 1.3.0 (options: -bowtie2 --p 20 --paired-end), based on GENCODE GRCm38 genome primary assembly with annotation version M16.
The differential expression was assessed using DESeq2 version 1.26.0 by comparing PAH to Control samples.
Genome_build: GRCm38
Supplementary_files_format_and_content: txt file with gene expression (FPKM) and differential gene expression Log fold change and adjusted p-value

虽然我一直没有跑RSEM流程，但是它可以同时输出 (read count and normalized expression value as Fragments Per Kilobase Million - FPKM)两种形式的定量，还是蛮方便的。本来我还好奇为什么RSEM定量软件得到的明明是FPKM矩阵为什么可以DESeq2差异分析呢？

虽然我们一直强调FPKM和RPKM已经是落后的定量格式，但是仍然是有部分小伙伴喜欢这样的矩阵。而且作者在附件给的是Supplementary_files_format_and_content: txt file with gene expression (FPKM) ，意味着它不可能被DESeq2这样的包进行常规差异分析。

这个时候要么写邮件给作者申请read count 格式的表达量矩阵，要么自己下载fq文件自己走定量流程，这个流程可以看最近学徒实践的转录组数据分析实战目录，如下所示：

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。