直接从TCGA
官网下载的突变maf数据可以提供给maftools
直接使用,画出突变瀑布图。
但是不能直接提供给ComplexHeatmap
使用,需要提前处理一下,今天演示下如何处理。对TCGA的MAF数据使用ComplexHeatmap绘制突变全景图。
关于这个TCGA的MAF数据和突变瀑布图,我们之前也介绍过很多次:
首先是从TCGA
官网下载突变数据的MAF文件,我们直接使用easyTCGA
包下载,1行代码搞定:
# COAD
library(easyTCGA)
getsnvmaf("TCGA-COAD")
下载后直接加载使用即可:
load(file = "G:/easyTCGA_test/output_snv/TCGA-COAD_maf_clin.rdata")
这个数据可以直接提供给maftools
使用:
library(maftools)
# 直接读取
coad_maf <- read.maf(snv,clin_snv,isTCGA = T)
## -Validating
## --Removed 37645 duplicated variants
## -Silent variants: 54696
## -Summarizing
## --Mutiple centers found
## BCM;WUGSC;BCM;BI;BCM;WUGSC--Possible FLAGS among top ten genes:
## TTN
## SYNE1
## MUC16
## -Processing clinical data
## -Finished in 8.250s elapsed (5.900s cpu)
然后就可以画出突变全景图了:
maftools::oncoplot(coad_maf)
过程非常简单,我们一共用了5行代码就画出了这个图,而且还包含了加载R包的代码。
这个方法是基于ComplexHeatmap
的,并且ComplexHeatmap
也提供了一个oncoprint
函数可以画出这样的图,虽然更加复杂,但是自定义程度也更高,只需要更改一下数据格式。
我们先看一下ComplexHeatmap
包内置的数据格式。
library(ComplexHeatmap)
# 读取内置数据
mat = read.table(system.file("extdata", package = "ComplexHeatmap", "tcga_lung_adenocarcinoma_provisional_ras_raf_mek_jnk_signalling.txt"), header = TRUE, stringsAsFactors = FALSE, sep = "\t")
mat[is.na(mat)] = ""
rownames(mat) = mat[, 1]
mat = mat[, -1]
mat= mat[, -ncol(mat)]
mat = t(as.matrix(mat))
mat[1:5, 1:4]
## TCGA-05-4384-01 TCGA-05-4390-01 TCGA-05-4425-01 TCGA-38-4631-01
## KRAS " " "MUT;" " " " "
## HRAS " " " " " " " "
## BRAF " " " " " " " "
## RAF1 " " " " " " " "
## MAP3K1 " " " " " " " "
可以看到这个数据和默认的MAF数据是不一样的,它的行是基因,列是样本。
画图如下,我们之前也介绍过:超详细的R语言热图之complexheatmap系列07
col = c("HOMDEL" = "blue", "AMP" = "red", "MUT" = "#008000")
alter_fun = list(
background = alter_graphic("rect", fill = "#CCCCCC"),
HOMDEL = alter_graphic("rect", fill = col["HOMDEL"]),
AMP = alter_graphic("rect", fill = col["AMP"]),
MUT = alter_graphic("rect", height = 0.33, fill = col["MUT"])
)
heatmap_legend_param = list(title = "Alternations", at = c("HOMDEL", "AMP", "MUT"),
labels = c("Deep deletion", "Amplification", "Mutation"))
oncoPrint(mat,
alter_fun = alter_fun,
col = col,
heatmap_legend_param = heatmap_legend_param
)
## All mutation types: MUT, AMP, HOMDEL.
## `alter_fun` is assumed vectorizable. If it does not generate correct
## plot, please set `alter_fun_is_vectorized = FALSE` in `oncoPrint()`.
但是默认从TCGA下载的MAF数据长这样:
# 只展示部分
snv[1:4,c(2,7,8,10)]
## Hugo_Symbol Start_Position End_Position Variant_Classification
## 1 AGRN 1054486 1054486 Missense_Mutation
## 2 B3GALT6 1234004 1234004 3'UTR
## 3 UBE2J2 1257244 1257244 Missense_Mutation
## 4 TNFRSF14 2561721 2561721 Silent
ComplexHeatmap
画图需要的都是宽数据,你看到的数据是什么样,画出来的热图就是什么样的,所以我们先把原始的MAF数据变为宽数据,再画图即可。
你是不是以为这个操作很难?
其实也就是1行代码即可,借助oncoplot
函数即可:
maftools::oncoplot(coad_maf,
top = 30,
writeMatrix = T)#添加这个参数即可
这样会在当前工作目录下自动生成一个onco_matrix.txt
文件,这个文件的格式默认就是ComplexHeatmap
包需要的格式!
coad_mat <- read.table(file = "E:/R/r-learning/热图相关图网络图/onco_matrix.txt",
header=T,sep="\t",check.names = F
)
coad_mat[1:4,1:4]
## TCGA-CA-6718 TCGA-AA-3977 TCGA-AA-A02E TCGA-D5-6931
## APC Multi_Hit Multi_Hit Frame_Shift_Del Nonsense_Mutation
## TP53 Multi_Hit Multi_Hit Missense_Mutation Missense_Mutation
## TTN Multi_Hit Multi_Hit Missense_Mutation Multi_Hit
## KRAS Multi_Hit Missense_Mutation Missense_Mutation Missense_Mutation
maftools
包非常强大,有非常多实用的函数,比如我们之前还介绍过1行代码计算肿瘤突变负荷:1行代码计算肿瘤突变负荷TMB。大家可以多去学习下。
下面直接画图就可以了,不过需要进行很多自定义处理,适合对ComplexHeatmap
有一定了解的人使用,并不是很简单哦~
# 定义颜色
col = c("Missense_Mutation" = "#666111", "Nonsense_Mutation" = "#A65628",
"Splice_Site" = "#BEBADA","Frame_Shift_Del"="#FB8072",
"Frame_Shift_Ins"="#4DAF4A","In_Frame_Del"="#E41A1C",
"In_Frame_Ins"="#FF7F00","Nonstop_Mutation"="#FCCDE5",
"Multi_Hit"="#8DA0CB"
)
alter_fun = list(
background = alter_graphic("rect", fill = "grey90"), ##CCCCCC
Missense_Mutation = alter_graphic("rect", fill = col["Missense_Mutation"]),
Nonsense_Mutation = alter_graphic("rect", fill = col["Nonsense_Mutation"]),
Splice_Site = alter_graphic("rect", height = 0.33, fill = col["Splice_Site"]),
Frame_Shift_Del = alter_graphic("rect", fill = col["Frame_Shift_Del"]),
Frame_Shift_Ins = alter_graphic("rect", fill = col["Frame_Shift_Ins"]),
In_Frame_Del = alter_graphic("rect", fill = col["In_Frame_Del"]),
In_Frame_Ins = alter_graphic("rect", fill = col["In_Frame_Ins"]),
Nonstop_Mutation = alter_graphic("rect", fill = col["Nonstop_Mutation"]),
Multi_Hit = alter_graphic("rect", fill = col["Multi_Hit"])
)
# 定义图例
heatmap_legend_param = list(title = "Alternations",
at = c("Missense_Mutation", "Nonsense_Mutation",
"Splice_Site","Frame_Shift_Del",
"Frame_Shift_Ins","In_Frame_Del",
"In_Frame_Ins","Nonstop_Mutation",
"Multi_Hit"
),
labels = c("Missense_Mutation", "Nonsense_Mutation",
"Splice_Site","Frame_Shift_Del",
"Frame_Shift_Ins","In_Frame_Del",
"In_Frame_Ins","Nonstop_Mutation",
"Multi_Hit"
)
)
# 添加临床信息
columnanno <- HeatmapAnnotation(stage=clin_snv$ajcc_pathologic_stage,
gender=clin_snv$gender,
race=clin_snv$race,
tissue_orign=clin_snv$tissue_or_organ_of_origin,
show_annotation_name = T
)
下面就画图就行了:
oncoPrint(coad_mat,
alter_fun = alter_fun,
col = col,
heatmap_legend_param = heatmap_legend_param,
remove_empty_columns = T,
remove_empty_rows = T,
bottom_annotation = columnanno
)
## All mutation types: Multi_Hit, Missense_Mutation, Nonsense_Mutation,
## Frame_Shift_Del, Splice_Site, Frame_Shift_Ins, Nonstop_Mutation,
## In_Frame_Del, In_Frame_Ins.
## `alter_fun` is assumed vectorizable. If it does not generate correct
## plot, please set `alter_fun_is_vectorized = FALSE` in `oncoPrint()`.
这样就搞定了,如果你还想添加各种元素,就是常规的ComplexHeatmap
操作了,我们在之前都介绍过了,这里就不多说了。
通过翻看maftools
的代码,我们也可以自己制作oncoPrint
需要的输入数据。
进行复杂操作的前提是,你得知道MAF文件中各个列都是什么意思,比如Variant_Classification
这一列就是突变类型,也就是上面突变全景图的图例部分。
table(snv$Variant_Classification)
##
## 3'Flank 3'UTR 5'Flank
## 366 1922 365
## 5'UTR Frame_Shift_Del Frame_Shift_Ins
## 527 19020 5019
## IGR In_Frame_Del In_Frame_Ins
## 3 1052 83
## Intron Missense_Mutation Nonsense_Mutation
## 2920 148620 12105
## Nonstop_Mutation RNA Silent
## 125 1469 54352
## Splice_Region Splice_Site Translation_Start_Site
## 1713 2936 176
我们只选择一部分突变类型(以下几个类型是从maftools
的源码中翻的):
vc.nonSilent <- mus <- c("Missense_Mutation","Nonsense_Mutation","Splice_Site",
"Frame_Shift_Del","Frame_Shift_Ins","In_Frame_Del",
"In_Frame_Ins","Nonstop_Mutation",
"Translation_Start_Site")
maf <- snv
maf = maf[maf$Variant_Classification %in% vc.nonSilent,]
aa <- as.data.frame(maf)
下面就是长宽转换了:
library(dplyr)
library(tidyr)
library(tibble)
# tibble不支持行名,我感觉是一个缺点
plot_df <- aa %>%
select(Hugo_Symbol,Variant_Classification,Tumor_Sample_Barcode) %>%
#filter(Variant_Classification %in% mus) %>%
pivot_wider(names_from = Tumor_Sample_Barcode,values_from = Variant_Classification)
dim(plot_df)
## [1] 17754 429
plot_df[1:4,1:4]
## # A tibble: 4 × 4
## Hugo_Symbol `TCGA-D5-6540` `TCGA-AA-3815` `TCGA-G4-6322`
## <chr> <list> <list> <list>
## 1 AGRN <chr [1]> <NULL> <NULL>
## 2 UBE2J2 <chr [1]> <NULL> <NULL>
## 3 RPL22 <chr [1]> <NULL> <NULL>
## 4 ZBTB48 <chr [1]> <NULL> <NULL>
此时的值都是列表,每个列表可能是一个字符,表示某种突变,也可能是多个字符,表示同时存在多个突变,还需要继续转换。
以下代码参考https://github.com/jokergoo/ComplexHeatmap/issues/663:
coad_mat <- apply(plot_df, 2, function(x) {
sapply(x, paste, collapse = ",")
})
rownames(coad_mat) <- coad_mat[,1]
coad_mat <- coad_mat[,-1]
coad_mat[1:4,1:4]
## TCGA-D5-6540 TCGA-AA-3815 TCGA-G4-6322 TCGA-AA-3855
## AGRN "Missense_Mutation" "" "" ""
## UBE2J2 "Missense_Mutation" "" "" ""
## RPL22 "Frame_Shift_Del" "" "" ""
## ZBTB48 "Frame_Shift_Del" "" "" ""
这样就转换成功了,这个矩阵就可以用来画图了。
我们选择突变频率最高的前30个基因,还是借助maftools
包帮我们选择:
genes <- getGeneSummary(coad_maf) %>% slice_head(n=30) %>% pull(Hugo_Symbol)
plot_coad_mat <- coad_mat[genes,]
plot_coad_mat[1:4,1:4]
## TCGA-D5-6540
## APC "Frame_Shift_Del"
## TP53 "Missense_Mutation"
## TTN "Missense_Mutation,Missense_Mutation,Missense_Mutation,Splice_Site,Missense_Mutation,Missense_Mutation"
## KRAS ""
## TCGA-AA-3815
## APC "Frame_Shift_Ins"
## TP53 "Missense_Mutation,Missense_Mutation"
## TTN "Missense_Mutation"
## KRAS ""
## TCGA-G4-6322 TCGA-AA-3855
## APC "Nonsense_Mutation,Nonsense_Mutation" "Frame_Shift_Del"
## TP53 "" ""
## TTN "" ""
## KRAS "" ""
大功告成,画图即可:
# 定义颜色
col = c("Missense_Mutation" = "#666111", "Nonsense_Mutation" = "#A65628",
"Splice_Site" = "#BEBADA","Frame_Shift_Del"="#FB8072",
"Frame_Shift_Ins"="#4DAF4A","In_Frame_Del"="#E41A1C",
"In_Frame_Ins"="#FF7F00","Nonstop_Mutation"="#FCCDE5",
"Multi_Hit"="#8DA0CB"
)
alter_fun = list(
background = alter_graphic("rect", fill = "grey90"), ##CCCCCC
Missense_Mutation = alter_graphic("rect", fill = col["Missense_Mutation"]),
Nonsense_Mutation = alter_graphic("rect", fill = col["Nonsense_Mutation"]),
Splice_Site = alter_graphic("rect", height = 0.33, fill = col["Splice_Site"]),
Frame_Shift_Del = alter_graphic("rect", fill = col["Frame_Shift_Del"]),
Frame_Shift_Ins = alter_graphic("rect", fill = col["Frame_Shift_Ins"]),
In_Frame_Del = alter_graphic("rect", fill = col["In_Frame_Del"]),
In_Frame_Ins = alter_graphic("rect", fill = col["In_Frame_Ins"]),
Nonstop_Mutation = alter_graphic("rect", fill = col["Nonstop_Mutation"]),
Multi_Hit = alter_graphic("rect", fill = col["Multi_Hit"])
)
# 定义图例
heatmap_legend_param = list(title = "Alternations",
at = c("Missense_Mutation", "Nonsense_Mutation",
"Splice_Site","Frame_Shift_Del",
"Frame_Shift_Ins","In_Frame_Del",
"In_Frame_Ins","Nonstop_Mutation",
"Multi_Hit"
),
labels = c("Missense_Mutation", "Nonsense_Mutation",
"Splice_Site","Frame_Shift_Del",
"Frame_Shift_Ins","In_Frame_Del",
"In_Frame_Ins","Nonstop_Mutation",
"Multi_Hit"
)
)
# 添加临床信息
columnanno <- HeatmapAnnotation(stage=clin_snv$ajcc_pathologic_stage,
gender=clin_snv$gender,
race=clin_snv$race,
tissue_orign=clin_snv$tissue_or_organ_of_origin,
show_annotation_name = T
)
oncoPrint(plot_coad_mat,
alter_fun = alter_fun,
col = col,
heatmap_legend_param = heatmap_legend_param,
remove_empty_columns = T,
remove_empty_rows = T,
bottom_annotation = columnanno
)
## All mutation types: Frame_Shift_Del, Missense_Mutation, Splice_Site,
## Nonsense_Mutation, Frame_Shift_Ins, In_Frame_Del, In_Frame_Ins,
## Nonstop_Mutation.
## `alter_fun` is assumed vectorizable. If it does not generate correct
## plot, please set `alter_fun_is_vectorized = FALSE` in `oncoPrint()`.
搞定!
complexheatmap
系列教程:
联系客服