公共数据库挖掘必备-QTL分析

现如今，由于二代测序的普及化、公共数据库的便利化，越来越多的科研工作者可以将不同的组学数据进行大数据的整合分析。今天，小编就为大家讲解如何进行QTL的分析流程。

首先，我们要理解什么是QTL。数量性状基因座（QuantitativeTrait Loci, QTL）是指染色体上一些能特定调控mRNA（eQTL）、甲基化水平(mQTL)的SNP位点，其mRNA、甲基化的表达水平量与数量性状成比例关系。

有很多文章都用到了QTL的分析，比如小编上一期讲解的“Modulation of long noncoding RNAs by risk SNPs underlying geneticpredispositions to prostate cancer”，用到了eQTL的分析；以及“Associationand cis-mQTL Analysis of Variants in CHRNA3-A5, CHRNA7, CHRNB2, and CHRNB4 inRelation to Nicotine Dependence in a Chinese Han Population”这篇文章中结合了表达和甲基化数据，做了meQTL的分析。他们做这些分析的目的，是结合DNA、RNA、甲基化的数据，探索一些和疾病显著关联的intron上的SNP的潜在生物学机制。比如在“Association andcis-mQTL Analysis of Variants in CHRNA3-A5, CHRNA7, CHRNB2, and CHRNB4 in Relationto Nicotine Dependence in a Chinese Han Population”文章中，作者发现rs3743075和尼古丁成瘾显著相关，并且，这个位点不仅是cis-eQTL，还可以调节临近位点的甲基化水平。那么，这个位点可能通过增加了附近的甲基化水平，抑制了基因的表达，从而导致了疾病的发生。

这里正式开始给大家如何做QTL的分析。我们用到的软件是MatrixEQTLpackage (v 2.1.1)，具体所需要的文件格式，大家请查看我们之前的推文“meQTL分析，你分析过吗？”。

library(MatrixEQTL)

# Genotype filename

SNP_file_name =paste("/data/liuqiang/mQTL/data_set/wgs/mqtl/HHJ/rs1176744.txt",sep="\t");

snps_location_file_name= paste("/data/liuqiang/mQTL/data_set/wgs/mqtl/HHJ/rs1176744loc.txt",sep="\t");

# Gene expressionfile name

expression_file_name= paste("/data/liuqiang/mQTL/data_set/wgs/mqtl/HHJ/GE_chr11.txt",sep="\t");

gene_location_file_name=paste("/data/liuqiang/mQTL/data_set/wgs/mqtl/HHJ/geneloc_chr11.txt",sep="\t");

# Covariates filename

covariates_file_name= paste("/data/liuqiang/mQTL/data_set/wgs/mqtl/HHJ/Covariates.txt",sep="\t");

此处一共输入5个文件，分别是SNP文件（注意转换为dosage格式）、SNP位置信息、表达\甲基化文件、表达\甲基化位置信息、协变量文件，需要注意的是，所有矩阵的样本顺序要保持一致。

# Output file name

output_file_name_cis= tempfile();

output_file_name_tra= tempfile();

# Onlyassociations significant at this level will be saved

pvOutputThreshold_cis= 5e-2;

pvOutputThreshold_tra= 1e-20;

此处是设定大家对cis-eQTL和trans-eQTL阈值的设定；

# Error covariancematrix

# Set to numeric()for identity.

errorCovariance =numeric();

# errorCovariance= read.table("Sample_Data/errorCovariance.txt");

# Distance forlocal gene-SNP pairs

cisDist = 2.5e5;

这里是设定cis-eQTL的作用范围，即以targetSNP为中心，上下游250kb范围，查看该范围内哪些基因的表达水平和target SNP显著相关。

## Load genotypedata

snps = SlicedData$new();

snps$fileDelimiter= "\t"; # the TABcharacter

snps$fileOmitCharacters= "NA"; # denote missing values;

snps$fileSkipRows= 1; # one row of column labels

snps$fileSkipColumns= 1; # one column of row labels

snps$fileSliceSize= 1000; # read file in slices of2,000 rows

snps$LoadFile(SNP_file_name);

## Load geneexpression data

gene =SlicedData$new();

gene$fileDelimiter= "\t"; # the TABcharacter

gene$fileOmitCharacters= "NA"; # denote missing values;

gene$fileSkipRows= 1; # one row of column labels

gene$fileSkipColumns= 1; # one column of row labels

gene$fileSliceSize= 2000; # read file in slices of2,000 rows

gene$LoadFile(expression_file_name);

## Load covariates

cvrt =SlicedData$new();

cvrt$fileDelimiter= "\t"; # the TABcharacter

cvrt$fileOmitCharacters= "NA"; # denote missing values;

cvrt$fileSkipRows= 1; # one row of column labels

cvrt$fileSkipColumns= 1; # one column of row labels

if(length(covariates_file_name)>0){

cvrt$LoadFile(covariates_file_name);

}

## Run theanalysis

snpspos =read.table(snps_location_file_name, header = TRUE, stringsAsFactors = FALSE);

genepos =read.table(gene_location_file_name, header = TRUE, stringsAsFactors = FALSE);

me = Matrix_eQTL_main(

snps = snps,

gene = gene,

cvrt = cvrt,

output_file_name = output_file_name_tra,

pvOutputThreshold = pvOutputThreshold_tra,

useModel = useModel,

errorCovariance = errorCovariance,

verbose = TRUE,

output_file_name.cis = output_file_name_cis,

pvOutputThreshold.cis =pvOutputThreshold_cis,

snpspos = snpspos,

genepos = genepos,

cisDist = cisDist,

pvalue.hist = "qqplot",

min.pv.by.genesnp = FALSE,

noFDRsaveMemory = FALSE);

unlink(output_file_name_tra);

unlink(output_file_name_cis);

## Results:

cat('Analysis donein: ', me$time.in.sec, ' seconds', '\n');

cat('Detectedlocal eQTLs:', '\n');

show(me$cis$eqtls)

cat('Detecteddistant eQTLs:', '\n');

show(me$trans$eqtls)

## Plot the Q-Qplot of local and distant p-values

plot(me)

write.csv(me$cis$eqtls,file='mQTL_250k.csv')

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。