vcf文件是储存样本变异信息的文件,如果不采用joint calling的方式进行分析,最终会获得单个样本的变异数据。为了便于对同组不同样本进行差异SNP分析,就需要对文件进行合并。vcf文件的合并有很多的软件可以做,GATK、vcftools和bcftools三款软件最常用,但是具体的合并方法需要根据vcf文件中的信息来判断。
# GatherVcfs
gatk GatherVcfs -I concat-a.vcf -I concat-b.vcf -O combine_a_b_samesample_diffsites.vcf
# MergeVcfs
gatk MergeVcfs -I concat-a.vcf -I concat-b.vcf -O combine_a_b_diffsample_allsites_gatk.vcf
##fileformat=VCFv4.2
##FILTER=<ID=q10,Description="Quality below 10">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##contig=<ID=1,length=17540695>
##contig=<ID=2,length=14896646>
##contig=<ID=3,length=12399606>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A
# concat-a.vcf文件位点
1 100 . GTTT G 1806 q10 DP=35 GT:GQ:DP 0/1:409:35
1 110 . C T,G 1792 PASS DP=32 GT:GQ:DP 0/1:245:32
1 120 . GA G 628 q10 DP=21 GT:GQ:DP 1/1:21:21
1 130 . GAA GG 1016 PASS DP=22 GT:GQ:DP 0/1:212:22
1 140 . GT G 727 PASS DP=30 GT:GQ:DP 0/1:150:30
1 150 . TAAAA TA,T 246 PASS DP=10 GT:GQ:DP 1/2:12:10
2 100 . GTTT G 1806 q10 DP=35 GT:GQ:DP 0/1:409:352 110 . CAAA C 1792 PASS DP=32 GT:GQ:DP 0/1:245:32
2 120 . GA G 628 q10 DP=21 GT:GQ:DP 1/1:21:21
2 130 . GAA G 1016 PASS DP=22 GT:GQ:DP 0/1:212:22
2 140 . GT G 727 PASS DP=30 GT:GQ:DP 0/1:150:30
2 150 . TAAAA TA,T 246 PASS DP=10 GT:GQ:DP 1/2:12:10
2 160 . TAAAA TA,TC,T 246 PASS DP=10 GT:GQ:DP 0/2:12:10
# concat-b.vcf文件位点
3 241 . GTTT G 1806 q10 DP=35 GT:GQ:DP 0/1:409:35
3 251 . CAAA C 1792 PASS DP=32 GT:GQ:DP 0/1:245:32
3 261 . GA G 628 q10 DP=21 GT:GQ:DP 1/1:21:21
3 271 . GAA G 1016 PASS DP=22 GT:GQ:DP 0/1:212:22
/
vcftoo
ls/bin/vcf-concat concat-a.vcf concat-b.vcf > combine_a_b_samesample_diffsites_vcftools.vcf
-f
指定并进行后续操作。该命令合并完的vcf与gatk结果一致,只不过表头信息顺序会发生改变,不影响数据使用。bcftools concat concat-a.vcf concat-b.vcf -o combine_a_b_samesample_diffsites_bcftools.vcf
##bcftools_concatVersion=1.3.1+htslib-1.3.1
##bcftools_concatCommand=concat -o combine_a_b_samesample_diffsites_bcftools.vcf concat-a.vcf concat-b.vcf
bgzip merge-test-a.vcf.gz && tabix merge-test-a.vcf.gz
bgzip merge-test-b.vcf.gz && tabix merge-test-b.vcf.gz
/vcftools/bin/vcf-merge merge-test-a.vcf.gz merge-test-b.vcf.gz > combine_a_b_diffsamples_allsites_vcftools.vcf
bcftools merge merge-test-a.vcf.gz merge-test-b.vcf.gz -o combine_a_b_diffsamples_allsites_bcftools.vcf
该方法也需要预先对所有vcf文件进行压缩并创建索引,否则程序无法运行。
java -jar /GenomeAnalysisTK-3.8/GenomeAnalysisTK.jar -T CombineVariants -V merge-test-a.vcf -V merge-test-b.vcf -o combine_a_b_diffsample_allsites_gatk.vcf -R ref.fna
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A B
1 3062915 . GTTT G,GT 1806 q10;q20 AC=1,1;AF=0.250,0.250;AN=4;DP=49;set=FilteredInAll GT:DP:GQ 0/1:35:99 0/2:14:99
1 3106154 . CAAAA CA,C 1792 PASS AC=1,1;AF=0.250,0.250;AN=4;DP=47;set=Intersection GT:DP:GQ 0/1:32:99 0/2:15:99
1 3157410 . GA G 628 PASS AC=3;AF=0.750;AN=4;DP=32;set=filterInvariant-variant2 GT:DP:GQ 1/1:21:21 0/1:11:49
1 3162006 . GAA G 1016 PASS AC=2;AF=0.500;AN=4;DP=41;set=Intersection GT:DP:GQ 0/1:22:99 0/1:19:99
1 3177144 . GT G 727 PASS AC=2;AF=0.500;AN=4;DP=54;set=Intersection GT:DP:GQ 0/1:30:99 0/1:24:99
1 3184885 . TAAAA TA,T 246 PASS AC=2,1;AF=0.500,0.250;AN=4;DP=26;set=Intersection GT:DP:GQ 1/2:10:12 0/1:16:99
2 3188209 . GA G 162 . AC=1;AF=0.500;AN=2;DP=15;set=variant2 GT:DP:GQ ./. 0/1:15:99
2 3199812 . G GTT,GT 481 PASS AC=1,1;AF=0.500,0.500;AN=2;DP=26;set=variant GT:DP:GQ 1/2:26:99 ./.
3 3199812 . G GTT,GT 353 PASS AC=1,1;AF=0.500,0.500;AN=2;DP=19;set=variant2 GT:DP:GQ ./. 1/2:19:99
3 3199815 . G A 353 PASS AC=1;AF=0.500;AN=2;DP=19;set=variant2 GT:DP:GQ ./. 0/1:19:99
3 3212016 . CTT C,CT 565 PASS AC=1,1;AF=0.500,0.500;AN=2;DP=26;set=variant GT:DP:GQ 1/2:26:91 ./.
4 3212016 . CTT C 677 q20 AC=1;AF=0.500;AN=2;DP=15;set=FilteredInAll GT:DP:GQ ./. 0/1:15:99
4 3258448 . TACACACAC T 325 PASS AC=1;AF=0.500;AN=2;DP=31;set=variant GT:DP:GQ 0/1:31:99 ./.
# merge-test-a.vcf
1 3184885 . TAAAA TA,T 246 PASS DP=10 GT:GQ:DP 1/2:12:10
# merge-test-b.vcf
1 3184885 . TAAA T 598 PASS DP=16 GT:GQ:DP 0/1:435:16
# combine_a_b_diffsamples_allsites_vcftools.vcf
1 3184885 . TAAAA TA,T 422.00 PASS AC=2,1;AN=4;DP=26;SF=0,1 GT:DP:GQ 1/2:10:12 0/1:16:435
# combine_a_b_diffsamples_allsites_bcftools.vcf
1 3184885 . TAAAA TA,T 598 PASS DP=26 GT:GQ:DP 1/2:12:10 0/1:435:16
#如有错误,欢迎指正#
联系客服