split_libraries.py
-m
-f
-q
-r
-l
-L
-t
-s
-k
-B
-b
-e
-c
-a
-H
-M
-o
-n
--retain_unassigned_reads
-w
Enable sliding window test of quality scores.If the average score of a continuous set of w nucleotides fallsbelow the threshold (see -s for default), the sequence isdiscarded. A good value would be 50. 0 (zero) means no filtering.Must pass a .qual file (see -q parameter) if this functionality isenabled. Default behavior for this function is to truncate thesequence at the beginning of the poor quality window, and test forminimal length (-l parameter) of the resulting sequence. [default:0]
-g, --discard_bad_windows
If the qual_score_window option (-w) isenabled, this will override the default truncation behavior anddiscard any sequences where a bad window is found. [default:False]
-p, --disable_primers
Disable primer usage when demultiplexing.Should be enabled for unusual circumstances, such as analyzingSanger sequence data generated with different primers.[default: False]
-z, --reverse_primers
Enable removal of the reverse primer and anysubsequence sequence from the end of each read. To enable this,there has to be a “ReversePrimer” column in the mapping file.Primers a required to be in IUPAC format and written in the 5’ to3’ direction. Valid options are ‘disable’, ‘truncate_only’, and‘truncate_remove’. ‘truncate_only’ will remove the primer andsubsequent sequence data from the output read and will not alteroutput of sequences where the primer cannot be found.‘truncate_remove’ will flag sequences where the primer cannot befound to not be written and will record the quantity of such failedsequences in the log file. [default: disable]
--reverse_primer_mismatches
Set number of allowed mismatches for reverseprimers (option -z). [default: 0]
-d, --record_qual_scores
Enables recording of quality scores for allsequences that are recorded. If this option is enabled, a filenamed seqs_filtered.qual will be created in the output directory,and will contain the same sequence IDs in the seqs.fna file andsequence quality scores matching the bases present in the seqs.fnafile. [default: False]
-i, --median_length_filtering
Disables minimum and maximum sequence lengthfiltering, and instead calculates the median sequence length andfilters the sequences based upon the number of median absolutedeviations specified by this parameter. Any sequences with lengthsoutside the number of deviations will be removed. [default:None]
-j, --added_demultiplex_field
Use -j to add a field to use in the mappingfile as an additional demultiplexing option to the barcode. Allcombinations of barcodes and the values in these fields must beunique. The fields must contain values that can be parsed from thefasta labels such as “plate=R_2008_12_09”. In this case, “plate”would be the column header and “R_2008_12_09” would be the fielddata (minus quotes) in the mapping file. To use the run prefix fromthe fasta label, such as “>FLP3FBN01ELBSX”, where “FLP3FBN01” isgenerated from the run ID, use “-j run_prefix” and set the runprefix to be used as the data under the column headerr“run_prefix”. [default: None]
-x, --truncate_ambi_bases
Enable to truncate at the first “N” characterencountered in the sequences. This will disable testing forambiguous bases (-a option) [default: False]
生成文件:
.fna
histograms.txt包含了特殊长度的序列的数目
split_library_log.txt
1,如果是好几个样品,只要他们Map文件中barcode不一样,可以这么来:
split_libraries.py -mMapping_File.txt -f 1.TCA.454Reads.fna,2.TCA.454Reads.fna -q1.TCA.454Reads.qual,2.TCA.454Reads.qual -oSplit_Library_Output_comma_separated/
也可以直接将所有序列合并后再来处理
2,如果是双端测序,来自两个测序。比如说同一个barcode的几个不同测序结果中编号一样,如果都用同一个barcode,导致的结果就是不同测序中的片段被划分了同一个编号。
split_libraries.py -mMapping_File.txt -f 1.TCA.454Reads.fna -q 1.TCA.454Reads.qual -oSplit_Library_Run1_Output/
split_libraries.py -mMapping_File.txt -f 2.TCA.454Reads.fna -q 2.TCA.454Reads.qual -oSplit_Library_Run2_Output/ -n 2000000
catSplit_Library_Run1_Output/seqs.fnaSplit_Library_Run2_Output/seqs.fna >Combined_seqs.fna
-n后面接着起始序列编号,这个数值应该大于打一个脚本中序列数之和
参考资料:
http://qiime.org/scripts/split_libraries.html
联系客服