In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
What this article shares with you is about how to use Trimmomatic to filter the quality of NGS data. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.
Trimmomatic software can filter the quality of NGS sequencing data, and its function of removing adapter is only aimed at the sequence of illumina, identifying the adapter sequence from the 3 'end of reads and removing it, which is less flexible than cutadapt. However, when filtering low-quality sequences, the sliding window algorithm is adopted. given the window length and step size, if the average quality value of all bases in the window is lower than the threshold, the window and its subsequent bases are removed. For reads with a large amount of data, the sliding window algorithm runs faster than cutadapt's algorithm. The official website is as follows
Http://www.usadellab.org/cms/?page=trimmomatic
The software is developed in java language and can be downloaded directly from packaged jar files. The latest version is v0.38, and the official website provides a compressed package of binaries, as shown below.
After downloading, unzip it. The software can perform the following operations on sequences
1. Remove adapter sequence
When removing adapter, you need to specify a file in fasta format, which contains the corresponding adapter sequence. Several common illumina adapter sequence files are built into the software, which are listed in detail as follows
NexteraPE-PE.fa
TruSeq2-PE.fa
TruSeq2-SE.fa
TruSeq3-PE.fa
TruSeq-3-PE-2.fa
TruSeq-3-SE.fa
Of course, you can also customize the adapter sequence file. Specify the file of the adapter sequence through the ILLUMINACLIP parameter, written as follows
ILLUMINACLIP:TruSeq2-PE.fa:2:30:10
TruSeq2-PE.fa means to find the adapter sequence provided by the file. When searching, first execute a seed match, that is, only look for the first few bases of adapter in the sequence. If you can't find the first few bases, there is no need to find the following bases. You can speed up the operation through seed match, and 2 represents the maximum number of mismatches allowed when carrying out seed match. When the seed match is satisfied, trimmomatic will compare the full length of the adapter sequence with the input sequence to identify the adapter sequence. At this time, there are two modes: palindromeClip mode allows you to find the reverse complementary sequence of adapter sequence. For example, in double-terminal sequencing, the R2 terminal sequence will contain the reverse complementary sequence of 5'- terminal adapter sequence, 30 indicates at least the number of bases that need to be matched in this mode, and the other mode is called SimpleClip mode, which only considers the provided adapter sequence and does not consider reverse complementarity. 10 indicates at least the number of bases that need to be matched in this mode.
two。 Remove low quality sequences
Trimmomatic uses a sliding window to remove low-quality sequences. It needs to specify the size of the sliding window and the threshold of the average quality, which is specified by the SLIDINGWINDOW parameter, as follows.
SLIDINGWINDOW:4:15
The first number 4 indicates that the size of the sliding window is 4bp, and the second number 15 represents a base quality threshold of 15. If the average quality value of the four bases in the window is less than 15, the window and subsequent sequences will be removed.
3. Removal of low-quality bases at the 5 'end of reads
Specify the threshold through the LEADING parameter, written as follows
LEADING:3
If the base quality value at the head of the sequence is less than 3, the base is removed.
4. Removal of low-quality bases at the 3 'end of reads
Specify the threshold through the TAILING parameter, written as follows
TRAILING:3
If the base quality value at the tail of the sequence is less than 3, the base is removed.
5. Remove a base of a specified length from the head of the sequence
Specify the length through the HEADCROP parameter, written as follows
HEADCROP:5
It means to cut off five bases from the beginning of each sequence.
6. Cut the sequence to a specified length
Specify the length through the CROP parameter, written as follows
CROP:120
Indicates that all sequences are truncated to the length of 120bp.
7. Remove sequences with too short length
The threshold of the length is specified by the MINLEN parameter, written as follows
MINLEN:36
If the sequence length is less than 36bp, the sequence is removed.
You can selectively perform the above steps according to your own needs, and the order in which the parameters are defined specifies the order in which each step is performed.
For single-ended sequencing data, the basic usage is as follows
Java-jar trimmomatic-0.38.jar SE-phred33 input.fq.gz output.fq.gzILLUMINACLIP:TruSeq3-SE:2:30:10LEADING:3TRAILING:3SLIDINGWINDOW:4:15MINLEN:36
For double-ended sequencing data, the basic usage is as follows
Java-jar trimmomatic-0.38.jar PE-phred33input_forward.fq.gzinput_reverse.fq.gzoutput_forward_paired.fq.gzoutput_forward_unpaired.fq.gzoutput_reverse_paired.fq.gzoutput_reverse_unpaired.fq.gzILLUMINACLIP:TruSeq3-PE.fa:2:30:10LEADING:3TRAILING:3SLIDINGWINDOW:4:15MINLEN:36 above is how to use Trimmomatic to filter the quality of NGS data. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.