Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use trim_galore for quality filtering of NGS data

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

How to use trim_galore to filter the quality of NGS data, I believe that many inexperienced people do not know what to do. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Cutadapt software can filter the quality of NGS data, FastQC software can check the quality distribution of NGS data, and trim_galore encapsulates the two software together to make it more convenient to use.

The software will process the data in the following four steps

1. Removal of low-quality bases at the 3 'end of reads

The sequencing data of illumina platform is usually of poor quality at the 3 'end. Trim_galore first filters out the low-quality bases at the 3 'end, essentially invoking cutadapt's quality filtering algorithm. The following figure shows the distribution of base mass before and after filtration.

It can be seen that after filtering out the low-quality bases, the overall quality of the sequence is significantly improved.

two。 Remove adapter sequence

After filtering out the low-quality bases, trim_galore calls cutadapt to look for the adapter sequence at the 3 'end of the reads and remove it. Normally, we need to specify the corresponding adapter sequence. If there is no specified transformation, trim_galore will automatically find the following three types of adapter

Illumina: AGATCGGAAGAGCSmall RNA: TGGAATTCTCGGNextera: CTGTCTCTTATA

The first 1 million sequences are read by default, and the 1 million sequences are used to determine which of the above three adapter belongs to, and then remove them. If you don't want the software to judge automatically, you can also specify the corresponding adapter type through the-- illumina,-- nextera,-- small_rna parameters.

3. Remove sequences that are too short in length

After the above two-step processing, it is possible that the remaining sequence length is very short, and this part of the short sequence will also be removed. By default, if the sequence length is less than 20bp, the sequence will be discarded.

4. Other filtering

For all input sequences, the above three steps are sure to be performed. In addition, trim_galore supports a number of other filtering measures to meet personalized needs.

The hardtrim5 parameter is used to remove bases from the 3 'end of the sequence, as shown below

Before: CCTAAGGAAACAAGTACACTCCACACATGCATA--hardtrim5 20: CCTAAGGAAACAAGTACACT

The sequence can be truncated to a fixed length through the hardtrim5 parameter. Correspondingly, there is a hardtrim3 parameter that removes the base from the 5 'end of the sequence, as shown below

Before: CAAATGTTATTTTTAAGAAAATGGAAAAT--hardtrim3 20: TTTTTAAGAAAATGGAAAAT

Software installation is also convenient, first of all, you need to make sure that cutadapt and fastqc software are installed and that the executable file is located in the path defined by the PAH environment variable. Then download the source code package of trim_galore and extract it. The code is as follows

Wget https://github.com/FelixKrueger/TrimGalore/archive/0.5.0.tar.gztar xzvf 0.5.0.tar.gz

There is an executable file called trim_galore in the installation directory of the software.

For single-ended sequencing data, the basic usage is as follows

Trim_galore-- quality 20-an AGATCGGAAGAGC-- length 20-o out_dir input.fq

For double-ended sequencing data, the basic usage is as follows

Trim_galore-- paired-- quality 20-an AGATCGGAAGAGC-a2 AGATCGGAAGAGC-- length 20-o out_dir R1.fq.gz R2.fq.gz after reading the above, have you mastered how to use trim_galore to filter the quality of NGS data? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report