How to use EXCAVATOR2 to detect the CNV of WES 04/18 Update SLTechnology News&Howtos

How to use EXCAVATOR2 to detect the CNV of WES

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

How to use EXCAVATOR2 to detect the CNV of WES. In view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

Excavator2 is a software that uses WES data for CNV analysis. Other similar software usually only focuses on the captured exon region, but the software extends to divide the captured area into two parts: exon and non-exon regions. When correcting the distribution of sequencing depth, these two areas are processed separately. The corresponding article is published on Nucleic Acids Research. The link is as follows.

Https://academic.oup.com/nar/article/44/20/e154/2607979

The source code of the software is saved on sourceforge with the following link

Https://sourceforge.net/projects/excavator2tool/

Excavator2 divides reads into the following two parts when calculating the sequencing depth

In-target reads

Off-target reads

In-target represents a sequence located on exon, and off-target represents a sequence located in an intergenic region or intron region. It also uses a sliding window to calculate the sequencing depth of each region, but with a slight change. The full name is as follows.

Mean windows read count

Referred to as WMRC, the calculation formula is as follows

A single exon is directly used as a window, while the non-exon region uses a fixed-length window to calculate the sequencing depth of different regions separately and correct them, taking into account the GC content, mappability in different regions, exon size and other factors.

Using the normalized sequencing depth, the log2 ratio values of control samples and experimental samples are calculated, and then the HSLM segmentation algorithm is used to divide the segment. Finally, the copy number of each segment is predicted by the FastCall algorithm, which is subdivided into the following five types.

Two-copy deletion

One-copy deletion

Normal

One-copy duplication

Multiple-copy duplicaiton

The software supports both hg19 and hg38 versions, with the corresponding database built in, as shown below

The software is divided into three modules, corresponding to three scripts, the specific operation steps are as follows

1. TargetPerla.pl

Provide a bed file for the capture area, calculate the GC content of the in-target and off-target regions, and the mappabilityvalue for subsequent normalization operations, using the following

Perl TargetPerla.pl\

SourceTarget.txt\

MyTarget.bed\

MyTarget_w50000\

50000\

Hg19

The first parameter is the source target file, which records the path of the bw file and fasta file corresponding to the genome, as shown below

/ data/ucsc.hg19.bw / data/hg19.fasta

The first column is the path of the bw file, which is included in the software and is located in the installation directory of the software. It is used to calculate the mappability of different regions of the genome, and the second column is the path of the fasta file, which is used to calculate the GC content of different regions.

The second parameter is the bed file of the capture area, the third parameter is the prefix of the output, the fourth parameter is the fixed length of the window, and the fifth parameter specifies the version of the genome.

This step is similar to building an index of the reference genome during alignment. A chip can be created once, and after a successful run, a folder is generated with the prefix MyTarget_w50000.

2. EXCAVATORDataPrepare.pl

The sequencing depth is calculated and normalized, using the following methods

Perl EXCAVATORDataPrepare.pl\

ExperimentalFile.txt\

-- processors 6\

-- target MyTarget_w50000\

-- assembly hg19

The first parameter is a space-separated txt file, which specifies the bam file corresponding to the sample, the path of the output result, and the sample name information, as shown below.

The-- processors specifies the number of parallel threads, the-- target parameter specifies the name of the target generated in the first step, and-- assembly specifies the version of the reference genome.

3. EXCAVATORDataAnalysis.pl

Execute HSLM segmentation algorithm and FastCall algorithm, do CNV analysis, and use as follows

Perl EXCAVATORDataAnalysis.pl\

ExperimentalFileAnalysis.txt\

-- processors 6\

-- target MyTarget_w50000\

-- assembly hg19\

-- output Results_MyProject_w50K\

-- mode pooling

The mode parameter indicates how the samples are compared, and supports both pooling and paired modes. The first mode compares all the experimental samples with the control sample, and the second mode is a paired sample mode, such as cancer and adjacent cancer, to compare each other to calculate the log2 value.

The first parameter, ExperimentalFileAnalysis.txt, is a space-delimited txt file that specifies the sample comparison operation. For pooling mode, the content is as follows

For paired mode, the content is as follows

T stands for Treat, C for Control, and the following numbers are used to distinguish between different samples.

-- output specifies the directory of the output result, and the txt, VCF and other files corresponding to the CNV area are provided in the output result, as well as visual results, as shown below

This is the answer to the question about how to use EXCAVATOR2 to detect the CNV of WES. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.