Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use conifer for CNV Analysis of WES

2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "how to use conifer for CNV analysis of WES". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Like xhmm, conifer is a software that uses WES data to detect CNV. The difference is that xhmm uses PCA algorithm to achieve the purpose of noise reduction, while conifer uses SVD singular value decomposition algorithm to reduce noise, the corresponding article link is as follows

Https://genome.cshlp.org/content/early/2012/05/14/gr.138115.112.full.pdf

The official website is as follows

Http://conifer.sourceforge.net/index.html

The processing steps of the software are as follows

First of all, the sequencing depth of the target region is obtained by comparing the reference genome, and a special processing is made here. Using the quantitative algorithm in RNA_seq for reference, the RPKM value of each target region is calculated, and the matrix of RPKM value of each target region of all samples is obtained. Then the matrix is standardized to calculate the z-score value, and the converted value is called ZRPKM.

The ZRPKM matrix is decomposed by SVD, and the submatrix with large singular value is considered as system noise. The SVD-ZRPKM matrix is reconstructed after removing the submatrix with large singular value, and then the CNV region is predicted by threshold calling algorithm, as shown below.

The green line indicates the threshold. Those greater than 1.5 are considered to be duplication, and those less than 1.5 are considered to be deletion. The specific steps are as follows

1. Create a target area file

First, based on the capture area of the chip, create a file corresponding to the destination area, as follows

Each row represents a captured target region, the first three columns correspond to the chromosome position of the target region, the fourth column corresponds to the corresponding gene name, and if not, it is empty. Different capture chips correspond to different destination areas, so you can write your own script and organize it into this format, which is called probes.txt in the following article.

two。 Calculate the RPKM value of the target area

For each sample, the RPKM value of the target area is calculated as follows

Python conifer.py rpkm\

-- probes probes.txt\

-- input sample1.bam\

-- output RPKM/sample1.rpkm.txt

The input file is the bam file generated by the comparison, and the biological duplicate output file is saved in the same directory.

3. Calculate the SVD-ZRPKM matrix

Read the rpkm values of all samples, perform SVD singular value decomposition, and construct the SVD-ZRPKM matrix, using the following methods

Python conifer.py analyze\

-- probes probes.txt\

-- rpkm_dir. / RPKM/\

-- output analysis.hdf5\

-- svd 6\

-- write_svals singular_values.txt\

-- plot_scree screeplot.png\

-- write_sd sd_values.txt

The output file is a file in hdf5 format, and the-- svd parameter specifies the topN component with the largest singular value to be removed, which needs to be adjusted dynamically.

4. CNV calling

To cnv calling, the usage is as follows

Python conifer.py call\

-- input analysis.hdf5\

-threshold 1.5

-- output calls.txt

The contents of the output file are shown below

5. Visualization

Visualize the CNV areas of interest, using the following

Python conifer.py plot\

-- input analysis.hdf5\

-region chr1:878657-889417\

-- output image.png\

-- sample sampleA.rpkm

The visualization results are as follows

Conifer is easy to use and suitable for detecting CNV above 1kb. The software requires CNV to span at least 3 exon regions, so a very short CNV cannot be detected.

This is the end of the content of "how to use conifer for CNV Analysis of WES". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report