How to carry out CNV Analysis of whole Genome data 02/13 Update SLTechnology News&Howtos

How to carry out CNV Analysis of whole Genome data

2026-02-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

The purpose of this article is to share with you about how to carry out CNV analysis of whole-genome data. The editor thinks it is very practical, so I share it with you for study. I hope you can get something after reading this article.

In addition to using aCGH and snp chips to detect CNV, NGS data can also be used to analyze CNV, such as whole genome and whole exon sequencing. For the detection of whole-genome CNV, a sequencing strategy called CNV_seq is also developed, which refers to low-depth whole-genome sequencing, which only needs 5X sequencing depth to effectively detect CNV.

According to the basic principles of the software, it can be divided into the following four categories, as shown below.

1. Read-Pair (RP)

RP is the earliest algorithm, which uses double-terminal sequencing to insert fragment length distribution to detect CNV, also known as PEM,pair end mapping method. The length distribution of double-ended sequencing inserts is shown in the following figure.

When the length of the insert is too long or too short, it represents a structural variation in the genome, such as the two thresholds in the figure above, as shown below.

The above two pictures are from the literature Jan O. Korbel et al.Science 318,420 (2007)

When the calculated insert length is less than cutoff I, it shows that some bases are inserted in the corresponding region of the actual detection sample compared with reference, on the contrary, if the calculated insert length is greater than cutoff D, it means that the corresponding region of the actual detection sample inserts part of the base compared with reference.

Affected by the length of sequence reading, this method is suitable for detecting medium-length insertion and deletion, is insensitive to small insertions, and depends on the accuracy of alignment, so it is unable to analyze low-complexity segmental duplication regions.

The list of some software using this strategy is as follows

BreakDancer

PEMer

Ulysses

2. Split-read (SR)

The SR method uses the reads that can be compared at one end and mismatched at the other end to identify the CNV. The other end of the alignment may be the existence of CNV, by splitting a separate reads, so that it can be correctly compared to the reference genome, the split point is the breakpoint of CNV.

Only single-ended reasd is used, and the read length is further limited, so this method is only suitable for detecting small-scale insertions and deletions. Some of the software using this strategy are listed below.

Pindel

PRISM

SVseq2

Gustaf

3. Read-Depth (RD)

The RD method uses the correlation between the copy number and the sequencing depth of the corresponding region to analyze. The basic model is that the sequencing depth of the missing region is relatively low, while the sequencing depth of the insertion region is relatively high. The algorithm uses a sliding window to count the sequencing depth distribution in each window, and then predicts the CNV region according to the sequencing depth distribution of different windows, as shown below.

The above picture is from the literature Genome Res. 2011. 21: 974-984

Similar to the log ratio value in the chip, in the RD method, the corresponding number of CNV is determined according to the sequence depth of the region. In this kind of method, the size of the sliding window has a great influence on the result. When the window is very large, some short small cnv signals will be masked.

Compared with RP and SR, RD can do CNV typing to determine the number of CNV, RP and SR can only detect the location of breakpoints, and RD can detect large-scale CNV, which is the mainstream algorithm at present. The list of some software using this strategy is as follows

CNVnator

ERDS

ReadDepth

CNVrd2

4. Assembly (AS)

The AS method uses the short sequence obtained by sequencing to assemble, and compares the assembled contig with the reference genome to determine the region of structural variation. The accuracy of the assembly depends on the length of the sequence reading and the accuracy of the algorithm, and the assembly consumes a lot of hardware resources, so it is not an ideal algorithm for CNV detection.

The above four are the most basic algorithm concepts, there are many software will integrate some of these algorithms to detect CNV, such as the lumpy software integrated in speedseq, the comprehensive use of RP,SR, RD to detect CNV.

The accuracy of comparison is the premise of the accurate results of strategy detection based on NGS. The accuracy of mapping and the coverage of the genome of the second generation sequencing will affect the detection results of CNV. At the same time, the deviation of PCR amplification caused by the difference in GC content in the calculation of sequencing depth also needs to be corrected. By setting control samples, it can effectively reduce the interference of system errors and better detect CNV.

To sum up, each algorithm has its own advantages and disadvantages, the comprehensive use of multiple strategies will help to improve the accuracy and sensitivity of detection results, while setting control samples can more effectively analyze the change of copy number.

The above is how to carry out CNV analysis of genome-wide data, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.