Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use CNVkit for CNV Analysis

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "how to use CNVkit for CNV analysis". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let Xiaobian take you to learn "How to use CNVkit for CNV analysis"!

CNVkit is a CNV prediction software, suitable for CNV detection of whole exon, target region targeted sequencing and other data. The official website is as follows

https://cnvkit.readthedocs.io/en/stable/

Quoted by many high-scoring articles, the following is illustrated

Article published on PLos, link below

https://journals.plos.org/ploscompbiol/article? id=10.1371/journal.pcbi.1004873

Similar to Excavator, CNVKit also divides the genome into in-target and off-target parts. The flowchart is shown as follows

Divide the in-target and off-target regions into small bin intervals, calculate the sequencing depth in the bin interval, comprehensively consider GC content, size and distribution density of the target region, repeat elements and other factors, correct the original sequencing depth, and then calculate the log2 ratio relative to the control sample, divide the segment by segmentation algorithm, and support cbs, haar, flasso multiple segmentation algorithms.

CNVkit is developed in python, which is easy to use. It integrates visualization functions, can visually display analysis results, supports exporting result files in multiple formats, and can be well combined with downstream software. The modular development idea is adopted, and it is divided into independent modules according to functions, as shown below.

Each function module corresponds to a subcommand. For convenience of calling, the complete function of pipeline is encapsulated in batch subcommand, and the whole pipeline is run only through this command. The whole process of data analysis can be divided into the following parts

1. input area file

For targeted sequencing, it is necessary to input the file of the target region in the format of bed. The target subcommand is used to process the bed file of the target region. Information such as corresponding gene comments can be added. The usage is as follows

cnvkit.py target \

my_baits.bed \

--annotate refFlat.txt \

-o my_targets.bed

In addition to the target region in-target, we also need to calculate the off-set region, also known as antitarget. The in-target and off-target regions add up to all the coverable regions on the genome. Second test sequencing cannot reach 100% coverage, and the highly repetitive regions, telomeres, centromeres and other regions on the genome cannot be covered. Therefore, cnvkit calculates the regions that can be covered on the genome through the access subcommand. The command is as follows

cnvkit.py access \

hg19.fa \

-x excludes.bed \

-o access.hg19.bed

After calculating the coverage area, subtract the in-target area to get the off-target area, which is implemented by the antitarget subcommand. The code is as follows

cnvkit.py antitarget \

my_targets.bed \

-g access.hg19.bed \

-o my_antitargets.bed2. Calculate the sequencing depth of the sample

Both the coverage and autobin subcommands can be used to calculate sequencing depth. Take coverage as an example.

cnvkit.py coverage \

Sample.bam \

my_targets.bed \

-o Sample.targetcoverage.cnn

cnvkit.py coverage \

Sample.bam \

my_antitargets.bed \

-o Sample.antitargetcoverage.cnn

Count the sequencing depth information of target and antitarget regions respectively, and the suffix of output result is cnn, which is a format defined in cnvkit and is specially used to store sequencing depth information.

3. Construction of Sequencing Distribution Model of Normal Genome

The sequencing distribution model of normal genome was constructed by reference subcommand, and systematic errors such as GC content were corrected by using sequencing depth of control samples. When there are multiple control samples, you can combine all the control samples to create the following usage

cnvkit.py reference \

*coverage.cnn \

-f hg19.fa \

-o Reference.cnn

When there is no control sample, the software can simulate a normal sequencing depth distribution model, using the following method

cnvkit.py reference \

-o FlatReference.cnn \

-f hg19.fa \

-t my_targets.bed \

-a my_antitargets.bed4. Calculate log2 ratio of experimental sample to normal control

Calculate the log2 ratio using the fix subcommand as follows

cnvkit.py fix \

Sample.targetcoverage.cnn \

Sample.antitargetcoverage.cnn \

Reference.cnn \

-o Sample.cnr

The output suffix cnr is a format defined in cnvkit, which is used to store log2 ratio information.

5. segment, calculate copy number

Segment division is performed by the segment subcommand, and the usage is as follows

cnvkit.py segment \

Sample.cnr \

-o Sample.cns

Output results suffix cns, is a format defined in cnvkit, similar to SEG format, used to store CNV analysis results. Next, you can also calculate the absolute number of copies of each segment region through the call subcommand, as follows

cnvkit.py call \

Sample.cns \

-o Sample.call.cns

For detailed explanations of the various file formats, please refer to the following links

https://cnvkit.readthedocs.io/en/stable/fileformats.html

6. result visualization

The following three visual subcommands are provided

diagram

scatter

heatmap

diagram is used to show the distribution of CNV on chromosomes of a single sample. It is used as follows

cnvkit.py diagram \

-s Sample.cns \

Sample.cnr

The visualization results are as follows

The scatter subcommand displays the distribution of log2 ratio values over a single sample chromosome region, as follows

cnvkit.py scatter \

-s Sample.cns \

Sample.cnr

The visualization results are as follows

The heatmap subcommand displays CNV distributions for multiple samples, as follows

cnvkit.py heatmap *.cns

The visualization results are as follows

The distribution operation steps are cumbersome. The above code can be done by batch a subcommand. The usage is as follows.

cnvkit.py batch \

*Tumor.bam \

--normal *Normal.bam \

--targets my_baits.bed \

--annotate refFlat.txt \

--fasta hg19.fasta \

--access access.hg19.bed \

--output-reference my_reference.cnn \

--output-dir results/ \

--diagram --scatter At this point, I believe everyone has a deeper understanding of "how to use CNVkit for CNV analysis". Let's actually operate it! Here is the website, more related content can enter the relevant channels for inquiry, pay attention to us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report