In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "how to use CNVkit for CNV analysis". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let Xiaobian take you to learn "How to use CNVkit for CNV analysis"!
CNVkit is a CNV prediction software, suitable for CNV detection of whole exon, target region targeted sequencing and other data. The official website is as follows
https://cnvkit.readthedocs.io/en/stable/
Quoted by many high-scoring articles, the following is illustrated
Article published on PLos, link below
https://journals.plos.org/ploscompbiol/article? id=10.1371/journal.pcbi.1004873
Similar to Excavator, CNVKit also divides the genome into in-target and off-target parts. The flowchart is shown as follows
Divide the in-target and off-target regions into small bin intervals, calculate the sequencing depth in the bin interval, comprehensively consider GC content, size and distribution density of the target region, repeat elements and other factors, correct the original sequencing depth, and then calculate the log2 ratio relative to the control sample, divide the segment by segmentation algorithm, and support cbs, haar, flasso multiple segmentation algorithms.
CNVkit is developed in python, which is easy to use. It integrates visualization functions, can visually display analysis results, supports exporting result files in multiple formats, and can be well combined with downstream software. The modular development idea is adopted, and it is divided into independent modules according to functions, as shown below.
Each function module corresponds to a subcommand. For convenience of calling, the complete function of pipeline is encapsulated in batch subcommand, and the whole pipeline is run only through this command. The whole process of data analysis can be divided into the following parts
1. input area file
For targeted sequencing, it is necessary to input the file of the target region in the format of bed. The target subcommand is used to process the bed file of the target region. Information such as corresponding gene comments can be added. The usage is as follows
cnvkit.py target \
my_baits.bed \
--annotate refFlat.txt \
-o my_targets.bed
In addition to the target region in-target, we also need to calculate the off-set region, also known as antitarget. The in-target and off-target regions add up to all the coverable regions on the genome. Second test sequencing cannot reach 100% coverage, and the highly repetitive regions, telomeres, centromeres and other regions on the genome cannot be covered. Therefore, cnvkit calculates the regions that can be covered on the genome through the access subcommand. The command is as follows
cnvkit.py access \
hg19.fa \
-x excludes.bed \
-o access.hg19.bed
After calculating the coverage area, subtract the in-target area to get the off-target area, which is implemented by the antitarget subcommand. The code is as follows
cnvkit.py antitarget \
my_targets.bed \
-g access.hg19.bed \
-o my_antitargets.bed2. Calculate the sequencing depth of the sample
Both the coverage and autobin subcommands can be used to calculate sequencing depth. Take coverage as an example.
cnvkit.py coverage \
Sample.bam \
my_targets.bed \
-o Sample.targetcoverage.cnn
cnvkit.py coverage \
Sample.bam \
my_antitargets.bed \
-o Sample.antitargetcoverage.cnn
Count the sequencing depth information of target and antitarget regions respectively, and the suffix of output result is cnn, which is a format defined in cnvkit and is specially used to store sequencing depth information.
3. Construction of Sequencing Distribution Model of Normal Genome
The sequencing distribution model of normal genome was constructed by reference subcommand, and systematic errors such as GC content were corrected by using sequencing depth of control samples. When there are multiple control samples, you can combine all the control samples to create the following usage
cnvkit.py reference \
*coverage.cnn \
-f hg19.fa \
-o Reference.cnn
When there is no control sample, the software can simulate a normal sequencing depth distribution model, using the following method
cnvkit.py reference \
-o FlatReference.cnn \
-f hg19.fa \
-t my_targets.bed \
-a my_antitargets.bed4. Calculate log2 ratio of experimental sample to normal control
Calculate the log2 ratio using the fix subcommand as follows
cnvkit.py fix \
Sample.targetcoverage.cnn \
Sample.antitargetcoverage.cnn \
Reference.cnn \
-o Sample.cnr
The output suffix cnr is a format defined in cnvkit, which is used to store log2 ratio information.
5. segment, calculate copy number
Segment division is performed by the segment subcommand, and the usage is as follows
cnvkit.py segment \
Sample.cnr \
-o Sample.cns
Output results suffix cns, is a format defined in cnvkit, similar to SEG format, used to store CNV analysis results. Next, you can also calculate the absolute number of copies of each segment region through the call subcommand, as follows
cnvkit.py call \
Sample.cns \
-o Sample.call.cns
For detailed explanations of the various file formats, please refer to the following links
https://cnvkit.readthedocs.io/en/stable/fileformats.html
6. result visualization
The following three visual subcommands are provided
diagram
scatter
heatmap
diagram is used to show the distribution of CNV on chromosomes of a single sample. It is used as follows
cnvkit.py diagram \
-s Sample.cns \
Sample.cnr
The visualization results are as follows
The scatter subcommand displays the distribution of log2 ratio values over a single sample chromosome region, as follows
cnvkit.py scatter \
-s Sample.cns \
Sample.cnr
The visualization results are as follows
The heatmap subcommand displays CNV distributions for multiple samples, as follows
cnvkit.py heatmap *.cns
The visualization results are as follows
The distribution operation steps are cumbersome. The above code can be done by batch a subcommand. The usage is as follows.
cnvkit.py batch \
*Tumor.bam \
--normal *Normal.bam \
--targets my_baits.bed \
--annotate refFlat.txt \
--fasta hg19.fasta \
--access access.hg19.bed \
--output-reference my_reference.cnn \
--output-dir results/ \
--diagram --scatter At this point, I believe everyone has a deeper understanding of "how to use CNVkit for CNV analysis". Let's actually operate it! Here is the website, more related content can enter the relevant channels for inquiry, pay attention to us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.