In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article focuses on "how to use lumpy for CNV detection", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to use lumpy for CNV detection.
Based on genome-wide data analysis of CNV, there are four classic strategies.
Read-pair
Split-read
Read-depth
Assembly
Each algorithm has its advantages and disadvantages, the comprehensive use of a variety of strategies to help improve the sensitivity of detection, lumpy is such a software, a collection of read-pair,split-read,read-depth, and other strategies to predict CNV, the article is linked as follows
Https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-6-r84
The source code is saved on github at the following URL
Https://github.com/arq5x/lumpy-sv
The pipeline of the analysis is as follows
As shown in figure A, for a single sample, four signals of read-pair, split-read, read-depth and known CNV loci are integrated to predict CNV; as shown in figure B. for multiple samples, the signals of multiple samples are combined to predict CNV.
Lumpy's framework is very flexible and scalable, and can take the results of other analysis software as input, such as the output of cnvnator as a signal of the known CNV. In the article, lumpy is compared with other software, and the results are as follows
Under different sequencing depths, the sensitivity of lumpy is higher than that of other software, and the false positive rate is the lowest.
The steps for CNV detection using lumpy are as follows
1. Mapping
It is recommended to use the bwa-mem algorithm to align the two-terminal sequence to the reference genome. in order to speed up the operation, the samblaster software is used for markduplicate. The usage is as follows.
Bwa mem\
-R "@ RG\ tID:id\ tSM:sample\ tLB:lib"\
Hg19_bwa_index\
Sample_R1.fastq.gz sample_R2.fastq.gz\
| | samblaster-excludeDups\ |
-- addMateTags\
-- maxSplitCount 2\
-- minNonOverlap 20\
| | samtools view-Sb-> sample.bam |
In order to save disk space, the file in bam format is finally generated.
2. Extract discordant paired-end alignments
Discordant reads refers to the distance between R1 and R2 that exceeds the desired length of the insert or the reads that is aligned to different strands. For more information, please see the following link
Https://www.biostars.org/p/278412/
These reads alignments may be caused by structural variation of the genome, so many structural mutation software will analyze this part of the reads. The extracted code is as follows.
Samtools view-b-F 1294\
Sample.bam\
> sample.discordants.unsorted.bam
This is equivalent to extracting a subset of the original bam file.
3. Extract split-reads alignments
Split-reads refers to the single-ended reads that covers the breakpoint, and these reads can be correctly split into subreads according to the breakpoint than to the reference genome. In the installation directory of the software, there is a script called extractSplitReads_BwaMem that is used to extract split-reads. The usage is as follows
Samtools view-h sample.bam\
| | scripts/extractSplitReads_BwaMem-I stdin\ |
| | samtools view-Sb -\ |
> sample.splitters.unsorted.bam4. Sort bams
The software requires that the input bam file must be a sorted file, so sort the extracted two child bam, using the following
Samtools sort\
Sample.discordants.unsorted.bam\
Sample.discordants
Samtools sort\
Sample.splitters.unsorted.bam\
Sample.splitters5. Run lumpy
Lumpyexpress is a wrapper script for lumpy, which is more convenient to use. The basic usage is as follows
Lumpyexpress\
-B sample.bam\
-S sample.splitters.bam\
-D sample.discordants.bam\
-O sample.vcf6. Genotype
For the detected CNV, the software svtyper can be used to predict the typing results in the sample, as follows
Svtyper\
-B sample.bam\
-S sample.splitters.bam\
-I sample.vcf
> sample.gt.vcf
Lumpy software is very sensitive and can perform well for low-depth genome-wide data. This only shows the basic usage. For more usage, please refer to the official documentation.
At this point, I believe you have a deeper understanding of "how to use lumpy for CNV detection". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.