How to use lumpy for CNV Detection 07/12 Update SLTechnology News&Howtos

How to use lumpy for CNV Detection

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article focuses on "how to use lumpy for CNV detection", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to use lumpy for CNV detection.

Based on genome-wide data analysis of CNV, there are four classic strategies.

Read-pair

Split-read

Read-depth

Assembly

Each algorithm has its advantages and disadvantages, the comprehensive use of a variety of strategies to help improve the sensitivity of detection, lumpy is such a software, a collection of read-pair,split-read,read-depth, and other strategies to predict CNV, the article is linked as follows

Https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-6-r84

The source code is saved on github at the following URL

Https://github.com/arq5x/lumpy-sv

The pipeline of the analysis is as follows

As shown in figure A, for a single sample, four signals of read-pair, split-read, read-depth and known CNV loci are integrated to predict CNV; as shown in figure B. for multiple samples, the signals of multiple samples are combined to predict CNV.

Lumpy's framework is very flexible and scalable, and can take the results of other analysis software as input, such as the output of cnvnator as a signal of the known CNV. In the article, lumpy is compared with other software, and the results are as follows

Under different sequencing depths, the sensitivity of lumpy is higher than that of other software, and the false positive rate is the lowest.

The steps for CNV detection using lumpy are as follows

1. Mapping

It is recommended to use the bwa-mem algorithm to align the two-terminal sequence to the reference genome. in order to speed up the operation, the samblaster software is used for markduplicate. The usage is as follows.

Bwa mem\

-R "@ RG\ tID:id\ tSM:sample\ tLB:lib"\

Hg19_bwa_index\

Sample_R1.fastq.gz sample_R2.fastq.gz\

| | samblaster-excludeDups\ |

-- addMateTags\

-- maxSplitCount 2\

-- minNonOverlap 20\

| | samtools view-Sb-> sample.bam |

In order to save disk space, the file in bam format is finally generated.

2. Extract discordant paired-end alignments

Discordant reads refers to the distance between R1 and R2 that exceeds the desired length of the insert or the reads that is aligned to different strands. For more information, please see the following link

Https://www.biostars.org/p/278412/

These reads alignments may be caused by structural variation of the genome, so many structural mutation software will analyze this part of the reads. The extracted code is as follows.

Samtools view-b-F 1294\

Sample.bam\

> sample.discordants.unsorted.bam

This is equivalent to extracting a subset of the original bam file.

3. Extract split-reads alignments

Split-reads refers to the single-ended reads that covers the breakpoint, and these reads can be correctly split into subreads according to the breakpoint than to the reference genome. In the installation directory of the software, there is a script called extractSplitReads_BwaMem that is used to extract split-reads. The usage is as follows

Samtools view-h sample.bam\

| | scripts/extractSplitReads_BwaMem-I stdin\ |

| | samtools view-Sb -\ |

> sample.splitters.unsorted.bam4. Sort bams

The software requires that the input bam file must be a sorted file, so sort the extracted two child bam, using the following

Samtools sort\

Sample.discordants.unsorted.bam\

Sample.discordants

Samtools sort\

Sample.splitters.unsorted.bam\

Sample.splitters5. Run lumpy

Lumpyexpress is a wrapper script for lumpy, which is more convenient to use. The basic usage is as follows

Lumpyexpress\

-B sample.bam\

-S sample.splitters.bam\

-D sample.discordants.bam\

-O sample.vcf6. Genotype

For the detected CNV, the software svtyper can be used to predict the typing results in the sample, as follows

Svtyper\

-B sample.bam\

-S sample.splitters.bam\

-I sample.vcf

> sample.gt.vcf

Lumpy software is very sensitive and can perform well for low-depth genome-wide data. This only shows the basic usage. For more usage, please refer to the official documentation.

At this point, I believe you have a deeper understanding of "how to use lumpy for CNV detection". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.