Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the principle of XHMM analysis?

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about the principle of XHMM analysis, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

XHMM is a software that uses WES data to analyze CNV. PCA dimensionality reduction is used to normalize the sequence depth information of exons, and then hidden Markov model is used to predict CNV. The pipeline of this software is as follows

It can be divided into four big steps.

1. Compare reference genomes

The sequencing reads was compared to the reference genome, and the original sequencing depth of the exon region was calculated. The core of CNV prediction is to model through the correlation between sequencing depth and cnv, so it is necessary to ensure that the sequencing depth here is consistent with the real DNA copy number, and PCR duplication needs to be removed.

It is officially recommended to use the preprocessing process in GATK best practices, and you can also add a MAPQ filter to filter alignemnts with a MAPQ greater than 20 to get bam files that can be used for downstream analysis.

two。 Normalized sequencing depth

Calculate the average sequencing depth of each exon of each sample, and get a matrix of exon average sequencing depth, as shown below.

Each behavior has a sample, each is listed as an eoxn region, and the corresponding value is the average sequencing depth of the exon region in the sample.

Before normalization, we can preprocess the matrix, that is, filter the sample or target area. For the target area, remove the target area with GC content less than 0.1 or greater than 0.9, remove the target region containing more than 10% of the low complexity sequence, and filter according to the sequencing depth to remove the target area that is too low or too high, such as removing the target area whose sequencing depth is less than 5x; for samples, you can analyze the distribution of the sequencing depth and remove the outlier samples.

The purpose of preprocessing is to ensure the uniformity of the sequencing depth distribution of the samples for subsequent analysis and to reduce the deviation between samples. After pre-processing, normalization can be carried out. Considering the influence of system errors such as PCR bias, chip capture and mapping accuracy caused by GC content, PCA algorithm is used to remove system noise and get the normalized sequencing depth.

The effect is shown in the following figure

The left side is the original sequencing depth, and the right side is the normalized sequencing depth, each line represents the sequencing depth value of a sample, the gray area represents the normal diploid, and the green area indicates an increase in copy number. After normalization, the distinction between the two is more significant.

3. Constructing Hidden Markov Model

Taking into account the proportion of CNV distribution in the whole genome, length, distance between exon and other factors, a hidden Markov model is constructed, and the chromosome regions are divided into the following three types.

Diploid

Deletion

Duplication

The first represents a normal copy number, 2 copies, and the corresponding sequencing depth is the average, that is, baseline, the second represents deletion, less than 2 copies, the sequencing depth is lower than the average, and the third represents repetition, greater than 2 copies, and the sequencing depth is higher than the average.

The transition probability matrix between the three states in the hidden Markov model is as follows

4. CNV calling

After the model is trained, for each sample, the copy number state of the chromosome region is analyzed by Viterbi algorithm, so as to detect CNV.

After reading the above, do you have any further understanding of the principle of XHMM analysis? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report