Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use PCA to correct Group stratification in GWAS Analysis

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Editor to share with you how to use PCA to correct group stratification in GWAS analysis, I believe most people don't know much about it, so share this article for your reference. I hope you can learn a lot after reading this article. Let's learn about it together.

GWAS looks for SNP loci associated with disease by analyzing the differences between case/control groups. However, there may be some differences between case and control groups, which will affect the detection of association analysis.

Population stratification, called group stratification, is the most common source of differences, which means that the samples of the case/control group come from different ancestral groups, and the typing results are naturally different. The purpose of GWAS analysis is to find the differences caused by diseases. Other differences belong to systematic errors and need to be corrected in the analysis.

For the correction of group stratification, the principal component analysis method, namely PCA, is usually adopted. The corresponding article is published on nature genetics with the following link

Https://www.nature.com/articles/ng1847

The core processing is shown in the following figure

The matrix corresponding to the typing result was analyzed by PCA, and the behavioral SNP loci in the matrix were listed as samples, and the typing result was 0 ~ 1 ~ 2. 0 means no mutation, 1 means heterozygous mutation, 2 means homozygous mutation. After PCA analysis, the corresponding position of each sample on the principal component axis such as PC1,PC2 can be obtained.

PCA essentially belongs to sorting analysis, and samples close to each other have similar attributes. According to the position information obtained after PCA, you can draw a scatter plot as shown below.

Each point in the image above represents a sample, and the information used in the drawing is the location of these samples on the PC1 and PC2 axes. Such a scatter diagram can directly show the stratification of the samples, and some of the samples that significantly deviate from the population can be removed and re-analyzed. In the subsequent GWAS analysis, the position information on the PC axis can be corrected as covariates in the regression analysis.

In this paper, the function of PCA analysis for typing results is packaged into a software named EIGENSTRAT. The URL of github is as follows.

Https://github.com/chrchang/eigensoft/tree/master/EIGENSTRAT

The software supports automatic removal of outlier samples, showing the proportion of principal components and many other functions, but the disadvantage is that the execution speed is relatively slow. For PCA in GWAS, the core information is actually the position information of the sample on each principal component axis, which we need for subsequent correction.

In the face of the classification result of GWAS scale, the running speed is a very important factor. For this reason, the following two kinds of software are often used in practice.

1. Plink

The usage is as follows

Plink\

-- bfile sample\

-pca-out pca2. GCTA

The usage is as follows

Gcta64\

-- bfile sample\

-- make-grm\

-- thread-num 5\

-- out gcta

Gcta64\

-- grm gcta\

-- pca 20\

-- thread-num 5\

-- out pca

Although the two output results are not exactly the same, the trend of distribution is the same. The difference is that GCTA supports multithreading and runs faster. The output result has multiple files, and the core is a file with the suffix eigenvec, which saves the position information of the sample on each principal component axis and can be used for subsequent correction.

These two softwares run fast, but there is a disadvantage that they will not output the proportion of each principal component. If you want this information, you can consider R packages with similar functions, such as vcfR,SNPRelate,bigsnpr.

These are all the contents of the article "how to use PCA to correct Group stratification in GWAS Analysis". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report