In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
How to carry out GWAS model analysis, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.
Introduction of GWAS Model
Genome-wide association analysis (Genome wide association study,GWAS) is to detect the genetic variation (marker) polymorphism of multiple individuals in the whole genome, obtain genotypes, and then make a statistical analysis of genotypes and observable traits, that is, phenotypes, at the population level, screen out the genetic variation (markers) that are most likely to affect this trait according to statistics or significant p value, and mine the genes related to trait variation.
GWAS is an alternative to traditional parental population mapping, and is widely used in plants, animals, model species and human beings. Compared with traditional QTL mapping, GWAS has the advantages of higher resolution, a wide range of research materials, abundant captured variations, and no need to build genetic populations to save time.
GWAS is an alternative to traditional parental population mapping, and is widely used in plants, animals, model species and human beings. Compared with traditional QTL mapping, GWAS has the advantages of higher resolution, a wide range of research materials, abundant captured variations, and no need to build genetic populations to save time.
Introduction of GWAS Analysis Model
GWAS analysis generally builds a regression model to test whether there is a correlation between markers and phenotypes. The zero hypothesis in GWAS (H0 null hypothesis) is that the regression coefficient of the marker is zero, and the marker has no effect on the phenotype. Alternative hypothesis (H1, also known as opposite hypothesis, Alternative Hypothesis) is that the regression coefficient of the marker is not zero, and SNP is related to phenotype. There are two main types of models in GWAS:
General linear model GLM (General Linear Model): y = X α + Z β + e mixed linear model MLM (Mixed Linear Model): y = X α + Z β + W μ + ey: phenotypic traits to be studied; X α: fixed effect (Fixed Effect), other factors affecting y, mainly refers to population structure; Z β: marker effect (Marker Effect SNP); W μ: random effect (RandomEffect), here generally refers to individual genetic relationship. E: residual error
There are two problems to be solved in GWAS analysis. One is that with the increasing amount of sequencing data, computing speed has become an important problem affecting GWAS analysis. The second is whether the accuracy of statistics can be increased. As a result, many other models have been developed. see the figure below, where the river represents the continuous development of the GWAS analysis method, from the Q model in the upper corner to the Blink,GWAS analysis method at the bottom.
The GWAS model is described in detail:
General linear model GLM: genotype x and phenotype y were directly fitted by regression. You can also join the group results to control false positives.
In the mixed linear model MLM:GLM model, if the two phenotypes are very different, but the population itself contains other genetic differences (such as regions, etc.), then those genetic differences that are not related to the phenotype will also affect the correlation. The MLM model can set the influence of population structure as covariance and correct this locus. In addition, the common ancestral relationship between materials will also lead to non-linkage correlation, which can be corrected by adding genetic relationship matrix as a random effect.
With the development of the second generation sequencing technology, genotyping becomes easier and easier, and the number of samples and markers used for association analysis is increasing. The time spent in solving the original MLM model can be expressed by mpn3 (m is the number of markers, p is the number of iterations in the solution process, n is the number of samples). It can be seen that with the increase of the sample size, the computing time increases to the third power of the sample at each iteration step. This makes the calculation time very long. In order to solve this problem, Zhang et al proposed P3D (population parameters previously determined) and compressed mixed linear model (compressed MLM, CMLM), and integrated these two methods into TASSEL software, which greatly improved the computational efficiency and detection efficiency. P3D reduces the number of repeated calculation of variance components, and CMLM reduces the number of samples actually involved in the calculation through clustering. Considering that the combination of eight clustering methods and three inter-group genetic relationship algorithms may get different results, the optimal compression mixed linear model (enriched CMLM, ECMLM) to detect the optimal combination is proposed and integrated into the GAPIT software.
CMLM compressed mixed linear model: the correction of MLM is too strict, and some real related SNP markers will be filtered out, so the purpose of the CMLM model is to re-detect those false negative SNP markers.
Which SNP should be chosen by SUPER:CMLM to calculate the kinship matrix, the answer is that it is best to use all phenotypic SNP (and excluding the detected SNP) to construct the kinship matrix, which is SUPER (Settlement of Kinship Under Progressively Exclusive Relationship, step-by-step exclusive kinship solution).
The bottleneck of FarmCPU:GWAS is the speed of calculation and the accuracy of statistics. FarmCPU can improve the speed and accuracy, firstly, convert the random effect genetic relation matrix (Kinship) into the fixed effect correlation SNP matrix (S matrix / QTNs matrix), so that the calculation speed is greatly accelerated; then use the QTN matrix as a covariable to re-do correlation analysis to improve the accuracy. Blink:Blink is an advanced version of FarmCPU, but also to improve speed and accuracy. First use the above GLM model to obtain QTNs, and then use the right GLM to use QTNs as a covariable for SNP detection. The obtained SNP determines the information of QTNs according to the LD information (selecting the corresponding bin size according to the actual position of the chromosome), and then uses the GLM on the left to test the accuracy of QTNs with BIC (Bayesianinformation criterion) strategy, excluding the part of hypothesis error, keeping the real QTNs, and cycling this process constantly. Until all associated SNP (that is, QTNs) are detected.
Other models:
Kang et al. proposed the EMMA model by reducing the number of variance components to be estimated and simplifying the process of matrix inverse operation. on this basis, by avoiding repeated estimation of polygene variance and error variance, the EMMAX algorithm was proposed, and the EMMAX software was developed to further improve the speed of calculation, but because the ratio of polygene variance and error variance is fixed, both EMMA and EMMAX belong to approximate algorithms. The GEMMA algorithm proposed by Zhou is an accurate algorithm of EMMA.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.