In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article shares with you what are the factors affecting the quality of GWAS correlation analysis. Xiaobian thinks it is quite practical, so share it with you. I hope you can gain something after reading this article. Let's not say much. Let's take a look at it together with Xiaobian.
Factors Influencing GWAS Correlation Analysis
In 1996, the human genome had not yet been completed, the second generation of high-throughput sequencing had not yet been born, and sequencing was still an expensive technology, but there was already a prediction in Science that GWAS could be used to study complex human diseases [1]. Later, with the development of high-throughput sequencing technology, the use of GWAS for quantitative trait gene mapping in animal and plant fields began to prevail.
To date, GWAS of plants and animals has published several articles, mostly in the CNNS series of journals. Species that have been studied by GWAS include maize [2-10], rice [11-16], soybean [17-20], cattle [21], dog [22], sorghum [23, 24], tomato [25], Arabidopsis thaliana [26], sesame [27], Populus tomentosa [28], etc.
However, in the GWAS study, there are several factors that cannot be ignored. Let's get to know each other today!
What is GWAS?
Genome-wide Association Study (GWAS) refers to the identification of sequence variations in the whole genome, not limited to single nucleotide polymorphisms (SNPs), but also using InDel, CNV and other variation types to screen out variation sites related to target traits. Association analysis is a method of identifying relationships between molecular markers or between candidate genes and traits based on linkage disequilibrium.
What is linkage disequilibrium? Two loci A (alleles A and a) and B (alleles B and b) are considered genetically linked when they are located on the same chromosome or linkage group. The degree of linkage between loci was measured by recombination rate r. The recombination rate represents the probability of crossover between two linked loci during a meiosis. The so-called linkage equilibrium refers to the frequency of gamete genotypes equal to the product of allele frequencies. In random mating populations, it is customary to regard the deviation between the actual frequency of gamete genotypes and the frequency at equilibrium as the linkage disequilibrium degree, which is represented by D, for example: DAB=fAB-fAfB. The degree of linkage determines the size of linkage disequilibrium, that is, the closer the linkage, the higher the degree of linkage disequilibrium.
1 Rare variants and minor genes
GWAS analysis can detect genes that are common or candidate for a target trait. However, a trait can be controlled by rare large effect variants or by many common minor effects genes, both of which are difficult to study with GWAS [29, 30]. Because the mapping power of GWAS depends on the size of the phenotypic variation that can be explained by the corresponding marker (Figure 1a)[31], phenotypic variation depends on differences in the size of allelic effects and the frequency with which they occur in the sample. A relatively large population sample size is required for minor genes to achieve a certain detection power.
A. Assuming that a SNP can explain 5%, 10% and 20% of the phenotypic variation, the power and FDR values of different population sizes are simulated;b. The simulated causal SNP (red square) is not the most significant detection result. [31]
So how do you improve the detection of rare mutations or minor genes? Solutions include increasing sample size, genetic analysis and research only for target regions, increasing genetic diversity, and reducing genetic background noise. However, increasing the sample size may not completely solve the problem of rare variation. It is best to use continuous multiple markers as a whole marker for research. It may become a trend to use haplotypes as markers for GWAS research in the future. QTL mapping using family population might be better for rare variation.
2 Sample Size
Some traits are controlled by loci with large effects, and GWAS studies require low sample sizes to detect significant loci even below 100 [26]. For complex traits controlled by several minor genes, sample sizes of at least several thousand are required [32, 33]. From the published articles on plants and animals, the sample size of GWAS studies varies from 100 to 5000 [2-28]. From the previous section, we also saw that the effect of GWAS research must be better when the population is large. At the same time, LD will be relatively reduced when the population is large. When the marker density is sufficient, the mapping interval is small, which is conducive to gene cloning.
If we don't want to locate genes with small effect or low frequency, we can use GWASpower/QT software to assist in the selection of population size. By inputting parameters such as heritability and number of markers, we can calculate the population size required to achieve the expected detection power. As a rule of thumb, populations larger than 300 are generally recommended, and genetic variation between samples can be maximized by selecting lines with different geographical distributions and phenotypes, but genetic heterogeneity may also be introduced.
3 Genetic heterogeneity
Genetic heterogeneity refers to the phenomenon that a certain phenotype can be caused by different alleles or gene locus mutations. Genetic heterogeneity can be divided into allelic heterogeneity and locus heterogeneity.
Genetic heterogeneity reduces the power of variant detection because it weakens the association between phenotype and any variant, and genetic heterogeneity can cause non-causal markers to be more associated with phenotype. One solution is to introduce competitive variation as a cofactor in the mixed model; another is to increase the sample size in areas with high phenotypic diversity.
A. evolutionary tree, asterisks are recent mutations that cause phenotypic changes (red fruits);b. early blue mutations do not cause changes in pericarp color but are associated with them. [31]
4 Population structure
Population structure refers to significant differences in the frequency of the same allele among different subgroups. Mixed populations with different genetic structures can also produce imbalances.
Association analysis is a method of identifying relationships between molecular markers or between candidate genes and traits based on linkage disequilibrium. If the sample is from subgroups with different genetic structures, a mixed population will also account for linkage disequilibrium, but such disequilibrium between two loci is due to population structure and is false positive for genes associated with the target trait mapped by GWAS.
Some existing algorithms (such as mixed models) introduce population structure and genetic relationship as covariates to help solve the impact of population structure on GWAS positioning results [35], which can effectively reduce false positive associations.
Consider population structure to improve GWAS localization results. The five dashed vertical lines represent causal loci preset in the simulated data, each of which accounts for up to 10% of the phenotypic variation. a. General linear model results;b. Mixed linear model results. The former has more false positives and the latter has better results, but there is also one false positive and one false negative. [31]
Population structure greatly affects the results of GWAS analysis. Although several algorithms have been developed to eliminate the influence of population structure, some traits are closely linked to population structure, such as flowering stage of plants [6]. If population structure is controlled, the detection power of such traits will be reduced. Of course, we can control the population structure in the process of sample selection, for example, we analyzed the population genetic characteristics of indica rice and japonica rice at the same time, but because there are significant differences between japonica rice and indica rice, we only studied indica rice in GWAS analysis [11]. It is a good choice to use multi-population derivative population. Cornell University researchers constructed multi-parent NAM (Nested Association Mapping) population by crossing multiple parents with the same parent and continuously self-crossing. Since they have unified parents as genetic background, they break the influence of population structure [6]. Multi-parent derived population can combine the advantages and overcome the disadvantages of linkage analysis and association analysis, and is the best population type for QTL mapping. What are the advantages and disadvantages of linkage analysis and association analysis? Watch tomorrow's microblogging.
5 Insufficient marker density
For most phenotypes, PCR-based molecular markers such as SSR, SFLP, etc. and existing SNP typing chips may not contain all causal variations, which may mean that the marker density is insufficient to detect causal loci during GWAS analysis.
But because linkage exists, if there are markers on each LD block, then even if the number of markers is not particularly large, it can be used for GWAS analysis. However, with the development of sequencing technology, the acquisition of sample genome-wide data makes marker density and marker type no longer an issue. Genome-wide SNPs, InDel and CNV can be used as markers for GWAS research.
The above is what factors affect the quality of GWAS correlation analysis. Xiaobian believes that some knowledge points may be seen or used in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.