In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
How to understand the message conveyed by Manhattan plot and QQ plot in GWAS, many novices are not very clear about this. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.
In the GWAS study, Manhattan plot and QQ plot are the two most common types of maps. They can clearly show the loci that are significantly related to the traits studied (for example, genotype and height). Many readers should know how to draw such a map, but I don't think everyone can really know the truth.
Manhattan plot (Manhattan chart) is relatively simple. After GWAS analysis, the p-value of all SNP loci is drawn from left to right on the whole genome. And, in order to express the results more intuitively, p-value is usually converted to-log10 (p-value). In this way, the height of the locus-log10 (p-value) on the Y axis corresponds to the degree of association with phenotypic traits or diseases, and the stronger the association (i.e., the lower the p-value), the higher. Moreover, generally speaking, due to the linkage disequilibrium (LD) relationship, those SNP around the strong association sites will show similar signal intensity and decrease in turn to both sides. For this reason, we will see neat signal peaks on the Manhattan map (such as the red part of the picture below). And the location of these peaks is generally the real concern of the whole study.
Manhattan Chart in the GWAS study
In the GWAS study, the p-value threshold is generally below the 10 ^-6 power or even below the 10 ^-8 power, which means that the SNP loci in the Manhattan chart where the Y axis is greater than 6 or even greater than 8 are worth studying, but there is nothing absolute, and sometimes it depends on your actual data performance.
In addition, the name Manhattan map actually comes from the night light reflection on the river of high-rise buildings in Manhattan, New York (pictured below).
Night view of Manhattan
Although QMQ plot (QQ chart) uses the same data as the above Manhattan chart, it expresses much more information than the Manhattan chart, and what can better reflect the quality of GWAS results in these two diagrams is QQ plot--, which is a more important quality control chart in GWAS research. This is what I mainly discuss in this article.
In fact, QQ plot has always been a common graph in statistical analysis. In 1968, this article (doi:10.1093/biomet/55.1.1) of Wilk.M.B proposed how to draw such a graph and its use.
QQ plot, whose full name is quantile-quantile plot, is a probability graph method that compares two probability distributions by comparing the quantiles of two probability distributions (commonly used in statistics). The reason for this is that if two probability distributions are the same, then their quantiles should also be the same or overlap on the same straight line.
In GWAS analysis, when we see a strong correlation signal between SNP and phenotypic traits (or disease) through the Manhattan map (for example, p-value < 10 ^-6 or even 10 ^-8), we still can not directly consider that these loci are significantly related to phenotype. This is because mutations in gene loci on the genome usually come from two sources:
The first is natural selection (Selection). By natural selection, I mean not only the natural selection described by Darwin in the Theory of Evolution, but also all the "forces" that affect the adaptability of species, such as high radiation environment, disease, virus and so on. This is also the mutation that we are really concerned about in GWAS research.
The second is genetic drift (genetics drift), which is a relatively random genomic mutation with a large number, although it is also an important force in species evolution, but because its mutations are relatively random, it is considered that it is not necessarily related to the changes of the environment, but it will also show its role in the population when some random mutations bring survival advantages. However, in the vast majority of cases, it is not considered that they play a significant role in traits that already exist stably in the population, so GWAS studies do not care about this kind of mutations, and we should eliminate them all. If you find that all the results you get are such variations, you should reconsider how to redesign the analysis, including whether you should increase the sample size and find ways to eliminate technical errors and interfering factors. or maybe they have nothing to do with each other.
The existence of genetic drift of strong correlation signals brings us a problem on GWAS, that is, it is impossible to directly identify and rule them out (often it is difficult to find them directly in the Manhattan map), and you can't even judge whether your research is just full of this kind of invalid information. So the question now is, what should we do to effectively determine whether the association results obtained in this study are indeed related to phenotypic traits or diseases?
This is where we're going to use QQ-plot. In GWAS analysis, the vertical axis of QQ-plot is the p-value value of SNP site (this is the actual result, observed), which is also represented as-log10 (p-value) like the Manhattan chart, while the horizontal axis is the probability value of uniform distribution (this is the result of Expecte), which is also converted to-log10. How is this probability value of the horizontal axis calculated? In fact, it is the quantile of a uniform distribution-as to why a uniform distribution should be used instead of other distributions, which I will elaborate on in the next paragraph. The number of quantiles corresponds to the number of SNP quantiles studied by GWAS. For example, if we use 5 million loci in our study, then the number of quantiles is also 5 million, from 1 pm 5000000Magne2 to 5000000and3According to 50000000. All the way down to 50, 000, 000
QQ diagrams studied by GWAS
After we get the QQ plot, what if we use it to work together to determine whether our GWAS results are good or bad?
Strictly speaking, it should not be described as good or bad, but whether it is related to phenotypic traits.
The secret of judgment lies in why the horizontal axis should be uniformly distributed instead of choosing other distributions. This is because the uniform distribution can be used to approximately describe the random drift in the genome. If phenotypic traits are not really influenced by natural selection, then you should see that the distribution and uniform distribution of GWAS p-value will be concentrated in a straight line, and if not, then you should be able to see separation from each other, especially when the lower the p-value, the higher the degree of segregation, and the QQ-plot will warp up (this is because the zero hypothesis of GWAS is no different from random mutations).
Moreover, we know that there must be random drift on the genome, so there must be loci associated with random drift, especially when the sites with larger p-value seem to overlap with random drift, as shown in the first half of QQ-plot. The distribution of this point will overlap with the uniform distribution! Moreover, a better result is that when p-value < 10 ^-3, the GWAS results begin to rapidly separate from the uniform distribution-that is, the power of natural selection is clearly shown, so that the results quickly get rid of randomness in the population, and finally see a high QQ-plot. At this point, it can be concluded that there is a significant correlation between phenotype and genotype in natural selection.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.