In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
How to interpret the results of GSEA analysis in detail, in view of this problem, this article introduces the corresponding analysis and answers in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.
When interpreting the results of traditional enrichment analysis, it is often questioned that under an enriched pathway, there are both up-regulated and down-regulated differential genes, so what is the overall form of expression of this pathway? is it inhibited or activated? Or, to put it more bluntly, did the level of gene expression in this pathway rise or decrease after experimental treatment?
Here I would like to express my point of view, in the traditional enrichment analysis, we only need a list of differential genes and do not care whether the differential genes are up-regulated or down-regulated at all. This is because the traditional enrichment analysis does not need to consider the changing trend of gene expression at all, and the core of its algorithm is only concerned about whether the distribution of these differential genes is consistent with that obtained by random sampling, even in the later stage of visualization, we marked the up-and down-down genes with different colors on the road map. However, due to the lack of effective statistical methods to analyze the overall change trend of all differential genes in this pathway, the results of traditional enrichment analysis can not answer the above questions.
Of course, some people came up with a solution in a flash. In the traditional enrichment analysis, only the up-regulated or down-regulated differential genes were extracted for analysis, because the differential genes were screened according to the change trend of expression in advance. thus avoiding the above question. In my opinion, this is biased, because Fisher's accurate test is to prove that my list of differential genes is not randomly sampled. Our prior filtering of the list of differential genes has interfered with the randomness of the results, and the accuracy of the final conclusion has been greatly reduced.
Imagine that up-regulated and down-regulated genes are enriched separately and then enriched to the same pathway. how do you explain this? So in my opinion, the traditional enrichment analysis can only locate the function, which functions these differential genes are related to, and can not answer the initial question. To answer the initial question, we need the results of the GSEA enrichment method.
Or this schematic diagram, the input of GSEA is a gene expression matrix, in which the samples are divided into two groups An and B. first, all genes are sequenced, and the sequencing criteria are also mentioned in the previous article, which is simply understood as foldchange, which is used to express the changing trend of gene expression between the two groups. The top of the sequenced gene list can be seen as up-regulated differential genes and down-regulated differential genes at the bottom.
GSEA analyzes whether all genes in a gene set are enriched at the top or bottom of the sequenced list. If they are enriched at the top, we can say that, on the whole, the gene set has an upward trend, whereas if it is enriched at the bottom, it is a downtrend.
After understanding this point of view, let's take a look at the results of GSEA enrichment analysis. Because there are so many results, a summary html page is given. For the enrichment result, it is divided into two parts according to whether it is up-regulated or down-regulated, corresponding to two groups. The example is as follows
On the whole, the gene set enriched in each group was highly expressed in this group. Click enrichment results in html to view the enrichment results on the web page. Examples are as follows
GS is the name of the gene set, SIZE represents the total number of genes in the gene set, ES represents the normalized Enrichment score,NOM p-val represents pvalue, represents the credibility of the enrichment results, and FDR Q-val` represents qvalue, which is the p value corrected by multiple hypothesis tests. Note that GSEA uses pvalue < 5%, qvalue < 25% to filter the results.
Click GS DESC to jump to the detailed results page of each gene set. The example is as follows
First of all, there is a summary result. Upregulated in class shows that the gene set is highly expressed in the MUT group, and other information is the same as before. In addition, there is a detailed table. The example is as follows.
Detailed statistical information is given for each gene in the gene set, RANK IN GENE LIST represents the position of the gene in the list of sequencing numbers, RANK METRIC SCORE represents the value of the amount of sequencing of the gene, such as the foldchange value, RUNNIG ES represents the cumulative Enrichment score, and CORE ENRICHMENT represents whether it belongs to the core gene, that is, the gene that has made a major contribution to the Enerchment score of the gene set. The data in this table corresponds to the following figure
It is divided into three parts, the first part is the line chart of the gene Enrichment Score, the horizontal axis is each gene under the gene, and the vertical axis is the corresponding Running ES. There is a peak in the line chart, the peak is the Enrichemnt score of this gene set, and the gene before the peak is the core gene under the gene set.
The second part is hit, which marks the genes under the gene set with lines, and the third part is the rank distribution map of all genes, which uses the Signal2Noise algorithm by default, corresponding to the title of the vertical axis.
As can be seen from the picture, this gene set is highly expressed in the MUT group. Here is an example of high expression in another group.
It can be seen that the Enrichment score values are all negative, and the corresponding gene on the right side of the peak is the core gene under the gene set. In addition, there is a heat map, as shown below
This heat map shows the distribution of the expression of genes under the gene set in all samples, where each column represents a sample. Each line represents a gene, the gene expression from low to high, and the color transition from blue to red.
In the overall html page, the following information is also given
Dataset details gives the total number of genes, and Gene Set details gives the information of the gene set. Note that the software first filters the gene set according to the number of genes contained in the gene set, with a minimum of 15 and a maximum of 500. the remaining 168 gene sets are used for analysis.
Gene markers gives the list of genes after sequencing and the corresponding statistic rank ordered gene list. According to the statistics of sequencing, the genes are divided into two parts, and the corresponding genes are highly expressed in each group.
Heatmap and gene list contains a heat map of all gene expression and a map of the distribution of ranking values, as shown below
The heat map intercepts parts of the heat map because there are too many genes, and the distribution map of the ranking values is actually the third part of the Enrichment plot of each gene set.
This is the answer to the question on how to interpret the results of GSEA analysis in detail. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.