Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the method of using GSEA software?

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

What is the method of using GSEA software? for this question, this article introduces the corresponding analysis and answer in detail, hoping to help more partners who want to solve this problem to find a more simple and easy way.

Gene Set Enrichment Analysis is an enrichment algorithm proposed by scientists at the Broad Institute Institute. The core of the algorithm is as follows

Two input elements are required, one is the sorted gene list, the sorting rule here is to show the differences between the two groups, such as sorting according to the value of Foldchange, and the second is the annotation set of genes, then run the KS test to calculate Enrichment Score (ES), and use the replacement test to evaluate the reliability of ES.

Scientists at the Broad Institute Institute also provide the corresponding analysis software GSEA, which is a graphical interface software developed in java language. It is easy to use and can be downloaded at the following address.

Http://software.broadinstitute.org/gsea/downloads.jsp

The official website provides a variety of download methods. It is recommended to download the jar file directly, as shown below.

As shown below, to run GSEA analysis, you need two basic elements. the first is the expression profile data, which can be chip data or the quantitative results of rna-seq, and the second is the gene set database. The official website provides a MSigDB database for human, of course, you can also define the gene set yourself.

In practice, the first step is to import data. There are four types of data that need to be imported. Since operating on the windows platform, the file format is identified by a specific suffix.

1. Expression datasets

The expression file, which can be a chip or a quantitative result of rna-seq, is suffixed with gct, as shown below

The file is a\ t-separated plain text file. The first line always represents the version, the second line represents the dimension of the expression matrix, the first value corresponds to the number of probes / genes, the second value represents the number of samples, the third row is the header of the expression matrix, the first two columns are fixed NAME and Description, NAME is gene ID or probe ID, must be unique, Description represents description information, if there is no It can be populated with na, with a sample for each subsequent column.

2. Phenotype labels

The grouping file of the sample, with the suffix cls, is shown below

The first behavior space or three values separated by\ t, the first value represents the total number of samples, the second value represents the number of packets corresponding to the sample, and the third value is always 1.

The second line starts with # and specifies the names of different groups; each field in the third line represents a sample in the same order as the samples in the expression file, except that the sample name is represented by the corresponding grouping name.

3. Gene sets

Geneset files are available in a variety of formats, including gmt and gmx. Gmt is shown below

Each row represents a gene set, the first column is the name of the gene set, it must be unique, the second column is the description information, if not, it is filled with na, and the following columns are listed as the genes under the collection, and each column is separated by\ t. The gmt format is as follows

In contrast to gmt, each column in gmt represents a set of genes, the name of the first behavior gene set must be unique, the second behavior description information, if not filled with na, other behaviors of the genes under the set.

4. Chip annotation

When chip data is provided, you can import a file of type chip, which stores the correspondence between the probe and the gene, with the suffix chip, as shown below

The first is the probe ID, the header is Probe_Set_ID, the second is the gene corresponding to the probe, the header is Gene Symbol, the third is the description of the probe, and na is used to fill it without it.

Through Load Data, first import the above files into the software, and then click the Run GSEA menu to select the corresponding files.

Phenotype labels is used to specify the order of comparisons between groups and to identify which group is the control group.

As mentioned above, GSEA needs two input elements, a sorted gene list and a gene collection. When the expression data and grouping information are imported, GSEA will automatically calculate the grouped difference value, and then sort the genes according to this difference value. The following statistics are supported.

1. Signal2noise

2. T-Test

3. Ratio_of_class

4. Diff_of_class

5. Log2_ratio_of_class

The default algorithm is signal2noise, which can be adjusted in Basic fields. This parameter can be indicated in ``as follows

When all the parameters are set, click the Run button below to run.

The answer to the question about the use of GSEA software is shared here. I hope the above content can be of some help to everyone. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report