Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use limma method of R language to analyze the differential expression of chip data

2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of "how to use the limma method of R language to analyze the differential expression of chip data". The editor shows you the operation process through an actual case, and the operation method is simple, fast and practical. I hope that this article "how to use the limma method of R language to analyze the differential expression of chip data" can help you solve the problem.

Data introduction and setup

In order to facilitate the demonstration, six sample data (GSE2600) of human promyelocytic leukemia cell line NB4 cells are selected here, and the input file for analysis is the downloaded expression matrix file. Before analysis, you need to ensure that limma is installed and loaded correctly, and the work path needs to be set.

Library ('limma') workdir= "F:/GEO/20180520" setwd (workdir)

Data processing.

1. Expression matrix

The data consists of six samples. After reading the data, you can use head () to simply check the data and so on.

> expreSet=read.csv2 ("GSE2600expressionMatrix.csv", header = T, row.names = 1check.names = F) > head (exprSet) 3) GSM49939 GSM49940 GSM49941 GSM49942 GSM49943 GSM499441007_s_at 23.0 13.8 26.5 75.9 94.9 84.61053_at 1449.9 1826.7 2242.8 1508.8 1523.0 2355.5117_at 109.2 71.5 106.7 128.8 84.1 79.6

For the expression matrix, you need to check its overall distribution. You can use boxplot () to draw the box distribution map. The expression matrix data downloaded by GEO are basically standardized data. From the distribution characteristics of the box chart, we can see that the data distribution of these samples is basically the same (median, upper quartile, lower quartile, etc.), as shown in the following figure:

N.sample = ncol (exprSet) cols = rainbow (n.sample) pdf (file=paste (workdir, "/", "Probe_expressionDistribution.pdf", sep= ""), width=24, height=18) par (cex = 0.7) if (n.sample > 40) par (cex = 0.5) boxplot (exprSet,col = cols, main = "expression", las = 2) dev.off ()

2. Grouping matrix

After confirming the expression matrix, you can group it by downloading the saved sample processing information, for example, the sample processing grouping here: CONTROL/INFECTED. After sorting out, the grouping information is roughly as follows, and a grouping matrix (design) is constructed based on the grouping information:

> group TreatmentGSM49939 CONTROLGSM49940 CONTROLGSM49941 CONTROLGSM49942 INFECTEDGSM49943 INFECTEDGSM49944 INFECTED > design = model.matrix (~ Treatment + 0, group) > colnames (design) = levels (as.factor (c ("CONTROL", "INFECTED")) > design CONTROL INFECTEDGSM49939 1 0GSM49940 1 0GSM49941 1 0GSM49942 0 1GSM49943 0 1GSM49944 0 1attr (, "assign") [1] 1 1attr (, "contrasts") attr ( "contrasts") $Treatment [1] "contr.treatment"

3. Difference comparison matrix

The difference comparison matrix (cont.matrix) is constructed based on the information of the grouping matrix. According to the display result of the difference comparison matrix, it is the difference analysis between INFECTED and CONTROL.

> cont.matrix = makeContrasts (INFECTED-CONTROL, levels=design) > cont.matrix ContrastsLevels INFECTED-CONTROL CONTROL-1 INFECTED 1

Differential expression analysis

Differential expression analysis is mainly based on lmFit (), eBayes (), topTable () to complete the analysis process, and extract the main results (tT).

> fit = lmFit (exprSet, design) > fit2 = contrasts.fit (fit, cont.matrix) > fit2 = eBayes (fit2, 0.01) > tT = topTable (fit2, adjust= "fdr", sort.by= "logFC", resort.by = "P", n=Inf) > tT = subset (tT, select=c ("adj.P.Val", "P.Value", "logFC")) > head (tT) 15) adj.P.Val P.Value logFC223020_at 0.99964 2.196175e-05 746.100001555758_a_at 0.99964 6.467722e-05-540.53333218676_s_at 0.99964 1.352768e-04-280.86667237249_at 0.99964 2.669173e-04-93.53333225100_at 0.99964 2.836527e-04-124.96667217825_s_at 0.99964 2.903446e-04-143.73333222099 _ s_at 0.99964 3.425427e-04 493.13333212634_at 0.99964 4.221452e-04-166.06667211499_s_at 0.99964 4.391776e-04-129.56667221098_x_at 0.99964 4.805746e-04 95.16667208974_x_at 0.99964 5.060448e-04 947.76667209670_at 0.99964 5.113338e-04 374.20000202088_at 0.99964 5.262646e-04-594.40000219394_at 0.99964 5.307063e-04-117.56667212221_x_at 0.99964 5.393084e-04 347.43333 on "how to use the limma method of R language to analyze the differential expression of chip data" ends here. Thank you for your reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report