Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use GEOquery and Biobase to download multiple data of GEO database

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to use GEOquery and combined with Biobase to download a variety of GEO database data, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

GEO database is a gene expression database developed by NCBI, which mainly receives gene expression data obtained by high-throughput sequencing and gene chip, which makes it convenient for people to use other people's data to post articles.

The first step of GEO data mining is to download the data, but if you enter the site to download, there will be a lot of query and search work, and the downloaded data may not be able to understand. Is there any way to solve this problem? Of course there is-- R package GEOquery! The following is aimed at the chip data, teach you to use GEOquery package to complete the download work.

GEO data

Before downloading, you need to understand the four types of data stored in the GEO database: GSE, GDS, GSM, and GPL.

A GSE Accession corresponds to a series of data from the entire research project, which may involve different platforms.

A GDS Accession corresponds to a dataset of the same platform

A GSM Accession corresponds to the data information of a single sample. It can only be data of a single platform. Often, GSE and GDS contain multiple GSM data.

A GPL Accession corresponds to a platform message.

R package installation and loading

GEOquery

# # try http:// if https:// URLs are not supportedsource ("https://bioconductor.org/biocLite.R")biocLite("GEOquery")"

Biobase

# # try http:// if https:// URLs are not supportedsource ("https://bioconductor.org/biocLite.R")biocLite("Biobase")"

Load correctly

Library ('Biobase') library (' GEOquery') setwd ("F:/GEO") # you can set the path if necessary

Using GSE Accession

By reading the literature to find the GSE Accession of interest and downloading the corresponding expression data and platform information, we can use the getGEO () function in GEOquery to download series_matrix.txt. For example, GSE70213:

> gse = getGEO ("GSE70213", GSEMatrix = TRUE, destdir = ".", getGPL = T, AnnotGPL = T) # destdir sets the current directory, getGPL and AnnotGPL both set TRUE, and you can download and obtain platform comment files

Gse is list data, and the corresponding GSM is a single platform, then length is 1. Then use exprs (), pData () and fData () in the Biobase package to obtain information such as expression data, sample processing grouping, and design notes of the chip platform. You can also use the annotation () function to understand the corresponding GPL Accession, such as the exprs () function:

> exprSet=exprs (gse [[1]]) > head (exprSet) 2) GSM1720833 GSM1720834 GSM1720835 GSM1720836 GSM1720837 GSM1720838 GSM1720839 GSM1720840 GSM1720841 GSM172084210338001 2041.40800 2200.86100 2323.7600 3216.26300 2362.77500 2195.31800 2013.35900 2146.25800 1785.9460 2067.0410010338002 63.78059 65.08438 58.3082 75.86145 66.95605 43.81526 49.11361 51.29279 48.9604 42.14286 GSM1720843 GSM1720844 GSM1720845 GSM1720846 GSM1720847 GSM1720848 GSM1720849 GSM1720850 GSM1720851 GSM172085210338001 1769.1150 1720.77400 1847.42900 2214.69800 2279.51500 2530.45600 2303. 26400 2358.83400 1701.40000 1970.9240010338002 42.5472 43.48373 64.34628 59.75188 57.48852 60.26423 54.81179 53.70885 57.86877 57.02808 GSM1720853 GSM1720854 GSM1720855 GSM172085610338001 1822.78600 2014.26000 1737.84200 2001.7340010338002 59.26121 55.27306 54.36722 49.43959

The acquisition of annotation information can correspond to the probe and gene, which is convenient for follow-up analysis. Data obtained through exprs (), pData (), and fData () can be saved in files such as write.table.

Using GDS Accession

GDS data can also be downloaded from soft files using the getGEO () function. For example, GDS5881:

> gds = getGEO ("GDS5881", GSEMatrix = TRUE, destdir = ".", getGPL = T, AnnotGPL = T) # destdir sets the current directory

Gds can use Table () in the GEOquery package to get expression data and Meta () to get description information, where Meta (gds) $platform can get GPL Accession.

> exprSet=Table (gds) > head (exprSet,1) ID_REF IDENTIFIER GSM1720845 GSM1720846 GSM1720847 GSM1720848 GSM1720849 GSM1720850 GSM1720851 GSM1720852 GSM17208531 10344614 Gm2889 48.4971 47.252 39.3331 49.9048 36.8313 41.9501 37.5569 38.1924 46.0668 GSM1720854 GSM1720855 GSM17208561 34.689 38.5762 32.2618 > Meta (gset) $platform [1] "GPL6246"

For the gds--GDS data returned by getGEO, GDS2Set () and GDS2MA () in the GEOquery package can be transformed into ExpressionSets and limma MALists.

> gds2eSet=GDS2eSet (gds) > MA=GDS2MA (gds)

For the returned gds2eSet, the expression data, sample processing grouping information and design annotation information of the chip platform can also be obtained by using exprs (), pData () and fData (). A large amount of description information is involved in the returned MA, among which MA$tragets is also the sample processing information.

Using GSM Accession

Using GSM Accession to download is a single sample of expression data, such as GSM1720833:

> gsm = getGEO ("GSM1720833", GSEMatrix = TRUE, destdir = ".", getGPL = T, AnnotGPL = T) for gsm, the expression data is also obtained by using Table () in the GEOquery package, and the description information is obtained by Meta (), while the corresponding GSE Accession and GPL Accsesion are obtained using Meta (gsm) $series_id and Meta (gsm) $platform_id.

Using GPL Accession

For the chip platform, the data downloaded by GPL Accession is the design and annotation information of the chip, the corresponding relationship between probe groups and genes can be obtained, and the Table () function can be used to display annotation, such as GPL6246:

> gpl = getGEO ("GPL6246", GSEMatrix = TRUE, destdir = ".", getGPL = T, AnnotGPL = T) > ann=Table (gpl) > head (ann) 2) ID Gene title Gene symbol Gene ID UniGene title UniGene symbol UniGene ID1 10344614 predicted gene 2889 Gm2889 100040658 2 10344616 Nucleotide Title1 Mus musculus blastocyst blastocyst cDNA RIKEN full-length enriched library, clone:I1C0009C06 product:hypothetical DeoxyUTP pyrophosphatase/Aspartyl protease, retroviral-type family profile/Retrovirus capsid, C-terminal/Peptidase aspartic/Peptidase aspartic, active site containing protein, full insert sequence///Mus musculus blastocyst blastocyst cDNA, RIKEN full-length enriched library, clone:I1C0042P10 product:hypothetical protein Full insert sequence2 GI GenBank Accession Platform_CLONEID Platform_ORF Platform_SPOTID Chromosome location1 74211482Accord 74217103 AK145513///AK145782 chr1:3054233-3054733 182 chr1:3102016-3102125 Chromosome annotation GO:Function GO:Process GO:Component GO: Function ID GO:Process ID GO:Component ID1 Chromosome 18 2 and above are all the contents of the article "how to use GEOquery and combine Biobase to download multiple data from GEO database" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report