Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the principle of GEO database architecture

2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about the principle of GEO database architecture. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

GEO is an international open source project that allows researchers to submit their own data to the database and share their own data publicly around the world.

At first, the database is mainly used to share chip data, and later, with the development of NGS technology, it also supports uploading high-throughput sequencing data.

In this database, all relevant information is divided into the following categories, as shown below

1. Platform

Chip platform or sequencing platform, each platform has a unique numbered, high-throughput sequencing platform starting with GPL, as shown below

Different platforn are formed by the combination of sequencers and species. The chip platform is shown below.

The chip platform will give the probe-related information, such as the corresponding gene, probe sequence, etc., as shown below

2. Sample

Sample represents the data of a sample, which can be data generated by any platform, with a unique number that begins with GSM. For chip data, the expression value of the probe is given as follows

For high-throughput sequencing data, different kinds of files will be given according to the data type. If the original sequencing data is uploaded to the SRA database, the corresponding SRA number will also be given, as shown below.

3. Series

Series represents a group of samples belonging to the same experimental design, and usually gives the compressed packages of the attachment files of all samples in the series, as shown below

The above three kinds of information are provided by the submitter of the data. For the original data under the same series, GEO will simply mine them, such as clustering analysis based on the amount of expression. The corresponding type of these analysis results is DataSet, which has a unique number that begins with GDS. GDS2225 indicates as follows

Based on the GSE3541 data, the data is a set of rat chip data. The samples are divided into two groups: case and control, each with 3 repeats. The clustering results based on expression are shown below.

According to the expression profile data provided in DataSet, the Profile data are obtained by exploring the expression amount of each probe or gene in all samples, as shown below.

Data sharing makes data mining based on public database possible, and it can also be verified with our own sequencing data by analyzing the existing data of the same type.

These are the principles of the GEO database architecture shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report