In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
In this issue, the editor will bring you about the principle of GEO database architecture. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.
GEO is an international open source project that allows researchers to submit their own data to the database and share their own data publicly around the world.
At first, the database is mainly used to share chip data, and later, with the development of NGS technology, it also supports uploading high-throughput sequencing data.
In this database, all relevant information is divided into the following categories, as shown below
1. Platform
Chip platform or sequencing platform, each platform has a unique numbered, high-throughput sequencing platform starting with GPL, as shown below
Different platforn are formed by the combination of sequencers and species. The chip platform is shown below.
The chip platform will give the probe-related information, such as the corresponding gene, probe sequence, etc., as shown below
2. Sample
Sample represents the data of a sample, which can be data generated by any platform, with a unique number that begins with GSM. For chip data, the expression value of the probe is given as follows
For high-throughput sequencing data, different kinds of files will be given according to the data type. If the original sequencing data is uploaded to the SRA database, the corresponding SRA number will also be given, as shown below.
3. Series
Series represents a group of samples belonging to the same experimental design, and usually gives the compressed packages of the attachment files of all samples in the series, as shown below
The above three kinds of information are provided by the submitter of the data. For the original data under the same series, GEO will simply mine them, such as clustering analysis based on the amount of expression. The corresponding type of these analysis results is DataSet, which has a unique number that begins with GDS. GDS2225 indicates as follows
Based on the GSE3541 data, the data is a set of rat chip data. The samples are divided into two groups: case and control, each with 3 repeats. The clustering results based on expression are shown below.
According to the expression profile data provided in DataSet, the Profile data are obtained by exploring the expression amount of each probe or gene in all samples, as shown below.
Data sharing makes data mining based on public database possible, and it can also be verified with our own sequencing data by analyzing the existing data of the same type.
These are the principles of the GEO database architecture shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.