In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
The main content of this article is "how to download TCGA raw data through GDC Legacy Archive", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to download TCGA raw data through GDC Legacy Archive.
Before 2016, the relevant result files of the TCGA project were stored in the TCGA Data Portal provided by CGhub and TCGA Data Coordinating Center abbreviated as DCC, and the results at that time were obtained with reference to hg19 or hg18.
In DCC, the data is divided into three levels. Level 1 substitutes raw, unprocessed data, such as off-chip data; level2 represents intermediate results of processing, such as wig files corresponding to sequencing depth; and level 3 represents the final result of processing, such as quantitative results of genes.
After 2016, CGhub and DCC were shut down, all the data were migrated to the current GDC database, and the original results were converted to the hg38 reference genome version through GDC's pipeline. At present, the results retrieved in GDC are all processed by GDC pipeline, from which we can see that the migration to hg38 is a major trend.
Of course, there are still a lot of hg19 users at present, and if you need TCGA data based on the hg19 version, you can also find it in GDC. In fact, the data in GDC can be divided into two parts
GDC harmonized data
GDC legacy archive
In the R package TCGAbiolinks, the difference between the two is introduced, as shown in the following figure
The first part is the data based on the hg38 version used by default, and the second part is a storage of the original TCGA results. Through the GDC APPs on the home page of GDC, you can find the entrance to CDC Legacy Archive. The link is as follows.
Https://portal.gdc.cancer.gov/legacy-archive
The panel on the left can filter Cases and Files based on relevant properties. The Cases-related properties are as follows
The properties related to Files are as follows
The data is downloaded in the same way as described in the previous article. Without going into detail here, you can see the corresponding level from the file name. The files with different level are shown below.
1. Level1
Filter the Raw intensitites through Data Type to get the original data of the chip, as shown below
2. Level2
Sift the Coverage WIG through Data Type to get the sequence depth data of the comparison, as shown below
3. Level3
MiRNA gene quantification was screened by Data Type, and the quantitative data of miRNA expression was obtained, as shown below.
Through GDC Legacy Archive, the data result file based on hg19 can be found, but because the relevant website has been closed, it is impossible to confirm the details such as pipieline of the data analysis, so it needs to be used with caution.
At this point, I believe you have a deeper understanding of "how to download TCGA raw data through GDC Legacy Archive". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.