Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to download TCGA Raw data through GDC Legacy Archive

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

The main content of this article is "how to download TCGA raw data through GDC Legacy Archive", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to download TCGA raw data through GDC Legacy Archive.

Before 2016, the relevant result files of the TCGA project were stored in the TCGA Data Portal provided by CGhub and TCGA Data Coordinating Center abbreviated as DCC, and the results at that time were obtained with reference to hg19 or hg18.

In DCC, the data is divided into three levels. Level 1 substitutes raw, unprocessed data, such as off-chip data; level2 represents intermediate results of processing, such as wig files corresponding to sequencing depth; and level 3 represents the final result of processing, such as quantitative results of genes.

After 2016, CGhub and DCC were shut down, all the data were migrated to the current GDC database, and the original results were converted to the hg38 reference genome version through GDC's pipeline. At present, the results retrieved in GDC are all processed by GDC pipeline, from which we can see that the migration to hg38 is a major trend.

Of course, there are still a lot of hg19 users at present, and if you need TCGA data based on the hg19 version, you can also find it in GDC. In fact, the data in GDC can be divided into two parts

GDC harmonized data

GDC legacy archive

In the R package TCGAbiolinks, the difference between the two is introduced, as shown in the following figure

The first part is the data based on the hg38 version used by default, and the second part is a storage of the original TCGA results. Through the GDC APPs on the home page of GDC, you can find the entrance to CDC Legacy Archive. The link is as follows.

Https://portal.gdc.cancer.gov/legacy-archive

The panel on the left can filter Cases and Files based on relevant properties. The Cases-related properties are as follows

The properties related to Files are as follows

The data is downloaded in the same way as described in the previous article. Without going into detail here, you can see the corresponding level from the file name. The files with different level are shown below.

1. Level1

Filter the Raw intensitites through Data Type to get the original data of the chip, as shown below

2. Level2

Sift the Coverage WIG through Data Type to get the sequence depth data of the comparison, as shown below

3. Level3

MiRNA gene quantification was screened by Data Type, and the quantitative data of miRNA expression was obtained, as shown below.

Through GDC Legacy Archive, the data result file based on hg19 can be found, but because the relevant website has been closed, it is impossible to confirm the details such as pipieline of the data analysis, so it needs to be used with caution.

At this point, I believe you have a deeper understanding of "how to download TCGA raw data through GDC Legacy Archive". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report