Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the best way to download NCBI SRA data

2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

What is the best way to download NCBI SRA data? for this question, this article introduces the corresponding analysis and answer in detail, hoping to help more partners who want to solve this problem to find a more simple and easy way.

High-throughput raw data is usually uploaded to NCBI's SRA (Sequence Read Archive) database. When we need to use this data, we need an appropriate way to download it. Common download methods:

Aspera tool downloads wget, curl command downloads NCBI official SRA Toolkit directly to download many tutorials are recommended to use aspera to achieve high-speed download, but in many cases, the download is unstable or failed to download successfully due to port or network proxy reasons And prompted the following error: ascp: Failed to open TCP connection for SSH, exiting.Session Stop (Error: Failed to open TCP connection for SSH) NCBI also made the following declaration: As of early 2019, the SRA is starting to make use of additional forms of storage media, which are less useful over Aspera's fasp protocol. Files stored in these media may not be accessible via ascp and have triggered creation of some issues to report the problem.

Since 2019, the data storage mode of SRA database has changed. Using ascp to download data may bring some other problems.

Commands such as wget are also very convenient download tools. It is appropriate to use them to download small data, but for high-throughput data that is easily counted by GB or even TB, the advantage of wget is not obvious. If the program is interrupted, or if the download is interrupted due to network reasons, you have to download it again.

Similarly, NCBI also pointed out that wget may not be able to download all the data. There are several reasons why direct use of ascp (or curl, wget, etc) is not recommended. The main reason is that they are likely to only retrieve a portion of the data required.

Therefore, the most stable and reassuring way is to use prefect in SRA Toolkit to download.

Download and install SRA Toolkit:

Https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software

Download the binary (binaries) version, download it and use it without compilation and installation.

Add environment variables after decompression.

Use prefect to download data:

Method 1:

Directly specify the Run number for download, such as SRR1482462

Prefetch SRR1482462

Method 2:

Download all the Run/Sample of a Project in bulk:

First go to one of the run pages and click "All run"

Then click "Accession List" and download a file called "SRR_Acc_List.txt" with all the run numbers in it.

Use the following command to download in batches (put it in the background without interrupting the download: nohup cmd &):

Nohup prefetch-O. $(

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report