Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the download of virus genome in KEGG database?

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about the download of virus genome in KEGG database. Many people may not know much about it. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.

There are two problems in the method of downloading protein sequence data from KEGG database:

1. The naming of virus species in the KEGG database is not an acronym for lowercase letters like cell biology, so when a virus is downloaded in bulk, it will report an error and cannot be downloaded, as shown below:

two。 When downloading the id according to the protein sequence, the download will be incomplete, so there will be an error in the final merge.

Solutions are now provided to solve the above two problems. First of all, for the first problem, there is no standard abbreviation for the name of the virus species in the KEGG database, but all viruses can be represented by the abbreviation "vg" (that is, the abbreviation viral genome). The download method is as follows:

Wget-c http://rest.kegg.jp/list/vg

So we have a list of proteins for all the viruses, as follows:

Vg:23892186 CP, DU23_s2gp1; Arhar cryptic virus-II; Coat Proteinvg:24271495 LAT, HHV2s01; Human alphaherpesvirus 2; LATvg:1487286 RL1, HHV2p77; Human alphaherpesvirus 2; neurovirulence protein ICP34.5vg:1487288 RL2, HHV2p76; Human alphaherpesvirus 2; ubiquitin E3 ligase ICP0vg:1487292 UL1, HHV2p75; Human alphaherpesvirus 2; envelope glycoprotein Lvg:1487303 UL2, HHV2p74; Human alphaherpesvirus 2; uracil-DNA glycosylasevg:24271453 UL3, HHV2p73; Human alphaherpesvirus 2; nuclear protein UL3vg:1487326 UL4, HHV2p71; Human alphaherpesvirus 2; nuclear protein UL4vg:1487338 UL5, HHV2p72; Human alphaherpesvirus 2; helicase-primase helicase subunitvg:1487346 UL6, HHV2p70 Human alphaherpesvirus 2; capsid portal protein

The first column on the left is the id of the viral protein sequence, which can be obtained by traversing the id.

For the second problem, this is a defect of the wget command. We can determine whether the file is downloaded completely by determining whether the end of each file is a newline character, as shown below:

Tail-N1 | wc-l

If the file is downloaded completely and the last character is a newline character, the result is 1, otherwise it is 0, as follows:

After reading the above, do you have any further understanding of the download of virus genome in KEGG database? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report