Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python to deal with gff3 format

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to use Python to deal with gff3 format" related knowledge, editor through the actual case to show you the operation process, the method of operation is simple and fast, practical, I hope that this "how to use Python to deal with gff3 format" article can help you solve the problem.

1. Normally downloaded plant genome annotation files, taking gff3 format as an example

The first example of a maize genome annotation file (gff3 format) downloaded from JGI is usually gene location information. 1 indicates that it is located on maize chromosome 1, the second case indicates the annotated version information, and the third column is usually gene,mRNA,CDS and other information. at the same time, a gene may correspond to multiple mRNA. Those who know something about biology also know that a mRNA is a transcriptional information, which corresponds to the annotated sequence file one by one. The fourth column and the fifth column are the physical position of the third column information of the gene on the chromosome, respectively. The seventh column indicates whether the gene is on the positive or negative strand. The eighth column is phase information. The ninth column contains some ID information of gene annotations. The main purpose of this data processing is to extract the first column, the third column, the fourth column, the fifth column and the ninth column.

two。 Use the results processed by the program

The processed format has a total of five columns, the first column is on the chromosome where the gene is located, the second column is the new ID information generated by the logical sequence of the starting and ending sites of gene, and the third and fourth columns are the starting and ending sites of the gene. The fifth column is extracted from the ninth column of the original comment information, and there must be an one-to-one correspondence between sequence files. Let's go straight to the code.

3. Code information #! usr/bin/pythonimport re,iofrom operator import itemgetterinput_file = io.open ('Zmays_284_Ensembl-18_2010-01murMaizeSequence.gene.gff3) # the annotation information of the gene, the GFF3 format file out_file = open (' Zm.newid.gff', 'sequence, encoding='UTF-8') # output file name list_two = [] chr_name = [] de_list = (' #','M') For line in input_file: if line.startswith (de_list): continue list_one = line.strip () .split () if list_one [2] = 'mRNA': # gene_id = list_one [8] .split (' ') [2] # need to modify gene_id = list_one [8] gene_id =' .join (re.findall (rroompacid = (. +?)) Longest',gene_id)) # need to modify # get the id information of gene list_one [0] = re.sub (r'\ Dcards, "", list_one [0]) # list_two.append (gene_na_st_end) list_two.append ((int (list_one [0]), int (list_one [3]), int (list_one [4])) Int (gene_id)) chr_name.append (int (list_ one [0])) # print (gene_id) else: continuechr_name = list (set (chr_name)) chr_name.sort () number = 0list_thrre = sorted (list_two,key = itemgetter 2) next_chr = 0for I in list_thrre: new_i = "\ t" .join ('% s'% id for id in i) # print (new_i) lp = str (new_i). Strip (). Split () # chr_id = re.sub ('\ [', "" Lp [0]) if str (lp [0]) = = str (chr_ name [next _ chr]): number = number + 1 else: number = 1 next_chr = next_chr + 1 # newid = "Zm" +'% lp [0] + 'G'+''%number newid = "Zm" + str (lp [0]) .zfill (2) + "G" + str (number) .zfill (5) # required Modify print (newid) out_file.write ('Zm'+str (lp [0]) + "\ t" + newid+ "\ t" + str (lp [1]) + "\ t" + str (lp [2]) + "\ t" + str (lp [3]) +'\ n' input_file.close () # make by ligaojie from North China University of Technology on "how to use Python to deal with gff3 format" Thank you for your reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report