How merged.gtf merges exon locations of the same transcript 11/21 Update SLTechnology News&Howtos

How merged.gtf merges exon locations of the same transcript

2025-11-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "how merged.gtf merges the exon location of the same transcript", the content is simple and clear, hoping to help you solve your doubts, the following let the editor lead you to study and learn "how merged.gtf merges the exon location of the same transcript" this article.

There is all the exon information in the merged.gtf file, and the following script can extract all exon location information of the transcript from this file.

Merged.gtf file example:

Chr00 Cufflinks exon 37990 38333. +. Gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1"; Chr00 Cufflinks exon 38607 38710. +. Gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1"; Chr00 Cufflinks exon 38814 38898. +. Gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "3"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1"; Chr00 Cufflinks exon 42611 42713. +. Gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "4"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1"; Chr00 Cufflinks exon 42906 43203. +. Gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "5"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1"

Example of output file:

Chr00 + XLOC_000001 MD00G1000200 TCONS_00000001 exon 37990-38333 38607-38710 38814-38898 42611-42713 42906-43203Chr00 + XLOC_000001 MD00G1000200 TCONS_00000002 exon 38005-38333 38607-38710 38814-38898 42611-42726 42906-43167Chr00 + XLOC_000002 MD00G1000400 TCONS_00000003 exon 50386-50877Chr00 + XLOC_000003 MD00G1000500 TCONS_00000004 exon 76659- 76991 77468-77544 77649-77715 77889-77970 78355-78424Chr00 + XLOC_000004 MD00G1000600 TCONS_00000005 exon 101951-102138 102228-102398 102957-103004 103099-103138 103227-103327 Chr00 + XLOC_000004 MD00G1000600 TCONS_00000006 exon 102003-102138 102228-102398 102957-103004 103099-103138 103227-103327 Chr00 + XLOC_000005 MD00G1000700 TCONS_00000007 exon 105542-105626 105926- 106541 108356-108832Chr00 + XLOC_000005 MD00G1000700 TCONS_00000009 exon 105542-105626 105926-106541 108902-109696Chr00 + XLOC_000005 MD00G1000700 TCONS_00000008 exon 105542-105626 105926-106541 108949-109696Chr00 + XLOC_000006 MD00G1001100 TCONS_00000010 exon 276592-277221 280928-280975

The first column is the chromosome; the second column is the positive and negative chain; the third column is the gene_id; fourth column gene_name; and the fifth column is the exon starting position information after the transcript ID;.

The code is as follows:

#! / usr/bin/perl-wuse strict;use warnings;use Getopt::Long;use Config::General;use Cwd qw (abs_path getcwd); use FindBin qw ($Bin $Script); my $version = "1.2" # # prepare parameters #- -# # GetOptionsmy% opts GetOptions (\% opts, "gtf=s", "od=s", "h"); my $od= $opts {od}; $od= abs_path ($od); mkdir $od unless (- d $od); open (IN, "$opts {gtf}") | die "open file $opts {gtf} failed."; open (OUT, "> $opts {od} / merged.tpm") | die "open file $opts {od} / merged.tpm failed."; while () {next if (/ ^ # /); chomp My ($chr,$a,$exon,$start,$end,$c,$link,$d,$lin) = split ("\ t", $_); $lin=~/transcript_id\ "([^\"] *)\ "/; my $trans = $1 / id / ([^\"] *)\ "/; my $gene_name= $1 ([^\"] *)\ "/; my $gene_id= $1th / transcript_id\" ([^\ "] *)\" ([^\ "] *)\" / My $trans_id = $1 leading print OUT join ("\ t", $chr,$exon,$start,$end,$link,$gene_id,$trans_id). "\ n";} close (IN); close (OUT); open (IN, "$opts {od} / merged.tpm") | | die "open file $opts {od} / merged.tpm failed."; open (OUT, "> $opts {od} / merged.gtf") | | die "open file $opts {od} / merged.gtf failed."; my $cmd= "; my $key=" While () {next if (/ ^ # /); chomp;my ($chr,$exon,$start,$end,$link,$gene_id,$gene_name,$trans_id) = split ("\ t", $_); if ($key eq $trans_id) {$cmd. = "\ t". $start. "-". $end;} else {$key = $trans_id;if ($cmd ne ") {print OUT $cmd."\ n " } $cmd = join ("\ t", $chr,$link,$gene_id,$trans_id,$exon,$start. "-". $end);}} close (IN); close (OUT); these are all the contents of the article "how merged.gtf merges the exon locations of the same transcript". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.