How to extract the Promoter sequence of all genes in Genome by perl 07/06 Update SLTechnology News&Howtos

How to extract the Promoter sequence of all genes in Genome by perl

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

The knowledge of this article "how to extract the promoter sequence of all genes in the genome by perl" is not quite understood by most people, so the editor summarizes the following content, detailed content, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this article, "how to extract the promoter sequence of all genes in the genome by perl".

Script run commands:

Perl gene_promoter.pl-fa Donkey_Hic_genome.20180408.fa-gff Donkey_Hic_genome.20180408.gff3-out gene_promoter.fa-n 2000

The-fa is followed by the genomic chromosome sequence; the-gff is followed by the genomic gff file; and-n is followed by a number indicating how many bp sequences upstream of the gene to extract.

Script code:

#! / usr/bin/perl-wuse strict;use warnings;use Getopt::Long;use Data::Dumper;use Config::General;use Cwd qw (abs_path getcwd); use FindBin qw ($Bin $Script); use File::Basename qw (basename dirname); use Bio::SeqIO;use Bio::Seq;my $version = "1.3" # # prepare parameters #- -# # GetOptionsmy% opts GetOptions (\% opts, "gff=s", "fa=s", "out=s", "nudes", "h"); if (! defined ($opts {out}) | |! defined ($opts {gff}) | defined ($opts {fa}) | | defined ($opts {h}) {print "$opts {fa}",-format = > 'Fasta'); my% fasta;while (my $seq = $in- > next_seq ()) {my ($id,$sequence) = ($seq- > id,$seq- > seq); $fasta {$id} = $sequence } open (IN, "$opts {gff}") | die "open file $opts {gff} faild.\ n"; open (OUT, "> $opts {out}") | | die "open file $opts {out} faild.\ n"; while () {next if (/ ^ # /); my @ line = split ("\ t", $_); if ($line [2] eq "gene") {$line [8] = ~ / ID= ([^;] *); my $name = $1 If ($line [6] eq "+") {my $gene = substr ($fasta {$line [0]}, $line [3]-$n); print OUT "> $name\ n$gene\ n";} elsif ($line [6] eq "-") {my $gene = substr ($fasta {$line [0]}, $line [4], $n); $gene = & reverse_complement_IUPAC ($gene); print OUT "> $name\ n$gene\ n";}} close (OUT); close (IN); sub reverse_complement_IUPAC {my $dna = shift # reverse the DNA sequence my $revcomp = reverse ($dna); # complement the reversed DNA sequence $revcomp = ~ tr/ABCDGHMNRSTUVWXYabcdghmnrstuvwxy/TVGHCDKNYSAABWXRtvghcdknysaabwxr/; return $revcomp;} sub reverse_complement {my $dna = shift; # reverse the DNA sequence my $revcomp = reverse ($dna); # complement the reversed DNA sequence $revcomp = ~ tr/ACGTacgt/TGCAtgca/; return $revcomp } the above is the content of this article on "how to extract the promoter sequence of all genes in the genome by perl". I believe you all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to learn more about the relevant knowledge, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.