How to extract the sequences of multiple pairs of genes in batches when calculating Kaks 07/15 Update SLTechnology News&Howtos

How to extract the sequences of multiple pairs of genes in batches when calculating Kaks

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to calculate Kaks when batch extraction of multiple pairs of gene sequences, I hope you will learn something after reading this article, let's discuss it together!

Batch extraction of multiple pairs of gene sequences when calculating Kaks

In gene family analysis, when calculating kaks, it is necessary to extract each group of tandem repeat gene sequences and then carry out multiple sequence alignment.

We provide you with a script that can only extract sequences in pairs, which is fine, but it will be troublesome if there are too many tandem repeat genes. Here is a script to extract tandem repeat gene sequences in batches, and each set of tandem repeat gene sequences can be extracted into a separate file.

First of all, we need to prepare three documents:

1. Tandem repeat gene pairs of files. Each line is a pair of tandem repeat genes, which are divided by Tab bond.

two。 Cds sequences of all genes.

3. An empty directory.

The script runs the command as follows:

Perl get_fa_by_id_kaks.pl WRKY.tandem.. / data/Sspon.v20180123.cds.fa test/

WRKY.tandem: tandem repeat gene pair file

.. / data/Sspon.v20180123.cds.fa: all gene cds sequence files

Test/: empty directory.

Get_fa_by_id_kaks.pl is the script, with the following code:

# email: huangls@biomics.com.cndie "perl $0" unless (@ ARGV = = 3); use Math::BigFloat;use Bio::SeqIO;use Bio::Seq;$in = Bio::SeqIO- > new (- file = > "$ARGV [1]",-format = > 'Fasta'); my% gene;while (my $seq = $in- > next_seq ()) {my ($id, $sequence, $desc) = ($seq- > id, $seq- > seq, $seq- > desc) $gene {$id} = $seq;} open IN, "$ARGV [0]" or die "$!"; my $n = 1 split while () {chomp; next if / ^ # /; my @ a = split /\ swords / My $out = Bio::SeqIO- > new (- file = > "> $ARGV [2] / dup_gene_paired$n.fa",-format = > 'Fasta') If (exists $gene {$a [0]}) {my ($id, $sequence, $desc) = ($gene {$a [0]}-> id, $gene {$a [0]}-> seq, $gene {$a [0]}-> desc); my $newSeqobj = Bio::Seq- > new (- seq = > $sequence,-id = > $id,); $out- > write_seq ($newSeqobj) } if (exists $gene {$a [1]}) {my ($id, $sequence, $desc) = ($gene {$a [1]}-> id, $gene {$a [1]}-> seq, $gene {$a [1]}-> desc); my $newSeqobj = Bio::Seq- > new (- seq = > $sequence,-desc = > $desc,-id = > $id,) $out- > write_seq ($newSeqobj);} $out- > close ();} close (IN); $in- > close (). After reading this article, I believe you have a certain understanding of "how to extract the sequence of multiple pairs of genes in batch when calculating Kaks". If you want to know more about it, welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.