Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to output fasta sequences in a specified format

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to output the fasta sequence according to the specified format, which has a certain reference value, and interested friends can refer to it. I hope you can learn a lot after reading this article.

Sometimes when dealing with fasta files, we need to arrange the sequences according to the prescribed format.

Many people should have encountered the need to arrange the sequence on a line, or to display each line according to the specified number of bp. I also often encounter situations where unequal fasta sequences such as 60bp and 70bp coexist in the same fasta file. In order to avoid the impact of different lengths on later processing, it is generally best to unify the format.

Fasta file format:

Although it is a small problem, there are many different ways to implement these operations, so let's give some examples to illustrate some ways to achieve the two formats mentioned above.

1. Here I use a full-length 158bp fasta file composed of 60bp per line display and the last line of 38bp arranged by two fasta sequences.

Test.fa:

$cat test.fa > chr_test1GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATTTTTTAACAGTCACCCCCCAACTAACACATTCCAACTAACC > chr_test2GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATTTTTTAACAGTCACCCCCCAACTAACACATTCCAACTAACC

2. The first step is to arrange into a row using awk:

$awk'/ ^ > / {if (NR > 1) print "; printf ("% s\ n ", $0); next;} {printf ("% s ", $0);} END {printf ("\ n ");} 'test.fa > chr_test1GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATTTTTTAACAGTCACCCCCCAACTAACACATTCCAACTAACC > chr_test2GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATTTTTTAACAGTCACCCCCCAACTAACACATTCCAACTAACC

3. In addition, biopython is also very convenient to deal with fasta and fastq files, and there are corresponding solutions.

Biopython is output by default according to each line of 60bp. If you check its help documentation, you can find that FastaWriter can specify the wrap of the fasta sequence in the write-out file. ) number:

I have written a version of biopython that can do both of the above with its specified parameter nwrap, which is displayed on a line when nwrap is set to 0.

Wrap_xbp.py:

Import argparsefrom Bio import SeqIOfrom Bio.SeqIO.FastaIO import FastaWriter

# usage descriptiondescribe=argparse.ArgumentParser (description= "Make Fasta Sequence ina Single Line or Wrap N bp One Line") describe.add_argument ("- nwrap", help= "n base per line;default=0 means seq in one line", default=0,type=int) describe.add_argument ("orgf", help= "Original fasta") # original fasta file describe.add_argument ("optf", help= "Output fasta") # modified output file args=describe.parse_args ()

# handle to output and FastaWriter to make normalized outputoutput_fasta = open (args.optf, "w") # Open the file handle for writing out the file writer = FastaWriter (output_fasta,wrap=args.nwrap) # set the write format writer.write_file (SeqIO.parse (args.orgf, "fasta")) # read the original file and write out output_fasta.close () in the required format # close the file handle

Run the output file test_50wrap.fa for each line of 50bp

$python3 wrap_xbp.py-nwrap 50 test.fa test_50wrap.fa

$cat test_50wrap.fa > chr_test1GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATTTTTTAACAGTCACCCCCCAACTAACACATTCCAACTAACC > chr_test2GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATTTTTTAACAGTCACCCCCCAACTAACACATTCCAACTAACC

4. Bbmap also has a quick and easy-to-use reformat.sh to do the same operation.

Arrange each line of 50bp:

$~ / tool/bbmap/reformat.sh in=test.fa out=test_out2.fa fastawrap=50

The result files are arranged according to each line of 50bp:

$cat test_out2.fa > chr_test1GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATTTTTTAACAGTCACCCCCCAACTAACACATTCCAACTAACC > chr_test2GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATTTTTTAACAGTCACCCCCCAACTAACACATTCCAACTAACC

Of course, you can also plan to display on one line, as long as you set it larger: fastawrap=50000000000

Thank you for reading this article carefully. I hope the article "how to output fasta sequences in a specified format" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report