What is the usage of bwa software? 07/06 Update SLTechnology News&Howtos

What is the usage of bwa software?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

What is the usage of bwa software? I believe many inexperienced people don't know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Bwa is a software that aligns sequences to reference genomes and includes the following three algorithms

BWA-backtrack

BWA-SW

BWA-MEM

BWA-backtrack is suitable for alignment sequences with length less than 100bp; BWA-SW and BWA-MEM are suitable for sequences with length of 70-1m bp. BWA-MEM is the newly developed algorithm, which is faster and more accurate for high-quality sequencing data. For 70-100bp reads, BWA-MEM algorithm is better than BWA-backtrack algorithm in alignment of sequences with length of 70-100bp. All in all, in general, just choose the BWA-MEM algorithm.

The source code of bwa is stored on github

The installation process is as follows:

Git clone https://github.com/lh4/bwa.git

Cd bwa

Make

After installation, an executable file named bwa appears. Enter the following command to view help information.

. / bwa

Program: bwa (alignment via Burrows-Wheeler transformation) Version: 0.7.17-r1188Contact: Heng Li Usage: bwa [options] Command: index index sequences in the FASTA format mem BWA-MEM algorithm fastmap identify super-maximal exact matches pemerge merge overlapping paired ends (EXPERIMENTAL) aln gapped/ungapped alignment samse generate alignment (single ended) Sampe generate alignment (paired ended) bwasw BWA-SW for long queries shm manage indices in shared memory fa2pac convert FASTA to PAC format pac2bwt generate BWT from PAC pac2bwtgen alternative algorithm for generating BWT bwtupdate update .bwt to the new format bwt2sa generate SA from BWT and OccNote: To use BWA You need to first index the genome with `bwa index'. There are three alignment algorithms in BWA: `mem', `bwasw', and `aln/samse/sampe'. If you are not sure which to use, try `bwa mem' first. Please `man. / bwa.1' for the manual.

You can see that it is made up of a lot of subcommands.

The function of the bwa software is to align the sequence to the reference genome. Before alignment, the reference genome needs to be indexed. The command is as follows:

Bwa index in.fasta

After the index is established, five files are generated with the suffix of

Bwt

Pac

Ann

Amb

Here is an example of indexing the mouse genome

Bwa index mm10.fasta

├── mm10.fasta

├── mm10.fasta.amb

├── mm10.fasta.ann

├── mm10.fasta.bwt

├── mm10.fasta.pac

└── mm10.fastq.sa

Once the reference genome is established, it can be compared. Different alignment algorithms have different commands.

1. BWA-backtrack algorithm

The corresponding subcommand is aln/samse/sample

The usage of single-ended data is as follows:

Bwa aln ref.fa reads.fq > aln_sa.sai

Bwa samse ref.fa aln_sa.sai reads.fq > aln-se.sam

The usage of double-ended data is as follows:

Bwa aln ref.fa read1.fq > aln1_sa.sai

Bwa aln ref.fa read2.fq > aln2_sa.sai

Bwa sampe ref.fa aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln-pe.sam

2. BWA-SW algorithm

The corresponding subcommand is bwasw, and the basic usage is as follows

Bwa bwasw ref.fa reads.fq > aln-se.sam

Bwa bwasw ref.fa read1.fq read2.fq > aln-pe.sam

3. BWA- MEM` algorithm

The corresponding subcommand is mem, and the basic usage is as follows

Bwa mem ref.fa reads.fq > aln-se.sam

Bwa mem ref.fa read1.fq read2.fq > aln-pe.sam

For ultra-long read-length reads, such as reads generated by PacBio and Nanopore sequencers, the usage is as follows

Bwa mem-x pacbio ref.fa reads.fq > aln.sam

Bwa mem-x ont2d ref.fa reads.fq > aln.sam

In the above code, ref.fa refers to the name of the reference genome index. For the mouse example provided above, the reference genome index is named mm10.fasta, and note that it does not contain a suffix.

The default usage is very simple, and sometimes you need to adjust the parameters. Take the memsubcommand as an example, the commonly used parameters include the following:

-t specifies the number of threads. The default is 1. Increasing the number of threads will reduce the running time. -p ignores the second input sequence. by default, entering a sequence file is considered single-ended sequencing, while entering two sequence files is double-ended sequencing. after adding this parameter, the second input sequence file is ignored. compare the first file as single-ended sequencing data The-Y parameter soft clipping the data, and when there is a mismatch or too many gap numbers do not match, the sequence will be removed. The removal here only removes this part of the sequence during the alignment, and the sequence still exists in the final output, so it is called soft clipping.

After reading the above, have you mastered the usage of bwa software? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.