In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
What is the usage of bwa software? I believe many inexperienced people don't know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
Bwa is a software that aligns sequences to reference genomes and includes the following three algorithms
BWA-backtrack
BWA-SW
BWA-MEM
BWA-backtrack is suitable for alignment sequences with length less than 100bp; BWA-SW and BWA-MEM are suitable for sequences with length of 70-1m bp. BWA-MEM is the newly developed algorithm, which is faster and more accurate for high-quality sequencing data. For 70-100bp reads, BWA-MEM algorithm is better than BWA-backtrack algorithm in alignment of sequences with length of 70-100bp. All in all, in general, just choose the BWA-MEM algorithm.
The source code of bwa is stored on github
The installation process is as follows:
Git clone https://github.com/lh4/bwa.git
Cd bwa
Make
After installation, an executable file named bwa appears. Enter the following command to view help information.
. / bwa
Program: bwa (alignment via Burrows-Wheeler transformation) Version: 0.7.17-r1188Contact: Heng Li Usage: bwa [options] Command: index index sequences in the FASTA format mem BWA-MEM algorithm fastmap identify super-maximal exact matches pemerge merge overlapping paired ends (EXPERIMENTAL) aln gapped/ungapped alignment samse generate alignment (single ended) Sampe generate alignment (paired ended) bwasw BWA-SW for long queries shm manage indices in shared memory fa2pac convert FASTA to PAC format pac2bwt generate BWT from PAC pac2bwtgen alternative algorithm for generating BWT bwtupdate update .bwt to the new format bwt2sa generate SA from BWT and OccNote: To use BWA You need to first index the genome with `bwa index'. There are three alignment algorithms in BWA: `mem', `bwasw', and `aln/samse/sampe'. If you are not sure which to use, try `bwa mem' first. Please `man. / bwa.1' for the manual.
You can see that it is made up of a lot of subcommands.
The function of the bwa software is to align the sequence to the reference genome. Before alignment, the reference genome needs to be indexed. The command is as follows:
Bwa index in.fasta
After the index is established, five files are generated with the suffix of
Bwt
Pac
Ann
Amb
Sa
Here is an example of indexing the mouse genome
Bwa index mm10.fasta
├── mm10.fasta
├── mm10.fasta.amb
├── mm10.fasta.ann
├── mm10.fasta.bwt
├── mm10.fasta.pac
└── mm10.fastq.sa
Once the reference genome is established, it can be compared. Different alignment algorithms have different commands.
1. BWA-backtrack algorithm
The corresponding subcommand is aln/samse/sample
The usage of single-ended data is as follows:
Bwa aln ref.fa reads.fq > aln_sa.sai
Bwa samse ref.fa aln_sa.sai reads.fq > aln-se.sam
The usage of double-ended data is as follows:
Bwa aln ref.fa read1.fq > aln1_sa.sai
Bwa aln ref.fa read2.fq > aln2_sa.sai
Bwa sampe ref.fa aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln-pe.sam
2. BWA-SW algorithm
The corresponding subcommand is bwasw, and the basic usage is as follows
Bwa bwasw ref.fa reads.fq > aln-se.sam
Bwa bwasw ref.fa read1.fq read2.fq > aln-pe.sam
3. BWA- MEM` algorithm
The corresponding subcommand is mem, and the basic usage is as follows
Bwa mem ref.fa reads.fq > aln-se.sam
Bwa mem ref.fa read1.fq read2.fq > aln-pe.sam
For ultra-long read-length reads, such as reads generated by PacBio and Nanopore sequencers, the usage is as follows
Bwa mem-x pacbio ref.fa reads.fq > aln.sam
Bwa mem-x ont2d ref.fa reads.fq > aln.sam
In the above code, ref.fa refers to the name of the reference genome index. For the mouse example provided above, the reference genome index is named mm10.fasta, and note that it does not contain a suffix.
The default usage is very simple, and sometimes you need to adjust the parameters. Take the memsubcommand as an example, the commonly used parameters include the following:
-t specifies the number of threads. The default is 1. Increasing the number of threads will reduce the running time. -p ignores the second input sequence. by default, entering a sequence file is considered single-ended sequencing, while entering two sequence files is double-ended sequencing. after adding this parameter, the second input sequence file is ignored. compare the first file as single-ended sequencing data The-Y parameter soft clipping the data, and when there is a mismatch or too many gap numbers do not match, the sequence will be removed. The removal here only removes this part of the sequence during the alignment, and the sequence still exists in the final output, so it is called soft clipping.
After reading the above, have you mastered the usage of bwa software? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.