In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "how to intercept genomic sequence by Samtools by Docker". The content of the explanation in this article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to intercept genomic sequence in Samtools by Docker".
Samtools is a tool software for manipulating sam and bam files, which can compare the binary view, format conversion, sorting and merging of files. Combined with information such as flag and tag in sam format, it can also complete the statistical summary of comparison results. It is to deal with sam and bam files (for example: the comparison result of transcriptome Tophat analysis software is .bam file. The resequencing of BWA, bowtie and other comparison software is mainly output to the .sam file) indispensable artifact!
Download the image and run it
After installing Docker, search for the docker image we provided, and there is the image where the samtools software is installed.
0/rnaseq RNA-seq analysis docker image build by omics... 0/samtools samtools v1.10 build by 0/biocontainer-base Biocontainers base Image centos7 0/blast-plus blast+ v2.9.0 0/blastall legacy blastall v2.2.26 0/sratoolkit SRAtoolkit v2.10.3 and aspera v3.9.9.177872 0
Download the image.
$docker pull / samtoolsUsing default tag: latestlatest: Pulling from / samtoolsab5ef0e58194: Already exists417469905807: Already existsed09842cc19f: Already existsf860268ff83f: Already existsf87dd41136a6: Already exists90091b4f5d91: Already exists6485f44fc594: Pulling fs layerlatest: Pulling from / samtoolsab5ef0e58194: Already exists417469905807: Already existsed09842cc19f: Already existsf860268ff83f: Already existsf87dd41136a6: Already exists90091b4f5d91: Already exists6485f44fc594: Pull completeDigest: sha256:e641dd5b9f60d8f9d01f0d109eff72d15836d0d59a753e3e35677b1200adc4a1Status: Downloaded newer image for / samtools:latest
You can check it after the download is complete.
$docker imagesREPOSITORY TAG IMAGE ID CREATED SIZE/samtools latest 9373e18781bf 8 days ago 2.04GB/blast-plus latest 0220cac51a6e 8 days ago 2.55GB
Then create a samtools folder on your computer's D disk and paste the desired chromosome fasta file. If docker is the windows Toolbox version, you need to mount the D disk to the virtual machine. For more information, please see the detailed version of Docker tool installation (windows Toolbox version).
Enter the virtual machine.
$docker run-rm-v / d/samtools:/work-it / samtools:latest # the Docker I use here is the windows Toolbox version # Welcome to the docker image provided by the Group Lecture Hall # # For question exchange, please visit: www..com # Linux novice recommended course:-- > https://www..com/article/702 build laboratory student letter analysis For more information on the use of the platform and docker, see course:-- > https://www..com/article/1181[root@0e9f42f25cc1 10:16:46 / work] #
View the working directory.
Samtools faidx can create a file with the suffix .fai for the fasta sequence. According to this .fai file and the original fasta file, it can quickly extract the sequence of any region. Usage:
Samtools faidx Arabidopsis_thaliana.TAIR10.31.dna.toplevel.fa
This command has certain requirements for the fasta sequence entered: for each sequence, except for the last line, the other lines must be the same length.
> one ATGCATGCATGCATGCATGCATGCATGCAT GCATGCATGCATGCATGCATGCATGCATGC ATGCAT > two another chromosome ATGCATGCATGCATGCATGCATGCATGC
The final generated .fai file is as follows, with 5 columns, separated by\ t
Pt 154478 4 60 61Mt 366924 157061 60 614 18585056 530104 60 612 19698289 19424914 60 613 23459830 39451511 60 615 26975502 63302342 60 611 30427671 90727439 60 61
The first column NAME: the name of the sequence, leaving only the contents after ">" and before the first blank.
The second column LENGTH: the length of the sequence in bp
The third column OFFSET: the offset of the first base, counted from 0, and the newline character also counted.
The fourth column LINEBASES: except for the last line, the base number of rows representing the sequence, in bp
The fifth column, LINEWIDTH: line width, except the last line, represents the length of the line of the sequence, including the newline character, which is\ r\ n in the windows system. Add 2 to the length of the sequence.
Finally, the desired sequence can be extracted by specifying the position of the sequence or sequence to be extracted on the chromosome:
Samtools faidx Arabidopsis_thaliana.TAIR10.31.dna.toplevel.fa Pt > Pt.fa # extract chloroplast sequence samtools faidx Arabidopsis_thaliana.TAIR10.31.dna.toplevel.fa Pt:100-1000 > Pt.fa # extract chloroplast 100 to 1000 base sequence thank you for reading, this is the content of "how to intercept Samtools Genome sequence by Docker". After the study of this article I believe you have a deeper understanding of how Docker can intercept genomic sequences by Samtools, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.