In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "how to install and use ALLPATHS-LG". The content in the article is simple and clear, easy to learn and understand. Please follow the editor's train of thought to study and learn "how to install and use ALLPATHS-LG".
!
ALLPATHS-LG is a genome assembly software invented by Broad Institiute Research, which is capable of assembling small genomes such as bacteria / fungi or large genomes such as animals and plants.
Unlike other assembly software, allpaths-lg requires at least two libraries
The insert length of the first library cannot be more than twice the read length of sequencing, which ensures that there is overlap between the reads sequenced at both ends. This type of library is called fragment.
The insert of the second library is usually larger than 3kb, and ultra-long reading length is beneficial to the assembly of genomes. This type of library is called jumping.
In addition to inserting fragments, allpaths-lg also has requirements for sequencing depth, which is recommended above 100x.
When assembling, there are also certain requirements for hardware resources. For mammalian genomes, the recommended memory size is 512G, and for small genomes, the recommended memory size is 32G.
The installation process is as follows
Wget ftp://ftp.broadinstitute.org/pub/crd/ALLPATHS/Release-LG/latest_source_code/allpathslg-52488.tar.gztar xzvf allpathslg-52488.tar.gzcd allpathslg-52488/./configure-- prefix=$ (pwd) makemake install
After installation, you can find the executable file of the program in the bin directory. For ease of calling, you can add the bin directory to the PATH environment variable. The official provides a small test data set, which can help us understand the use of the software.
Wget ftp://ftp.broadinstitute.org/pub/crd/ALLPATHS/Release-LG/test.genome.tar.gz
The operation of allpaths-lg is divided into the following two steps
1. Prepare the input file
In the bin directory, there is an executable file called PrepareAllPathsInputs.pl, which is used to prepare the input file. This file needs to read the following two files
In_groups.csv, the example is as follows
File_name, library_name, group_nameseq/frags.?.fastq, Solexa-25396, fragsseq/jumps.?.fastq, Solexa-11542, jumps
Comma-separated three-column files, group_name represents the unique ID,library_name of each group represents the name of the library, file_name represents sequence files, for files sequenced at both ends, wildcards can be used to represent R1 and R2.
In_libs.csv, the example is as follows
Library_name, project_name, organism_name, type, paired, frag_size, frag_stddev, insert_size, insert_stddev, read_orientation, genomic_start, genomic_endSolexa-25396, test, test.genome, fragment, 1, 180, 10, inward, 0, 0Solexa-11542, test, test.genome, jumping, 1, 3000, 500, outward, 0, 0
Comma-separated 12-column files, the library names in library_name and in_groups.csv files are the same, project_name represents the project name, organism_name represents the assembled species name, type represents the library type, fragment represents the library with short inserts and exists overlap; jumping stands for the library with very long inserts, paired stands for sequencing type, 0 represents single-ended sequencing, 1 represents double-ended sequencing Frag_size and frag_stddev only for fragment library, respectively represent the average and variance of the length of inserted fragments; insert_size and insert_stddev only for jumping library, respectively represent the mean and variance of the length of inserted fragments of jumping library; read_orientation represents the sequencing direction, genomic_start and genome_end are used to filter sequences, sequences smaller than genome_start and larger than genome_end will be filtered out, in actual use, just fill in 0.
For fragment and jumping libraries, the sequencing directions correspond to inward and outward, respectively.
After filling in the above two files according to your own data, you can run the following code
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ GENOME_SIZE=200000\ OVERWRITE=True
DATA_DIR represents the directory where the data is stored, and to go must be an absolute path, test.genome must be the same as the organism_name in the csv file. PLOIDY stands for chromosome ploidy, while allpaths-lg currently supports only haploids and diploids. After running, the following files are generated in the output directory
├── frag_reads_orig.fastb ├── frag_reads_orig.pairs ├── frag_reads_orig.qualb ├── frag_reads_orig.source.txt ├── jump_reads_orig.fastb ├── jump_reads_orig.pairs ├── jump_reads_orig.qualb ├── jump_reads_orig.source.txt ├── ploidy └── read_cache
The sequence of each library generates the corresponding .fastb, .pairs, and qualb files; ploidy records chromosome ploidy; read_cache is a temporary directory.
two。 Assembly
Once the input file is ready, you can assemble it with the following command
RunAllPathsLG\ PRE=$PWD\ REFERENCE_NAME=test.genome\ DATA_SUBDIR=data\ RUN=run\ SUBDIR=test\ TARGETS=standard\ OVERWRITE=True
The five parameters in the above command form the following directory structure
PRE/REFERENCE_NAME/DATA_SUBDIR/RUN/SUBDIR
Allpaths-lg uses this directory structure to store the results of multiple genome assemblies.
The result of the assembly is stored in the SUBDIR directory. Final.contigs.fasta corresponds to the result of contig and final.assembly.fasta corresponds to the result of scaffold.
Thank you for your reading, the above is the content of "how to install and use ALLPATHS-LG". After the study of this article, I believe you have a deeper understanding of how to install and use ALLPATHS-LG, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.