How to install and use ALLPATHS-LG 04/20 Update SLTechnology News&Howtos

How to install and use ALLPATHS-LG

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to install and use ALLPATHS-LG". The content in the article is simple and clear, easy to learn and understand. Please follow the editor's train of thought to study and learn "how to install and use ALLPATHS-LG".

ALLPATHS-LG is a genome assembly software invented by Broad Institiute Research, which is capable of assembling small genomes such as bacteria / fungi or large genomes such as animals and plants.

Unlike other assembly software, allpaths-lg requires at least two libraries

The insert length of the first library cannot be more than twice the read length of sequencing, which ensures that there is overlap between the reads sequenced at both ends. This type of library is called fragment.

The insert of the second library is usually larger than 3kb, and ultra-long reading length is beneficial to the assembly of genomes. This type of library is called jumping.

In addition to inserting fragments, allpaths-lg also has requirements for sequencing depth, which is recommended above 100x.

When assembling, there are also certain requirements for hardware resources. For mammalian genomes, the recommended memory size is 512G, and for small genomes, the recommended memory size is 32G.

The installation process is as follows

Wget ftp://ftp.broadinstitute.org/pub/crd/ALLPATHS/Release-LG/latest_source_code/allpathslg-52488.tar.gztar xzvf allpathslg-52488.tar.gzcd allpathslg-52488/./configure-- prefix=$ (pwd) makemake install

After installation, you can find the executable file of the program in the bin directory. For ease of calling, you can add the bin directory to the PATH environment variable. The official provides a small test data set, which can help us understand the use of the software.

Wget ftp://ftp.broadinstitute.org/pub/crd/ALLPATHS/Release-LG/test.genome.tar.gz

The operation of allpaths-lg is divided into the following two steps

1. Prepare the input file

In the bin directory, there is an executable file called PrepareAllPathsInputs.pl, which is used to prepare the input file. This file needs to read the following two files

In_groups.csv, the example is as follows

File_name, library_name, group_nameseq/frags.?.fastq, Solexa-25396, fragsseq/jumps.?.fastq, Solexa-11542, jumps

Comma-separated three-column files, group_name represents the unique ID,library_name of each group represents the name of the library, file_name represents sequence files, for files sequenced at both ends, wildcards can be used to represent R1 and R2.

In_libs.csv, the example is as follows

Library_name, project_name, organism_name, type, paired, frag_size, frag_stddev, insert_size, insert_stddev, read_orientation, genomic_start, genomic_endSolexa-25396, test, test.genome, fragment, 1, 180, 10, inward, 0, 0Solexa-11542, test, test.genome, jumping, 1, 3000, 500, outward, 0, 0

Comma-separated 12-column files, the library names in library_name and in_groups.csv files are the same, project_name represents the project name, organism_name represents the assembled species name, type represents the library type, fragment represents the library with short inserts and exists overlap; jumping stands for the library with very long inserts, paired stands for sequencing type, 0 represents single-ended sequencing, 1 represents double-ended sequencing Frag_size and frag_stddev only for fragment library, respectively represent the average and variance of the length of inserted fragments; insert_size and insert_stddev only for jumping library, respectively represent the mean and variance of the length of inserted fragments of jumping library; read_orientation represents the sequencing direction, genomic_start and genome_end are used to filter sequences, sequences smaller than genome_start and larger than genome_end will be filtered out, in actual use, just fill in 0.

For fragment and jumping libraries, the sequencing directions correspond to inward and outward, respectively.

After filling in the above two files according to your own data, you can run the following code

PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ GENOME_SIZE=200000\ OVERWRITE=True

DATA_DIR represents the directory where the data is stored, and to go must be an absolute path, test.genome must be the same as the organism_name in the csv file. PLOIDY stands for chromosome ploidy, while allpaths-lg currently supports only haploids and diploids. After running, the following files are generated in the output directory

├── frag_reads_orig.fastb ├── frag_reads_orig.pairs ├── frag_reads_orig.qualb ├── frag_reads_orig.source.txt ├── jump_reads_orig.fastb ├── jump_reads_orig.pairs ├── jump_reads_orig.qualb ├── jump_reads_orig.source.txt ├── ploidy └── read_cache

The sequence of each library generates the corresponding .fastb, .pairs, and qualb files; ploidy records chromosome ploidy; read_cache is a temporary directory.

two。 Assembly

Once the input file is ready, you can assemble it with the following command

RunAllPathsLG\ PRE=$PWD\ REFERENCE_NAME=test.genome\ DATA_SUBDIR=data\ RUN=run\ SUBDIR=test\ TARGETS=standard\ OVERWRITE=True

The five parameters in the above command form the following directory structure

PRE/REFERENCE_NAME/DATA_SUBDIR/RUN/SUBDIR

Allpaths-lg uses this directory structure to store the results of multiple genome assemblies.

The result of the assembly is stored in the SUBDIR directory. Final.contigs.fasta corresponds to the result of contig and final.assembly.fasta corresponds to the result of scaffold.

Thank you for your reading, the above is the content of "how to install and use ALLPATHS-LG". After the study of this article, I believe you have a deeper understanding of how to install and use ALLPATHS-LG, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.