How to use kallisto 04/28 Update SLTechnology News&Howtos

How to use kallisto

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "how to use kallisto". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Kallisto is a transcript quantitative tool released in 2016 that does not need to be compared, using an algorithm called pseudo-alignment. The traditional quantitative algorithm is to determine which transcript or gene it belongs to according to the comparison position of reads, while the pseudo-alignment algorithm does not affect the specific location of reads, but determines which transcript it belongs to by the kmer characteristics of reads, as shown in the following diagram

First of all, each transcript sequence is divided into kmer, and the kmer sequence of all transcripts is used to construct de Bgujin Graph, referred to as T-DBG. In this graph, each node is a kmer, and each path represents a transcript. Because of the redundancy of the transcript sequence, each kmer actually corresponds to multiple paths, that is, multiple transcripts; then the sequenced reads is also divided into kmer and mapped to T-DBG.

In the final quantification, the transcripts corresponding to all the kmer of the reads are intersected, and the transcripts that reads may belong to can be analyzed.

There are compiled executable files on the official website, which can be downloaded and decompressed. The code is as follows

Wget https://github.com/pachterlab/kallisto/releases/download/v0.44.0/kallisto_linux-v0.44.0.tar.gztar xzvf kallisto_linux-v0.44.0.tar.gz

After unzipping, you can see the executable file named kallisto under the folder. From the algorithm, we can also see that the operation of the software requires two steps: the first step is to divide the sequence of transcripts into kmer and build T-DBG, which is also called indexing; the second step is to quantify the reads.

1. Index the transcript sequence

Kallisto supports reading gzip-compressed transcript sequences, using the following

Kallisto index-k 31-I hg19.idx hg19.refMrna.fa

You only need to provide a sequence in fasta format of the transcript. The-k parameter specifies the length of the kmer, and the-I parameter specifies the name of the output index. Note that the index established by kallisto is a file.

two。 Quantitative analysis

Kallisto supports the quantification of single-ended and double-ended data. The usage of double-ended data is as follows.

Kallisto quant\-I hg19.idx\-o out_dir\-t 20\ R1.fastq.gz R2.fastq.gz

The-I parameter specifies the index file of the transcript, the-o parameter specifies the directory of the output results, the-t parameter specifies the number of threads, and kallisto supports gzip compressed sequence files.

The use of single-ended data is as follows

Kallisto quant\-I hg19.idx\-o output\-- single\-l 180\-s 20\-t 20\ reads.fastq.gz

For single-ended data, you must specify the mean and variance of the fragment length, corresponding to the-l and-s parameters, respectively.

In the output directory, the following three files are generated

├── abundance.h6 ├── abundance.tsv └── run_info.json

The run_info.json file is in JSON format and saves the running commands and parameters.

A file prefixed with abundance holds the quantitative information of the transcript. Among them, H6 is a file in HDF5 format, and when there are a large number of transcripts, the file size in this format will be much smaller than that in plain text; the file in tsv is plain text, and the contents are as follows

Target_idlengtheff_lengthest_countstpmNR_103451865664.44990.493026NM_001243523577376.636312.99591NR_03893124322231.436.99640.603491

For HDF5 files, you can use the following command to convert to tsv format files

Kallisto h6dump-o out_dir abundance.h6

The-o parameter specifies the directory where the results are output, and the resulting file name is abundance.tsv.

That's all for the content of "how to use kallisto". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.