How to realize Fusion Gene Operation in STAR-fusion 04/27 Update SLTechnology News&Howtos

How to realize Fusion Gene Operation in STAR-fusion

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Today, I would like to talk to you about how to achieve fusion gene operation in STAR-fusion, many people may not know much about it. In order to make you understand better, the editor summarized the following content for you. I hope you can get something from this article.

1. Running time

As you can see from the figure above, the running time of STAR-fusion has a significant advantage.

2. ROC curve

ROC curve is used to evaluate the quality of software analysis results, Abscissa is false positive rate, referred to as FPR, represents the false positive rate of analysis results, ordinate is true positive rate, also known as sensitivity sensitivity. For an ideal analysis result, the lower the false positive rate, the better, and the higher the sensitivity, the better.

In the ROC curve, the area under the curve is called AUC value. The larger the AUC value of a software is, the best comprehensive effect is.

As can be seen from the above figure, the analysis result of STAR-fusion is better for the test data in the article.

The installation of the software is relatively simple. You can download the file directly and decompress it. The running process is as follows.

It should be noted that STAR-fusion relies on STAR to align sequences. The software STAR runs very fast, but its memory consumption is huge. For the human genome, comparing a sample requires about 30 gigabytes of memory. If used for fusion gene detection, the memory used will rise to about 40 gigabytes, which is a test for computing resources. In actual analysis, The number of parallel samples should be set reasonably according to the existing hardware resources.

The specific operation process of STAR-fusin is as follows

1. Establish reference lib

First of all, we need to establish the corresponding reference lib of the reference genome, at least refer to the corresponding fasta file and gtf file of the genome, and can also provide the annotation of the existing fusion gene.

For human and mouse, the files that have been built are provided with the following links

Https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/

The plug-n is the reference lib that has been established, and the source contains the required raw files. The command to build reference lib from the original file is as follows

FusionFilter/prep_genome_lib.pl\-genome_fa ref_genome.fa\-gtf ref_annot.gtf\-fusion_annot_lib CTAT_HumanFusionLib.dat.gz\-annot_filter_rule AnnotFilterRule.pm\-pfam_db PFAM.domtblout.dat.gz

The perl script is integrated in the star-fusion installation directory. Pfam_db and anno_filter_rule can be obtained from the source package in the image above, while fusion_annot_lib is the annotation information of the fusion gene. For humans and mice, the corresponding comment file is provided in the image above, if not, it may not be provided.

By default, a directory called ctat_genome_lib_build_dir is generated in the current directory, where all the result files are saved.

two。 Run STAR-fusion

STAR-fusion supports two modes, the first is to start directly from fastq, and the second is to manually compare STAR yourself, and then run STAR-fusion. The use of the first mode is as follows

Double-ended sequencing

STAR-Fusion\-genome_lib_dir CTAT_resource_lib\-left_fq reads_1.fq\-right_fq reads_2.fq\-output_dir star_fusion_outdir

Single-ended sequencing

STAR-Fusion\-genome_lib_dir CTAT_resource_lib\-left_fq reads_1.fq\-output_dir star_fusion_outdir

CTAT_resource_lib is the directory where the first step of the established reference lib is located. The pattern that is analyzed directly according to the results of STAR comparison is called Kickstart pattern, and its usage is as follows

1. STAR compares STAR-- genomeDir ${star_index_dir}\-- readFilesIn ${left_fq_filename} ${right_fq_filename}\ -twopassMode Basic\-outReadsUnmapped None\-chimSegmentMin 12\ -- chimJunctionOverhangMin 12\-- alignSJDBoverhangMin 10\ -- alignMatesGapMax 100000\-- alignIntronMax 100000\ -- chimSegmentReadGapMax 3\-- alignSJstitchMismatchNmax 5-1 55\-- runThreadN ${THREAD_COUNT}\ -- outSAMstrandField intronMotif\-- chimOutJunctionFormat 12. Run STAR-fusionSTAR-Fusion\-genome_lib_dir CTAT_resource_lib\-J Chimeric.out.junction\-output_dir star_fusion_outdir

The output file for STAR-fusion is named

Star-fusion.fusion_predictions.tsv

There are a large number of columns, and some of the screenshots are as follows

Among them, JunctionRead and SpanningFrag, as described in the previous article, the more the number of these reads, the more likely it is to be a true fusion gene. SpliceType indicates whether the breakpoint breakpoint is located at the boundary of exon. For more detailed interpretation of the results, please refer to the official documentation.

After reading the above, do you have any further understanding of how to implement fusion gene operation in STAR-fusion? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.