In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Today, I would like to talk to you about how to achieve fusion gene operation in STAR-fusion, many people may not know much about it. In order to make you understand better, the editor summarized the following content for you. I hope you can get something from this article.
1. Running time
As you can see from the figure above, the running time of STAR-fusion has a significant advantage.
2. ROC curve
ROC curve is used to evaluate the quality of software analysis results, Abscissa is false positive rate, referred to as FPR, represents the false positive rate of analysis results, ordinate is true positive rate, also known as sensitivity sensitivity. For an ideal analysis result, the lower the false positive rate, the better, and the higher the sensitivity, the better.
In the ROC curve, the area under the curve is called AUC value. The larger the AUC value of a software is, the best comprehensive effect is.
As can be seen from the above figure, the analysis result of STAR-fusion is better for the test data in the article.
The installation of the software is relatively simple. You can download the file directly and decompress it. The running process is as follows.
It should be noted that STAR-fusion relies on STAR to align sequences. The software STAR runs very fast, but its memory consumption is huge. For the human genome, comparing a sample requires about 30 gigabytes of memory. If used for fusion gene detection, the memory used will rise to about 40 gigabytes, which is a test for computing resources. In actual analysis, The number of parallel samples should be set reasonably according to the existing hardware resources.
The specific operation process of STAR-fusin is as follows
1. Establish reference lib
First of all, we need to establish the corresponding reference lib of the reference genome, at least refer to the corresponding fasta file and gtf file of the genome, and can also provide the annotation of the existing fusion gene.
For human and mouse, the files that have been built are provided with the following links
Https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/
The plug-n is the reference lib that has been established, and the source contains the required raw files. The command to build reference lib from the original file is as follows
FusionFilter/prep_genome_lib.pl\-genome_fa ref_genome.fa\-gtf ref_annot.gtf\-fusion_annot_lib CTAT_HumanFusionLib.dat.gz\-annot_filter_rule AnnotFilterRule.pm\-pfam_db PFAM.domtblout.dat.gz
The perl script is integrated in the star-fusion installation directory. Pfam_db and anno_filter_rule can be obtained from the source package in the image above, while fusion_annot_lib is the annotation information of the fusion gene. For humans and mice, the corresponding comment file is provided in the image above, if not, it may not be provided.
By default, a directory called ctat_genome_lib_build_dir is generated in the current directory, where all the result files are saved.
two。 Run STAR-fusion
STAR-fusion supports two modes, the first is to start directly from fastq, and the second is to manually compare STAR yourself, and then run STAR-fusion. The use of the first mode is as follows
Double-ended sequencing
STAR-Fusion\-genome_lib_dir CTAT_resource_lib\-left_fq reads_1.fq\-right_fq reads_2.fq\-output_dir star_fusion_outdir
Single-ended sequencing
STAR-Fusion\-genome_lib_dir CTAT_resource_lib\-left_fq reads_1.fq\-output_dir star_fusion_outdir
CTAT_resource_lib is the directory where the first step of the established reference lib is located. The pattern that is analyzed directly according to the results of STAR comparison is called Kickstart pattern, and its usage is as follows
1. STAR compares STAR-- genomeDir ${star_index_dir}\-- readFilesIn ${left_fq_filename} ${right_fq_filename}\ -twopassMode Basic\-outReadsUnmapped None\-chimSegmentMin 12\ -- chimJunctionOverhangMin 12\-- alignSJDBoverhangMin 10\ -- alignMatesGapMax 100000\-- alignIntronMax 100000\ -- chimSegmentReadGapMax 3\-- alignSJstitchMismatchNmax 5-1 55\-- runThreadN ${THREAD_COUNT}\ -- outSAMstrandField intronMotif\-- chimOutJunctionFormat 12. Run STAR-fusionSTAR-Fusion\-genome_lib_dir CTAT_resource_lib\-J Chimeric.out.junction\-output_dir star_fusion_outdir
The output file for STAR-fusion is named
Star-fusion.fusion_predictions.tsv
There are a large number of columns, and some of the screenshots are as follows
Among them, JunctionRead and SpanningFrag, as described in the previous article, the more the number of these reads, the more likely it is to be a true fusion gene. SpliceType indicates whether the breakpoint breakpoint is located at the boundary of exon. For more detailed interpretation of the results, please refer to the official documentation.
After reading the above, do you have any further understanding of how to implement fusion gene operation in STAR-fusion? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.