How to use OncodriveCLUST to identify driving genes 04/26 Update SLTechnology News&Howtos

How to use OncodriveCLUST to identify driving genes

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article shows you how to use OncodriveCLUST to identify driver genes, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Welcome to "Shengxin training Manual"!

OncodriveCLUST is a driver gene recognition software, which mainly analyzes functional acquired mutations, namely gain-of-funciton mutations. These mutations usually gather in specific regions of proteins and may be signals of positive selection in the process of tumor cell growth advantage and tumor cell clone evolution. Through the analysis of these mutations, we can predict potential driving genes.

The corresponding article is posted on Bioinformatics with the following link

Http://bioinformatics.oxfordjournals.org/content/29/18/2238.full

The software is analyzed in terms of basic units, and the main steps are divided into five steps, as shown below.

The first step is to count the frequency distribution of function-acquired mutations at each position on the protein. As shown in figure I, the Abscissa is the position of the protein, and the ordinate represents the corresponding frequency of the mutation site at each location. The second step is to screen the non-random mutation location, and use the binomial distribution accumulation function to screen the non-random mutation location. In the graph II, the dotted line represents the threshold, and the position above the dotted line is the non-random mutation location, which means it has potential biological significance.

In the third step, these non-random locations are clustered, and each cluster corresponds to the grey area in the map III, and the distance between the two adjacent positions under each cluster is less than 5 amino acids; in the fourth step, the original cluster position is expanded, and the mutation position in the adjacent region is also included, corresponding to the gray area in the IV map, it can be seen that the gray area is wider than that in the graph III; in the fifth step, the gene is scored by using the cluster on each gene.

Using the synonymous mutation on the gene to calculate the corresponding score according to the same rule, as the background, compare whether the score calculated by each gene using non-synonymous mutation is different from the background, so as to screen out the driving genes which are different from the background model.

The software is developed based on python3 and the installation process is as follows

Yum install-y epel-release

Yum install-y gcc

Yum install-y gcc-c++

Yum install-y python34

Yum install-y python34-devel

Yum install-y python34-pip

Pip3 install oncodriveclust

The official website provides a test data set, which can be downloaded as follows

Curl-o oncodriveclust.tar.gz https://bitbucket.org/bbglab/oncodriveclust/get/0.3.tar.gz

Tar xzvf oncodriveclust.tar.gz

The basic usage is as follows

Oncodriveclust\

-m 3\

-- cgc\

Data/CGC_phenotype.tsv\

Examples/tcga.BRCA.nonsyn.txt\

Examples/tcga.BRCA.syn.txt\

Data/gene_transcripts.tsv

At least three input files are required, the txt files corresponding to non-synonymous mutations and synonymous mutations, as shown below

The most important are the first column and the last column, the first column represents the gene, and the last column represents the protein location where the mutation is located. The corresponding content of gene_transcripts.tsv is as follows

-- the cgc parameter specifies the corresponding annotation information of the gene in the CGC database, which is optional, as shown below

This file is specified, and the annotated information of the gene is included in the second column of the output. After running successfully, the default output file is oncodriveclust-results.tsv, with the following contents

The results were screened according to pvalue and qvalue, and the significant driving genes were selected.

The above is how to use OncodriveCLUST to identify driver genes. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.