In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Editor to share with you what is the use of CPAT software, I hope you will gain something after reading this article, let's discuss it together!
With the application of high-throughput sequencing in the field of lncRNA research, more and more lncRNA have been found. For transcriptome sequencing data, after assembling the transcript, the first thing to do is to distinguish between protein-coded and non-protein-coded RNA.
At present, there are many solutions to this problem, which can be divided into the following two categories.
Alignment-based
Alignment-free
The first algorithm is based on sequence alignment and can better identify conserved protein-coding genes, including software such as CPC,PhyloCSF. The second algorithm does not need alignment, but distinguishes it by the sequence characteristics of coding and non-coding transcripts, including CNCI, CPAT, PLEK and so on.
The conservatism of lncRNA among species is poor, and there is overlap between the chromosome location and protein coding genes of some lncRNA, so it is easy to cause misjudgment by sequence alignment. In addition, the running speed of the software based on sequence alignment is relatively slow, so the comprehensive effect of the software using the second algorithm is better.
This article mainly introduces the use of CPAT. The website is as follows.
Http://lilab.research.bcm.edu/cpat/
For a transcript, whether it is coding or noncoding is essentially a dichotomy problem, so the developers of CPAT came up with the idea of solving this problem through logical regression. The software builds a logical regression model based on the following four features to distinguish coding from noncoding.
Open reading frame size
Open reading frame coverage
Fickett TESTCODE statistic
Hexamer usage bias
The first two factors are defined for the open reading frame, the first factor is the size of the open reading frame, the second factor is the proportion of the open reading frame to the total length of the transcript, the third factor is defined based on the base composition and codon distribution of the sequence, and the fourth factor is defined based on the frequency of the hexamer in the sequence.
In this paper, according to the above four characteristics, we first evaluate the distribution in coding and noncoding, as shown below.
It can be seen that coding and noncoding form two different peaks, indicating that there are differences in these four characteristics between coding and noncoding.
In this paper, the performance of different software is evaluated by ROC curve, and the results are as follows.
You can see that CPAT and CPC are the best. CPAT is developed based on the python programming language, and the installation is very easy. The code is as follows
Pip install CPAT
The software can be run locally as well as online.
1. Online version
The URL of the online version is as follows
Http://lilab.research.bcm.edu/cpat/
You can enter a sequence in fasta format directly or a file in bed format. At this point, you need to specify the corresponding genome version, as shown below
two。 Local version
There are also two uses for the local version, and the use of input bed files is as follows
Cpat.py-r / database/hg19.fa\-g mRNA_hg19.bed\-d dat/Human_logitModel.RData\-x dat/Human_Hexamer.tsv\-o output.txt
The use of the input fasta file is as follows
Cpat.py-g transcript.fa\-d dat/Human_logitModel.RData\-x dat/Human_Hexamer.tsv\-o output.txt
The files corresponding to the-d and-x parameters are the built model of the software and are located in the installation directory of the software. The output of the software is as follows
The last column shows the protein coding information of the transcript. Yes represents that the transcript is a protein-coding transcript and no represents that the transcript is a noncoding transcript.
After reading this article, I believe you have a certain understanding of "what is the use of CPAT software". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.