Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use ballgown to analyze the difference of transcript level

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you a difference analysis on how to use ballgown for transcript level. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

There are two common strategies for transcriptome difference analysis, one is quantitative method based on raw count, such as DESeq2, edgeR, etc., and the other is quantitative method based on FPKM/RPKM, such as cuffdiff.

In previous articles, we also mentioned that pipeline based on FPKM values was upgraded from tophat+cufflinks+cuffdiff to hisat + stringTie + ballgown. The R packet of ballgown also analyzes the difference of the expression of FPKM value. There are two ways to get the FPKM value at transcript level.

1. StringTie

In order to facilitate the downstream ballgown analysis, the input file of ballgown can be generated by directly adding the-b parameter to the stringTie software. The basic usage is as follows

Stringtie-p 10\-G hg19.gtf\-o output.gtf\-b ballgown_out_dir-e\ align.sorted.bam2. Tablemaker

Tablemaker software can also generate input files for ballgown by calling cufflinks software, which can be downloaded from the following link

Https://figshare.com/articles/Tablemaker_Linux_Binary/1053137

The basic usage is as follows

Tablemaker\-p 4\-Q-W\-G hg19.gtf\-o out_dir\ align.sorted.bam

For each sample, a folder is generated containing the following five files

E_data.ctabe2t.ctabi2t.ctabi_data.ctabt_data.ctab

E stands for exon, I stands for intron, and t represents transcript,_data for different levels of expression. I2t represents the correspondence between intron and transcript, and e2t represents the correspondence between exon and transcript.

Once the input file is ready, you can do the difference analysis. Today's R packets are highly encapsulated, and a few functions can complete the whole set of analysis. The first step is to read the input files for all the samples, as follows

Library (ballgown) bg = ballgown (samples = c ("sampleA.dir", "sampleB.dir"), meas='all')

Samples specifies the input folder for all sample ballgown. After the import is successful, you can view the expression information of the sample at different levels in R through the * expr function. The value range of * is I, e, t, g, representing different levels.

An example of code to view expression at the transcript level is as follows

Transcript_fpkm = texpr (bg, 'FPKM')

It should be noted that the expression levels of intron, exon and transcript are all available in the original ctab file, while the expression level of gene needs to be calculated according to the expression of the corresponding transcripts of the gene, so it is more time-consuming.

After reading, you need to set the sample grouping as follows

PData (bg)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 204

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report