Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use MISO to analyze variable Shearing

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

How to use MISO for variable shear analysis, many novices are not very clear about this, in order to help you solve this problem, the following small series will explain in detail for everyone, there are people who need this can learn, I hope you can gain something.

MISO is a classic variable shear analysis tool that, like rmats, supports quantitative and differential analysis of variable shear events.

This software supports variable splice analysis at both exon and transcript levels. In the article on rmats, we also mentioned that rmats are variable splice results given from exon level. Because of the characteristics of second generation sequencing read length, it is impossible to effectively obtain the full length of transcript. The results obtained from exon level are more accurate, and positive results are easier to verify by RT-PCR. However, it is impossible to explore the changes between different isoforms of a gene in detail. Transcript level directly gives quantitative and differences between different isoforms, which can effectively explore the changes of different isoforms of genes, but the accuracy of the results is poor.

The software is a python package, can be installed directly through pip, analysis pipeline is as follows

1. Indexing GFF files of reference genomes

For transcript-level analysis, only GFF files of transcripts need to be provided, gtf files of reference genomes can be downloaded from Ensembl and other databases, and then converted into GFF3 format by themselves; for exon level, GFF files of known variable splicing events need to be provided, as shown below

chr1 SE gene 4772649 4775821 . - . ID=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-;Name=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-chr1 SE mRNA 4772649 4775821 . - . ID=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-.A;Parent=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-chr1 SE mRNA 4772649 4775821 . - . ID=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-.B;Parent=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-chr1 SE exon 4775654 4775821 . - . ID=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-.A.up;Parent=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-.Achr1 SE exon 4774032 4774186 . - . ID=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-.A.se;Parent=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-.Achr1 SE exon 4772649 4772814 . - . ID=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-.A.dn;Parent=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-.Achr1 SE exon 4775654 4775821 . - . ID=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-.B.up;Parent=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-.Bchr1 SE exon 4772649 4772814 . - . ID=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-.B.dn;Parent=chr1:4775654:4775821:-@chr1:4774032:4774186:-@chr1:4772649:4772814:-.B

The second column indicates the type of variable splice, taking exon skipping as an example. The format of ID is as follows

chr1:4775654:4775821:-@chr1:4774032:4774186:@chr1:4772649:4772814

It contains 3 exons separated by @ symbol, skipped exons of middle exon, the first exon is upstream exon, the second exon is downstream exon, corresponding to 3 exons in the following schematic diagram

Transcript-level GFF files can be downloaded from the database, while exon-level GFF files need to identify different isoforms of variable shear first, and then collate them. For common species such as humans and mice, the official website provides exon-level GFF files. The link is as follows

https://miso.readthedocs.io/en/fastmiso/annotation.html

After preparing the GFF file, you can create an index. The command is as follows

index_gff --index ensGene.gff3 index_db

index_db is the directory saved by the index.

2. Run miso

Run miso requires the index built in the first step and the bam file corresponding to the sample. The bam file must be sorted and have a corresponding bai index. For double-ended data, the usage is as follows

miso --runindex_db \algin.sorted.bam \ --output-dir out_dir \--read-len 150 \--paired-end 250 15 \--settings-filename miso_settings.txt

read-len is the average length of reads, paired-end represents the average and variance of insert lengths, miso_settings.txt is the configuration file, which reads as follows

[data]filter_results = Truemin_event_reads = 20strand = fr-unstranded[sampler]burn_in = 500lag = 10num_iters = 5000num_processors = 4

There are many parameters in the configuration file, so I won't explain them one by one. Please refer to the official documentation for the meaning of each parameter.

The results obtained by the above method can be directly used for subsequent difference analysis, but this result is not conducive to our review, so the official provides a summary program, the usage is as follows

summarize_miso \--summarize-samples \raw_out/ \summary_out13. Analysis of differences between samples

The codes for performing the between-sample difference analysis are as follows

compare_miso --compare-samples control case/ comparisons/

In the output directory, a file with the suffix bf is generated.

4. the result is filtered

used as follows

filter_events \--filter case_vs_control.miso_bf \--num-inc 1 \--num-exc 1 \--num-sum-inc-exc 10 \--delta-psi 0.20 \--bayes-factor 10 \--output-dir filter_dir5. visualization

used as follows

sashimi_plot \--plot-event "chr1:7778:7924:-@chr1:7096:7605:-@chr1:6717:6918:-" \index_db/ \sashimi_plot_settings.txt \--output-dir out_dir

sashimi_plot_settings.txt is a configuration file in which the bam file of the sample and the output result of the variable cut are set, an example is as follows

[data]# directory where BAM files arebam_prefix = ./ test-data/bam-data/# directory where MISO output ismiso_prefix = ./ test-data/miso-data/bam_files = [ "heartWT1.sorted.bam", "heartWT2.sorted.bam", "heartKOa.sorted.bam", "heartKOb.sorted.bam"]miso_files = [ "heartWT1", "heartWT2", "heartKOa", "heartKOb"][plotting]# Dimensions of figure to be plotted (in inches)fig_width = 7fig_height = 5# Factor to scale down introns and exons byintron_scale = 30exon_scale = 4# Whether to use a log scale or not when plottinglogged = Falsefont_size = 6# Max y-axisymax = 150# Whether to plot posterior distributions inferred by MISOshow_posteriors = True# Whether to show posterior distributions as bar summariesbar_posteriors = False# Whether to plot the number of reads in each junctionnumber_junctions = Trueresolution = .5posterior_bins = 40gene_posterior_ratio = 5# List of colors for read denisites of each samplecolors = [ "#CC0011", "#CC0011", "#FF8800", "#FF8800"]# Number of mapped reads in each sample# (Used to normalize the read density for RPKM calculation)coverages = [ 6830944, 14039751, 4449737, 6720151]# Bar color for Bayes factor distribution# plots (--plot-bf-dist)# Paint them bluebar_color = "b"# Bayes factors thresholds to use for --plot-bf-distbf_thresholds = [0, 1, 2, 5, 10, 20]

The final result is as follows

This graph is called sashimi plot , which is a graph dedicated to variable shear visualization. The above schematic diagram shows the expression of an exon skipping event in different samples. The lower left is the exon structure in the GFF file. The upper left is the visualization of reads aligned with exon in each sample. It is represented by RPKM. Different shear modes are linked by curves. The number of reads aligned in this region is marked on the curve. Samples in different groups are represented by different colors. The picture on the right is the expression magnitude of the corresponding variable crop in the sample.

From this figure, we can intuitively see whether there is any difference in variable shear expression between the two groups of samples. In the above figure, the expression amount in the heartWT group is higher than that in the heartKO group.

In actual analysis, due to the need to manually sort out the gf files corresponding to variable shear isoprom, it is difficult to use, but the visualization function provided is very worth learning.

Did reading the above help you? If you still want to have further understanding of related knowledge or read more related articles, please pay attention to the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report