Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use R packet SomaticSignatures for signature inference of denovo

2025-03-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "how to use R-package SomaticSignatures for signature inference of denovo". In daily operation, I believe many people have doubts about how to use R-package SomaticSignatures for denovo signature inference. The editor consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "how to use R-package SomaticSignatures for denovo signature inference". Next, please follow the editor to study!

First read the documentation for the SomaticSignatures package

The original text is: http://bioconductor.org/packages/release/bioc/vignettes/SomaticSignatures/inst/doc/SomaticSignatures-vignette.html

Library (SomaticSignatures)

Library (SomaticCancerAlterations)

Library (BSgenome.Hsapiens.1000genomes.hs37d5)

Sca_metadata = scaMetadata ()

Sca_metadata

Sca_data = unlist (scaLoadDatasets ())

Sca_data$study = factor (gsub ("(. *) _ (. *)", "\\ 1", toupper (names (sca_data)

Sca_data = unname (subset (sca_data, Variant_Type% in% "SNP")

Sca_data = keepSeqlevels (sca_data, hsAutosomes (), pruning.mode = "coarse")

Sca_vr = VRanges (

Seqnames = seqnames (sca_data)

Ranges = ranges (sca_data)

Ref = sca_data$Reference_Allele

Alt = sca_data$Tumor_Seq_Allele2

SampleNames = sca_data$Patient_ID

Seqinfo = seqinfo (sca_data)

Study = sca_data$study)

Sca_vr

As you can see, this package requires the columns in the sca_data variable, and the columns c ("Sample", "chr", "pos", "ref", "alt") are used. So our own somatic mutation information also needs to be made into these five columns.

The somatic mutation of WGS data of 508 ESCC is made into the input data of SomaticSignatures packet.

Download from the home page of the article; https://static-content.springer.com/esm/art%3A10.1038%2Fs41422-020-0333-6/MediaObjects/41422_2020_333_MOESM23_ESM.csv

This is a CSV file larger than 500m. Change the name after download, then read it into R, and the code to make the input data of the SomaticSignatures package is as follows:

Library (data.table)

B=fread ('.. / maf.csv',data.table = F)

B [1:4,1:3]

Colnames (b)

Mut=b

Table (mut$Variant_Type)

Mut= Mutt [mut $Variant_Type=='SNP',]

A=mut [, c (10, 2, 3, 8, 9)]

Colnames (a) = c ("Sample", "chr", "pos", "ref", "alt")

Alls=as.character (unique (a$Sample))

A$study=a$Sample

Head (a)

Although we use the fread function of the data.table package, we can read CSV files larger than 500m very quickly, but it will take some time.

The variable an is made as follows:

> head (a)

Sample chr pos ref alt study

2 FP1705100059DN01 chr1 4870770 G T FP1705100059DN01

3 FP1705100059DN01 chr1 5111686 C T FP1705100059DN01

4 FP1705100059DN01 chr1 5116099 C T FP1705100059DN01

5 FP1705100059DN01 chr1 5151401 C T FP1705100059DN01

6 FP1705100059DN01 chr1 5151403 G C FP1705100059DN01

7 FP1705100059DN01 chr1 5217189 G A FP1705100059DN01

A very common data box, not the documentation of the SomaticSignatures package describes the type of sca_data variable, but there should be five columns of information.

Sca_vr = VRanges (

Seqnames = a$chr

Ranges = IRanges (start = axipotheta end = a$pos+1)

Ref = a$ref

Alt = a$alt

SampleNames = as.character (a$Sample)

Study=as.character (a$study))

Sca_vr

Extracting the mutation context has calculated the proportion of 96 mutation forms.

The SomaticSignatures package is already an encapsulated function, which can be easily obtained and is super fast. The code is as follows:

# the coordinates of the mutation site are based on hg19, and the base context is obtained from the genome according to the coordinates.

Sca_motifs = mutationContext (sca_vr, BSgenome.Hsapiens.UCSC.hg19)

Head (sca_motifs)

# calculate the proportional distribution of 96 mutation probability for each sample

Escc_sca_mm = motifMatrix (sca_motifs, group = "study", normalize = TRUE)

Dim (escc_sca_mm)

Table (colSums (escc_sca_mm))

Head (escc_sca_mm [, 1:4])

Use NMF to determine the number of signature for denovo

As we all know, the scientist of sanger Institute [1] put forward the concept of signature of tumor somatic mutation, and decomposed the 30 features of the non-negative matrix of the 96 mutation spectrum, which can be learned in the cosmic database. Different characteristics have different biological meanings. For example, article [3] uses these signature to distinguish survival! The main reason is that R-packet deconstructSigs can map its own 96 mutation spectrum to 30 mutation features in cosmic database.

[1] https://software.broadinstitute.org/cancer/cga/msp [2] https://en.wikipedia.org/wiki/Mutational_signatures [3] https://www.nature.com/articles/s41586-019-1056-z

But now we have to infer the signature of denovo ourselves, so we use the identifySignatures function of the SomaticSignatures package, and the code is as follows:

# set the number of signature to be explored in advance, and finally select 11.

If (F) {

N_sigs = 5:15

Gof_nmf = assessNumberSignatures (escc_sca_mm, n_sigs, nReplicates = 5)

Save (gof_nmf,file = 'gof_nmf.Rdata')

}

Load (file = 'gof_nmf.Rdata')

# this assessNumberSignatures step is very time-consuming.

PlotNumberSignatures (gof_nmf)

# according to this chart, select 11 signature

Sigs_nmf = identifySignatures (escc_sca_mm

11, nmfDecomposition)

Save (escc_sca_mm,sigs_nmf,file = 'escc_denovo_results.Rata')

Draw your own NMF to determine the 96 mutation spectrum of 11 signatures of denovo

The code is as follows:

Load (file = 'escc_denovo_results.Rata')

Str (sigs_nmf)

Library (ggplot2)

PlotSignatureMap (sigs_nmf) + ggtitle ("Somatic Signatures: NMF-Heatmap")

PlotSignatures (sigs_nmf, normalize = T) +

Ggtitle ("Somatic Signatures: NMF-Barchart") +

Facet_grid (signature ~ alteration,scales = "free_y")

The figure is as follows:

At this point, the study on "how to use R package SomaticSignatures for signature inference of denovo" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report