In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "how to use R-package SomaticSignatures for signature inference of denovo". In daily operation, I believe many people have doubts about how to use R-package SomaticSignatures for denovo signature inference. The editor consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "how to use R-package SomaticSignatures for denovo signature inference". Next, please follow the editor to study!
First read the documentation for the SomaticSignatures package
The original text is: http://bioconductor.org/packages/release/bioc/vignettes/SomaticSignatures/inst/doc/SomaticSignatures-vignette.html
Library (SomaticSignatures)
Library (SomaticCancerAlterations)
Library (BSgenome.Hsapiens.1000genomes.hs37d5)
Sca_metadata = scaMetadata ()
Sca_metadata
Sca_data = unlist (scaLoadDatasets ())
Sca_data$study = factor (gsub ("(. *) _ (. *)", "\\ 1", toupper (names (sca_data)
Sca_data = unname (subset (sca_data, Variant_Type% in% "SNP")
Sca_data = keepSeqlevels (sca_data, hsAutosomes (), pruning.mode = "coarse")
Sca_vr = VRanges (
Seqnames = seqnames (sca_data)
Ranges = ranges (sca_data)
Ref = sca_data$Reference_Allele
Alt = sca_data$Tumor_Seq_Allele2
SampleNames = sca_data$Patient_ID
Seqinfo = seqinfo (sca_data)
Study = sca_data$study)
Sca_vr
As you can see, this package requires the columns in the sca_data variable, and the columns c ("Sample", "chr", "pos", "ref", "alt") are used. So our own somatic mutation information also needs to be made into these five columns.
The somatic mutation of WGS data of 508 ESCC is made into the input data of SomaticSignatures packet.
Download from the home page of the article; https://static-content.springer.com/esm/art%3A10.1038%2Fs41422-020-0333-6/MediaObjects/41422_2020_333_MOESM23_ESM.csv
This is a CSV file larger than 500m. Change the name after download, then read it into R, and the code to make the input data of the SomaticSignatures package is as follows:
Library (data.table)
B=fread ('.. / maf.csv',data.table = F)
B [1:4,1:3]
Colnames (b)
Mut=b
Table (mut$Variant_Type)
Mut= Mutt [mut $Variant_Type=='SNP',]
A=mut [, c (10, 2, 3, 8, 9)]
Colnames (a) = c ("Sample", "chr", "pos", "ref", "alt")
Alls=as.character (unique (a$Sample))
A$study=a$Sample
Head (a)
Although we use the fread function of the data.table package, we can read CSV files larger than 500m very quickly, but it will take some time.
The variable an is made as follows:
> head (a)
Sample chr pos ref alt study
2 FP1705100059DN01 chr1 4870770 G T FP1705100059DN01
3 FP1705100059DN01 chr1 5111686 C T FP1705100059DN01
4 FP1705100059DN01 chr1 5116099 C T FP1705100059DN01
5 FP1705100059DN01 chr1 5151401 C T FP1705100059DN01
6 FP1705100059DN01 chr1 5151403 G C FP1705100059DN01
7 FP1705100059DN01 chr1 5217189 G A FP1705100059DN01
A very common data box, not the documentation of the SomaticSignatures package describes the type of sca_data variable, but there should be five columns of information.
Sca_vr = VRanges (
Seqnames = a$chr
Ranges = IRanges (start = axipotheta end = a$pos+1)
Ref = a$ref
Alt = a$alt
SampleNames = as.character (a$Sample)
Study=as.character (a$study))
Sca_vr
Extracting the mutation context has calculated the proportion of 96 mutation forms.
The SomaticSignatures package is already an encapsulated function, which can be easily obtained and is super fast. The code is as follows:
# the coordinates of the mutation site are based on hg19, and the base context is obtained from the genome according to the coordinates.
Sca_motifs = mutationContext (sca_vr, BSgenome.Hsapiens.UCSC.hg19)
Head (sca_motifs)
# calculate the proportional distribution of 96 mutation probability for each sample
Escc_sca_mm = motifMatrix (sca_motifs, group = "study", normalize = TRUE)
Dim (escc_sca_mm)
Table (colSums (escc_sca_mm))
Head (escc_sca_mm [, 1:4])
Use NMF to determine the number of signature for denovo
As we all know, the scientist of sanger Institute [1] put forward the concept of signature of tumor somatic mutation, and decomposed the 30 features of the non-negative matrix of the 96 mutation spectrum, which can be learned in the cosmic database. Different characteristics have different biological meanings. For example, article [3] uses these signature to distinguish survival! The main reason is that R-packet deconstructSigs can map its own 96 mutation spectrum to 30 mutation features in cosmic database.
[1] https://software.broadinstitute.org/cancer/cga/msp [2] https://en.wikipedia.org/wiki/Mutational_signatures [3] https://www.nature.com/articles/s41586-019-1056-z
But now we have to infer the signature of denovo ourselves, so we use the identifySignatures function of the SomaticSignatures package, and the code is as follows:
# set the number of signature to be explored in advance, and finally select 11.
If (F) {
N_sigs = 5:15
Gof_nmf = assessNumberSignatures (escc_sca_mm, n_sigs, nReplicates = 5)
Save (gof_nmf,file = 'gof_nmf.Rdata')
}
Load (file = 'gof_nmf.Rdata')
# this assessNumberSignatures step is very time-consuming.
PlotNumberSignatures (gof_nmf)
# according to this chart, select 11 signature
Sigs_nmf = identifySignatures (escc_sca_mm
11, nmfDecomposition)
Save (escc_sca_mm,sigs_nmf,file = 'escc_denovo_results.Rata')
Draw your own NMF to determine the 96 mutation spectrum of 11 signatures of denovo
The code is as follows:
Load (file = 'escc_denovo_results.Rata')
Str (sigs_nmf)
Library (ggplot2)
PlotSignatureMap (sigs_nmf) + ggtitle ("Somatic Signatures: NMF-Heatmap")
PlotSignatures (sigs_nmf, normalize = T) +
Ggtitle ("Somatic Signatures: NMF-Barchart") +
Facet_grid (signature ~ alteration,scales = "free_y")
The figure is as follows:
At this point, the study on "how to use R package SomaticSignatures for signature inference of denovo" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.