In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
How to use SnpSift to annotate the mutation sites of vcf files to clinvar database, I believe that many inexperienced people do not know what to do about this. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
First, be familiar with clinvar database.
ClinVar is a disease-related database of human genome mutations sponsored by NCBI. Its strength lies in the integration of dbSNP, dbVar, Pubmed, OMIM and other databases in terms of genetic variation and clinical phenotype data information to form a standard and credible genetic variation-clinical related database.
The annotation of clinvar can find out the corresponding gene variation information, occurrence frequency, phenotype, clinical significance, review status and chromosome location.
First, we go to the ftp of the clinvar database to find the database file, and then download the latest version of the file. I use the shell command here:
# # ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/
Mkdir-p ~ / annotation/variation/human/clinvar
Cd ~ / annotation/variation/human/clinvar
Wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/disease_names
# mkdir vcf_GRCh47 & & cd vcf_GRCh47
Mkdir vcf_GRCh48 & & cd vcf_GRCh48
Wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh48/clinvar_20200706.vcf.gz
Wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh48/clinvar_20200706.vcf.gz.tbi
The records of mutation sites in this ClinVar database are updated quite quickly.
Then become familiar with SnpSift software.
SnpSift software is very powerful, it is recommended that you read its manual carefully, http://snpeff.sourceforge.net/protocol.html
Example 1: Coding variantsExample 2:Software IntegrationExample 3: Non-Coding variantsExample 4: Sequencing data analysisExample 5: Filter variants (dbSnp) Example 6: Custom annotations
If we want to use SnpSift to annotate the variation sites of the vcf file to the clinvar database, an example of the command we need to use is:
Java-Xmx1g-jar ~ / biosoft/snpEff/snpEff/SnpSift.jar\
Annotate\
-v ~ / annotation/variation/human/clinvar/clinvar_20200706.vcf.gz\
New.filter.sort.vcf\
> new.clinvar.vcf
In general, the annotation ratio is not too high because the number of sites recorded in the clinvar database is limited, as follows:
Total annotated entries: 6231
Total entries: 54972
Percent: 11.33%
If you choose other databases, such as dbSNP, exac, and gnomad, the annotation ratio will be much higher.
Interpreting the annotation results of clinvar database
In fact, there is no way to interpret it. Generally speaking, WES data analysis results show that there are 100000 variation sites, of which about 20, 000 exon regions, then about 2, 000 sites will be annotated by clinvar, which is still a considerable order of magnitude.
We must have a priori knowledge, for example, individuals who know the source of this WES data are suffering from certain diseases, for example, diseases related to retinal degeneration can be searched for:
"Pigmentary retinal degeneration"
"Rod-cone dystrophy"
"Retinitis pigmentosa"
If you find that the mutations found in the search are all Benign, and there is no Pathogenic, it will be troublesome. You need to classify and analyze according to the ACMG guidelines, focusing on:
1. PM1: located in the hot spot mutation region, and / or located in the key functional domain where there is no known benign variation (such as the active site of the enzyme).
2. PM2: mutations (or extremely low frequency loci in recessive genetic diseases) not found in normal controls in ESP database, thousand population database and EXAC database.
3. PP1: mutation and disease were separated in the pedigree (this mutation was detected in multiple patients in the pedigree). Note: if there is more evidence, it can be used as stronger evidence.
4. PP3: a variety of statistical methods have been used to predict the harmful effects of the variation on genes or gene products, including conservative prediction, evolutionary prediction, splice site effects and so on. Note: since many bioinformatics algorithms use the same or very similar inputs when making predictions, each algorithm should not be counted as a separate standard.
If there are still a lot of sites in this way, it can be narrowed down directly to genes, such as 37 disease-related genes, such as "Retinitis pigmentosa", so that there are fewer mutations in these genes, from which harmful mutations are selected and the population frequency is low. But in fact, there is a problem, since it has been limited to specific disease-related genes, then why do you still have to do WES, just go to a panel? See: the exon study of the family should locate the known disease-related genes anyway.
For the classification and analysis of ACMG guidelines, it is recommended to read the materials and methods of Germline pathogenic variants of 11 breast cancer genes in 7051 Japanese patients and 11241 controls articles, mainly a lot of detailed exploration, very interesting.
Clineff software is also recommended.
The home page is: http://www.dnaminer.com/clineff.html
After reading the above, do you know how to use SnpSift to annotate the variation sites in vcf files to the clinvar database? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.