In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article is to share with you about the use of snpEff tools. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
1. Query the list of all available databases
The command is as follows
Java-jar snpEff.jar databases > snpEff.databases.list.txt
At present, there are 42791 databases. The contents of the snpEff.databases.list.txt file are as follows.
The available database of the species and the corresponding download link are given.
two。 Download the database
Take human as an example, first look at which databases are available.
Grep-I "Homo_sapiens" snpEff.databases.list.txt | cut-F1
GRCh47.75 GRCh48.86 hg19 hg19kg hg38 Hg38kg testHg19ChrM
Take the GRCh48.86 database as an example, the download command is as follows
Java-jar snpEff.jar download GRCh48.86
After the download is successful, under the data folder of the software installation directory, there will be a folder named after the database, which contains all the downloaded files.
GRCh48.86/ ├── cytoBand.txt.gz ├── interactions.bin ├── motif.bin ├── nextProt.bin ├── pwms.bin ├── sequence.X.bin ├── sequence.Y.bin └── snpEffectPredictor.bin3. Make a comment
The command is as follows:
Java-jar snpEff.jar GRCh48.86 examples/test.chr22.vcf > test.chr22.ann.vcf
GRCh48.86 represents the name of the database. Test.chr22.vcf is the input file. The input file format is VCF. The contents are as follows
The output file test.chr22.ann.vcf is as follows
As you can see, a new field information is added to the INFO column of the input file. The name of the field is ANN. For more information about the various parts of the ANN, please refer to the comments section of the VCF header. By default, the following information is given, taking the first mutation site as an example.
1. Allele
After the mutation, the first mutation site changes from T base to C base, and the corresponding value of Allel is C.
2.Annotation
For the type of mutation defined by sequence ontology, the downstream_gene_variant of the first mutation site is located in the SO system as follows
If the variation site belongs to multiple types, the & symbols are used to connect the multiple types, such as
Intron_variant&nc_transcript_variant
3. Annotation_Impact
For the simple evaluation of the harmful degree of the variation site, there are four values: HIGH, MODERATE, LOW and MODIFIER, which means as follows.
4. Gene_Name
Gene name
5. Gene_ID
Gene ID
6. Feature_Type
The types of features you want to analyze, transcript, motif, miRNA, etc.
7. Feature_ID
According to the characteristics specified by Feature Type, the corresponding ID is given.
8. Transcript_BioType
Transcript type, usually using the transcript type of Ensembl database
9. Rank
It has a value only when the mutation site is located in the gene region. It will give the number of the exon/intron where the mutation site is located and the total number of exon/intron of the gene. For example, if a mutation site is located on the third exon of the gene, the gene has a total of 12 exon, and the corresponding Rank value is 3gam12.
When the mutation site is outside the gene region, the value of this field is empty.
10. HGVS.c
Variation at the gene level named by the HGVS standard
11. HGVS.p
The variation at the protein level named according to the HGVS standard will be valuable only if the mutation site is in the coding region.
12. CDNA.pos/cDNA.length
Location of the mutation site on cDNA / total length of cDNA
13. CDS.pos/CDS.length
Location of the mutation site on CDS / total length of CDS
14. AA.pos/AA.length
Position of the mutation site on the amino acid sequence / total length of the amino acid sequence
15. Distance
The distance between the variation site and the nearest feature is given when the variation site is located in the intergenic region, and the distance from the nearest intron boundary is given when the variation site is located in the exon region.
16. ERRORS/WARNINGS/INFO
The reliability of the annotation results is evaluated, and the meanings of various values are as follows
Thank you for reading! This is the end of this article on "what is the use of snpEff tools?". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.