Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the use of the snpEff tool

2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article is to share with you about the use of snpEff tools. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

1. Query the list of all available databases

The command is as follows

Java-jar snpEff.jar databases > snpEff.databases.list.txt

At present, there are 42791 databases. The contents of the snpEff.databases.list.txt file are as follows.

The available database of the species and the corresponding download link are given.

two。 Download the database

Take human as an example, first look at which databases are available.

Grep-I "Homo_sapiens" snpEff.databases.list.txt | cut-F1

GRCh47.75 GRCh48.86 hg19 hg19kg hg38 Hg38kg testHg19ChrM

Take the GRCh48.86 database as an example, the download command is as follows

Java-jar snpEff.jar download GRCh48.86

After the download is successful, under the data folder of the software installation directory, there will be a folder named after the database, which contains all the downloaded files.

GRCh48.86/ ├── cytoBand.txt.gz ├── interactions.bin ├── motif.bin ├── nextProt.bin ├── pwms.bin ├── sequence.X.bin ├── sequence.Y.bin └── snpEffectPredictor.bin3. Make a comment

The command is as follows:

Java-jar snpEff.jar GRCh48.86 examples/test.chr22.vcf > test.chr22.ann.vcf

GRCh48.86 represents the name of the database. Test.chr22.vcf is the input file. The input file format is VCF. The contents are as follows

The output file test.chr22.ann.vcf is as follows

As you can see, a new field information is added to the INFO column of the input file. The name of the field is ANN. For more information about the various parts of the ANN, please refer to the comments section of the VCF header. By default, the following information is given, taking the first mutation site as an example.

1. Allele

After the mutation, the first mutation site changes from T base to C base, and the corresponding value of Allel is C.

2.Annotation

For the type of mutation defined by sequence ontology, the downstream_gene_variant of the first mutation site is located in the SO system as follows

If the variation site belongs to multiple types, the & symbols are used to connect the multiple types, such as

Intron_variant&nc_transcript_variant

3. Annotation_Impact

For the simple evaluation of the harmful degree of the variation site, there are four values: HIGH, MODERATE, LOW and MODIFIER, which means as follows.

4. Gene_Name

Gene name

5. Gene_ID

Gene ID

6. Feature_Type

The types of features you want to analyze, transcript, motif, miRNA, etc.

7. Feature_ID

According to the characteristics specified by Feature Type, the corresponding ID is given.

8. Transcript_BioType

Transcript type, usually using the transcript type of Ensembl database

9. Rank

It has a value only when the mutation site is located in the gene region. It will give the number of the exon/intron where the mutation site is located and the total number of exon/intron of the gene. For example, if a mutation site is located on the third exon of the gene, the gene has a total of 12 exon, and the corresponding Rank value is 3gam12.

When the mutation site is outside the gene region, the value of this field is empty.

10. HGVS.c

Variation at the gene level named by the HGVS standard

11. HGVS.p

The variation at the protein level named according to the HGVS standard will be valuable only if the mutation site is in the coding region.

12. CDNA.pos/cDNA.length

Location of the mutation site on cDNA / total length of cDNA

13. CDS.pos/CDS.length

Location of the mutation site on CDS / total length of CDS

14. AA.pos/AA.length

Position of the mutation site on the amino acid sequence / total length of the amino acid sequence

15. Distance

The distance between the variation site and the nearest feature is given when the variation site is located in the intergenic region, and the distance from the nearest intron boundary is given when the variation site is located in the exon region.

16. ERRORS/WARNINGS/INFO

The reliability of the annotation results is evaluated, and the meanings of various values are as follows

Thank you for reading! This is the end of this article on "what is the use of snpEff tools?". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report