In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
How to use shapeit for haplotype analysis, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.
Welcome to "Shengxin training Manual"!
Shapeit is a haplotype analysis tool with fast operation speed and high accuracy. It is an official pre-phasing tool recommended by impute2.
The haplotype is analyzed by hidden Markov model, and the simplified model is shown below.
There are five subgraphs from top to bottom, represented by 1 to 5, which need to be divided into three parts. In figure 1, eight haplotypes composed of 8 loci are represented, each row represents one haplotype, each column represents one site, and in figure 2, the above haplotype is represented by a graph structure, and each node represents a SNP site, which is represented by Z1 to Z8 in turn, and the complete path from 1 to 8 represents a haplotype. Looking at figure 1, it can be found that the first four sites have only three kinds of composition, and the last four sites are the same. All haplotypes can be represented by the different connections of sites 4 and 5, and the numbers on each side represent the corresponding frequency.
Figure 5 shows the typing results of a sample, respectively. 0 indicates no mutation, 1 indicates heterozygous mutation, 2 indicates homozygous mutation. When the sample is divided into haplotypes according to the typing results, the heterozygous mutation corresponds to 2 allel. According to this typing result, the corresponding haplotype composition in figure 4 can be obtained. In figure 4, ref allel is represented by a blank box, and alt allel is represented by a black box. For the first five loci, there are two heterozygous mutations, so there are four pathways, and the last three loci are also four.
Figure 3 shows the hidden Markov model of the software, which regards the real haplotype as a hidden sequence and the haplotype predicted according to the typing result as an observation sequence. After modeling, the composition of the hidden sequence is analyzed. The real haplotype analysis results are obtained.
In the literature, the software is compared with other similar tools, and the results are as follows.
Three different data sets are used, and the running time and error rate are compared. Shapeit has the lowest error rate and the fastest running speed.
The basic usage of the software is as follows
Shapeit\
-- input-bed gwas.bed gwas.bim gwas.fam\
-- input-map genetic_map.txt\
-- output-max gwas.phased.haps gwas.phased.sample
-- thread 8
The parameters that need to be specified are divided into the following three parts
1. Input unphased genotypes
Support the following three formats
Ped/map
Bed/bim/fam
Gen/sample
Vcf
The first two are plink software formats, which are the most common file formats for GWAS analysis, and the third format is the WTCCC default file format. The fourth is the most common VCF format.
The corresponding uses of different types of input files are as follows
Shapeit\
-- input-ped gwas.ped gwas.map\
-M genetic_map.txt\
-- missing-code N\
-O gwas.phased
Shapeit\
-- input-bed gwas.bed gwas.bim gwas.fam\
-M genetic_map.txt\
-O gwas.phased
Shapeit\
-- input-gen gwas\
-M genetic_map.txt\
-O gwas.phased
Shapeit\
-- input-vcf gwas.vcf\
-M genetic_map.txt\
-O gwas.phased
For gen/sample file format, you can convert the format through the software gtool.
2. Genetic map
Referring to the linkage map corresponding to the genome, the accuracy of haplotype analysis can be improved. The official linkage map of the hapmap project is available for download. The link is as follows
Http://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#formats
This is an optional parameter, without which the software will estimate it based on a linear model.
3. Output phased haplotypes
By default, two files with suffixes haps and sample are used to describe haplotypes. The contents of the haps file are as follows
Each column is separated by a space. The first column is the name of the chromosome where the snp locus is located, the second is snp id, the third is the location of the chromosome, and the fourth is the typing result of this locus in different samples. 0 represents ref allle, 1 represents alt allel, and every two columns correspond to one sample.
The contents of the file with the suffix sample are as follows
The information used to describe the sample is separated by the same spaces, the first two lines are fixed, and each subsequent line represents a sample. The above is only a display of the most basic content of the file, and there can be more columns to describe the phenotypic information of the sample.
In impute2, phased reference panel is represented by hap/legend/sample3 files, and format conversion can be performed by the following representatives
Shapeit\
-convert\
-- input-haps gwas.phased\
-- output-ref gwas.phased.hap gwas.phased.leg gwas.phased.sam
For a detailed explanation of the different formats, please refer to the following link
Http://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#formats
Phasing the samples that need to be filled in advance can effectively improve the efficiency of filling. If the subsequent use of impute2 for genotype filling, it is recommended to use shapeit for haplotype analysis of the samples that need to be filled.
After reading the above, have you mastered how to use shapeit for haplotype analysis? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.