In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces what gtool tools are used for. The introduction in the article is very detailed and has certain reference value. Interested friends must read it!
There are many software programs that can perform GWAS analysis, and different software programs require different input file formats. The most common file format is the ped/map file in plink, and in addition, there is the gen/sample file format.
The typing file stores the typing results of SNP loci in the sample. There are two types of information: sample and SNP typing results. First, let's look at the ped/map system. The ped file is mainly used to record the typing results of the sample and other phenotypic information. The contents are as follows
Each column is separated by spaces. The first column is the family id of the sample, the second column is the id of the sample, the third column is the id of the father sample, the fourth column is the id of the mother sample, the fifth column is the gender, 1 is male, 2 is female, the sixth column is the phenotype information of the sample, and if there is no zero, it is filled with 0. Each subsequent column represents the typing result of an snp locus.
The map file is used to record SNP locus information, and the contents are as follows
Each column is separated by a space. The first column is the name of the chromosome where the snp locus is located, the second column is the snp id, the third column is the linkage distance of the snp locus, and the fourth column is the position of the snp locus on the chromosome.
In the ped/map file system, the ped file represents the family relationship and typing results of the sample, and the map file represents the metadta of the snp locus. Let's look at the gen/sample system again. The contents of the gen file are as follows
Each column is separated by a space. The first column is the name of the chromosome where the snp locus is located, the second column is the snp id, the third column is the position of the chromosome, and the fourth column is the typing result of the locus in different samples. 0 represents ref allle, 1 represents alt allel, and every two columns correspond to one sample. The sample file contents are as follows
The contents of the first two rows are fixed, and each subsequent row represents a sample. Miss represents the proportion of loci missing typing results. The above is only the display of the most basic contents of the file. There can be more columns to describe the phenotypic information of the sample. The gen/sample system is more intuitive. gen is the abbreviation of genotype, indicating the result of SNP locus typing, and sample indicates the information of the sample.
In practice, we often have to do is format conversion, file format conversion is very cumbersome but must master a skill, gtool is a special tool for genotype data formatting, the URL is as follows
https://www.well.ox.ac.uk/~cfreeman/software/gwas/gtool.html
The classic application scenarios are as follows
Subsets extracted from large typing results
Convert from ped format to gen format
Convert gen format to ped format
Combine multiple typing results
Correct positive and negative chains of typing results
Each function corresponds to an operation mode, which is used as follows
1. Subset
Subsets are extracted from the typing results, and samples and SNPs are screened. The corresponding operation mode is-S. The basic usage is as follows
gtool -S \
--g input.gen \
--s input.sample \
--og filter.gen \
--os filter.sample \
--sample_id filter.sample.id.txt \
--inclusion filter.snp.id.txt
-g and-s specify genotype data for input, --og and--os specify genotype data for output, --sample specifies sample id to retain, --inclusion specifies snp id to retain.
2. PED convert to GEN
Converting ped format to gen format, the corresponding operation mode is-P, the basic usage is as follows
gtool -P \
--ped input.ped \
--map input.map \
--og out.gen \
--os out.sample3. GEN convert to PED
Convert gen format to ped format, the corresponding operation mode is-G, the basic usage is as follows
gtool -G \
--g input.gen \
--s input.sample \
--ped out.ped \
--map out.map \4. Merge
Combine multiple typing results, the corresponding operation mode is-M, the basic usage is as follows
gtool -M \
--g input1.gen input2.gen \
--s input1.sample input2.sample \
--log merge.log5. Orient
The SNP loci are uniformly adjusted to positive strands, and the corresponding operation mode is-O. The basic usage is as follows
gtool -O \
--g input.gen \
--strand input.strand \
--og output.gen \
--log orient.log
The--strand parameter specifies a file describing the direction of SNP loci. It is a file with two columns separated by a space. The first column is the position of SNP on chromosome, and the second column is the corresponding positive and negative strand information. The contents are as follows
The SNP site for the negative strand is inverted, and the base corresponding to the allel is displayed as the positive strand.
The above is "gtool what to use" all the content of this article, thank you for reading! Hope to share the content to help everyone, more relevant knowledge, welcome to pay attention to the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
1.Standby database process status: You can run following query on standby database to see what MRP a
© 2024 shulou.com SLNews company. All rights reserved.