In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
What are the three ways for VCF to convert PLINK format? I believe many inexperienced people are at a loss about this. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
Plink is the most widely used association analysis software at present. Its defined ped/map file system and its corresponding binary bed/bim/fam have become the standard file format for correlation analysis. Before the association analysis, the first thing we need to do is to convert the files in other formats to the corresponding file format of plink.
As a standard format for storing typing results, VCF format is also widely used in practical analysis. This paper summarizes three ways to convert vcf files to plink corresponding file formats, which are shown in detail as follows.
1. Gatk3
In gatk3, a feature called VariantsToBinaryPed is provided to convert the VCF format into a binary bed file corresponding to plink. The basic usage is as follows
Java-jar GenomeAnalysisTK.jar\
-T VariantsToBinaryPed\
-R reference.fasta\
-V input.vcf\
-m input.fam\
-bed output.bed\
-bim output.bim\
-fam output.fam
Three input files are required. The-R parameter specifies the fasta file of the reference genome, the-V parameter specifies the VCF file, and the-m parameter is called metadata. The family information corresponding to the sample is saved and two file formats are supported. The first is as follows
Corresponding to the first six columns of the ped file, if the information of the parents of the sample is not clear, it will be expressed as unknown, abbreviated to UNKN, and-9 if there is no phenotypic information. The second indication is as follows
Since only the typing results of the samples are saved in the VCF file, the family information is supplemented by an additional file. Please refer to the following documents for more details
Https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_VariantsToBinaryPed.php
2. Vcftools
Vcftools is a common tool for manipulating vcf files. It supports the conversion of vcf files to the corresponding ped/map format of plink. The basic usage is as follows
Vcftools-vcf input.vcf-plink-out output
There is no additional family information. In the output, family id and sample id are duplicated, and the other columns are all zero, as shown below
3. Plink
The plink1.9 version supports direct reading of vcf/gen and other file formats, so there is no need for special format conversion when using this version. The software will convert different formats to binary bed file format by default. Here is just to show the use of its format conversion, the basic usage is as follows
Plink-vcf input.vcf-recode-out output-double-id
Default conversion to binary bed format, more suitable for analysis, add-- recode parameter to adjust the output to ped format, you can better view the conversion rules.
Plink uses an underscore to separate the sample name by default, and the two separated fields are used as family id and sample id in the ped file, respectively. If the sample name in vcf contains multiple underscores and cannot be divided correctly, the software will report an error. At this time, you can modify the-id-delim parameter, which sets the separator, the default is underscore, and can be set to other characters to achieve the purpose of correct distinction.
In addition, there is a solution that specifies the way family_id is set through parameters. The first one, such as double_id in the above example, keeps family id and sample id the same. The first six columns of the output ped file are shown below.
The second parameter is used as follows
Plink-vcf input.vcf-recode-out output-const-fid family_id
Set family id to a constant with a default value of 0 through-- const-fid. The first six columns of the ped file output from the above example are shown below
For parents, the sex is filled with 0 by default, and the phenotype is filled with-9 by default.
After reading the above, have you mastered what are the three ways for VCF to convert PLINK format? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.