How to perform HLA typing of whole Genome data in HLA-VBSeq 04/27 Update SLTechnology News&Howtos

How to perform HLA typing of whole Genome data in HLA-VBSeq

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Today we will show you how to do HLA typing of genome-wide data in HLA-VBSeq. The content of the article is good. Now I would like to share it with you. Friends who feel in need can understand it. I hope it will be helpful to you. Let's read it along with the editor's ideas.

Using genome-wide sequencing data, HLA-VBseq can provide 8-bit HLA typing results. The literature links are as follows.

Https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-16-S2-S7

Using the genome-wide data of 30x, the typing results of HLA-VBSeq, PHLAT and HLAminer are evaluated. The accuracy is summarized as follows.

As you can see, only HLA-VBSeq provides 8-bit typing results, with an accuracy of 99.94%; for 2-to 4-bit typing results, its accuracy is also higher than that of the other two softwares.

At the same time, the accuracy of the 4-digit typing results provided by various softwares with different sequencing quantities is evaluated. The results are as follows.

Under different conditions, the accuracy of HLA-VBseq is the highest. Thus it can be seen that the typing effect of the software is quite good, the official website is as follows

Http://nagasakilab.csml.org/hla/

The software is developed in Java language. You can download HLAVBseq.jar directly. In addition to this file, you also need to download the following files

BamNameIndex.jar

SamToFastq.jar

Parse_result.pl

Hla_all.fasta

Allelelist.txt

The first three programs are used when dealing with fastq files; the last two files are downloaded from the IMGA/HLA database, and if you think the version provided on the official website is older, you can download the latest version from the IMGA/HLA database.

There are many steps in the software. first, compare the fastq sequence with the reference genome to get the bam file, and then operate the bam file. The steps are as follows:

1. Selection of reads located in the HLA gene region

Use the samtools view command to pick out the reads that is compared to the HLA area, as follows

Samtools view-hb align.bam chr6:29907037-29915661 chr6:31319649-31326989 chr6:31234526-31241863 chr6:32914391-32922899 chr6:32900406-32910847 chr6:32969960-32979389 chr6:32778540-32786825 chr6:33030346-33050555 chr6:33041703-33059473 chr6:32603183-32613429 chr6:32707163-32716664 chr6:32625241-32636466 chr6:32721875-32733330 chr6:32405619-32414826 chr6:32544547-32559613 chr6:32518778-32554154 chr6:32483154-32559613 chr6:30455183-30463982 chr6:29689117-29699106 chr6:29792756-29800899 chr6:29793613-29978954 chr6:29855105-2997933 chr6:29892236 -29899009 chr6:30225339-30236728 chr6:31369356-31385092 chr6:31460658-31480901 chr6:29766192-29772202 chr6:32810986-32823755 chr6:32779544-32808599 chr6:29756731-29767588 | samtools fastq-- 1 R1.fq-2 R2.fq

It is important to note that when using the view command, although it is possible to directly provide a file in bed format to select the reads of a specific area, this usage does not take advantage of the index of the bam file, so it is very slow. For the whole genome data, the bam file is very large, although the above writing method is lengthy, but the execution efficiency is high.

two。 Choose the reads that doesn't match.

Use the samtools view command to select the reads that does not match the reference genome, as follows:

Samtools view-hb-f 12 / home/pub/output/WGS/18B0315D/6343/6343_final.bam | samtools fastq-- 1 unmapped_R1.fq-2 unmapped_R2.fq3. Merge reads

Merge the reads that is aligned to the HLA region and the reads that does not match the reference genome, with the following command

Cat R1.fq unmapped_R1.fq > R1.fastqcat R2.fq unmapped_R2.fq > R2.fastq4. Compare with HLA reference reads

Using the bwa software, compare the reads obtained in the previous step with the HLA reference sequence, and the command is as follows

Bwa index hla_all.fastabwa mem-t 8-P-L 10000-a hla_all.fasta R1.fastq R2.fastq > out.sam5. Run HLA-VBSeq

HLA-VBSeq supports double-ended or single-ended sequenced data. Here, take double-ended data as an example, the usage is as follows

Java-jar HLAVBSeq.jar hla_all.fasta out.sam result.txt-- alpha_zero 0.01-- is_paired6. Format the result

The results have been generated in the previous step, and this step is just formatting. The following code will screen out the genotyping results of the HLA-A gene.

Perl parse_result.pl Allelelist.txt result.txt | grep "^ A\ *" | sort-K2-n-r > HLA.txt

The result after formatting is as follows

17.4022266628604A*11:01:01 01 17.4022266628604A*11:01:01 12.0376819868684

There are two columns, the first is Allel and the second is the average sequencing depth of the Allel region.

These are all the contents of how to HLA typing the whole genome data in HLA-VBSeq. For more information about how to HLA the whole genome data in HLA-VBSeq, you can search the previous articles or browse the following articles to learn! I believe the editor will add more knowledge to you. I hope you can support it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.