Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to convert from FASTQ to uBAM format

2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to convert from FASTQ to uBAM format", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's ideas slowly in depth, together to study and learn "how to convert from FASTQ to uBAM format" bar!

The data generated by the second-generation sequencing platform is usually stored in fastq format, and fastq stores the information of sequence and base quality that we are most concerned about. As far as sequencing is concerned, such information is of course sufficient. But for the analysis, there is still a lack of information.

Give you a fastq file, you can see the sample name, sequencing platform, sequencing read length and other basic information, if you want to know the sequencing type (WES, WGS or RNA-seq), sample sampling information, sample grouping information, these information can not be obtained from the fastq file. The data related to these experiments are called metadata.

Compared with FASTQ, uBAM can store not only sequence and base quality information, but also metadata information.

In GATK4, the schematic diagram of the data preprocessing part is as follows

As you can see, there are two formats for raw data, one is our common FASTQ; and the other is uBAM. The official uBAM format is more recommended.

How to convert from FASTQ to uBAM format? We need to use the picatd tool. Picard provides a FastqToSam function that converts sequences into ubam format.

The basic usage is as follows:

Java-jar picard.jar FastqToSam

F1=sampleA_R1.fastq.gz

F2=sampleA_R2.fastq.gz

PL=illumina

SM=sampleA

LB=sampleA

RG=sampleA

O=sampleA.ubam

F1 and F2 specify the original data in fastq format, for double-ended sequencing, specify both F1 and F2, and for single-ended sequencing, specify F1. PL stands for platform, specifies the sequencing platform, and the values include illumina and solid; SM represents sample name and specifies the sample name; LB stands for library name, specifies the library name, and RG represents read group, and specifies the name of reads group. These two parameters are generally the same as the sample name.

Ubam can also be seen from the name, it belongs to the bam format, so its content is also divided into two parts: the header and the body.

1. The contents of the head

Samtools view-H sampleA.ubam

@ HD VN:1.5 SO:queryname

@ RG ID:sampleA SM:sampleA LB:sampleA PL:illumina

The first line is the declaration of the standard bam file header, and the @ RG on the second line is several types of metadata information added during the transformation.

two。 The content of the text

Samtools view sampleA.ubam

Because of the large number of columns, I intercepted the first few columns here.

Each line represents a sequence, and the same sequence ID is actually R1 and R2, which can be distinguished from the flag of the second column.

Samtools flags 77

0x4d 77 PAIRED,UNMAP,MUNMAP,READ1

Samtools flags 141

0x8d 141 PAIRED,UNMAP,MUNMAP,READ2

77 corresponds to R1 and 141 corresponds to R2.

The * in the third column means that the chromosomes are not matched, which is the origin of unmapped bam.

You can get the ubam file from the fastq file through FastqToSam, and picard also provides the SamtoFastq command to get the fastq file from the bam file.

The usage is as follows:

Java-jar picard.jar SamToFastq

I=sampleA.ubam

F=sampleA_R1.fastq

F2=sampleA_R2.fastq

I stands for input and specifies the input bam file; F and F2 specify the output fastq file.

Thank you for your reading, the above is the content of "how to convert from FASTQ to uBAM format". After the study of this article, I believe you have a deeper understanding of how to convert from FASTQ to uBAM format, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report