How to realize bigwig normalization 04/26 Update SLTechnology News&Howtos

How to realize bigwig normalization

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to achieve bigwig normalization, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to understand it!

When displaying chip_seq data, bigwig files are often used and imported into genomic browsers such as igvtools to produce images like the following

We define the region of reads enrichment in the IP sample relative to the Input sample as peak, which is reflected in the above figure, and the corresponding reads peak appears in the IP sample, such as the red marked area in the following figure.

Through this visual way, we can directly reflect the situation of the peak area, but we need to pay attention to the problem of normalization in practical use.

The bigwig file essentially shows the distribution information of the sequencing depth, and the original sequencing depth is positively correlated with the amount of reads sequenced, such as 5G for Input samples and 10G for IP samples. In the original sequencing depth, we will see that the sequencing depth of Input samples is higher than that of IP samples. Of course, this is an extreme example, but it is a good illustration that the difference in sequencing quantity will have a direct impact on the original sequencing depth.

In order to eliminate the influence of the difference in the amount of sequencing data between samples, of course we think of normalization, similar to the quantitative strategy in transcriptome, the original sequencing depth is raw count, then of course it is similar to RPKM, CPM and other normalization methods, which is also applicable to bigwig files.

In deeptools, there are several ways to normalize

1. RPKM

RPKM's formula is as follows

RPKM (per bin) = number of reads per bin / (number of mapped reads (in millions) * bin length (kb))

The usage is as follows

Deeptools bamCoverage\

-p 10\

-- bam input.bam\

-- normalizeUsing RPKM\

-- outFileName rpkm.bigwig2. CPM

CPM's formula is as follows

CPM (per bin) = number of reads per bin / number of mapped reads (in millions)

The usage is as follows

Deeptools bamCoverage\

-p 10\

-- bam input.bam\

-- normalizeUsing CPM\

-- outFileName cpm.bigwig3. BPM

BPM's formula is as follows

BPM (per bin) = number of reads per bin / sum of all reads per bin (in millions)

The usage is as follows

Deeptools bamCoverage\

-p 10\

-- bam input.bam\

-- normalizeUsing BPM\

-- outFileName bpm.bigwig4. RPGC

RPGC's formula is as follows

RPGC (per bin) = number of reads per bin / scaling factor for 1x average coverage

Scaling factor = (total number of mapped reads * fragment length) / effective genome size

The usage is as follows

Deeptools bamCoverage\

-p 10\

-- bam input.bam\

-- normalizeUsing RPGC\

-- effectiveGenomeSize 2864785220\

-- outFileName rpgc.bigwig

For the same sample, the peak shape of the bigwig file generated by several normalized methods is exactly the same as that of the original bigwig file when imported into igvtools, as shown below

Notice the range of the vertical axis marked by the red box, you can see that the range of the vertical axis is different in different ways.

Normalization is mainly used for the comparison between samples. For example, when comparing Input and Ip samples, you should use the normalized data. Take RPKM as an example. After import, you can see the following results

We can see that the range of the longitudinal axis is inconsistent. In order to better compare the differences between the samples, we need to adjust the vertical axis range of the two to be consistent, because the data have been normalized, so we can compare them directly within the same range. When set to the same range, the effect is as follows

For the various normalization methods mentioned above, in fact, they can be compared between samples. In practice, because the concept of RPKM is the most classical, it is also the most widely used.

The above is all the contents of the article "how to achieve bigwig normalization". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.