How to use HiCUP to preprocess Hi-C data 02/09 Update SLTechnology News&Howtos

How to use HiCUP to preprocess Hi-C data

2026-02-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "how to use HiCUP for Hi-C data preprocessing". In the daily operation, I believe many people have doubts about how to use HiCUP for Hi-C data preprocessing. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "how to use HiCUP for Hi-C data preprocessing". Next, please follow the editor to study!

HiCUP is a classic Hi-C data preprocessing software, the official website is as follows

Https://www.bioinformatics.babraham.ac.uk/projects/hicup/

The flow of data processing is as follows

First, the junction reads in the original sequence is identified by hicup_truncater. The most typical reads of Hi-C is as follows

R1 and R2 come from two different fragments. Of course, this depends on the relationship between the insertion skewness length and the reading length. When the distance between the connection point and both ends of the fragment is less than the sequencing read length, the situation shown in the following figure will occur.

The sequence at one end is a chimera sequence, which is filtered out in subsequent alignment. In order to preserve this part of the effective reads,hicup_truncater, all the connecting sites on the reads are identified according to the characteristics of the restriction site, so as to identify the chimera sequence in the image above, and cut the end of the sequence to remove the excess chimera sequence. After slicing, this sequence is the same as the ordinary R1 Magi R2, and can be used for subsequent mapping.

Hicup_mapper compares the double-terminal reads with the reference genome. Because the R1 and R2 of the Hi-C library come from the chromatin near the spatial structure, and their linear distance is much longer than that of the traditional double-terminal sequencing insert, if we do the double-terminal alignment directly, we think that some of the reads can not match the reference genome, so here we compare the sequences at each end separately, and then merge them.

Hicup_filter compares and filters the sequence on, as shown in the following figure

Only valid di-tags will be retained, and other fragments such as selft-ligation and Re-ligation will be filtered out.

Hicup_deduplicator is used to remove PCR repeats, because the number of valid reads is used to indicate the frequency of chromatin interaction, and the number of reads repeats in PCR interferes with this information. if PCR repeats are not removed, the number of junction reads may be more PCR repeats, not necessarily because of the strong frequency of chromatin interaction.

The installation of the software is also very convenient, you can download and decompress it directly. The steps to use are as follows

1. Prepare the index file for the reference genome

All reference genome alignment software needs to index the genome beforehand. HiCUP supports the use of bowtie or bowtie2 for comparison. Take bowtie2 as an example, the genome index is established as follows

Bowite2-build hg19.fa hg19

The first parameter is the fasta file of the genome, and the second parameter is the name of the output index file.

two。 Prepare the reference genome restriction site map

The script hicup_digester is used to create the restriction endonuclease map of the genome. The basic usage is as follows.

Hicup_digester\

-- re1 A ^ agt, HindIII\

-- genome hg19_digester_db\

Hg19.fa

According to the sites recognized by restriction endonuclease, the genome sequence was digested by mimic restriction enzyme, and all possible fragments were obtained. -- re1 specifies the sequence of cleavage sites and the name of the endonuclease, and-- genome specifies the name of the output file. An example of the file name of the final output is as follows

Digest_hg19_digester_db_HindIII_None_09-46-07. 17-05-2019.txt3. Edit configuration file

First generate a template for the configuration file with the following command

Hicup-example

This command generates a file called hicup_example.conf, based on which you can edit it. Detailed comments are used for each option in the configuration, which can be modified according to the requirements. The common options for modification are as follows

# Path to the reference genome indices

# Remember to include the basename of the genome indices

Index: / bi/scratch/Genomes/Human/GRCh48/Homo_sapiens.GRCh48

# Path to the genome digest file produced by hicup_digester

Digest: / bi/scratch/Genomes/Human/GRCh48/Digest_Homo_sapiens_GRCh48_HindIII_None_14-43-31-10-02-2016.txt.gz

# FASTQ files to be analysed, placing paired files on adjacent lines

S_1_1_sequence.fastq.gz

S_1_2_sequence.fastq.gz

It includes the path of genome index and restriction map, as well as the path of the original Hi-C fastq file that needs to be processed.

4. Running

Once the configuration file is ready, you can run it, using the following

Hicup-config hicup.conf

A html file is generated in the directory where the results are output, showing the indicators of quality control, as follows

1. Truncating and Mapping

2. Filtering

As shown below, you can see that the proportion of valid pairs is about 50%.

3. Length Distribution

4. De-dupliation

In addition, there are many files in the output directory, among which the file with the suffix hicup_bam contains the comparison result of the reads after the final de-duplication, which can be used for downstream analysis.

At this point, the study on "how to use HiCUP for Hi-C data preprocessing" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.