Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Juicer

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to use Juicer, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

The operation of Juicer software is very simple, only a few parameters need to be set. This paper uses the small test data set of the official website to show the basic usage of the software.

1. Download test data

Download the test dataset from the link below

Https://github.com/aidenlab/juicer/wiki/Running-Juicer-on-a-cluster

The small test data set marked in red box is selected here. If you want to experience the complete analysis function, you can provide the test data provided by option1.

Wget http://juicerawsmirror.s3.amazonaws.com/opt/juicer/work/HIC003/fastq/HIC003_S2_L001_R1_001.fastq.gz

Wget http://juicerawsmirror.s3.amazonaws.com/opt/juicer/work/HIC003/fastq/HIC003_S2_L001_R2_001.fastq.gz

The original sequence of the sample is placed in the work/sample/fastq directory of the software installation directory, and the sample is replaced with a self-defined name.

two。 Running

Instead of downloading the official reference genome here, I used the genome downloaded by UCSC. For the reference genome downloaded by yourself, first set up the index of bwa. In order to facilitate management, put the genome sequence and index files in the references folder of the software installation directory, using the following

Cd references

Wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz

Gunzip hg19.fa.gz

Bwa index hg19.fa hg19.fa

Secondly, the enzyme digestion map is established and placed in the restriction_sites directory. The usage is as follows.

Python misc/generate_site_positions.py HindIII hg19 references/hg19.fa

The first parameter is selected according to the actual endonuclease. After the endonuclease map is generated, the chromosome size file can be generated on the basis of the output file. The usage is as follows

Awk 'BEGIN {OFS= "\ t"} {print $1, $NF}' hg19_HindIII.txt > hg19.chrom.sizes

In fact, the chromatin length files corresponding to species can also be downloaded directly from UCSC, and the above method is more common for genome files from other sources. The contents of hg19.chrom.sizes file are as follows

Chr1 249250621

Chr2 243199373

Chr3 198022430

Chr4 191154276

This file determines the chromosome name contained in the final Hi-C map. For some random and unplace_scaffold sequences, it can be removed directly from this file, so that it will not appear in the final result.

Once you have prepared the original sequence of the sample and the file for the reference genome, you can run juicer. The usage is as follows

Juicer.sh\

-z references/hg19.fa\

-p restriction_sites/hg19.chrom.sizes\

-y restriction_sites/hg19_HindIII.txt\

-d / home/pub/software/juicer/work/HIC003/\

-D / home/pub/software/juicer\

-t 5

The-z parameter specifies the path where the reference genome fasta is located, under which the corresponding bwa index must exist at the same time; the-p parameter specifies the chromosome length file;-y specifies the path of the genome restriction map;-d specifies the path where the original sample file is stored;-D specifies the installation path of the software;-t specifies the number of threads used for bwa comparison. The default is to use all threads.

It is important to note that when specifying a file path, it is best to specify an absolute path, especially the path where the fastq file is located. Because soft links are used when the software is running, the relative path can go wrong.

After the software has been run, the following directory will be generated under the corresponding directory of the sample

Splits

Aligned

The intermediate results are stored in the splits directory. Because of the large amount of hi-C data, the original sequence will be split into many parts and run in parallel to speed up the operation. Each copy contains a reads of 22.5m by default, of course, this can be adjusted by the-C parameter, which specifies the number of lines of the split file, the default is 90000000, note that the 4 lines of the fastq file represent a sequence, so the value of this parameter must be a multiple of 4. The R1 and R2 ends of the split sequence were compared with the genome by bwa, and then merged to screen the chimera sequence, remove the repetition, and generate the preprocessed result file.

The final result is stored in the aligned directory, which contains the graph file with the suffix hic that can be imported into juicebox. Inter.hic and inter_30.hic, 30 means the result is filtered by MAPQ > 30. The complete process will also be followed by subsequent processing, including the identification of TAD, chromatin ring and other structures. The HICCUPs algorithm that identifies the chromatin ring must be accelerated by GPU, so a normal server without a GPU card cannot run this step.

As you can see from the above process, the use of juicer is indeed very simple. Due to the large amount of Hi-C data sequencing and the complexity of subsequent analysis algorithms, the requirements for server computing resources are very high, and high-performance servers are needed to meet the requirements, and the cost of GPU card required by the software is also very high, which is about 20, 000 yuan. These factors restrict the popularization and development of Hi-C to a certain extent.

Thank you for reading this article carefully. I hope the article "how to use Juicer" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report