In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
How to use FitHiC to evaluate the significance of chromatin interaction, I believe that many inexperienced people do not know what to do. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
The chromatin interaction information in the whole genome can be obtained by Hi-C technology. At different resolutions, the interaction matrix contact matrix between bin is obtained, and the interaction matrix is displayed in the form of heat map, that is, contact map is obtained. On the basis of the complete contac matrix, we can analyze the spatial structure units of different levels, such as Amax B compartment, topological domain, chromatin ring and so on.
It is because of covering the chromatin interaction information in the whole genome that hi-C technology can mine different levels of spatial structure at the height of the whole genome, which is the unique advantage of hi-c technology. As an upgraded version of 3C technology, hi-c can also directly study the interaction between some chromatin. However, due to the existence of systematic errors in sequencing and sequence comparison, some information in the interaction matrix is unreliable. In order to directly analyze the interaction between some chromatin through hi-c technology, scientists have invented a lot of algorithms to evaluate the information in the interaction matrix. FitHic is one of the most commonly used software to extract restrictive interactive information through scoring and other forms.
The software was originally developed with python, but later, for ease of use, the related functions were rewritten and packaged into an R package.
The principle of the software is as follows
From the original interaction matrix, the mid-range is extracted according to the pre-defined distance threshold, that is, the interaction between the same chromatin bin at medium distance. It is pointed out that for yeast, the medium distance ranges from 10kb to 25kb, and for humans and mice, the medium distance ranges from 50kb to 10Mb, where the distance is a linear distance between two bin.
According to the extracted mid-range interaction information, the model of genome linear distance and interaction frequency, namely spline-1 in the graph, is constructed. On the basis of this model, the filtering threshold is established, that is, the outlier-threshold represented by dotted lines, and then the outlier data is proposed to correspond to the red origin in the graph. The rest of the data are fitted again to get spline2. Then the pvalue of each interaction is calculated on the basis of binomial distribution, and then the qvalue is obtained by correcting various hypothesis tests.
The use of the software is simple, but the original interaction matrix needs to be formatted. A classic interaction matrix is as follows
Bin1 Bin2 Bin3 Bin4 Bin5 Bin6
7.85957 4.80329 11.4766 9.57416 4.5288 8.55022
8.61621 4.98956 2.35654 5.69483 11.1187 10.1322
4.06803 4.07801 7.98047 2.59144 6.3851 7.74306
4.52869 2.70624 8.94544 4.29185 8.29491 8.38257
Each row and column represents a bin, and the number represents the frequency of interaction between the two bin. On the basis of this document, the result of significance evaluation can be obtained through the following two steps.
1. Prepare the input file
The software needs to prepare at least two input files. The first file is the chromatin region corresponding to bin, called fragsfile, and the content is as follows
\ t separated 5 columns, in which the information in the second column and the fifth column has no effect, just fill it with 0 or 1. The first column represents the chromosome where the bin is located, the third column represents the central position of the bin, and the third column represents the sum of the frequencies of interaction with the bin, that is, the sum of the corresponding columns or rows in the interaction matrix.
The second file is information about the frequency of interaction between bin, called intersfile, and the content is as follows
The first two columns represent the chromatin name and central location of the first bin, the third and fourth columns represent the chromatin name and central location of the second bin, and the fifth column represents the frequency of interaction between the two bin.
two。 Running
Once the input file is ready, you can run it. The basic usage is as follows
FitHiC (
Fragsfile
Intersfile
Outdir
Libname = "test_project"
DistUpThres = 250000
DistLowThres = 10000
Visual = TRUE)
Specify the directories of two input files and output results, libname specifies the prefix of the output file, and distUpThres and distLowThres specify the upper and lower threshold of the distance, which is used to filter the mid-range.
In the output, all the files are divided into two parts: pass1 and pass2, each with the following four pictures
The first graph represents the distribution of genome linear distance and interaction probability based on mid-range interaction information, the second map represents the fitted distribution, the third map represents the outliers filtered by the fitting model, and the fourth map represents the distribution of significant interactions screened by different FDR thresholds.
The final significance evaluation result can be obtained from the file with the suffix pass2.significances.txt.gz, which is shown below.
Significant chromatin interactions were screened by using the last column of qvaue as the threshold.
After reading the above, have you mastered the method of using FitHiC to evaluate the significance of chromatin interaction? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.