In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article is about how to use ROSE to identify super enhancers. I think it is very practical, so I share it with you. I hope you can get something after reading this article. Let's take a look at it.
ROSE is the most classic super enhancer prediction software developed by the Richard A. Young team. The source code is available at
Http://younglab.wi.mit.edu/super_enhancer_code.html
First of all, the enhancer regions in mouse embryonic stem cells were identified by the chip data of Oct4, Sox2 and Nanog, and 8794 enhancer regions were identified. For these enhancers, they were sorted according to the density of chip_seq reads corresponding to Med1, a universal cofactor for transcriptional activation in the region, and it was found that there was a trend of polarization, as shown below
The Med1 level of most of the enhancers is very low, and the Med1 level of a few enhancers is very high. In addition to Med1, data on several other transcription factors or histone modifications were compared.
It is found that the distinguishing effect of Med1 is the best. According to the level of Med1, enhancers can be divided into the following two categories.
Typical enhancers
Super enhancers
Referred to as TE and SE, further analysis shows that there is a very obvious difference in length between TE and SE. The length of SE is more than 10 times the length of TE, an ordinary enhancer is only a few hundred bp, and the length of super enhancer is about thousands of bp.
In addition to Med1, the distribution of many transcription factors such as Qct4 in TE and SE is compared. The results are shown in the following figure.
It is found that the distribution of Klf4 and Esrrb in SE is more abundant than that in TE. The motif enriched in the SE region is analyzed, and the results are as follows
It was found that it was enriched to Oct4, Sox2, Klf4 and other motif. From the above process of discovering and defining super enhancers, we can see that there are two key points in the prediction process of super enhancers.
Based on the enhancer, it can be regarded as the enrichment area of the enhancer.
Compared with enhancers, the super enhancer region has a higher density of transcription factors.
The ROSE program also identifies super enhancers based on these two key points, and the basic process is as follows
Firstly, the enhancers are identified, then the enhancers are combined, and a threshold is defined, and the enhancers whose distance is less than the threshold are merged. Finally, the reads distribution in the merged enhancers is compared to identify the super enhancers.
In practice, different mark can be used in the first and third steps, as shown below
The software is developed based on the python programming language. Download the source code directly from the official website and decompress it. The annotated databases of several species are built into the source code and are stored in the annotation folder
Annotation/
├── hg18_refseq.ucsc
├── hg19_refseq.ucsc
├── hg38_refseq.ucsc
├── mm10_refseq.ucsc
├── mm8_refseq.ucsc
└── mm9_refseq.ucsc
In fact, it is the corresponding refGene.txt file downloaded from UCSC. The basic usage of the software is as follows
Python ROSE_main.py\
-g HG18\
-I HG18_MM1S_MED1.gff\
-r MM1S_MED1.hg18.bwt.sorted.bam\
-c MM1S_WCE.hg18.bwt.sorted.bam\
-o out_dir\
-s 12500\
-t 2500
You need to be careful to run it in the installation directory of the software, because you will look for the species annotation file in the annotaton folder in the run directory.
-g specifies the reference genome version, which is used to retrieve the corresponding species annotation file;-I specifies the genome location corresponding to the enhancement subregion, as follows
\ t the separated 6 columns, the first column, the third column and the fourth column specify the chromosome position corresponding to the enhancement subregion, and the fifth column specifies the positive and negative chain information. For uncertainty, the second and sixth columns are a custom unique ID that represents the number of the enhancer.
After determining the enhanced sub-interval information, the next step is to compare the chip_seq reads distribution of a certain mark factor in the enhanced sub-region. The-r parameter specifies the bam file of the IP sample in the chip_seq, and-c specifies the bam file of the Input sample.
-s specifies the distance of the merged enhancer, default is 12.5kb, two enhancers less than this distance will be merged into an interval,-t specifies the distance from TSS, if the distance between a peak and a transcriptional initiation site is less than the specified distance, it may be a promoter, this potential promoter will be filtered out.
A lot of files will be generated in the directory where the output is output, and the contents of the png file are shown below
AllEnhancers.table.txt and SuperEnhancers.table.txt represent the information of all enhancers and super enhancers respectively. The contents of the file are similar, as shown below.
Both dbSUPER and SEdb databases use h4K27ac histone modification as mark to identify super enhancers, which can be used for reference to identify super enhancers.
The above is how to use ROSE to identify super enhancers. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.