Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use bedtools to get the gene symbol according to the starting and ending position on the chromosome

2025-03-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

How to use bedtools to get gene symbol according to the starting and ending position on the chromosome, I believe that many inexperienced people do not know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Step 1: organize your chromosome location coordinate files into bed format.

The bed format file includes at least the first three columns, namely: the name of the chromosome, the starting position on the chromosome, and the ending position on the chromosome. This step can be processed with WordPad, excel, R, etc., and the suffix name of the file is not important, because when forcibly changing the file suffix to bed, an error will be reported when bedtools processing is carried out in the later Linux system. The required bed format files are shown in the following figure.

Step 2: get the annotation file of the human genome.

You can download the hg38 or hg19 version of the human genome annotation file from gencode according to your needs (take hg38 as an example in this article). This step can go to the gencode official website (https://www.gencodegenes.org/human/) for local download, and then use filezilla and other file transfer tools to transfer the downloaded local files to the server. You can also download ftp directly from the server's Linux system.

Download locally:

Ftp download:

After obtaining the download link, enter the following code in the Linux system for ftp download:

Wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_34/gencode.v34.annotation.gtf.gz

Step 3: process the downloaded genome annotation file in the Linux system to get the location coordinates of human protein coding genes.

Enter the following code into the Linux system to get the location coordinates of the hg38 version of the human protein coding gene:

Zcat gencode.v34.annotation.gtf.gz | grep protein_coding | perl-alne'{next unless $F [2] eq "gene"; / gene_name\ "(. *?)\"; /; print "$F [0]\ tasking F [3]\ tasking F [4]\ tasking 1"}'> protein_coding.hg38.position

Step 4: convert the bed format files to be processed into Tab key delimited files in the Linux system.

First, link or copy the pending coordinate bed format file to the directory where the result file obtained in step 3 is located, then modify the suffix of the file to bed, and then convert the file into a Tab-delimited file with the suffix bed, you need to enter the following code (motif1.bed is a pending coordinate file named by yourself):

Mv motif1.tsv motif1.bed

Perl-p-I-e's / /\ t Universe g 'motif1.bed

If you have saved the bed file to be processed in Tab delimited format in the first step, but still report an error in the later processing, you might as well do the Tab delimiting processing again.

Step 5: the protein coding gene containing chromosome location coordinates was obtained by using bedtools in Linux system.

First, you need to start the conda Mini Environment where you have installed the bedtools software, and then enter the following code:

Bedtools intersect-a motif1.bed-b ~ / dna/exercise/protein_coding.hg38.position-wa-wb

You can also summarize the results and write together the gene symbol located in the same chromosome coordinates, and you only need to add the code after "." | there are several columns of the results obtained from the previous file, and just write a few numbers after-c. As I get, there are seven columns, followed by seven after-c.

Bedtools intersect-a motif1.bed-b ~ / dna/exercise/protein_coding.hg38.position-wa-wb | bedtools groupby-I-- g 1-4-c 7-o collapse

You can also save the results:

Bedtools intersect-a motif1.bed-b ~ / dna/exercise/protein_coding.hg38.position-wa-wb | bedtools groupby-I-- g 1-4-c 7-o collapse > gene.tsv

The newly saved gene.tsv file is the result file, and then you can take the results for subsequent processing.

After reading the above, have you mastered how to use bedtools to get the gene symbol according to the starting and ending position on the chromosome? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 224

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report