Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use MCScanX of python to analyze collinearity among species

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

Today, I would like to share with you how to use python's MCScanX to analyze the relevant knowledge points of collinearity among species. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look at it.

1. Installation

There are two steps to install the LASTAL and jcvi python packages.

LASTAL download address: http://last.cbrc.jp/, compile and add lastal and lastdb to PATH.

Jcvi installation is relatively simple, run the following commands directly on the command line:

Pip install jcvi

two。 Data download

Running MCscan requires sequence and coordinate files, which are in fasta format and BED format respectively. For example, you can download cds sequence files and gff files from Phytozome, and then generate BED files from gff files. Of course, JCVI has corresponding methods, which can be downloaded directly to save us the trouble of downloading and sorting data. The specific operations are as follows:

$python-m jcvi.apps.fetch phytozome

...

Acoerulea Alyrata Athaliana

Bdistachyon Brapa Cclementina

Cpapaya Creinhardtii Crubella

Csativus Csinensis Csubellipsoidea_C-169

Egrandis Fvesca Gmax

Graimondii Lusitatissimum Mdomestica

Mesculenta Mguttatus Mpusilla_CCMP1545

Mpusilla_RCC299 Mtruncatula Olucimarinus

Osativa Ppatens Ppersica

Ptrichocarpa Pvirgatum Pvulgaris

Rcommunis Sbicolor Sitalica

Slycopersicum Smoellendorffii Stuberosum

Tcacao Thalophila Vcarteri

Vvinifera Zmays early_release

...

$python-m jcvi.apps.fetch phytozome Vvinifera,Ppersica

...

$ls

Ppersica_139_cds.fa.gz Ppersica_139_gene.gff3.gz Vvinifera_145_cds.fa.gz Vvinifera_145_gene.gff3.gz

GFF to BED:

Python-m jcvi.formats.gff bed-- type=mRNA-- key=Name Vvinifera_145_gene.gff3.gz-o grape.bed$ python-m jcvi.formats.gff bed-- type=mRNA-- key=Name Ppersica_139_gene.gff3.gz-o peach.bed

If there is no cds sequence file in the reference genome, the chromosome sequence can be downloaded and the cds sequence can be extracted according to the gff file.

3. Pairwise collinear search

Once the input file is ready, you can do a collinear search. First, change the working directory to the same directory as the input file, and then run the following command:

Python-m jcvi.compara.catalog ortholog grape peach20:33:42 [base] lastdb peach peach.cds20:34:13 [base] lastal-u 0-P 64-i3G-f BlastTab peach grape.cds > grape.peach.last20:34:30 [synteny] Assuming-- qbed=grape.bed-- sbed=peach.bed20:34:31 [blastfilter] Load BLAST file `grape.peach.last` (total 403868 lines) 20:34:31 [base] Load file `grape.peach.last`20: 34:38 [blastfilter] running the cscore filter (cscore > = 0.70) ). 20:34:39 [blastfilter] after filter (294217-> 31229). 20:34:39 [blastfilter] running the local dups filter (tandem_Nmax=10). 20:34:39 [blastfilter] after filter (31229-> 21089) .A total of 30654 (NR:18661) anchors found in 678 clusters.Stats: Min=4 Max=683 Numb678 Mean=45.2123893805 SD=80.8407828815 Median=16.0 Sum=30654NR stats: Min=4 Max=460 Numb678 Mean=27.5235988201 SD=48.0461479616 Median=11.0 Sum=18661

After running, you will get the following files

$ls grape.peach.*grape.peach.lifted.anchors grape.peach.anchors grape.peach.last.filtered grape.peach.last

4. Pairwise homolinear visualization

The best way to visualize pairs of collinearity is to use dotplot, where only one command is needed.

$python-m jcvi.graphics.dotplot grape.peach.anchors

5. Linear visualization

We can also use the same collinear output file to make different visualization images. Here you need to prepare two input files, the seqids file and the layout file.

For the former, which chromosomes are included in both genomes, it is best to remove the shorter scaffolds. The file format is as follows:

Chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19scaffold_1,scaffold_2,scaffold_3,scaffold_4,scaffold_5,scaffold_6,scaffold_7,scaffold_8

The first behavior grape contains 19 chromosomes and the second behavior peach contains 8 chromosomes.

The latter is the configuration file required for drawing, in the following format:

# y, xstart, xend, rotation, color, label, va, bed. 6,. 1,. 8, 0, Grape, top, grape.bed. 4,. 1,. 8, 0, Peach, top, peach.bed# edgese, 0, 1, grape.peach.anchors.simple

The first three columns specify the location of the track, followed by rotation, color, label, vertical alignment (va), and genomic BED files. Finally, there is the drawing of the connection, and the information comes from the grape.peach.anchors.simple file.

Grape.peach.anchors.simple is generated by the following command:

$python-m jcvi.compara.synteny screen-minspan=30-simple grape.peach.anchors grape.peach.anchors.new

Then we can draw!

$python-m jcvi.graphics.karyotype seqids layout

6. The collinear results show that

1. Detailed result file of genome collinearity, filtered out unreliable collinear blocks: * .anchors.new, first listed as one genomic gene ID, second listed as the gene ID file of another genome, that is, the gene correspondence within the collinear region of the two genomes.

Different collinear blocks are separated by "#"

two。 Simplified result file of genome collinearity: * anchors.simple; line is a collinear block. The first two columns represent the region between two genomes in one genome, and there is a collinear relationship with the region between two genes in the other genome.

The fifth column area span, the last column: + is positive,-is reverse

3. If you want to set the color, you can modify 2. That is, add a color value to the head of the line where the color needs to be modified, in the following format:

Note that the color value is hexadecimal, separated by *; lines that do not add color values default to gray

These are all the contents of the article "how to use python's MCScanX to analyze the collinearity of species". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report