In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
Today, I would like to share with you how to use python's MCScanX to analyze the relevant knowledge points of collinearity among species. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look at it.
1. Installation
There are two steps to install the LASTAL and jcvi python packages.
LASTAL download address: http://last.cbrc.jp/, compile and add lastal and lastdb to PATH.
Jcvi installation is relatively simple, run the following commands directly on the command line:
Pip install jcvi
two。 Data download
Running MCscan requires sequence and coordinate files, which are in fasta format and BED format respectively. For example, you can download cds sequence files and gff files from Phytozome, and then generate BED files from gff files. Of course, JCVI has corresponding methods, which can be downloaded directly to save us the trouble of downloading and sorting data. The specific operations are as follows:
$python-m jcvi.apps.fetch phytozome
...
Acoerulea Alyrata Athaliana
Bdistachyon Brapa Cclementina
Cpapaya Creinhardtii Crubella
Csativus Csinensis Csubellipsoidea_C-169
Egrandis Fvesca Gmax
Graimondii Lusitatissimum Mdomestica
Mesculenta Mguttatus Mpusilla_CCMP1545
Mpusilla_RCC299 Mtruncatula Olucimarinus
Osativa Ppatens Ppersica
Ptrichocarpa Pvirgatum Pvulgaris
Rcommunis Sbicolor Sitalica
Slycopersicum Smoellendorffii Stuberosum
Tcacao Thalophila Vcarteri
Vvinifera Zmays early_release
...
$python-m jcvi.apps.fetch phytozome Vvinifera,Ppersica
...
$ls
Ppersica_139_cds.fa.gz Ppersica_139_gene.gff3.gz Vvinifera_145_cds.fa.gz Vvinifera_145_gene.gff3.gz
GFF to BED:
Python-m jcvi.formats.gff bed-- type=mRNA-- key=Name Vvinifera_145_gene.gff3.gz-o grape.bed$ python-m jcvi.formats.gff bed-- type=mRNA-- key=Name Ppersica_139_gene.gff3.gz-o peach.bed
If there is no cds sequence file in the reference genome, the chromosome sequence can be downloaded and the cds sequence can be extracted according to the gff file.
3. Pairwise collinear search
Once the input file is ready, you can do a collinear search. First, change the working directory to the same directory as the input file, and then run the following command:
Python-m jcvi.compara.catalog ortholog grape peach20:33:42 [base] lastdb peach peach.cds20:34:13 [base] lastal-u 0-P 64-i3G-f BlastTab peach grape.cds > grape.peach.last20:34:30 [synteny] Assuming-- qbed=grape.bed-- sbed=peach.bed20:34:31 [blastfilter] Load BLAST file `grape.peach.last` (total 403868 lines) 20:34:31 [base] Load file `grape.peach.last`20: 34:38 [blastfilter] running the cscore filter (cscore > = 0.70) ). 20:34:39 [blastfilter] after filter (294217-> 31229). 20:34:39 [blastfilter] running the local dups filter (tandem_Nmax=10). 20:34:39 [blastfilter] after filter (31229-> 21089) .A total of 30654 (NR:18661) anchors found in 678 clusters.Stats: Min=4 Max=683 Numb678 Mean=45.2123893805 SD=80.8407828815 Median=16.0 Sum=30654NR stats: Min=4 Max=460 Numb678 Mean=27.5235988201 SD=48.0461479616 Median=11.0 Sum=18661
After running, you will get the following files
$ls grape.peach.*grape.peach.lifted.anchors grape.peach.anchors grape.peach.last.filtered grape.peach.last
4. Pairwise homolinear visualization
The best way to visualize pairs of collinearity is to use dotplot, where only one command is needed.
$python-m jcvi.graphics.dotplot grape.peach.anchors
5. Linear visualization
We can also use the same collinear output file to make different visualization images. Here you need to prepare two input files, the seqids file and the layout file.
For the former, which chromosomes are included in both genomes, it is best to remove the shorter scaffolds. The file format is as follows:
Chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19scaffold_1,scaffold_2,scaffold_3,scaffold_4,scaffold_5,scaffold_6,scaffold_7,scaffold_8
The first behavior grape contains 19 chromosomes and the second behavior peach contains 8 chromosomes.
The latter is the configuration file required for drawing, in the following format:
# y, xstart, xend, rotation, color, label, va, bed. 6,. 1,. 8, 0, Grape, top, grape.bed. 4,. 1,. 8, 0, Peach, top, peach.bed# edgese, 0, 1, grape.peach.anchors.simple
The first three columns specify the location of the track, followed by rotation, color, label, vertical alignment (va), and genomic BED files. Finally, there is the drawing of the connection, and the information comes from the grape.peach.anchors.simple file.
Grape.peach.anchors.simple is generated by the following command:
$python-m jcvi.compara.synteny screen-minspan=30-simple grape.peach.anchors grape.peach.anchors.new
Then we can draw!
$python-m jcvi.graphics.karyotype seqids layout
6. The collinear results show that
1. Detailed result file of genome collinearity, filtered out unreliable collinear blocks: * .anchors.new, first listed as one genomic gene ID, second listed as the gene ID file of another genome, that is, the gene correspondence within the collinear region of the two genomes.
Different collinear blocks are separated by "#"
two。 Simplified result file of genome collinearity: * anchors.simple; line is a collinear block. The first two columns represent the region between two genomes in one genome, and there is a collinear relationship with the region between two genes in the other genome.
The fifth column area span, the last column: + is positive,-is reverse
3. If you want to set the color, you can modify 2. That is, add a color value to the head of the line where the color needs to be modified, in the following format:
Note that the color value is hexadecimal, separated by *; lines that do not add color values default to gray
These are all the contents of the article "how to use python's MCScanX to analyze the collinearity of species". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.