Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to construct the Evolutionary Tree in R language

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of "how to build an evolutionary tree in R language". The editor shows you the operation process through an actual case, and the operation method is simple, fast and practical. I hope this article "how to build an evolutionary tree in R language" can help you solve the problem.

There are many methods and software for tree software to construct evolutionary tree. As we mentioned earlier when we explained the principle of constructing evolutionary tree, the most accurate method is Bayesian method, but Bayesian method is too computational and time-consuming, so it is not suitable for a large amount of data. The second is the maximum likelihood method. Here we explain three kinds of software that uses maximum likelihood method to construct evolutionary tree: fasttree,iqtree2,RAxML-ng. The input data are the SNP data obtained by population genetic evolution.

After the data is ready for re-sequencing analysis, we can get the variation result vcf file containing SNP as the input file. For details, see course: resequence data Analysis course, and then use / pop-evol-gwas:v1.3 image for pre-data preparation and subsequent evolutionary tree construction analysis:

# operating environment preparation: docker image starts # Image download: docker pull / pop-evol-gwas:v1.3# starts genetic evolution image docker run-- rm-it-m 4G-- cpus 1-v D:\ pop:/work / pop-evol-gwas:v1.3# to filter vcf files vcftools-- gzvcf all.varFilter.vcf.gz-- recode--recode-INFO-all-- stdout-- maf 0.05-- Max-missing 1000-- minDP 4-- maxDP 1000\-- minQ 30-- minGQ 0-- min-alleles 2-- max-alleles 2-- remove-indels | gzip-> clean.vcf.gz# uses tassel software to sort files run_pipeline.pl-Xmx30G-SortGenotypeFilePlugin-inputFile clean.vcf.gz\-outputFile clean.sorted.vcf.gz-fileType VCF#vcf file format is converted to Phylip format It is used to construct phylogenetic tree run_pipeline.pl-Xmx5G-importGuess $workdir/00.filter/clean.sorted.vcf.gz\-ExportPlugin-saveAs supergene.phy-format Phylip_Inter1. Construction of Evolutionary Tree by FastTree

FastTree is a software that builds evolutionary tree based on maximum likelihood method. Its most important feature is that it runs fast and supports millions of sequences to build trees. However, fasttree does not support bootstrap verification and the number of alternative models supported is limited. The official website is as follows: http://www.microbesonline.org/fasttree/

Alternative model selection

FastTree supports the construction of evolutionary trees of nucleic acids and proteins. For nucleic acids, the alternative models include the following: JC (Jukes-Cantor), GTR (generalized time-reversible), and the default model is JC. For proteins, the alternative models include the following: JTT (Jones-Taylor-Thornton 1992), LG (Le and Gascuel 2008), and WAG (Whelan & Goldman 2001) the default model is JTT. FastTree requires that the input result of multiple sequence alignment be in FASTA or Phylip format.

An example of the command to build an evolutionary tree:

Fasttree-nt-gtr supergene.fa > fasttree.nwk

Examples of other FastTree commands are:

For the construction of protein evolutionary tree, the basic usage is as follows: fasttree protein.fasta > tree# can also choose LG or WAG replacement model, fasttree-lg protein.fasta > treefasttree-wag protein.fasta > tree#, for nucleic acid sequence, the basic usage is fasttree-nt nucleotide.fasta > tree#, you can also choose GTR substitution model, and the usage is fasttree-nt-gtr nucleotide.fasta > tree. IQ-tree to build evolutionary tree IQ-tree is also a maximum likelihood method to build evolutionary tree software. At present, IQ-tree has been updated to version 2.0, and its function and performance have been greatly improved. It mainly has four major functions, efficient tree building (efficient tree reconstruction), model selection (modelfinder: fast and accurate model selection), ultra-fast bootstrap (ultrafast bootstrap approximation), and large data (big data analysis). The above characteristics are especially suitable for a large number of SNP with high-throughput sequencing. Model selection

There are many models for building evolutionary trees for beginners who often do not know which model is the most suitable. Iqtree provides automatic model selection function, and the software used is modelfinder. Modelfinder is a super-fast automatic best model selection software. It is 100x faster than jmodeltest (for DNA) and prottest (for protein) while ensuring accuracy (ModelFinder is up to 100times faster than jModelTest/ProtTest.). Use the command for example:

# automatically select the best model and build the evolutionary tree:-m MFPiqtree-s supergene.phy-m MFP# just want to find the best model instead of building the evolutionary tree: iqtree-s example.phy-m MF# search model calculation process: ModelFinder will test up to 546 protein models (sample size: 36415). No. Model-LnL df AIC AICc BIC 1 LG 10134094.366 10134094.366 20268886.731 20268893.505 20271854.186 2 LG+I 10133927.677 350 20268555.354 20268562.167 20271531.312 3 LG+G4 10043239.052 350 20087178.104 20087184.917 20090154.062 4 LG+I+G4 10043175.024 351 20087052.048 20087058.900 20090036.508 5 LG+R2 10063911.721 20128525.442 20128532.294 20131509.902 6 LG+R3 10045448.117 353 20091602.235 20091609.165 20094603.701

MFP is an abbreviation for ModelFinder Plus. This parameter causes the program to execute ModelFinder to select the optimal model and complete the tree building analysis. ModelFinder calculates the logical probability of the initial parsimony tree for many different models and produces three result standard values: Akaike information criterion (AIC), * corrected Akaike information criterion* (AICc), and and* the Bayesian information criterion* (BIC). Usually ModelFinder chooses the model with the lowest BIC score (of course, you can also specify AIC and AICc by specifying the option-AIC or-AICc). If you want to save time, you can specify the selected model and coding parameters, for example: choose one of the WAG,LG,JTT nucleotide substitution models:-mset WAG,LG,JTT; choose between + G and + I, and + iTung, select rate:-mrate Greco I mfreq FU,F heterogeneity parameter:-mfreq FU,F command line as follows:

Iqtree-s example.phy-m MPF-mset WAG,LG,JTT-mrate Gnomi I remember G-mfreq FU,F

Specify the model parameter setting format:-m MODEL+FreqType+RateType

MODEL:model name+FreqType: (optional) frequency type+RateType: (optional) rate heterogeneity type replacement model MODEL includes:

DNA model:

JC/JC69, F81, K2P/K80, HKY/HKY85, TN/TrN/TN93, TNe

K3P/K81, K81u, TPM2, TPM2u, TPM3, TPM3u, TIM, TIMe

TIM2, TIM2e, TIM3, TIM3e, TVM, TVMe, SYM, GTR and 6-digit

Protein model:

BLOSUM62, cpREV, Dayhoff, DCMut, FLU, HIVb, HIVw, JTT

JTTDCMut, LG, mtART, mtMAM, mtREV, mtZOA, mtMet, mtVer

MtInv, Poisson, PMB, rtREV, VT, WAG

+ FreqType base usage preference: Base frequencies optional setting:

If the substitution at each nucleotide site is random, then the frequency of A ~ T ~ ~ C ~ ~ G should be roughly the same. The actual situation: DNA is under the pressure of natural selection, and the frequency of bases at each site is not equal.

+ RateType:rate heterogeneity across sites optional settings:

Specify an alternative model to build an evolutionary tree command for example:

Evaluation of Branch support by iqtree-s example.phy-m TIM2+I+GBootstrap method

There is only one real evolutionary information, and we always have limited sequence information, hoping to get it. Whether we can get him or not is a question. Whether the sequence information we use can truly and stably reflect an evolutionary information is another matter. Bootstrap method is commonly used, especially ML method to build evolutionary tree, branch reliability test method. But the biggest problem with this computational logic is that sampling reruns, sampling reruns, repeating until it converges or reaches a specified, say, 1000 times. The amount of calculation is large and takes a long time. IQ-tree 's team of authors proposed a fast BS method that was finally integrated into IQ-tree. The way to use it is

Iqtree-s example.phy-m TIM2+I+G-b 1000 Ultra Fast (ultrafast bootstraping)

Probably the essence of IQTREE. As the name implies, the characteristic of ultrafast bootstrap approximation is ultra-fast. For the details covered here, interested readers can refer to several articles written by IQTREE developers. The author believes that UFBoot is 10 to 40 times faster than RAxML rapid bootstrap and obtains less biased support values.

Iqtree-s example.phy-m TIM2+I+G-B 1000

In addition to ultrafast bootstrap, IQTREE also provides the following methods to verify the topology credibility of trees.

-alrt:SH-aLRT test (4), if you remember correctly, is this what FastTree2 uses?

-abayes:approximate Bayes test, proposed by Maria Anisimova, Professor of Applied Science in Zurich, Switzerland (5)

-lbp:fast local bootstrap probability method, proposed by Adachi and Hasegawa

Iqtree-s example.phy-m TIM2+I+G-B 1000-alrt 1000

If you specify multiple test methods, the results will be presented in the tree (.treefile), and the different test values will be separated by diagonal lines, for example: ((amemb) 100ppm) 100ppm (0.1magnet0.2) 90gam95 finally, iqtree makes a recommended command for population genetic evolution to build an evolutionary tree:

Iqtree2-s supergene.phy-st DNA-T 2-mem 8G\-m GTR-redo\-B 1000-bnni\-- prefix iqtree3. RAxML to build evolutionary tree RAxML is a classic tool for maximum likelihood (maximum likelihood) tree construction, which is developed by Alexandros Stamatakis from the Institute of theoretical Science (Heidelberg Institute for Theoretical Studies) in Heidelberg, Germany. The latest version of RAxML-NG has been updated to support more replacement models and run faster. Principle of RAxML tree building

RAxML uses the maximum likelihood method to build the tree, which takes all or part of the topological structure, branch length and evolution model of the system tree as the parameters to be estimated. On the basis of the given data set and evolution model, these parameters are estimated by the standard-likelihood maximization of the maximum likelihood method. First of all, the evolutionary model should be selected, and the likelihood method is used to estimate the parameters of the model based on the reduced tree or join tree. After setting the parameters, take the reduced tree or join tree as the starting tree, carry on the likelihood analysis, and finally use the statistical method to find the best score tree from multiple likelihood trees.

RAxML-NG usage

RAxML software supports input files in fasta format or phylip format, such as DNA alignment sequence, nucleotide substitution model sets GTR,rate heterogeneity to gamma distribution, and does not do bootstraping. The command is as follows:

Raxml-ng-msa supergene.phy-model GTR+G-prefix raxml_tree-threads 2-seed 123

If you build a tree and do it together with bootstrap, you can add the-- all parameter to complete it in one step:

Raxml-ng-- msa supergene.phy-- model GTR+G-- prefix raxml_tree-- threads 2-- seed 123Thank you for reading "how to build an evolutionary tree in R language". If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report