Duke Wiki  logo
Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Maximum likelihood (ML) is based upon calculating the probability of observed data given a hypothesis. The alternative hypotheses in phylogenetic inference are all the various trees that can be drawn for a set of taxa. In an ML search, we aim to find the tree that, given our evolutionary model, results in the highest likelihood of obtaining the data we observe. Thus, we are maximizing the likelihood of the data under the model for evolution that we choose, which is covered in more detail in Section 2.5.

ML is a different approach to tree inference than parsimony, but there are important similarities. For example, both ML and parsimony search tree space in similar ways. When trees become large, one must often devise a strategy for sampling a subset of all possible trees, because no computer is fast enough to sample the huge number of possible trees, which increase exponentially as a function of number of taxa. In addition, ML shares with parsimony the goal of finding the "best" tree topology. The difference is that, rather than finding the tree that minimizes the number of inferred changes, ML finds the tree that maximizes the likelihood of the data.

That sounds confusing at first, but can be understood if you consider that simply counting changes fails to make use of all the information available. For example, the parsimony method effectively assumes that the probability of a change happening is the same regardless of how long ago two taxa diverged. This, of course, is an unrealistic assumption, because taxa that diverged in the more distant past have had more time for evolutionary changes to take place, assuming there is a roughly constant probability of change over time. ML uses an explicit probability model that can effectively account for the greater number of evolutionary changes that are expected on long branches relative to short ones (see Section 2.5).

Maximum likelihood example in R

In later sections, we will use R and other programs to select a model of evolution, and as part of that process, we will infer a phylogeny using maximum likelihood. Before proceeding, however, it is worth noting that the R package 'phangorn', which was used in the previous two sections, provides some simple tools to compare the likelihood of the data under different models of evolution or among different phylogenies. Keep in mind that we are using a small dataset of just 150 nucleotides, and this is not meant to be a definitive analysis. Rather, this example simply illustrates that many fundamental steps of a maximum likelihood analysis can be conducted easily in R. First, load the two packages we will need for this section (see subsection 1.1.3 for instructions on installing packages): 


The phangorn function 'pml' provides a way to compute the likelihood of the data given a phylogenetic tree and evolutionary model. Drawing on the trees obtained in Section 2.2, let's do some simple likelihood computations (you will need to first complete Section 2.2 to run these analyses).