Duke Wiki  logo
Page tree
Skip to end of metadata
Go to start of metadata


Fritz and Purvis' test in R


Here we will look for phylogenetic signal in primate activity period (diurnal vs. nocturnal) using D (Fritz and Purvis 2010) in caper. Due to the publication date of this paper and delays in publishing the book, this method does not appear in the Comparative Approach in Evolutionary Anthropology and Biology, but is worth knowing.

To begin, load the R package 'caper' (Orme et al. in press; see Section 1.1.2 for instructions on installing packages):



Next download the data and tree, and load them into R after navigating to the folder with the downloaded files:



primatedata <-read.table("Primatedata2.txt", sep = "\t", header =TRUE)
primatetree <-read.nexus("consensusTree_10kTrees_Version2.nex")



caper requires your phylogeny and your data to be in a special kind of R object called a comparative data object. This will match the species names in the tree to those in your data automatically so there is no need to worry about data being ordered correctly etc. phy is your tree, data your data set, and names.col is the name of the column containing your species names.



primate <- comparative.data(phy = primatetree, data = primatedata, names.col = Binomial, vcv = TRUE, na.omit = FALSE, warn.dropped = TRUE)



NOTE: vcv = TRUE stores a variance covariance matrix of your tree. na.omit = FALSE stops the function from removing species without data for certain variables. warn.dropped = TRUE will show you any species which are not in the tree and the dataset and are therefore dropped from the data object. Make sure you check this list is what you expected, it may reveal typos in your species names. If you want to turn this off use warn.dropped = FALSE. This function does all the matching of data and tree for you.


Estimating D

D is a measure of phylogenetic signal for binary traits (Fritz & Purvis, 2010) and is calculated as follows: D = (dobs – mean(db))/(mean(dr) - mean(db)). dobs is equal to the number of character state changes required to get the observed distribution of character states at the tips of the phylogeny. To make dobs comparable among different trees and datasets, it is scaled using two null distributions: dr and dbdr is the distribution of d values obtained from 1000 permutations where the number of species with each character state is kept constant, but the values are shuffled on the tips of the phylogeny. Thus dr is the expected distribution of d values if character states are randomly distributed among species without respect to phylogeny. db, on the other hand, is the expected distribution of d values if character states are distributed among species under the expectations of Brownian motion model of evolution. db is generated by simulating a continuous trait along the phylogeny then defining the character state at each tip according to some threshold value of the continuous trait. The threshold is chosen to ensure that the number of tips with each character state remains the same as in the observed data. d is then calculated and the process is repeated 1000 times to get a distribution of d values (Fritz & Purvis, 2010).


D is 1 if the distribution of the binary trait is random with respect to phylogeny, and greater than 1 if the distribution of the trait is more overdispersed than the random expectation. D is 0 if the binary trait is distributed as expected under the Brownian motion model of evolution, and less than 0 if the binary trait is more phylogenetically conserved than the Brownian expectation. The distributions dr and db can also be used to assign p-values to dobs, i.e., if dobs is larger than 95% of dr values then the distribution of the trait is significantly more overdispersed than the random expectation, if dobs is less than 95% of db values, the character is significantly more clumped than the Brownian expectation.



To estimate D we use the function 'phylo.d':

result = phylo.d(data=primate, binvar = Nocturnal, permut = 1000)


To view the output type "result" and press enter. The output should look something like this, subject to slight differences due to stochastic factors associated with the simulation process:


Calculation of D statistic for the phylogenetic structure of a binary variable
  Data :  primatedata
  Binary variable :  Nocturnal

Phylogeny : primatetree

  Number of permutations :  1000


Estimated D :  -0.7107697
Probability of E(D) resulting from no (random) phylogenetic structure :  0
Probability of E(D) resulting from Brownian phylogenetic structure    :  0.98


D is significantly < 1 but not significantly different from the Brownian expectation (D = 0).





Fritz, S. A., and A. Purvis. 2010. Selectivity in Mammalian Extinction Risk and Threat Types: a New Measure of Phylogenetic Signal Strength in Binary Traits. Conservation Biology 24:1042-1051.

Orme, C. D. L., Freckleton, R. P., Thomas, G. H., Petzoldt, T., Fritz, S. A. & Isaac, N. J. B. in press. caper: Comparative Analyses of Phylogenetics and Evolution in R. Methods in Ecology and Evolution.

Contributed by Natalie Cooper

  • No labels