Duke Wiki  logo
Page tree
Skip to end of metadata
Go to start of metadata

 

Calculating Independent Contrasts in R (caper, ape)

In this section, we will use R to calculate contrasts for a small dataset on home range size and group size in five species of primates, which corresponds to the example in Chapter 7 of the book (although here, we use log-transformed data). Different R packages are available that can facilitate the calculation of independent contrasts. Here, we will use the package caper (Orme et al in press).

To begin, load the R package caper (Orme et al in press; see Section 1.1.2 for instructions on installing packages):

library("caper")
 

 

Next download the data set and the phylogeny below:

 

 

 

 

Next, navigate to the folder containing the files you downloaded and read the tree file into R: 

 

tree_list = read.nexus("ProbSet5.tree.nex")

 

How many trees are in this tree file? How many tips does the 1st tree have? What is the variance in the branch lengths? Use the 'summary' function, recalling that you can access particular trees with the double brackets (e.g., type "tree_list[[1]]")

 

Let’s plot the second tree in the file by passing that tree to a new variable, “tree”, as follows:

 

tree = tree_list[[2]]
plot(tree)

 

Now, let’s read in the data:

 

ds = read.table("ProbSet5.data.txt", header=T)

Type "ds" to see your data. The 'caper' package requires your phylogeny and data to be in a special kind of R object called a comparative data object. This will match the species names in the tree to those in your data automatically so there is no need to worry about data being ordered correctly (as in some of the other packages in R). For the arguments, "phy" is your tree, "data" is your data set, and "names.col" is the name of the column containing your species names.

 

     primate = comparative.data(phy = tree, data = ds, names.col = Species, vcv = TRUE, na.omit =        FALSE, warn.dropped = TRUE)

NOTE: "vcv = TRUE" stores a variance covariance matrix of your tree. It is not needed for independent contrasts, but you will need this for pgls later. "na.omit = FALSE" stops the function from removing species without data for certain variables. "warn.dropped = TRUE" will tell you if there are any species which are not in the tree and the dataset and are therefore dropped from the data object. You can see the list of dropped species by typing primate$dropped. Make sure you check this list is what you expected, it may reveal typos in your species names. If you want to turn this off use "warn.dropped = FALSE". Note that the comparative.data function does all the matching of data and tree for you.

Now we can perform a regression using independent contrasts using the function 'crunch' to investigate the correlation between log home range size and log female body mass using independent contrasts.

 

      model.ic = crunch(log(HomeRange) ~ log(FemMass), data = primate) 
summary(model.ic)

 

The output should look like this. Note the significant positive correlation and high r-squared. Also note that here we take the log of the data, whereas in the book the raw data is used. By removing the two "log" terms above, you will obtain nearly the same results as presented in the book.

 

Call:
lm(log(HomeRange) ~ log(FemMass) - 1, data = contrData)
Residuals:
        6         7         8         9 
-0.042529  0.192080  0.004324 -0.023918 
Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
log(FemMass)   1.3292     0.2741   4.849   0.0167 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Residual standard error: 0.1144 on 3 degrees of freedom
Multiple R-squared: 0.8869,     Adjusted R-squared: 0.8492 
F-statistic: 23.52 on 1 and 3 DF,  p-value: 0.01673 

 

 

You can examine the contrasts using the function 'caic.table' with the model as output, as in:

caic.table(model.ic)

 

Note that the table provides the standardized contrasts for each of the variables, along with the variance (the branch length connecting the species or nodes), the number of descendent lineages from each node, the node depth, and the studentized residuals (used to identify outliers).

 

Now, let’s display the independent contrasts for each node on the tree:

 

plot(tree)
primatecontrasts<-caic.table(model.ic)  #this extracts the contrasts from model.ic
nodelabels(round(primatecontrasts[,1], 3), adj = c(0, -0.5),frame = "n")
nodelabels(round(primatecontrasts[,2], 3), adj = c(0, 1),frame = "n")

 

 

 

To display the regression results and the contrasts graphically:

 plot(primatecontrasts[,1] ~ primatecontrasts[,2], xlab = "FemMass", ylab = "HomeRange")

 

 abline(model.ic)
 

Now you can run an independent contrasts analysis in R!

 

 

Other considerations

Outliers

Outliers can seriously affect the parameter estimates for any regression model and independent contrasts regressions are no exception. Luckily caper has an inbuilt option which automatically removes outliers with studentized residuals >±3 (Jones and Purvis 1997), using the argument "robust=TRUE".

 

model.ic2<-crunch(log(HomeRange) ~ log(FemMass), data = primate, robust = TRUE)
summary(model.ic2)

 

 

Note that one outlier has automatically been removed.

 

Model diagnostic plots

It is generally worth checking model diagnostic plots whenever you fit a model in R to check that your data meet the assumptions of linear modelling. The method for this is the same for OLS, independent contrasts and PGLS models (though the graphs are slightly different):

par(mfrow=c(2,2)) #so you can see all 4 plots at once
plot(model.ic)
par(mfrow=c(1,1)) #to reset the graphic window to just one graph

 

 


Here are the model diagnostics for our model above, and referred in text below to as 1,2 on top, 3,4 on bottom:

 

 

Here is what you want to see in these plots:

1 and 3) A

fairly random scattering of points. You want to avoid any clear patterns.

2) The points of the Q-Q plot should approximately form a straight line (rather than a banana shape).

4) No points with high leverage (i.e., very high influence over the regression line). These points will fall outside the dotted red lines. Note the one outlier which was removed (using robust = TRUE) in the outlier removal example above.

 

 

It takes practice to know what is “good”, “bad” and “acceptable” with these plots. See an R or statistics book for further details on these plots and model checking in general.

 

References

 

 

Jones, K. E. & Purvis, A. 1997. An optimum body size for mammals? Comparative evidence from bats. Funct. Ecol. 11: 751-756.

Orme, C. D. L., Freckleton, R. P., Thomas, G. H., Petzoldt, T., Fritz, S. A. & Isaac, N. J. B. in press. caper: Comparative Analyses of Phylogenetics and Evolution in R. Methods Ecol. Evol.

 

 

Contributed by Natalie Cooper and Charlie Nunn

  • No labels