looping over caper::phylo.d()
If you're using caper::phylo.d() to estimate the phylogentic signal in a series of traits, you may want to use a for-loop, map() or apply() to do it consecutively over a set of traits instead of one at a time. However, if you've ever tried this you may have encountered a problem with the argument binvar. It is not able to digest what it is fed from a for-loop, apply() etc as the correct kind of input for that argument. You'll get one of these two type of errors:
Error in caper::phylo.d(compdat, ...) : 'var' is not a variable in data..
Error in caper::phylo.d(data = compdat, binvar = "var") :
'"var"' is not a variable in data.
However! Don't worry! Help is at hand!
As was noticed by the GitHub user MaxKerney, this has to do with this line of the phylo.d() function.
binvar <- deparse(substitute(binvar))
I struggled with how to go about this, so I contacted the package maintainer David Orme who was kind enough to join the GitHub-thread where some of us with this problem were hanging out and offer a solution. The solution is based on using eval() and substitute() to wrap around the phylo.d() function to make the input to the binvar argument evaluate to the right type.
Here is an example of the solution in use with a random small tree generated from rtree() and some made up data. What you'll get is a data frame with the D-estimate, Pval1 and Pval0 of each trait in a separate row. The example was made by Orme.
df <- tree$tip.label %>%
as.data.frame() %>%
rename(tip.label = ".") %>%
mutate(var1 = c("a", "b", "b", "b", "b", "b", "b", "b", "b", "b"), var2= c(c("a", "b", "b", "b", "b", "b", "b", "b", "b", "a")))
# Build the comparative dataset once
ds <- comparative.data(tree, df, names.col=tip.label)
vars <- colnames(ds$data)
# Create the rows all at once, to avoid rbind (real performance hit in
# larger examples) although it does mean having to loop over indices not names below.
result_df <- tibble(Feature = character(nvar), Destimate = numeric(nvar), Pval1 = numeric(nvar),
Pval0 = numeric(nvar))
for (idx in seq_along(vars)) {
output <- eval(substitute(phylo.d(data = ds, binvar = this_var), list(this_var=as.name(var))))
result_df[idx, 2] <- output$DEstimate
result_df[idx, 3] <- output$Pval1
result_df[idx, 4] <- output$Pval0}