Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
title: 'Project #31 - Data Extraction'
output:
  html_document: default
  toc: TRUE
always_allow_html: yes
![](https://www.idiv.de/fileadmin/content/Files_sDiv/sDiv_Workshops_Photos_Docs/sDiv_WS_Documents_sPlot/splot-long-rgb.png "sPlot Logo")

Timestamp: r date()
Drafted: Francesco Maria Sabatini
Version: 1.0

This report documents the data extraction for sPlot project proposal #31 - The adaptive value of xylem physiology within and across global ecoregions as requested by Daniel Laughlin and Jesse Robert Fleri

library(tidyverse)

library(knitr)
library(kableExtra)
library(viridis)
library(grid)
library(gridExtra)

#library(vegan)
#library(xlsx)
#library(caret)
#library(foreign)
#library(raster)

library(downloader)
library(sp)
library(sf)
library(rgdal)
library(rgeos)

#save temporary files
write("TMPDIR = /data/sPlot/users/Francesco/_tmp", file=file.path(Sys.getenv('TMPDIR'), '.Renviron'))
write("R_USER = /data/sPlot/users/Francesco/_tmp", file=file.path(Sys.getenv('R_USER'), '.Renviron'))
#rasterOptions(tmpdir="/data/sPlot/users/Francesco/_tmp")

Import data from sPlot 3.0

#Import sPlot data
load("/data/sPlot/releases/sPlot3.0/header_sPlot3.0.RData")
load("/data/sPlot/releases/sPlot3.0/DT_sPlot3.0.RData")
load("/data/sPlot/releases/sPlot3.0/Traits_CWMs_sPlot3.RData")
load("/data/sPlot/releases/sPlot3.0/SoilClim_sPlot3.RData")

Import data on xylem traits, provided by Jesse Robert Fleri on October 26th, 2020.

load("xylem_data.RData")

1 Extract plots from sPlot based on species with xylem traits

Extract all plots containing at least one species in the xylem list.

species_list <- xylem_data$Species
plot.sel <- DT2 %>%
  filter(DT2$species %in% species_list) %>%
  dplyr::select(PlotObservationID) %>%
  distinct() %>%
  pull(PlotObservationID)

#exclude plots without geographic information
header.xylem <- header %>%
  filter(PlotObservationID %in% plot.sel) %>% 
  filter(!is.na(Latitude))
#refine plot.sel
plot.sel <- header.xylem$PlotObservationID

DT.xylem <- DT2 %>% 
  filter(taxon_group %in% c("Vascular plant", "Unknown")) %>% 
  filter(PlotObservationID %in% plot.sel)

Out of the r length(species_list) species in the sRoot list, r sum(unique(DT2$species) %in% species_list) species are present in sPlot, for a total of r nrow(DT.xylem %>% filter(species %in% species_list)) records, across r length(plot.sel) plots.

2 Extract woody species

This is partial selection, as we don't have information on the growth form of all species in sPlot

#load list of woody species, as provided to me by Alexander Zizka, within sPlot project #21
load("../Project_21/_input/evowood_species_list.rda")

#Select all woody species and extract relevant traits from TRY
woody_species_traits <- sPlot.traits %>%
  dplyr::select(species, GrowthForm, is.tree.or.tall.shrub, n,
                starts_with("StemDens"),
                starts_with("Stem.cond.dens"), 
                starts_with("StemConduitDiameter"),
                starts_with("LDCM"),
                starts_with("SLA"),
                starts_with("PlantHeight"), 
                starts_with("Wood"), 
                starts_with("SpecificRootLength_mean")) %>%
  filter( (species %in% species_list) |
          (species %in% synonyms$name_binomial) |
          grepl(pattern = "tree|shrub", x = GrowthForm) |
          is.tree.or.tall.shrub==T
              ) %>% 
  #counter proof - exclude species NOT herb
  filter(GrowthForm != "herb" | is.na(GrowthForm))

table(woody_species_traits$GrowthForm, exclude=NULL)
# MEMO: some standardization needed in sPlot 3.0
#
# Using data from A.Zizka whhen selecting species
# improves the selection only marginally (from ~21k to ~22k)
knitr::kable(woody_species_traits %>% 
               sample_n(20), caption="Example of gap-filled trait data from TRY (20 randomly selected species)") %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

Selected traits are:

  • StemDens - 4 - Stem specific density (SSD) or wood density (stem dry mass per stem fresh volume) (g/cm^3)
  • SLA - 11 - Leaf area per leaf dry mass (specific leaf area, SLA or 1/LMA)
  • PlantHeight - 18 - Plant height (vegetative + generative)
  • StemDiam - 21 - Stem diameter (m)
  • Stem.cond.dens - 169 - Stem conduit density (vessels and tracheids) (mm^-2)
  • StemConduitDiameter - 281 - Stem conduit diameter (vessels, tracheids)_micro (m)
  • Wood.vessel.length - 282 - Wood vessel element length; stem conduit (vessel and tracheids) element length_micro (m)
  • WoodFiberLength - 289 - Wood fiber lengths_micro (m)
  • SpecificRootLength - 1080 - Root length per root dry mass (specific root length, SRL) (cm/g)

Codes correspond to those reported in TRY

#subset DT.xylem to only retain woody species
DT.xylem <- DT.xylem %>% 
  filter(species %in% (woody_species_traits$species))
nrow(DT.xylem)

Merge relative cover across vegetation layers, if needed, and normalize to 1 (=100%)

###combine cover values across layers
combine.cover <- function(x){
    while (length(x)>1){
      x[2] <- x[1]+(100-x[1])*x[2]/100
      x <- x[-1]
    }
    return(x)
}

DT.xylem <- DT.xylem %>%
  dplyr::select(PlotObservationID, species,Layer, Relative.cover) %>%
  # normalize relative cover to 100
  left_join({.} %>% 
              group_by(PlotObservationID, Layer) %>% 
              summarize(Tot.Cover=sum(Relative.cover), .groups="drop"), 
            by=c("PlotObservationID", "Layer")) %>% 
  mutate(Relative.cover=Relative.cover/Tot.Cover) %>% 
  group_by(PlotObservationID, species) %>%
  summarize(Relative.cover=combine.cover(Relative.cover), .groups="drop") %>%
  ungroup()

nrow(DT.xylem)

3 Calculate CWMs and trait coverage

Calculate CWM and trait coverage for each trait and each plot. Select plots having more than 80% coverage for at least one trait.

# Merge species data table with traits
CWM.xylem0 <- DT.xylem %>%
  as_tibble() %>%
  dplyr::select(PlotObservationID, species, Relative.cover) %>%
  left_join(xylem_data %>%
              dplyr::rename(species=Species) %>%
              dplyr::select(species, P50, Ks), 
            by="species")

# Calculate CWM for each trait in each plot
CWM.xylem1 <- CWM.xylem0 %>%
  group_by(PlotObservationID) %>%
  summarize_at(.vars= vars(P50:Ks),
               .funs = list(~weighted.mean(., Relative.cover, na.rm=T)), 
               .groups="drop") %>%
  dplyr::select(PlotObservationID, order(colnames(.))) %>%
  pivot_longer(-PlotObservationID, names_to="trait", values_to="trait.value")

# Calculate coverage for each trait in each plot
CWM.xylem2 <- CWM.xylem0 %>%
  mutate_at(.funs = list(~if_else(is.na(.),0,1) * Relative.cover), 
            .vars = vars(P50:Ks)) %>%
  group_by(PlotObservationID) %>%
  summarize_at(.vars= vars(P50:Ks),
               .funs = list(~sum(., na.rm=T)), 
               .groups="drop") %>%
  dplyr::select(PlotObservationID, order(colnames(.))) %>%
  pivot_longer(-PlotObservationID, names_to="trait", values_to="trait.coverage")

# Calculate CWV
variance2.fun <- function(trait, abu){
  res <- as.double(NA)
  #nam <- nam[!is.na(trait)]
  abu <- abu[!is.na(trait)]
  trait <- trait[!is.na(trait)]
  abu <- abu/sum(abu)
  if (length(trait)>1){
    # you need more than 1 observation to calculate
    # skewness and kurtosis
    # for calculation see 
    # http://r.789695.n4.nabble.com/Weighted-skewness-and-curtosis-td4709956.html
    m.trait <- weighted.mean(trait,abu)
    res <- sum(abu*(trait-m.trait)^2)
  }
  res
}

CWM.xylem3 <- CWM.xylem0 %>%
  group_by(PlotObservationID) %>%
  summarize_at(.vars= vars(P50:Ks),
               .funs = list(~variance2.fun(., Relative.cover))) %>%
  dplyr::select(PlotObservationID, order(colnames(.))) %>%
  pivot_longer(-PlotObservationID, names_to="trait", values_to="trait.variance")

## Calculate proportion of species having traits
CWM.xylem4 <- CWM.xylem0 %>%
  group_by(PlotObservationID) %>%
  summarize_at(.vars= vars(P50:Ks),
               .funs = list(~sum(!is.na(.)))) %>%
  dplyr::select(PlotObservationID, order(colnames(.))) %>%
  pivot_longer(-PlotObservationID, names_to="trait", values_to="trait.nspecies")

# Join together
CWM.xylem <- CWM.xylem1 %>%
  left_join(CWM.xylem2, by=c("PlotObservationID", "trait")) %>%
  left_join(CWM.xylem3, by=c("PlotObservationID", "trait")) %>%
  left_join(CWM.xylem4, by=c("PlotObservationID", "trait")) %>%
  left_join(CWM.xylem0 %>% 
              group_by(PlotObservationID) %>%
              summarize(sp.richness=n()), by=c("PlotObservationID"), 
            .groups="drop") %>%
  mutate(trait.coverage.nspecies=trait.nspecies/sp.richness) %>%
  #filter(trait.coverage>=0.8) %>%
  arrange(PlotObservationID)
knitr::kable(CWM.xylem[1:20,], caption="Example of CWM data file") %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")
CWM.xylem08 <- CWM.xylem %>%
  filter(trait.coverage>=0.8)
knitr::kable(CWM.xylem08 %>%
               group_by(trait) %>%
               summarize("num.plots"=n(),.group="drop"), caption="Number of plots with >=.8 coverage per trait") %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")
#Create list of plots having at least one trait with >=.8 coverage and extract header data

#plot80perc <- (CWM.xylem %>%
#  dplyr::select(PlotObservationID) %>%
#  distinct())$PlotObservationID

#DT.xylem08 <- DT.xylem %>%
#  filter(PlotObservationID %in% header.xylem$PlotID)

#CWM.xylem <- CWM.xylem %>%
#  filter(PlotObservationID %in% header.xylem$PlotID) 

Completeness of header data

knitr::kable(data.frame(Completeness_perc=colSums(!is.na(header.xylem))/
                          nrow(header.xylem)*100)[-c(1,2),,drop=F], 
             caption="Header file - Columns present and % completeness") %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

The process results in r nrow(header.xylem) plots selected, for a total of r nrow(CWM.xylem) trait * plot combinations.

Geographical distribution of plots

countries <- map_data("world")
ggworld <- ggplot(countries, aes(x=long, y=lat, group = group)) +
  geom_polygon(col=NA, lwd=3, fill = gray(0.9)) +
  geom_point(data=header.xylem, aes(x=Longitude, y=Latitude, group=1), col="red", alpha=0.5, cex=0.7, shape="+") + 
  theme_bw()
ggworld

Summarize data across data sets in sPlot, and create list of data custodians

db.out <- read_csv("/data/sPlot/users/Francesco/_sPlot_Management/Consortium/Databases.out.csv") %>%
  dplyr::select(`GIVD ID`, Custodian)
data.origin <- header.xylem %>% 
   group_by(`GIVD ID`) %>% 
   summarize(Num.plot=n(), .groups="drop") %>%
   left_join(db.out)
knitr::kable(data.origin, caption="Data Origin") %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

The data derive from r nrow(data.origin) datasets.

4 Extract climate and soils data

soilclim.xylem <- soilclim %>% 
  filter(PlotObservationID %in% plot.sel) %>% 
  rename(Elevation=Elevation_median, -Elevation_q2.5, -Elevation_q97.5)
knitr::kable(soilclim.xylem %>% 
               sample_n(20), caption="Example of climatic and soil variables for 20 randomly selected plots. All values represent the mean in a circle centered on the plot coordinates, having a radius equal to the plot's location uncertainty (capped to 50 km for computing reasons). Sd is also reported.") %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

The procedure used to obtain these environmental predictors is described here (Click to download the report)

Bioclimatic variables (bio01-bio19) derive from CHELSA

Codes:

Bio1 = Annual Mean Temperature
Bio2 = Mean Diurnal Range
Bio3 = Isothermality
Bio4 = Temperature Seasonality
Bio5 = Max Temperature of Warmest Month
Bio6 = Min Temperature of Coldest Month
Bio7 = Temperature Annual Range
Bio8 = Mean Temperature of Wettest Quarter
Bio9 = Mean Temperature of Driest Quarter
Bio10 = Mean Temperature of Warmest Quarter
Bio11 = Mean Temperature of Coldest Quarter
Bio12 = Annual Precipitation
Bio13 = Precipitation of Wettest Month
Bio14 = Precipitation of Driest Month
Bio15 = Precipitation Seasonality
Bio16 = Precipitation of Wettest Quarter
Bio17 = Precipitation of Driest Quarter
Bio18 = Precipitation of Warmest Quarter
Bio19 = Precipitation of Coldest Quarter

\newline \newline

Soil variables (5 cm depth) derive from the ISRIC dataset, downloaded at 250-m resolution

CECSOL Cation Exchange capacity of soil
CLYPPT Clay mass fraction in %
CRFVOL Coarse fragments volumetric in %
ORCDRC Soil Organic Carbon Content in g/kg
PHIHOX Soil pH x 10 in H20
SLTPPT Silt mass fraction in %
SNDPPT Sand mass fraction in %
BLDFIE Bulk Density (fine earth) in kg/m3
P.ret.cat Phosphorous Retention - Categorical value, see ISRIC 2011-06
\newline \newline

5 Export & SessionInfo

save( woody_species_traits, DT.xylem, CWM.xylem, header.xylem, 
      file="_derived/Xylem_sPlot.RData" )
sessionInfo()