This report documents the data extraction for **sPlot project proposal #31** - *Global variation in fine root traits along climate and soil gradients* as requested by Daniel Laughlin and Alexandra Weigelt.
This R script:
1) Extracts plots from sPlot based on species with root traits,
2) Calculates CWM for plots with >80% trait coverage,
3) Extracts climate and soils data,
4) Classifies plots into broad vegetation types based on species' affinities,
5) Shows the distribution of these root traits across vegetation types (forest, grassland, etc)
6) Returns scatterplots of root traits along climate and soil variables.
# 1 Extract plots from sPlot based on species with root traits
Import data on root traits provided by the sRoot consortium (v 30 sept. 2019). Extract all plots containing at least one species in sRoot list. ~~(Matching could be marginally improved if checking names in sRoot's list with TPL).~~
Out of the `r length(species_list)` species in the sRoot list, `r sum(unique(DT2$species) %in% species_list)` species are present in sPlot, for a total of `r length(unique(DT.sRoot$PlotObservationID))` records.
# 2 Calculate CWMs
Calculate CWM and trait coverage for each trait and each plot. Select plots having more than 80% coverage for at least one trait.
Download and unzip rasters from [Global Aridity Index and Potential Evapotranspiration (ET0) Climate Database v2](https://figshare.com/articles/Global_Aridity_Index_and_Potential_Evapotranspiration_ET0_Climate_Database_v2/7504448/3)
The basic idea for the plots (true for forests, not fully true for grasslands, admittedly) is that N supply is determined by the decomposition of the organic matter and the nitrogen mineralisation during the decomposition.
That means that we discount (1) nitrogen deposition; (2) nitrogen fixation (assuming that very few plots are so dominated by legumes that they overwhelm the signal of nitrogen through mineralisation), (3) application of nitrogen fertiliser.
We also assume that most plots will be nitrogen-limited, so losses of nitrogen (when mineralisation is in excess of plant nitrogen demand) is equally discounted in the estimate of nitrogen supply. Under that assumption we can estimate nitrogen supply from soil organic matter.
In order to do that we need a brief recap which factors affect nitrogen mineralisation, factors that can be easily included in the model.
*Factor 1 - Temperature*
For simplicity we assume that the average annual soil temperature in the uppermost 10-20 cm is equal to the average annual temperature. From models on decomposition we chose a model developed by Yang (a PhD from Wageningen). His temperature function is:
(1) At average temperatures below 0 °C there is no decomposition and N mineralisation (as there is no water which is crucial for both enzyme activity and accessibility of soil organic matter in the soil solution). The few arctic and alpine plots will then possibly have estimates of zero N supply.
(2) The model for temperature works for an average temperature of 9° C, so that has the standard value of 1. For other temperatures we then apply a correction factor.
(3) For temperatures between 0 °C and 9 °C, we use: F = 0.1 (T+1); so the actual model yields 0 at T = -1 °C and f = 1 at 9 °C
(4) For temperatures between 10 and 27 °C we use F = 2^(T-9)/9. So at 9 ° C the formula against yields f = 1; for 18 °C the formula yields f = 2 (which suggests the formula has a Q10, aka temperature sensitivity or slightly over 2).
*Factor 2 - pH and Soil C org*
The second correction factor has to do with what is called protection of organic matter by mineral association. Mineral association protects negatively charged organic matter, if the minerals have positive charge. (There is some protection of positively charged organic matter, e.g. basic amino acids, on negatively charged mineral particles such as clay). However, the database might not be very accurate for clay content, and would have not data for different kinds of clay. So we might overcome this by again simplifying assumptions. I assume that protection is mainly be metal oxides and these have a more positive charge at lower pH (and hence organic matter is better protected against decomposition at lower pH). So I propose to include a second protection factor, based on pH. This simplification is also in QUEFTS and a paper by Sattari et al. (2014) that is based on QUEFTS, a model to estimate soil nutrient supply for agricultural soils, so as to calculate which part of plant nutrient demand can be covered by soil supply rather than fertilizer supply.:
SN = αN * FN * Corg
with
αN: an empirical parameter, set at 6.8
FN: pH-dependent coefficient to take into account the availability of organic matter in relation to mineral association
Corg : the fraction soil organic matter in mg kg-1. C is 50% of SOM content.
The factor N is defined as:
FN = 0.4 for pH < 4.7
FN = 0.25*(pH-3) for 4.7 < pH < 7
FN = 1 for pH > 7
The combination of Factor 1 and Factor 2 would give a reasonable estimate of N supply
Extract data on ecoregion and join into the `env.sRoot` data.frame. Ecoregion derive from [Olson et al. (2001)](https://doi.org/10.1641/0006-3568(2001)051[0933:TEOTWA]2.0.CO;2).
Bioclimatic variables (bio01-bio19) derive from [CHELSA](http://chelsa-climate.org/bioclim/)
Codes:
Bio1 = Annual Mean Temperature
Bio2 = Mean Diurnal Range
Bio3 = Isothermality
Bio4 = Temperature Seasonality
Bio5 = Max Temperature of Warmest Month
Bio6 = Min Temperature of Coldest Month
Bio7 = Temperature Annual Range
Bio8 = Mean Temperature of Wettest Quarter
Bio9 = Mean Temperature of Driest Quarter
Bio10 = Mean Temperature of Warmest Quarter
Bio11 = Mean Temperature of Coldest Quarter
Bio12 = Annual Precipitation
Bio13 = Precipitation of Wettest Month
Bio14 = Precipitation of Driest Month
Bio15 = Precipitation Seasonality
Bio16 = Precipitation of Wettest Quarter
Bio17 = Precipitation of Driest Quarter
Bio18 = Precipitation of Warmest Quarter
Bio19 = Precipitation of Coldest Quarter
Soil variables derive from the [ISRIC dataset](https://www.isric.org/), downloaded at 250-m resolution
CECSOL Cation Exchange capacity of soil
CLYPPT Clay mass fraction in %
CRFVOL Coarse fragments volumetric in %
ORCDRC Soil Organic Carbon Content in g/kg
PHIHOX Soil pH x 10 in H20
SLTPPT Silt mass fraction in %
SNDPTT Sand mass fraction in %
BLDFIE Bulk Density (fine earth) in kg/m3
P.ret.cat Phosphorous Retention - Categorical value, see [ISRIC 2011-06](https://www.isric.org/documents/document-type/isric-report-201106-global-distribution-soil-phosphorus-retention-potential)
N.f1 - Nitrogen factor 1
N.f2 - Nitrogen factor 2
N.est - N.f1 * N.f2 (See description above)
Aridity Index and Potential Evapotranspiration are downloaded from [Global Aridity Index and Potential Evapotranspiration (ET0) Climate Database v2](https://figshare.com/articles/Global_Aridity_Index_and_Potential_Evapotranspiration_ET0_Climate_Database_v2/7504448/3) at 30 arc-seconds.
Trabucco, Antonio; Zomer, Robert (2019): Global Aridity Index and Potential Evapotranspiration (ET0) Climate Database v2. figshare.
# 4 Classifiy plots into broad vegetation types based on species' affinities
sPlot has two independent systems for classifying plots to vegetation types. The first, classifies plots into forest and non-forest, based on the share of phanerophytes. The second system classifies plots into broad habitat types and relies on the expert opinion of data contributors. This is, unfortunately, not consistently available across all plots, being the large majority of classified plots only available for Europe. These broad habitat types are coded using 5, non-mutually exclusive dummy variables:
1) Forest - F
2) Grassland - G
3) Shrubland - S
4) Sparse vegetation - B (Bare)
5) Wetland - W
A plot may belong to more than one formation, e.g. a Savannah is categorized as Forest + Grassland (FG).
On top of the native (but incomplete) classification of plots into vegetation types, we ran ~~two~~ *one* classifications based on ~~a) Cluster analysis, or b)~~ species' affinities
*from v1.8 on, classification based on cluster analysis is discontinued*
Species affinities were assigned based on expert-opinion. Based on the relative cover of each species, we summarized species based on their habitat affinities, and summarized their relative cover. We then assigned each plot to an habitat based on the following ifelse conditions:
wetland>0.5 -> "Wetland"
forest>0.3 -> "Forest"
grassland+heathland+steppe>0.7 -> "Grassland"
"Other", otherwise.
```{r }
# Added in v1.4 - Add results of cluster analysis
# based on beal's smoothing as performed by Helge Bruelheide
#Build a confusion matrix to evaluate the comparison
u <- union(Habitat.compare$TopHabitat, Habitat.compare$HabitatObs)
t <- table( factor(Habitat.compare$TopHabitat, u), factor(Habitat.compare$HabitatObs, u))
confm <- caret::confusionMatrix(t)
```
```{r echo=F}
knitr::kable(confm$table, caption="Confusion matrix between sPlot's native classification of habitats (columns), and classification based on species' habitat affinity (rows)") %>%
Formulas of associated statistics are available on the help page of the [caret package](https://www.rdocumentation.org/packages/caret/versions/6.0-84/topics/confusionMatrix) and associated references.
The overall accuracy of the classification based on species' habitat affinity, when tested against sPlot's native habitat classification is `r round(confm$overall[1],2)`, the Kappa statistics is `r round(confm$overall[2],2)`.
```{r echo=F}
knitr::kable(t(confm$byClass), caption="Associated statistics of confusion matrix by class") %>%