diff --git a/code/06_buildDT.Rmd b/code/06_buildDT.Rmd index 605b102fb1238e9cd79bb9e3de46f590ea238ad5..1979e32765363c7a16698fc9de9f36198b3ac509 100644 --- a/code/06_buildDT.Rmd +++ b/code/06_buildDT.Rmd @@ -17,12 +17,12 @@ output: html_document This report documents the construction of the DT table for sPlot 3.0. It is based on dataset sPlot_3.0.2, received on 24/07/2019 from Stephan Hennekens. -Caution: Layer information is not available for each species in each plot. In case of missing information Layer is set to zero. +Caution: Layer information is not available for all species in each plot. In case of missing information Layer is set to zero. -*Changes in version 1.1* - -1) Added explanation of fields -2) Fixed `taxon_group` of Friesodielsia - +*Changes in version 1.1* +1) Added explanation of fields +2) Fixed `taxon_group` of Friesodielsia +3) Only export the fields `Ab_scale` and `Abundance` ```{r results="hide", message=F, warning=F} knitr::opts_chunk$set(echo = TRUE) library(tidyverse) @@ -74,7 +74,7 @@ DT0 <- DT0 %>% ``` -Species data include `r nrow(DT0)` species * plot records, across `r nplots` plots. Before taxonomic resolution, there are `r nspecies` species . +The DT table includes `r nrow(DT0)` species * plot records, across `r nplots` plots. Before taxonomic resolution, there are `r nspecies` species . \newline @@ -301,9 +301,9 @@ DT1 %>% slice(1:40) ``` -## Calculate relative cover per layer per species in each plot +## Standardize abundance values -Species abundance information varies across datasets and plots. While for the large majority of plots abundance values are returned as percentage cover, there is a subset where abundance is returned with different scales. These are marked in the column `Cover code` as follows: +Species abundance information varies across datasets and plots. While for the large majority of plots abundance values are returned as percentage cover, there is a subset where abundance is returned with different scales. These are marked in the column `Cover code` as follows: \newline \newline *x_BA* - Basal Area *x_IC* - Individual count @@ -313,7 +313,7 @@ Species abundance information varies across datasets and plots. While for the la *x* - Presence absence \newline \newline Still, it's not really intuitive that in case `Cover code` belongs to one of the classes above, then the actual abundance value is stored in the `x_` column. This stems from the way this data is stored in `TURBOVEG`. -To make the cover data more user friendly, I simplify the way cover is stored, so that there are only two columns: +To make the cover data more user friendly, I simplify the way cover it is stored, so that there are only two columns: `Ab_scale` - to report the type of scale used `Abundance` - to coalesce the cover\\abundance values previously in the columns `Cover %` and `x_`. @@ -410,8 +410,8 @@ nrow(scale_check)== length(unique(DT1$PlotObservationID)) table(scale_check$Ab_scale_combined) ``` - -Transform abundances to relative abundance. For consistency with the previous version of sPlot, this field is called `Relative cover`. +## Calculate species' relative covers in each plot +Transform abundances to relative abundance. For consistency with the previous version of sPlot, this field is called `Relative_cover`. *Watch out* - Even plots with p\\a information are transformed to relative cover. ```{r} DT1 <- DT1 %>% @@ -443,7 +443,9 @@ DT2 <- DT1 %>% Taxon_group=`Taxon group`, Cover_perc=`Cover %`, Cover_code=`Cover code`, - Relative_cover=Relative.cover) + Relative_cover=Relative.cover) %>% + ## change in Version 1.1. + dplyr::select(-x_, -Cover_perc) ``` The output of the DT table contains `r nrow(DT2)` records, over `r length(unique(DT2$PlotObservationID))` plots. The total number of taxa is `r length(unique(DT2$Species_original))` and `r length(unique(DT2$species))`, before and after standardization, respectively. Information on the `Taxon group` is available for `r DT2 %>% filter(Taxon_group!="Unknown") %>% distinct(Species) %>% nrow()` standardized species. @@ -457,16 +459,16 @@ knitr::kable(DT2 %>% ``` ## Field List -- PlotObservationID - Plot ID, as in `header` -- Species - Resolved species name, based on taxonomic backbone -- Species_original - Original species name, as provided by data contributor -- Rank_correct - Taxonomic rank at which the `species_original` was matched -- Taxon_group - Possible entries are: "Alga_Stonewort", "Lichen", "Moss", "Vascular plant","Unknown" -- Layer - Vegetation layer, as specified in Turboveg: 0 - No layer specified, 1 - Upper tree layer, 2 - Middle tree layer, 3 - Lower tree layer, 4 - Upper shrub layer, 5 - Lower shrub layer, 6 - Herb layer, 7 - Juvenile, 8 - Seedling, 9 - Moss layer. -- Cover_code - Cover value in original data, before transformation to percentage cover -- Ab_scale - Abundance scale in original data. Possible values are: CoverPerc: Cover Percentage, pa: Presence absence, x_BA: Basal Area, x_IC: Individual count, x_SC: Stem count, x_IV: Relative Importance, x_RF: Relative Frequency. -- Abundance - Abundance value, in original value, or as transformed from original `Cover code` to quantitative values. -- Relative_cover - Abundance of each species after being to normalized to 1 in each plot. +- `PlotObservationID` - Plot ID, as in `header`. +- `Species` - Resolved species name, based on taxonomic backbone +- `Species_original` - Original species name, as provided by data contributor. +- `Rank_correct` - Taxonomic rank at which `Species_original` was matched. +- `Taxon_group` - Possible entries are: *Alga_Stonewort*, *Lichen*, *Moss*, *Vascular plant*, *Unknown*. +- `Layer` - Vegetation layer, as specified in Turboveg: *0*: No layer specified, *1*: Upper tree layer, *2*: Middle tree layer, *3*: Lower tree layer, *4*: Upper shrub layer, *5*: Lower shrub layer, *6*: Herb layer, *7*: Juvenile, *8*: Seedling, *9*: Moss layer. +- `Cover_code` - Cover\\abundance value in original data, before transformation to percentage cover. +- `Ab_scale` - Abundance scale in original data. Possible values are: *CoverPerc*: Cover Percentage, *pa*: Presence absence, *x_BA*: Basal Area, *x_IC*: Individual count, *x_SC*: Stem count, *x_IV*: Relative Importance, *x_RF*: Relative Frequency. +- `Abundance` - Abundance value, in original value, or as transformed from original `Cover code` to quantitative values. +- `Relative_cover` - Abundance of each species after being normalized to 1 in each plot. ```{r} diff --git a/code/07_buildCWMs.Rmd b/code/07_buildCWMs.Rmd index 46718ee3a370cbabbfbeb07cb73590e27b979acb..0e128cc4199df289b873f0cf526107ffd91d3375 100644 --- a/code/07_buildCWMs.Rmd +++ b/code/07_buildCWMs.Rmd @@ -10,7 +10,6 @@ output: html_document  </center> - **Timestamp:** `r date()` **Drafted:** Francesco Maria Sabatini **Revised:** @@ -18,6 +17,8 @@ output: html_document This reports documents 1) the construction of Community Weighted Means (CWMs) and Variance (CWVs); and 2) the classification of plots into forest\\non-forest based on species growth forms. It complements species composition data from sPlot 3.0 and gap-filled plant functional traits from TRY 5.0, as received by [Jens Kattge](jkattge@bgc-jena.mpg.de) on Jan 21, 2020. +*Changes in version 1.1* - Standardized Growth form names in sPlot_traits. + ```{r results="hide", message=F, warning=F} library(tidyverse) library(readr) @@ -31,8 +32,10 @@ library(viridis) # 1 Data import, preparation and cleaning ```{r} -load("../_output/DT_sPlot3.0.RData") +#load("/data/sPlot/releases/sPlot3.0/DT_sPlot3.0.RData") +#load("/data/sPlot/releases/sPlot3.0/Backbone3.0.RData") load("../_output/Backbone3.0.RData") +load("../_output/DT_sPlot3.0.RData") ``` Import TRY data @@ -87,10 +90,10 @@ After attaching resolved names, TRY data contains information on `r try.species. Check for how many of the species in sPlot, trait information is available in TRY. ```{r} sPlot.species <- DT2 %>% - distinct(species) + distinct(Species) sPlot.in.TRY <- sPlot.species %>% - filter(species %in% (try.species.names %>% + filter(Species %in% (try.species.names %>% distinct(Name_short) %>% pull(Name_short))) ``` @@ -203,7 +206,7 @@ This results in the exclusion of `r length(unique(c(toexclude, toexclude2, toexc ## Calculate species level trait means and sd. try.species.means <- try.individuals %>% group_by(Name_short) %>% - #Add a field to indicate the number of observation per taxon + #Add a field to indicate the number of observations per taxon left_join(x={.} %>% summarize(n=n()), y={.} %>% @@ -246,10 +249,10 @@ try.combined.means <- try.genus.means %>% dplyr::select(Taxon_name, Rank_correct, everything()) total.matches <- DT2 %>% - distinct(species, Rank_correct) %>% + distinct(Species, Rank_correct) %>% left_join(try.combined.means %>% - dplyr::rename(species=Taxon_name), - by=c("species", "Rank_correct")) %>% + dplyr::rename(Species=Taxon_name), + by=c("Species", "Rank_correct")) %>% filter(!is.na(SLA_mean)) %>% nrow() ``` @@ -271,7 +274,9 @@ mysummary <- try.combined.means %>% separate(variable, sep="_", into=c("variable", "mean.sd", "stat")) %>% mutate(stat=factor(stat, levels=c("min", "q025", "q50", "q75", "max", "mean", "sd"))) %>% spread(key=stat, value=value) %>% - arrange(desc(Rank_correct)) + arrange(desc(Rank_correct)) %>% + mutate_at(.vars=vars(min:sd), + .funs=~round(.,3)) ``` @@ -297,16 +302,16 @@ combine.cover <- function(x){ } DT2.comb <- DT2 %>% - group_by(PlotObservationID, species, Rank_correct) %>% - summarize(Relative.cover=combine.cover(Relative.cover)) %>% + group_by(PlotObservationID, Species, Rank_correct) %>% + summarize(Relative_cover=combine.cover(Relative_cover)) %>% ungroup() %>% # re-normalize to 100% left_join(x=., y={.} %>% group_by(PlotObservationID) %>% - summarize(Tot.cover=sum(Relative.cover)), + summarize(Tot.cover=sum(Relative_cover)), by="PlotObservationID") %>% - mutate(Relative.cover=Relative.cover/Tot.cover) %>% + mutate(Relative_cover=Relative_cover/Tot.cover) %>% dplyr::select(-Tot.cover) ``` @@ -327,21 +332,21 @@ length(any_pa) CWM0 <- DT2.comb %>% filter(!PlotObservationID %in% any_pa) %>% left_join(try.combined.means %>% - dplyr::rename(species=Taxon_name) %>% - dplyr::select(species, Rank_correct, ends_with("_mean")), - by=c("species", "Rank_correct")) + dplyr::rename(Species=Taxon_name) %>% + dplyr::select(Species, Rank_correct, ends_with("_mean")), + by=c("Species", "Rank_correct")) # Calculate CWM for each trait in each plot CWM1 <- CWM0 %>% group_by(PlotObservationID) %>% summarize_at(.vars= vars(StemDens_mean:LeafWaterCont_mean), - .funs = list(~weighted.mean(., Relative.cover, na.rm=T))) %>% + .funs = list(~weighted.mean(., Relative_cover, na.rm=T))) %>% dplyr::select(PlotObservationID, order(colnames(.))) %>% gather(key=variable, value=CWM, -PlotObservationID) # Calculate coverage for each trait in each plot CWM2 <- CWM0 %>% - mutate_at(.funs = list(~if_else(is.na(.),0,1) * Relative.cover), + mutate_at(.funs = list(~if_else(is.na(.),0,1) * Relative_cover), .vars = vars(StemDens_mean:LeafWaterCont_mean)) %>% group_by(PlotObservationID) %>% summarize_at(.vars= vars(StemDens_mean:LeafWaterCont_mean), @@ -369,7 +374,7 @@ variance2.fun <- function(trait, abu){ CWM3 <- CWM0 %>% group_by(PlotObservationID) %>% summarize_at(.vars= vars(StemDens_mean:LeafWaterCont_mean), - .funs = list(~variance2.fun(., Relative.cover))) %>% + .funs = list(~variance2.fun(., Relative_cover))) %>% dplyr::select(PlotObservationID, order(colnames(.))) %>% gather(key=variable, value=CWV, -PlotObservationID) @@ -501,17 +506,17 @@ We used different sources of information: **Step 1**: Attach growth form trait information to DT table. Growth form information derives from TRY ```{r} DT.gf <- DT2 %>% - filter(taxon_group=="Vascular plant") %>% + filter(Taxon_group=="Vascular plant") %>% #join with try names, using resolved species names as key left_join(try.species.names %>% dplyr::select(Name_short, GrowthForm) %>% - rename(species=Name_short) %>% - distinct(species, .keep_all=T), - by="species") %>% + rename(Species=Name_short) %>% + distinct(Species, .keep_all=T), + by="Species") %>% left_join(try.species.means %>% dplyr::select(Name_short, PlantHeight_mean) %>% - rename(species=Name_short), - by="species") + rename(Species=Name_short), + by="Species") # number of records withouth Growth Form info sum(is.na(DT.gf$GrowthForm)) ``` @@ -520,10 +525,11 @@ sum(is.na(DT.gf$GrowthForm)) ```{r} top.gf.nas <- DT.gf %>% filter(is.na(GrowthForm)) %>% - group_by(species) %>% + group_by(Species) %>% summarize(n=n()) %>% arrange(desc(n)) ``` + ```{r, eval=F, message=F} write_csv(top.gf.nas %>% filter(n>1000), @@ -537,8 +543,8 @@ gf.manual <- read_csv("../_derived/Species_missingGF_complete.csv") DT.gf <- DT.gf %>% left_join(gf.manual %>% - rename(GrowthForm.m=GrowthForm), - by="species") %>% + rename(GrowthForm.m=GrowthForm, Species=species), + by="Species") %>% mutate(GrowthForm=coalesce(GrowthForm, GrowthForm.m)) %>% dplyr::select(-GrowthForm.m) ``` @@ -561,7 +567,7 @@ all.gf <- all.gf0 %>% distinct(AccSpeciesName, OrigValueStr) %>% rename(GrowthForm0=OrigValueStr) %>% mutate(GrowthForm0=tolower(GrowthForm0)) %>% - filter(AccSpeciesName %in% sPlot.species$species) %>% + filter(AccSpeciesName %in% sPlot.species$Species) %>% mutate(GrowthForm_simplified= GrowthForm0) %>% mutate(GrowthForm_simplified=replace(GrowthForm_simplified, list=str_detect(GrowthForm0, @@ -602,8 +608,8 @@ table(all.gf$GrowthForm_simplified, exclude=NULL) #coalesce this info into DT.gf DT.gf <- DT.gf %>% left_join(all.gf %>% - rename(species=AccSpeciesName), - by="species") %>% + rename(Species=AccSpeciesName), + by="Species") %>% mutate(GrowthForm=coalesce(GrowthForm, GrowthForm_simplified)) %>% dplyr::select(-GrowthForm_simplified) ``` @@ -613,12 +619,12 @@ DT.gf <- DT.gf %>% other.trees <- DT.gf %>% filter(Layer==1 & is.na(GrowthForm)) %>% filter(Rank_correct=="species") %>% - distinct(species, Layer, GrowthForm) %>% - pull(species) + distinct(Species, Layer, GrowthForm) %>% + pull(Species) DT.gf <- DT.gf %>% mutate(GrowthForm=replace(GrowthForm, - list=species %in% other.trees, + list=Species %in% other.trees, values="tree")) ``` After cross-matching, the number of records without growth form information decresead to `r sum(is.na(DT.gf$GrowthForm))`. @@ -626,7 +632,7 @@ After cross-matching, the number of records without growth form information decr ```{r, echo=F} knitr::kable(DT.gf %>% - distinct(species, GrowthForm, PlantHeight_mean) %>% + distinct(Species, GrowthForm, PlantHeight_mean) %>% group_by(GrowthForm) %>% summarize(Height=mean(PlantHeight_mean, na.rm=T)), caption="Average height per growth form", digits = 3) %>% @@ -639,8 +645,8 @@ Define a species as `is.tree.or.tall.shrub` when it is either defined as tree, O Define a species as `is.not.tree.or.shrub.and.small` when it has a height <10, as long as it's not defined a tree. When height is not available, it is sufficient that the species is classified as "herb" or "other". ```{r} GF <- DT.gf %>% - distinct(species, GrowthForm, PlantHeight_mean) %>% - mutate(GrowthForm=fct_collape(GrowthForm, + distinct(Species, GrowthForm, PlantHeight_mean) %>% + mutate(GrowthForm=fct_collapse(GrowthForm, "herb/shrub"=c("herb\\shrub","herb/shrub"), "shrub/tree"=c("shrub/tree", "shrub\\tree"))) %>% ## define is.tree.or.tall @@ -731,11 +737,11 @@ plot.vegtype2 <- DT.gf %>% filter(Layer %in% c(1,2,3)) %>% # first sum the cover of all species in a layer group_by(PlotObservationID, Layer) %>% - summarize(cover_perc=sum(cover_perc)) %>% + summarize(Cover_perc=sum(Abundance)) %>% # then combine cover across layers group_by(PlotObservationID) %>% - summarize(cover_perc=combine.cover(cover_perc)) %>% - mutate(is.forest=cover_perc>=25) + summarize(Cover_perc=combine.cover(Cover_perc)) %>% + mutate(is.forest=Cover_perc>=25) table(plot.vegtype1 %>% dplyr::select(is.forest), exclude=NULL) @@ -746,11 +752,11 @@ plot.vegtype3 <- DT.gf %>% #filter plots where all records are recorded as percentage cover filter(PlotObservationID %in% mysel ) %>% # combine cover across layers - group_by(PlotObservationID, species) %>% - summarize(cover_perc=combine.cover(cover_perc)) %>% + group_by(PlotObservationID, Species) %>% + summarize(cover_perc=combine.cover(Abundance)) %>% ungroup() %>% # attach species Growth Form information - left_join(GF, by="species")%>% + left_join(GF, by="Species")%>% group_by(PlotObservationID) %>% summarize(cover_tree=sum(cover_perc*is.tree.or.tall.shrub, na.rm=T), cover_non_tree=sum(cover_perc*(!is.tree.or.tall.shrub), na.rm=T), @@ -860,15 +866,21 @@ The total number of plots with attribution to forest\\non-forest (either coming # 4 Export and update other objects ```{r} sPlot.traits <- sPlot.species %>% - arrange(species) %>% + arrange(Species) %>% left_join(GF %>% - dplyr::select(species, GrowthForm, is.tree.or.tall.shrub), - by="species") %>% + dplyr::select(Species, GrowthForm, is.tree.or.tall.shrub), + by="Species") %>% left_join(try.combined.means %>% - rename(species=Taxon_name), by="species") %>% + rename(Species=Taxon_name), by="Species") %>% + ## some entries are duplicated (both at species and Genus level) + ## Keep only genus-level averages + group_by(Species) %>% + arrange(desc(n)) %>% + slice(1) %>% + ungroup() %>% dplyr::select(-Rank_correct) -save(try.combined.means, CWM, sPlot.traits, file="../_output/Traits_CWMs_sPlot3.RData") +save(try.combined.means, CWM, sPlot.traits, trait.legend, file="../_output/Traits_CWMs_sPlot3.RData") header <- header %>% left_join(plot.vegtype %>%