Aligned 07 and 07

cbb520c2 · Francesco Sabatini · 3a47b40e · cbb520c2 · cbb520c2
Commit cbb520c2 authored 4 years ago by Francesco Sabatini
--- a/code/06_buildDT.Rmd
+++ b/code/06_buildDT.Rmd
@@ -17,12 +17,12 @@ output: html_document

 This report documents the construction of the DT table for sPlot 3.0. It is based on dataset sPlot_3.0.2, received on 24/07/2019 from Stephan Hennekens.    
  
-Caution: Layer information is not available for each species in each plot. In case of missing information Layer is set to zero.   
+Caution: Layer information is not available for all species in each plot. In case of missing information Layer is set to zero.   
  
-*Changes in version 1.1* -  
-1) Added explanation of fields
-2) Fixed `taxon_group` of Friesodielsia
-
+*Changes in version 1.1*  
+1) Added explanation of fields  
+2) Fixed `taxon_group` of Friesodielsia  
+3) Only export the fields `Ab_scale` and `Abundance`  
 ```{r results="hide", message=F, warning=F}
 knitr::opts_chunk$set(echo = TRUE)
 library(tidyverse)
@@ -74,7 +74,7 @@ DT0 <- DT0 %>%
 ```


-Species data include `r nrow(DT0)` species * plot records, across `r nplots` plots. Before taxonomic resolution, there are `r nspecies` species .  
+The DT table includes `r nrow(DT0)` species * plot records, across `r nplots` plots. Before taxonomic resolution, there are `r nspecies` species .  
 \newline


@@ -301,9 +301,9 @@ DT1 %>%
    slice(1:40)
 ```

-## Calculate relative cover per layer per species in each plot
+## Standardize abundance values

-Species abundance information varies across datasets and plots. While for the large majority of plots abundance values are returned as percentage cover, there is a subset where abundance is returned with different scales. These are marked in the column `Cover code` as follows:
+Species abundance information varies across datasets and plots. While for the large majority of plots abundance values are returned as percentage cover, there is a subset where abundance is returned with different scales. These are marked in the column `Cover code` as follows:  
 \newline \newline
 *x_BA* - Basal Area  
 *x_IC* - Individual count  
@@ -313,7 +313,7 @@ Species abundance information varies across datasets and plots. While for the la
 *x* - Presence absence  
 \newline \newline
 Still, it's not really intuitive that in case `Cover code` belongs to one of the classes above, then the actual abundance value is stored in the `x_` column. This stems from the way this data is stored in `TURBOVEG`.  
-To make the cover data more user friendly, I simplify the way cover is stored, so that there are only two columns:  
+To make the cover data more user friendly, I simplify the way cover it is stored, so that there are only two columns:  
 `Ab_scale` - to report the type of scale used  
 `Abundance` - to coalesce the cover\\abundance values previously in the columns `Cover %` and `x_`.  

@@ -410,8 +410,8 @@ nrow(scale_check)== length(unique(DT1$PlotObservationID))
 table(scale_check$Ab_scale_combined)
 ```

-
-Transform abundances to relative abundance. For consistency with the previous version of sPlot, this field is called `Relative cover`.  
+## Calculate species' relative covers in each plot
+Transform abundances to relative abundance. For consistency with the previous version of sPlot, this field is called `Relative_cover`.  
 *Watch out* - Even plots with p\\a information are transformed to relative cover.  
 ```{r}
 DT1 <- DT1 %>% 
@@ -443,7 +443,9 @@ DT2 <- DT1 %>%
         Taxon_group=`Taxon group`, 
         Cover_perc=`Cover %`, 
         Cover_code=`Cover code`, 
-         Relative_cover=Relative.cover)
+         Relative_cover=Relative.cover) %>% 
+  ## change in Version 1.1.
+  dplyr::select(-x_, -Cover_perc)
 ```

 The output of the DT table contains `r nrow(DT2)` records, over `r length(unique(DT2$PlotObservationID))` plots. The total number of taxa is `r length(unique(DT2$Species_original))` and `r length(unique(DT2$species))`, before and after standardization, respectively. Information on the `Taxon group` is available for `r DT2 %>% filter(Taxon_group!="Unknown") %>% distinct(Species) %>% nrow()` standardized species.
@@ -457,16 +459,16 @@ knitr::kable(DT2 %>%
 ```
 ## Field List

- PlotObservationID - Plot ID, as in `header`  
- Species - Resolved species name, based on taxonomic backbone  
- Species_original - Original species name, as provided by data contributor  
- Rank_correct - Taxonomic rank at which the `species_original` was matched  
- Taxon_group - Possible entries are: "Alga_Stonewort", "Lichen", "Moss", "Vascular plant","Unknown"  
- Layer - Vegetation layer, as specified in Turboveg: 0 - No layer specified, 1 - Upper tree layer, 2 - Middle tree layer, 3 - Lower tree layer, 4 - Upper shrub layer, 5 - Lower shrub layer, 6 - Herb layer, 7 - Juvenile, 8 - Seedling, 9 - Moss layer.  
- Cover_code - Cover value in original data, before transformation to percentage cover  
- Ab_scale - Abundance scale in original data. Possible values are: CoverPerc: Cover Percentage, pa: Presence absence, x_BA: Basal Area, x_IC: Individual count, x_SC: Stem count, x_IV: Relative Importance, x_RF: Relative Frequency.  
- Abundance - Abundance value, in original value, or as transformed from original `Cover code` to quantitative values.  
- Relative_cover - Abundance of each species after being to normalized to 1 in each plot.  
+- `PlotObservationID` - Plot ID, as in `header`.  
+- `Species` - Resolved species name, based on taxonomic backbone  
+- `Species_original` - Original species name, as provided by data contributor.  
+- `Rank_correct` - Taxonomic rank at which `Species_original` was matched.  
+- `Taxon_group` - Possible entries are: *Alga_Stonewort*, *Lichen*, *Moss*, *Vascular plant*, *Unknown*.  
+- `Layer` - Vegetation layer, as specified in Turboveg: *0*: No layer specified, *1*: Upper tree layer, *2*: Middle tree layer, *3*: Lower tree layer, *4*: Upper shrub layer, *5*: Lower shrub layer, *6*: Herb layer, *7*: Juvenile, *8*: Seedling, *9*: Moss layer.  
+- `Cover_code` - Cover\\abundance value in original data, before transformation to percentage cover.  
+- `Ab_scale` - Abundance scale in original data. Possible values are: *CoverPerc*: Cover Percentage, *pa*: Presence absence, *x_BA*: Basal Area, *x_IC*: Individual count, *x_SC*: Stem count, *x_IV*: Relative Importance, *x_RF*: Relative Frequency.  
+- `Abundance` - Abundance value, in original value, or as transformed from original `Cover code` to quantitative values.  
+- `Relative_cover` - Abundance of each species after being normalized to 1 in each plot.  


 ```{r}

--- a/code/07_buildCWMs.Rmd
+++ b/code/07_buildCWMs.Rmd
@@ -10,7 +10,6 @@ output: html_document
  ![](/data/sPlot/users/Francesco/_sPlot_Management/splot-long-rgb.png "sPlot Logo")
 </center>

-  
 **Timestamp:** `r date()`  
 **Drafted:** Francesco Maria Sabatini  
 **Revised:**  
@@ -18,6 +17,8 @@ output: html_document
  
 This reports documents 1) the construction of Community Weighted Means (CWMs) and Variance (CWVs); and 2) the classification of plots into forest\\non-forest based on species growth forms. It complements species composition data from sPlot 3.0 and gap-filled plant functional traits from TRY 5.0, as received by [Jens Kattge](jkattge@bgc-jena.mpg.de) on Jan 21, 2020. 

+*Changes in version 1.1* - Standardized Growth form names in sPlot_traits.  
+
 ```{r results="hide", message=F, warning=F}
 library(tidyverse)
 library(readr)
@@ -31,8 +32,10 @@ library(viridis)

 # 1 Data import, preparation and cleaning 
 ```{r}
-load("../_output/DT_sPlot3.0.RData")
+#load("/data/sPlot/releases/sPlot3.0/DT_sPlot3.0.RData")
+#load("/data/sPlot/releases/sPlot3.0/Backbone3.0.RData")
 load("../_output/Backbone3.0.RData")
+load("../_output/DT_sPlot3.0.RData")
 ```

 Import TRY data
@@ -87,10 +90,10 @@ After attaching resolved names, TRY data contains information on `r try.species.
 Check for how many of the species in sPlot, trait information is available in TRY.
 ```{r}
 sPlot.species <- DT2 %>% 
-  distinct(species) 
+  distinct(Species) 

 sPlot.in.TRY <- sPlot.species %>% 
-  filter(species %in% (try.species.names %>% 
+  filter(Species %in% (try.species.names %>% 
                                  distinct(Name_short) %>% 
                                  pull(Name_short))) 
 ```
@@ -203,7 +206,7 @@ This results in the exclusion of `r length(unique(c(toexclude, toexclude2, toexc
 ## Calculate species level trait means and sd. 
 try.species.means <- try.individuals %>% 
  group_by(Name_short) %>% 
-  #Add a field to indicate the number of observation per taxon
+  #Add a field to indicate the number of observations per taxon
  left_join(x={.} %>% 
              summarize(n=n()), 
            y={.} %>% 
@@ -246,10 +249,10 @@ try.combined.means <- try.genus.means %>%
  dplyr::select(Taxon_name, Rank_correct, everything())

 total.matches <- DT2 %>%
-  distinct(species, Rank_correct) %>%
+  distinct(Species, Rank_correct) %>%
  left_join(try.combined.means %>%
-              dplyr::rename(species=Taxon_name), 
-            by=c("species", "Rank_correct")) %>% 
+              dplyr::rename(Species=Taxon_name), 
+            by=c("Species", "Rank_correct")) %>% 
  filter(!is.na(SLA_mean)) %>% 
  nrow()
 ```
@@ -271,7 +274,9 @@ mysummary <- try.combined.means %>%
  separate(variable, sep="_", into=c("variable", "mean.sd", "stat")) %>% 
  mutate(stat=factor(stat, levels=c("min", "q025", "q50", "q75", "max", "mean", "sd"))) %>% 
  spread(key=stat, value=value) %>% 
-  arrange(desc(Rank_correct))
+  arrange(desc(Rank_correct)) %>% 
+  mutate_at(.vars=vars(min:sd),
+            .funs=~round(.,3))
  
 ```

@@ -297,16 +302,16 @@ combine.cover <- function(x){
 }

 DT2.comb <- DT2 %>% 
-  group_by(PlotObservationID, species, Rank_correct) %>% 
-  summarize(Relative.cover=combine.cover(Relative.cover)) %>%
+  group_by(PlotObservationID, Species, Rank_correct) %>% 
+  summarize(Relative_cover=combine.cover(Relative_cover)) %>%
  ungroup() %>% 
  # re-normalize to 100%
  left_join(x=., 
            y={.} %>% 
              group_by(PlotObservationID) %>% 
-              summarize(Tot.cover=sum(Relative.cover)), 
+              summarize(Tot.cover=sum(Relative_cover)), 
            by="PlotObservationID") %>% 
-  mutate(Relative.cover=Relative.cover/Tot.cover) %>% 
+  mutate(Relative_cover=Relative_cover/Tot.cover) %>% 
  dplyr::select(-Tot.cover)

 ```
@@ -327,21 +332,21 @@ length(any_pa)
 CWM0 <- DT2.comb %>%
  filter(!PlotObservationID %in% any_pa) %>% 
  left_join(try.combined.means %>%
-              dplyr::rename(species=Taxon_name) %>% 
-              dplyr::select(species, Rank_correct, ends_with("_mean")), 
-            by=c("species", "Rank_correct"))
+              dplyr::rename(Species=Taxon_name) %>% 
+              dplyr::select(Species, Rank_correct, ends_with("_mean")), 
+            by=c("Species", "Rank_correct"))

 # Calculate CWM for each trait in each plot
 CWM1 <- CWM0 %>%
  group_by(PlotObservationID) %>%
  summarize_at(.vars= vars(StemDens_mean:LeafWaterCont_mean),
-               .funs = list(~weighted.mean(., Relative.cover, na.rm=T))) %>%
+               .funs = list(~weighted.mean(., Relative_cover, na.rm=T))) %>%
  dplyr::select(PlotObservationID, order(colnames(.))) %>%
  gather(key=variable, value=CWM, -PlotObservationID)

 # Calculate coverage for each trait in each plot
 CWM2 <- CWM0 %>%
-  mutate_at(.funs = list(~if_else(is.na(.),0,1) * Relative.cover), 
+  mutate_at(.funs = list(~if_else(is.na(.),0,1) * Relative_cover), 
            .vars = vars(StemDens_mean:LeafWaterCont_mean)) %>%
  group_by(PlotObservationID) %>%
  summarize_at(.vars= vars(StemDens_mean:LeafWaterCont_mean),
@@ -369,7 +374,7 @@ variance2.fun <- function(trait, abu){
 CWM3 <- CWM0 %>%
  group_by(PlotObservationID) %>%
  summarize_at(.vars= vars(StemDens_mean:LeafWaterCont_mean),
-               .funs = list(~variance2.fun(., Relative.cover))) %>%
+               .funs = list(~variance2.fun(., Relative_cover))) %>%
  dplyr::select(PlotObservationID, order(colnames(.))) %>%
  gather(key=variable, value=CWV, -PlotObservationID)

@@ -501,17 +506,17 @@ We used different sources of information:
 **Step 1**: Attach growth form trait information to DT table. Growth form information derives from TRY
 ```{r}
 DT.gf <- DT2 %>% 
-  filter(taxon_group=="Vascular plant") %>% 
+  filter(Taxon_group=="Vascular plant") %>% 
  #join with try names, using resolved species names as key
  left_join(try.species.names %>% 
              dplyr::select(Name_short, GrowthForm) %>% 
-              rename(species=Name_short) %>% 
-              distinct(species, .keep_all=T), 
-            by="species") %>% 
+              rename(Species=Name_short) %>% 
+              distinct(Species, .keep_all=T), 
+            by="Species") %>% 
  left_join(try.species.means %>% 
              dplyr::select(Name_short, PlantHeight_mean) %>% 
-              rename(species=Name_short), 
-            by="species")
+              rename(Species=Name_short), 
+            by="Species")
 # number of records withouth Growth Form info
 sum(is.na(DT.gf$GrowthForm))
 ```
@@ -520,10 +525,11 @@ sum(is.na(DT.gf$GrowthForm))
 ```{r}
 top.gf.nas <- DT.gf %>% 
  filter(is.na(GrowthForm)) %>% 
-  group_by(species) %>% 
+  group_by(Species) %>% 
  summarize(n=n()) %>% 
  arrange(desc(n))
 ```
+
 ```{r, eval=F, message=F}
 write_csv(top.gf.nas %>% 
            filter(n>1000), 
@@ -537,8 +543,8 @@ gf.manual <- read_csv("../_derived/Species_missingGF_complete.csv")

 DT.gf <- DT.gf %>% 
  left_join(gf.manual %>% 
-              rename(GrowthForm.m=GrowthForm),
-            by="species") %>% 
+              rename(GrowthForm.m=GrowthForm, Species=species),
+            by="Species") %>% 
   mutate(GrowthForm=coalesce(GrowthForm, GrowthForm.m)) %>% 
   dplyr::select(-GrowthForm.m)
 ```
@@ -561,7 +567,7 @@ all.gf <- all.gf0 %>%
  distinct(AccSpeciesName, OrigValueStr) %>% 
  rename(GrowthForm0=OrigValueStr) %>% 
  mutate(GrowthForm0=tolower(GrowthForm0)) %>%
-  filter(AccSpeciesName %in% sPlot.species$species) %>% 
+  filter(AccSpeciesName %in% sPlot.species$Species) %>% 
  mutate(GrowthForm_simplified= GrowthForm0) %>% 
  mutate(GrowthForm_simplified=replace(GrowthForm_simplified, 
                                       list=str_detect(GrowthForm0,
@@ -602,8 +608,8 @@ table(all.gf$GrowthForm_simplified, exclude=NULL)
 #coalesce this info into DT.gf
 DT.gf <- DT.gf %>% 
  left_join(all.gf %>% 
-              rename(species=AccSpeciesName), 
-            by="species") %>% 
+              rename(Species=AccSpeciesName), 
+            by="Species") %>% 
  mutate(GrowthForm=coalesce(GrowthForm, GrowthForm_simplified)) %>% 
  dplyr::select(-GrowthForm_simplified)
 ```
@@ -613,12 +619,12 @@ DT.gf <- DT.gf %>%
 other.trees <- DT.gf %>% 
  filter(Layer==1 & is.na(GrowthForm)) %>% 
  filter(Rank_correct=="species") %>% 
-  distinct(species, Layer, GrowthForm) %>% 
-  pull(species)
+  distinct(Species, Layer, GrowthForm) %>% 
+  pull(Species)

 DT.gf <- DT.gf %>% 
  mutate(GrowthForm=replace(GrowthForm, 
-                            list=species %in% other.trees, 
+                            list=Species %in% other.trees, 
                            values="tree"))
 ```
 After cross-matching, the number of records without growth form information decresead to `r sum(is.na(DT.gf$GrowthForm))`.  
@@ -626,7 +632,7 @@ After cross-matching, the number of records without growth form information decr

 ```{r, echo=F}
 knitr::kable(DT.gf %>% 
-              distinct(species, GrowthForm, PlantHeight_mean) %>% 
+              distinct(Species, GrowthForm, PlantHeight_mean) %>% 
              group_by(GrowthForm) %>% 
              summarize(Height=mean(PlantHeight_mean, na.rm=T)), 
  caption="Average height per growth form", digits = 3) %>%
@@ -639,8 +645,8 @@ Define a species as `is.tree.or.tall.shrub` when it is either defined as tree, O
 Define a species as `is.not.tree.or.shrub.and.small` when it has a height <10, as long as it's not defined a tree. When height is not available, it is sufficient that the species is classified as "herb" or "other".
 ```{r}
 GF <- DT.gf %>% 
-  distinct(species, GrowthForm, PlantHeight_mean) %>% 
-  mutate(GrowthForm=fct_collape(GrowthForm, 
+  distinct(Species, GrowthForm, PlantHeight_mean) %>% 
+  mutate(GrowthForm=fct_collapse(GrowthForm, 
                                 "herb/shrub"=c("herb\\shrub","herb/shrub"), 
                                 "shrub/tree"=c("shrub/tree", "shrub\\tree"))) %>% 
  ## define is.tree.or.tall
@@ -731,11 +737,11 @@ plot.vegtype2 <- DT.gf %>%
  filter(Layer %in% c(1,2,3)) %>% 
  # first sum the cover of all species in a layer
  group_by(PlotObservationID, Layer) %>% 
-  summarize(cover_perc=sum(cover_perc)) %>% 
+  summarize(Cover_perc=sum(Abundance)) %>% 
  # then combine cover across layers
  group_by(PlotObservationID) %>% 
-  summarize(cover_perc=combine.cover(cover_perc)) %>% 
-  mutate(is.forest=cover_perc>=25) 
+  summarize(Cover_perc=combine.cover(Cover_perc)) %>% 
+  mutate(is.forest=Cover_perc>=25) 

 table(plot.vegtype1 %>% dplyr::select(is.forest), exclude=NULL)

@@ -746,11 +752,11 @@ plot.vegtype3 <- DT.gf %>%
  #filter plots where all records are recorded as percentage cover
  filter(PlotObservationID %in% mysel ) %>% 
  # combine cover across layers
-  group_by(PlotObservationID, species) %>%
-  summarize(cover_perc=combine.cover(cover_perc)) %>%
+  group_by(PlotObservationID, Species) %>%
+  summarize(cover_perc=combine.cover(Abundance)) %>%
  ungroup() %>% 
  # attach species Growth Form information
-  left_join(GF, by="species")%>% 
+  left_join(GF, by="Species")%>% 
  group_by(PlotObservationID) %>% 
  summarize(cover_tree=sum(cover_perc*is.tree.or.tall.shrub, na.rm=T), 
            cover_non_tree=sum(cover_perc*(!is.tree.or.tall.shrub), na.rm=T), 
@@ -860,15 +866,21 @@ The total number of plots with attribution to forest\\non-forest (either coming
 # 4 Export and update other objects
 ```{r}
 sPlot.traits <- sPlot.species %>% 
-  arrange(species) %>% 
+  arrange(Species) %>% 
  left_join(GF %>% 
-              dplyr::select(species, GrowthForm, is.tree.or.tall.shrub), 
-            by="species") %>% 
+              dplyr::select(Species, GrowthForm, is.tree.or.tall.shrub), 
+            by="Species") %>% 
  left_join(try.combined.means %>% 
-              rename(species=Taxon_name), by="species") %>% 
+              rename(Species=Taxon_name), by="Species") %>% 
+  ## some entries are duplicated (both at species and Genus level)
+  ## Keep only genus-level averages
+  group_by(Species) %>% 
+  arrange(desc(n)) %>% 
+  slice(1) %>% 
+  ungroup() %>% 
  dplyr::select(-Rank_correct)
  
-save(try.combined.means, CWM, sPlot.traits, file="../_output/Traits_CWMs_sPlot3.RData")
+save(try.combined.means, CWM, sPlot.traits, trait.legend, file="../_output/Traits_CWMs_sPlot3.RData")

 header <- header %>% 
  left_join(plot.vegtype %>%