Skip to content
Snippets Groups Projects
Commit cbb520c2 authored by Francesco Sabatini's avatar Francesco Sabatini
Browse files

Aligned 07 and 07

parent 3a47b40e
Branches
No related tags found
No related merge requests found
......@@ -17,12 +17,12 @@ output: html_document
This report documents the construction of the DT table for sPlot 3.0. It is based on dataset sPlot_3.0.2, received on 24/07/2019 from Stephan Hennekens.
Caution: Layer information is not available for each species in each plot. In case of missing information Layer is set to zero.
Caution: Layer information is not available for all species in each plot. In case of missing information Layer is set to zero.
*Changes in version 1.1* -
1) Added explanation of fields
2) Fixed `taxon_group` of Friesodielsia
*Changes in version 1.1*
1) Added explanation of fields
2) Fixed `taxon_group` of Friesodielsia
3) Only export the fields `Ab_scale` and `Abundance`
```{r results="hide", message=F, warning=F}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
......@@ -74,7 +74,7 @@ DT0 <- DT0 %>%
```
Species data include `r nrow(DT0)` species * plot records, across `r nplots` plots. Before taxonomic resolution, there are `r nspecies` species .
The DT table includes `r nrow(DT0)` species * plot records, across `r nplots` plots. Before taxonomic resolution, there are `r nspecies` species .
\newline
......@@ -301,9 +301,9 @@ DT1 %>%
slice(1:40)
```
## Calculate relative cover per layer per species in each plot
## Standardize abundance values
Species abundance information varies across datasets and plots. While for the large majority of plots abundance values are returned as percentage cover, there is a subset where abundance is returned with different scales. These are marked in the column `Cover code` as follows:
Species abundance information varies across datasets and plots. While for the large majority of plots abundance values are returned as percentage cover, there is a subset where abundance is returned with different scales. These are marked in the column `Cover code` as follows:
\newline \newline
*x_BA* - Basal Area
*x_IC* - Individual count
......@@ -313,7 +313,7 @@ Species abundance information varies across datasets and plots. While for the la
*x* - Presence absence
\newline \newline
Still, it's not really intuitive that in case `Cover code` belongs to one of the classes above, then the actual abundance value is stored in the `x_` column. This stems from the way this data is stored in `TURBOVEG`.
To make the cover data more user friendly, I simplify the way cover is stored, so that there are only two columns:
To make the cover data more user friendly, I simplify the way cover it is stored, so that there are only two columns:
`Ab_scale` - to report the type of scale used
`Abundance` - to coalesce the cover\\abundance values previously in the columns `Cover %` and `x_`.
......@@ -410,8 +410,8 @@ nrow(scale_check)== length(unique(DT1$PlotObservationID))
table(scale_check$Ab_scale_combined)
```
Transform abundances to relative abundance. For consistency with the previous version of sPlot, this field is called `Relative cover`.
## Calculate species' relative covers in each plot
Transform abundances to relative abundance. For consistency with the previous version of sPlot, this field is called `Relative_cover`.
*Watch out* - Even plots with p\\a information are transformed to relative cover.
```{r}
DT1 <- DT1 %>%
......@@ -443,7 +443,9 @@ DT2 <- DT1 %>%
Taxon_group=`Taxon group`,
Cover_perc=`Cover %`,
Cover_code=`Cover code`,
Relative_cover=Relative.cover)
Relative_cover=Relative.cover) %>%
## change in Version 1.1.
dplyr::select(-x_, -Cover_perc)
```
The output of the DT table contains `r nrow(DT2)` records, over `r length(unique(DT2$PlotObservationID))` plots. The total number of taxa is `r length(unique(DT2$Species_original))` and `r length(unique(DT2$species))`, before and after standardization, respectively. Information on the `Taxon group` is available for `r DT2 %>% filter(Taxon_group!="Unknown") %>% distinct(Species) %>% nrow()` standardized species.
......@@ -457,16 +459,16 @@ knitr::kable(DT2 %>%
```
## Field List
- PlotObservationID - Plot ID, as in `header`
- Species - Resolved species name, based on taxonomic backbone
- Species_original - Original species name, as provided by data contributor
- Rank_correct - Taxonomic rank at which the `species_original` was matched
- Taxon_group - Possible entries are: "Alga_Stonewort", "Lichen", "Moss", "Vascular plant","Unknown"
- Layer - Vegetation layer, as specified in Turboveg: 0 - No layer specified, 1 - Upper tree layer, 2 - Middle tree layer, 3 - Lower tree layer, 4 - Upper shrub layer, 5 - Lower shrub layer, 6 - Herb layer, 7 - Juvenile, 8 - Seedling, 9 - Moss layer.
- Cover_code - Cover value in original data, before transformation to percentage cover
- Ab_scale - Abundance scale in original data. Possible values are: CoverPerc: Cover Percentage, pa: Presence absence, x_BA: Basal Area, x_IC: Individual count, x_SC: Stem count, x_IV: Relative Importance, x_RF: Relative Frequency.
- Abundance - Abundance value, in original value, or as transformed from original `Cover code` to quantitative values.
- Relative_cover - Abundance of each species after being to normalized to 1 in each plot.
- `PlotObservationID` - Plot ID, as in `header`.
- `Species` - Resolved species name, based on taxonomic backbone
- `Species_original` - Original species name, as provided by data contributor.
- `Rank_correct` - Taxonomic rank at which `Species_original` was matched.
- `Taxon_group` - Possible entries are: *Alga_Stonewort*, *Lichen*, *Moss*, *Vascular plant*, *Unknown*.
- `Layer` - Vegetation layer, as specified in Turboveg: *0*: No layer specified, *1*: Upper tree layer, *2*: Middle tree layer, *3*: Lower tree layer, *4*: Upper shrub layer, *5*: Lower shrub layer, *6*: Herb layer, *7*: Juvenile, *8*: Seedling, *9*: Moss layer.
- `Cover_code` - Cover\\abundance value in original data, before transformation to percentage cover.
- `Ab_scale` - Abundance scale in original data. Possible values are: *CoverPerc*: Cover Percentage, *pa*: Presence absence, *x_BA*: Basal Area, *x_IC*: Individual count, *x_SC*: Stem count, *x_IV*: Relative Importance, *x_RF*: Relative Frequency.
- `Abundance` - Abundance value, in original value, or as transformed from original `Cover code` to quantitative values.
- `Relative_cover` - Abundance of each species after being normalized to 1 in each plot.
```{r}
......
......@@ -10,7 +10,6 @@ output: html_document
![](/data/sPlot/users/Francesco/_sPlot_Management/splot-long-rgb.png "sPlot Logo")
</center>
**Timestamp:** `r date()`
**Drafted:** Francesco Maria Sabatini
**Revised:**
......@@ -18,6 +17,8 @@ output: html_document
This reports documents 1) the construction of Community Weighted Means (CWMs) and Variance (CWVs); and 2) the classification of plots into forest\\non-forest based on species growth forms. It complements species composition data from sPlot 3.0 and gap-filled plant functional traits from TRY 5.0, as received by [Jens Kattge](jkattge@bgc-jena.mpg.de) on Jan 21, 2020.
*Changes in version 1.1* - Standardized Growth form names in sPlot_traits.
```{r results="hide", message=F, warning=F}
library(tidyverse)
library(readr)
......@@ -31,8 +32,10 @@ library(viridis)
# 1 Data import, preparation and cleaning
```{r}
load("../_output/DT_sPlot3.0.RData")
#load("/data/sPlot/releases/sPlot3.0/DT_sPlot3.0.RData")
#load("/data/sPlot/releases/sPlot3.0/Backbone3.0.RData")
load("../_output/Backbone3.0.RData")
load("../_output/DT_sPlot3.0.RData")
```
Import TRY data
......@@ -87,10 +90,10 @@ After attaching resolved names, TRY data contains information on `r try.species.
Check for how many of the species in sPlot, trait information is available in TRY.
```{r}
sPlot.species <- DT2 %>%
distinct(species)
distinct(Species)
sPlot.in.TRY <- sPlot.species %>%
filter(species %in% (try.species.names %>%
filter(Species %in% (try.species.names %>%
distinct(Name_short) %>%
pull(Name_short)))
```
......@@ -203,7 +206,7 @@ This results in the exclusion of `r length(unique(c(toexclude, toexclude2, toexc
## Calculate species level trait means and sd.
try.species.means <- try.individuals %>%
group_by(Name_short) %>%
#Add a field to indicate the number of observation per taxon
#Add a field to indicate the number of observations per taxon
left_join(x={.} %>%
summarize(n=n()),
y={.} %>%
......@@ -246,10 +249,10 @@ try.combined.means <- try.genus.means %>%
dplyr::select(Taxon_name, Rank_correct, everything())
total.matches <- DT2 %>%
distinct(species, Rank_correct) %>%
distinct(Species, Rank_correct) %>%
left_join(try.combined.means %>%
dplyr::rename(species=Taxon_name),
by=c("species", "Rank_correct")) %>%
dplyr::rename(Species=Taxon_name),
by=c("Species", "Rank_correct")) %>%
filter(!is.na(SLA_mean)) %>%
nrow()
```
......@@ -271,7 +274,9 @@ mysummary <- try.combined.means %>%
separate(variable, sep="_", into=c("variable", "mean.sd", "stat")) %>%
mutate(stat=factor(stat, levels=c("min", "q025", "q50", "q75", "max", "mean", "sd"))) %>%
spread(key=stat, value=value) %>%
arrange(desc(Rank_correct))
arrange(desc(Rank_correct)) %>%
mutate_at(.vars=vars(min:sd),
.funs=~round(.,3))
```
......@@ -297,16 +302,16 @@ combine.cover <- function(x){
}
DT2.comb <- DT2 %>%
group_by(PlotObservationID, species, Rank_correct) %>%
summarize(Relative.cover=combine.cover(Relative.cover)) %>%
group_by(PlotObservationID, Species, Rank_correct) %>%
summarize(Relative_cover=combine.cover(Relative_cover)) %>%
ungroup() %>%
# re-normalize to 100%
left_join(x=.,
y={.} %>%
group_by(PlotObservationID) %>%
summarize(Tot.cover=sum(Relative.cover)),
summarize(Tot.cover=sum(Relative_cover)),
by="PlotObservationID") %>%
mutate(Relative.cover=Relative.cover/Tot.cover) %>%
mutate(Relative_cover=Relative_cover/Tot.cover) %>%
dplyr::select(-Tot.cover)
```
......@@ -327,21 +332,21 @@ length(any_pa)
CWM0 <- DT2.comb %>%
filter(!PlotObservationID %in% any_pa) %>%
left_join(try.combined.means %>%
dplyr::rename(species=Taxon_name) %>%
dplyr::select(species, Rank_correct, ends_with("_mean")),
by=c("species", "Rank_correct"))
dplyr::rename(Species=Taxon_name) %>%
dplyr::select(Species, Rank_correct, ends_with("_mean")),
by=c("Species", "Rank_correct"))
# Calculate CWM for each trait in each plot
CWM1 <- CWM0 %>%
group_by(PlotObservationID) %>%
summarize_at(.vars= vars(StemDens_mean:LeafWaterCont_mean),
.funs = list(~weighted.mean(., Relative.cover, na.rm=T))) %>%
.funs = list(~weighted.mean(., Relative_cover, na.rm=T))) %>%
dplyr::select(PlotObservationID, order(colnames(.))) %>%
gather(key=variable, value=CWM, -PlotObservationID)
# Calculate coverage for each trait in each plot
CWM2 <- CWM0 %>%
mutate_at(.funs = list(~if_else(is.na(.),0,1) * Relative.cover),
mutate_at(.funs = list(~if_else(is.na(.),0,1) * Relative_cover),
.vars = vars(StemDens_mean:LeafWaterCont_mean)) %>%
group_by(PlotObservationID) %>%
summarize_at(.vars= vars(StemDens_mean:LeafWaterCont_mean),
......@@ -369,7 +374,7 @@ variance2.fun <- function(trait, abu){
CWM3 <- CWM0 %>%
group_by(PlotObservationID) %>%
summarize_at(.vars= vars(StemDens_mean:LeafWaterCont_mean),
.funs = list(~variance2.fun(., Relative.cover))) %>%
.funs = list(~variance2.fun(., Relative_cover))) %>%
dplyr::select(PlotObservationID, order(colnames(.))) %>%
gather(key=variable, value=CWV, -PlotObservationID)
......@@ -501,17 +506,17 @@ We used different sources of information:
**Step 1**: Attach growth form trait information to DT table. Growth form information derives from TRY
```{r}
DT.gf <- DT2 %>%
filter(taxon_group=="Vascular plant") %>%
filter(Taxon_group=="Vascular plant") %>%
#join with try names, using resolved species names as key
left_join(try.species.names %>%
dplyr::select(Name_short, GrowthForm) %>%
rename(species=Name_short) %>%
distinct(species, .keep_all=T),
by="species") %>%
rename(Species=Name_short) %>%
distinct(Species, .keep_all=T),
by="Species") %>%
left_join(try.species.means %>%
dplyr::select(Name_short, PlantHeight_mean) %>%
rename(species=Name_short),
by="species")
rename(Species=Name_short),
by="Species")
# number of records withouth Growth Form info
sum(is.na(DT.gf$GrowthForm))
```
......@@ -520,10 +525,11 @@ sum(is.na(DT.gf$GrowthForm))
```{r}
top.gf.nas <- DT.gf %>%
filter(is.na(GrowthForm)) %>%
group_by(species) %>%
group_by(Species) %>%
summarize(n=n()) %>%
arrange(desc(n))
```
```{r, eval=F, message=F}
write_csv(top.gf.nas %>%
filter(n>1000),
......@@ -537,8 +543,8 @@ gf.manual <- read_csv("../_derived/Species_missingGF_complete.csv")
DT.gf <- DT.gf %>%
left_join(gf.manual %>%
rename(GrowthForm.m=GrowthForm),
by="species") %>%
rename(GrowthForm.m=GrowthForm, Species=species),
by="Species") %>%
mutate(GrowthForm=coalesce(GrowthForm, GrowthForm.m)) %>%
dplyr::select(-GrowthForm.m)
```
......@@ -561,7 +567,7 @@ all.gf <- all.gf0 %>%
distinct(AccSpeciesName, OrigValueStr) %>%
rename(GrowthForm0=OrigValueStr) %>%
mutate(GrowthForm0=tolower(GrowthForm0)) %>%
filter(AccSpeciesName %in% sPlot.species$species) %>%
filter(AccSpeciesName %in% sPlot.species$Species) %>%
mutate(GrowthForm_simplified= GrowthForm0) %>%
mutate(GrowthForm_simplified=replace(GrowthForm_simplified,
list=str_detect(GrowthForm0,
......@@ -602,8 +608,8 @@ table(all.gf$GrowthForm_simplified, exclude=NULL)
#coalesce this info into DT.gf
DT.gf <- DT.gf %>%
left_join(all.gf %>%
rename(species=AccSpeciesName),
by="species") %>%
rename(Species=AccSpeciesName),
by="Species") %>%
mutate(GrowthForm=coalesce(GrowthForm, GrowthForm_simplified)) %>%
dplyr::select(-GrowthForm_simplified)
```
......@@ -613,12 +619,12 @@ DT.gf <- DT.gf %>%
other.trees <- DT.gf %>%
filter(Layer==1 & is.na(GrowthForm)) %>%
filter(Rank_correct=="species") %>%
distinct(species, Layer, GrowthForm) %>%
pull(species)
distinct(Species, Layer, GrowthForm) %>%
pull(Species)
DT.gf <- DT.gf %>%
mutate(GrowthForm=replace(GrowthForm,
list=species %in% other.trees,
list=Species %in% other.trees,
values="tree"))
```
After cross-matching, the number of records without growth form information decresead to `r sum(is.na(DT.gf$GrowthForm))`.
......@@ -626,7 +632,7 @@ After cross-matching, the number of records without growth form information decr
```{r, echo=F}
knitr::kable(DT.gf %>%
distinct(species, GrowthForm, PlantHeight_mean) %>%
distinct(Species, GrowthForm, PlantHeight_mean) %>%
group_by(GrowthForm) %>%
summarize(Height=mean(PlantHeight_mean, na.rm=T)),
caption="Average height per growth form", digits = 3) %>%
......@@ -639,8 +645,8 @@ Define a species as `is.tree.or.tall.shrub` when it is either defined as tree, O
Define a species as `is.not.tree.or.shrub.and.small` when it has a height <10, as long as it's not defined a tree. When height is not available, it is sufficient that the species is classified as "herb" or "other".
```{r}
GF <- DT.gf %>%
distinct(species, GrowthForm, PlantHeight_mean) %>%
mutate(GrowthForm=fct_collape(GrowthForm,
distinct(Species, GrowthForm, PlantHeight_mean) %>%
mutate(GrowthForm=fct_collapse(GrowthForm,
"herb/shrub"=c("herb\\shrub","herb/shrub"),
"shrub/tree"=c("shrub/tree", "shrub\\tree"))) %>%
## define is.tree.or.tall
......@@ -731,11 +737,11 @@ plot.vegtype2 <- DT.gf %>%
filter(Layer %in% c(1,2,3)) %>%
# first sum the cover of all species in a layer
group_by(PlotObservationID, Layer) %>%
summarize(cover_perc=sum(cover_perc)) %>%
summarize(Cover_perc=sum(Abundance)) %>%
# then combine cover across layers
group_by(PlotObservationID) %>%
summarize(cover_perc=combine.cover(cover_perc)) %>%
mutate(is.forest=cover_perc>=25)
summarize(Cover_perc=combine.cover(Cover_perc)) %>%
mutate(is.forest=Cover_perc>=25)
table(plot.vegtype1 %>% dplyr::select(is.forest), exclude=NULL)
......@@ -746,11 +752,11 @@ plot.vegtype3 <- DT.gf %>%
#filter plots where all records are recorded as percentage cover
filter(PlotObservationID %in% mysel ) %>%
# combine cover across layers
group_by(PlotObservationID, species) %>%
summarize(cover_perc=combine.cover(cover_perc)) %>%
group_by(PlotObservationID, Species) %>%
summarize(cover_perc=combine.cover(Abundance)) %>%
ungroup() %>%
# attach species Growth Form information
left_join(GF, by="species")%>%
left_join(GF, by="Species")%>%
group_by(PlotObservationID) %>%
summarize(cover_tree=sum(cover_perc*is.tree.or.tall.shrub, na.rm=T),
cover_non_tree=sum(cover_perc*(!is.tree.or.tall.shrub), na.rm=T),
......@@ -860,15 +866,21 @@ The total number of plots with attribution to forest\\non-forest (either coming
# 4 Export and update other objects
```{r}
sPlot.traits <- sPlot.species %>%
arrange(species) %>%
arrange(Species) %>%
left_join(GF %>%
dplyr::select(species, GrowthForm, is.tree.or.tall.shrub),
by="species") %>%
dplyr::select(Species, GrowthForm, is.tree.or.tall.shrub),
by="Species") %>%
left_join(try.combined.means %>%
rename(species=Taxon_name), by="species") %>%
rename(Species=Taxon_name), by="Species") %>%
## some entries are duplicated (both at species and Genus level)
## Keep only genus-level averages
group_by(Species) %>%
arrange(desc(n)) %>%
slice(1) %>%
ungroup() %>%
dplyr::select(-Rank_correct)
save(try.combined.means, CWM, sPlot.traits, file="../_output/Traits_CWMs_sPlot3.RData")
save(try.combined.means, CWM, sPlot.traits, trait.legend, file="../_output/Traits_CWMs_sPlot3.RData")
header <- header %>%
left_join(plot.vegtype %>%
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment