Finished v.0.1 of classification is.forest is.non.forest

07d19e4f · Francesco Sabatini · 943ef428 · 07d19e4f
Commit 07d19e4f authored 5 years ago by Francesco Sabatini
--- a/code/07_buildCWMs.Rmd
+++ b/code/07_buildCWMs.Rmd
@@ -26,6 +26,7 @@ library(data.table)
 library(knitr)
 library(kableExtra)
 library(stringr)
+library(caret)
 ```
 # Import data
@@ -77,12 +78,12 @@ There are `r nrow(try.species)` individual observations from `r nrow(try.species
 ## Attach resolved names from Backbone
 ```{r}
 try.species.names <- try.allinfo %>% 
-  dplyr::select(Species, Genus) %>% 
+  dplyr::select(Species, Genus, GrowthForm) %>% 
  left_join(Backbone %>% 
              dplyr::select(Name_sPlot_TRY, Name_short) %>% 
              rename(Species=Name_sPlot_TRY), 
            by="Species") %>% 
-  dplyr::select(Species, Name_short, Genus)
+  dplyr::select(Species, Name_short, Genus, GrowthForm)
 ```
 After attaching resolved names, TRY data contains information on `r try.species.names %>% distinct(Name_short) %>% nrow()` species.  
 \newline \newline
@@ -156,7 +157,7 @@ try.individuals <- try.individuals0 %>%
              rename_at(col.from, .funs=function(x) col.to)
 ```
+## Fix some known errors in the gap-filled matrix
 Check traits at the individual level. There are some traits with unexpected negative entries:
 ```{r}
 try.species.names %>% 
@@ -168,29 +169,41 @@ try.species.names %>%
  group_by(variable) %>% 
  summarize(n=n())
 ```
-According to Jens Kattge, the entries for `Leaf.delta.15N` are legitimate, while in the other cases, it may be due to bad predictions. He suggested to delete these negative records.
+According to Jens Kattge, the entries for `Leaf.delta.15N` are legitimate, while in the other cases, it may be due to bad predictions. He suggested to delete these negative records.  
+Similarly, there are records with impossible values for height. Some species incorrectly predicted to have height >100 meters, and some herbs predicted to have a height >10 m.  
 ```{r}
+try.individuals <- try.species.names %>% 
+  dplyr::select(Name_short) %>% 
+  bind_cols(try.individuals)
 toexclude <- try.individuals %>% 
-  gather(variable, value, -X1) %>% 
+  gather(variable, value, -X1, -Name_short) %>% 
  filter(variable != "Leaf.delta.15N") %>% 
  filter(value<0) %>% 
  pull(X1)
-try.individuals <- try.species.names %>% 
+toexclude2 <- try.individuals %>% 
-  dplyr::select(Name_short) %>% 
+  filter(PlantHeight>100  & (!Name_short %in% c("Pseudotsuga menziesii", "Sequoia sempervirens"))) %>% 
-  bind_cols(try.individuals) %>% 
+  pull(X1)
-  filter(!X1 %in% toexclude) %>% 
+toexclude3 <- try.individuals %>% 
+  filter(X1 %in% (try.allinfo %>% 
+                     filter(GrowthForm=="herb") %>% 
+                     pull(X1))) %>% 
+  filter(PlantHeight>10) %>% 
+  pull(X1)
+try.individuals <- try.individuals %>% 
+  filter(!X1 %in% c(toexclude, toexclude2, toexclude3)) %>% 
  dplyr::select(-X1)
 ```
-This results in the exclusion of `r length(toexclude)` individuals. In this way the total number of species included in TRY reduces to `r try.individuals %>% distinct(Name_short) %>% nrow()`
+This results in the exclusion of `r length(unique(c(toexclude, toexclude2, toexclude3)))` individuals. In this way the total number of species included in TRY reduces to `r try.individuals %>% distinct(Name_short) %>% nrow()`
 ## Calculate species and genus level trait means and sd
 ```{r}
 ## Calculate species level trait means and sd. 
 try.species.means <- try.individuals %>% 
  group_by(Name_short) %>% 
  #Add a field to indivate the number of observation per taxon
@@ -270,7 +283,7 @@ Merge vegetation layers, where necessary. Combine cover values across layers
 ```{r}
 #Ancillary function
 # Combine cover accounting for layers
-combine.cover <- function(x, datatype){
+combine.cover <- function(x){
    while (length(x)>1){
      x[2] <- x[1]+(100-x[1])*x[2]/100
      x <- x[-1]
@@ -279,9 +292,8 @@ combine.cover <- function(x, datatype){
 }
 DT2.comb <- DT2 %>% 
-  #temporary
  group_by(PlotObservationID, species, Rank_correct) %>%
-  summarize(Relative.cover=combine.cover(Relative.cover, cover_code)) %>%
+  summarize(Relative.cover=combine.cover(Relative.cover)) %>%
  ungroup()
 ```
@@ -368,7 +380,7 @@ CWM <- CWM1 %>%
  arrange(PlotObservationID)
 ```
-## Explore CWM output
+### Explore CWM output
 ```{r, echo=F}
 knitr::kable(CWM %>% 
@@ -436,11 +448,388 @@ knitr::kable(coverage.summary,
                  full_width = F, position = "center")
 ```
-## Export CWM and species mean trait values
+### Export CWM and species mean trait values
 ```{r}
 save(try.combined.means, CWM, file="../_output/Traits_CWMs.RData")
 ```
+## Classify plots in `is.forest` or `is.non.forest` based on species traits
+sPlot has two independent systems for classifying plots to vegetation types. The first, classifies plots into forest and non-forest, based on the share of trees, and the layering of vegetation. The second system classifies plots into broad habitat types and relies on the expert opinion of data contributors. This is, unfortunately, not consistently available across all plots, being the large majority of classified plots only available for Europe. These broad habitat types are coded using 5, non-mutually exclusive dummy variables:  
+1) Forest - F  
+2) Grassland - G  
+3) Shrubland - S  
+4) Sparse vegetation - B (Bare)  
+5) Wetland - W  
+A plot may belong to more than one formation, e.g. a Savannah is categorized as Forest + Grassland (FG).  
+\newline\newline
+Derive the `if.forest` and `is.non.forest` classification of plots.    
+### Derive species level information on Growth Forms.
+We used different sources of information:  
+1) Data from the gap-filled trait matrix  
+2) Manual cleaning of the most common species for which growth trait info is not available  
+3) Data from TRY (public dataset only) on all species with growth form info (Trait ID = 42)  
+4) Cross-match with species assigned to tree layer in DT table.
+\newline\newline
+Step 1: Derive growth form trait information to DT table. Growth form information derives from TRY
+```{r}
+DT.gf <- DT2 %>% 
+  filter(taxon_group=="Vascular plant") %>% 
+  #join with try names, using resolved species names as key
+  left_join(try.species.names %>% 
+              dplyr::select(Name_short, GrowthForm) %>% 
+              rename(species=Name_short) %>% 
+              distinct(species, .keep_all=T), 
+            by="species") %>% 
+  left_join(try.species.means %>% 
+              dplyr::select(Name_short, PlantHeight_mean) %>% 
+              rename(species=Name_short), 
+            by="species")
+# number of records withouth Growth Form info
+sum(is.na(DT.gf$GrowthForm))
+```
+Step 2: Select most common species without growth-trait information to export and check manually
+```{r, eval=F}
+top.gf.nas <- DT.gf %>% 
+  filter(is.na(GrowthForm)) %>% 
+  group_by(species) %>% 
+  summarize(n=n()) %>% 
+  arrange(desc(n))
+write_csv(top.gf.nas %>% 
+            filter(n>1000), 
+  path="../_derived/Species_missingGF.csv")
+```
+The first `r nrow(top.gf.nas)` species account for `r sum(top.gf.nas %>% filter(n>1000) %>% pull(n))/sum(top.gf.nas$n)*100`% of the missing records. Assign growth forms manually, reimport and coalesce into `DT.gf`
+```{r}
+gf.manual <- read_csv("../_derived/Species_missingGF_complete.csv")
+DT.gf <- DT.gf %>% 
+  left_join(gf.manual %>% 
+              rename(GrowthForm.m=GrowthForm),
+            by="species") %>% 
+   mutate(GrowthForm=coalesce(GrowthForm, GrowthForm.m)) %>% 
+   dplyr::select(-GrowthForm.m)
+```
+After manual completion, the number of records without growth form information decresead to `r sum(is.na(DT.gf$GrowthForm))`.  
+\newline\newline
+Step 3: Import additional data on growth-form from TRY (Accessed 10 March 2020). All public data on growth form downloaded. First take care of unmatched quotation marks in the txt file. Do this from command line.
+```{bash, eval=F}
+# escape all unmatched quotation marks. Run in Linux terminal
+#sed 's/"/\\"/g' 8854.txt > 8854_test.csv
+#sed "s/'/\\'/g" 8854.txt > 8854_test.csv
+```
+```{r}
+all.gf <- read_delim("../_input/TRY5.0_v1.1/8854_test.txt", delim="\t") 
+all.gf <- all.gf0 %>% 
+  filter(TraitID==42) %>% 
+  distinct(AccSpeciesName, OrigValueStr) %>% 
+  rename(GrowthForm0=OrigValueStr) %>% 
+  mutate(GrowthForm0=tolower(GrowthForm0)) %>%
+  filter(AccSpeciesName %in% sPlot.species$species) %>% 
+  mutate(GrowthForm_simplified= GrowthForm0) %>% 
+  mutate(GrowthForm_simplified=replace(GrowthForm_simplified, list=str_detect(GrowthForm0,
+                                                                              "vine|climber|liana|carnivore|epiphyte|^succulent|lichen|parasite|
+                                                                              hydrohalophyte|aquatic|cactous|parasitic|hydrophytes|carnivorous"), "other")) %>%
+  mutate(GrowthForm_simplified=replace(GrowthForm_simplified, list=str_detect(GrowthForm0, "tree|conifer|^woody$|palmoid|mangrove|gymnosperm"), "tree")) %>% 
+  mutate(GrowthForm_simplified=replace(GrowthForm_simplified, list=str_detect(GrowthForm0, "shrub|scrub|bamboo"), "shrub")) %>%
+  mutate(GrowthForm_simplified=replace(GrowthForm_simplified, list=str_detect(GrowthForm0, "herb|sedge|graminoid|fern|forb|herbaceous|grass|chaemaephyte|geophyte|annual"), "herb")) %>%
+  mutate(GrowthForm_simplified=ifelse(GrowthForm_simplified %in% c("other", "herb", "shrub", "tree"), 
+                                      GrowthForm_simplified, NA)) %>% 
+  filter(!is.na(GrowthForm_simplified)) 
+#Some species have multiple attributions - use a majority vote. NA if ties
+get.mode <- function(x){
+  if(length(unique(x))==1){
+    return(as.character(unique(x)))} else{
+    tmp <- sort(table(x), decreasing=T)
+    if(tmp[1]!=tmp[2]){return(names(tmp)[1])} else {
+      #return(paste0(names(tmp)[1:2], collapse="/"))}
+    return("Unknown")}
+    }
+  }
+all.gf <- all.gf %>% 
+  group_by(AccSpeciesName) %>% 
+  summarize(GrowthForm_simplified=get.mode(GrowthForm_simplified)) %>% 
+  filter(GrowthForm_simplified!="Unknown")
+table(all.gf$GrowthForm_simplified, exclude=NULL)  
+#coalesce this info into DT.gf
+DT.gf <- DT.gf %>% 
+  left_join(all.gf %>% 
+              rename(species=AccSpeciesName), 
+            by="species") %>% 
+  mutate(GrowthForm=coalesce(GrowthForm, GrowthForm_simplified)) %>% 
+  dplyr::select(-GrowthForm_simplified)
+```
+Step 4: Cross-match. Assign all species occurring in at least one relevé in the tree layer as tree. Conservatively, do this only when the record is at species level (exclude records at genus\\family level)
+```{r}
+other.trees <- DT.gf %>% 
+  filter(Layer==1 & is.na(GrowthForm)) %>% 
+  filter(Rank_correct=="species") %>% 
+  distinct(species, Layer, GrowthForm) %>% 
+  pull(species)
+DT.gf <- DT.gf %>% 
+  mutate(GrowthForm=replace(GrowthForm, 
+                            list=species %in% other.trees, 
+                            values="tree"))
+```
+After cross-matching, the number of records without growth form information decresead to `r sum(is.na(DT.gf$GrowthForm))`.  
+\newline\newline
+```{r, echo=F}
+knitr::kable(DT.gf %>% 
+              distinct(species, GrowthForm, PlantHeight_mean) %>% 
+              group_by(GrowthForm) %>% 
+              summarize(Height=mean(PlantHeight_mean, na.rm=T)), 
+  caption="Average height per growth form") %>%
+    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), 
+                  full_width = F, position = "center")
+```
+Classify species as tree or tall shrubs vs. other. Make a compact table of species growth forms and create fields `is.tree.or.tall.shrub` and `is.not.tree.and.small`.  
+Define a species as `is.tree.or.tall.shrub` when it is either defined as tree, OR has a height >10  
+Define a species as `is.not.tree.or.shrub.and.small` when it has a height <10, as long as it's not defined a tree. When height is not available, it is sufficient that the species is classified as "herb" or "other".
+```{r}
+GF <- DT.gf %>% 
+  distinct(species, GrowthForm, PlantHeight_mean) %>% 
+  ## define is.tree.or.tall
+  mutate(is.tree.or.tall.shrub=NA) %>% 
+  mutate(is.tree.or.tall.shrub=replace(is.tree.or.tall.shrub, 
+                                       list=str_detect(GrowthForm, "tree"), 
+                                       T)) %>% 
+  mutate(is.tree.or.tall.shrub=replace(is.tree.or.tall.shrub, 
+                                       list=PlantHeight_mean>=10, 
+                                       T)) %>% 
+  ## define is.not.tree.or.shrub.and.small 
+  mutate(is.not.tree.or.shrub.and.small=NA) %>% 
+  mutate(is.not.tree.or.shrub.and.small=replace(is.not.tree.or.shrub.and.small,
+                                       list=PlantHeight_mean<10, 
+                                       T)) %>% 
+  mutate(is.not.tree.or.shrub.and.small=replace(is.not.tree.or.shrub.and.small,
+                                       list=is.na(PlantHeight_mean) & str_detect(GrowthForm, "herb|other"), 
+                                       T)) %>%   
+  ## use each field in turn to define which of the records in the other is F
+  mutate(is.not.tree.or.shrub.and.small=replace(is.not.tree.or.shrub.and.small,
+                                       list= is.tree.or.tall.shrub==T,
+                                       F)) %>% 
+  mutate(is.tree.or.tall.shrub=replace(is.tree.or.tall.shrub,
+                                       list= is.not.tree.or.shrub.and.small==T,
+                                       F)) %>% 
+  ## drop redundant field
+  dplyr::select(-is.not.tree.or.shrub.and.small)
+## cross-check classification  
+table(GF$GrowthForm, GF$is.tree.or.tall.shrub, exclude=NULL)
+## Check for herb species classified as trees
+GF %>% 
+  filter(is.tree.or.tall.shrub & GrowthForm=="herb")
+```
+### Perform actual classification of plots
+Define a plot as forest if:  
+1) Has a total cover of the the tree layer >=25% (from header)  
+2) Has a total cover in Layer 1 >=25% (from DT)  
+3) Has a total cover of tree or tall shrub species >=25% (from DT + TRY)  
+4) Has data on Basal area summing to 10 m2/ha  
+\newline\newline
+The first three criteria are declined to define non forest as follows:  
+1) Info on total cover of the tree layer is available and <25%  
+2) Info on total cover in Layer 1 is available and <25%  
+3) The **relative** cover of non tree species is >75%  
+\newline\newline
+Criteria 2 and 3 only apply to plots having cover data in percentage.  
+Reimport header file
+```{r}
+load("../_output/header_splot3.0.RData")
+```
+```{r}
+# Criterium 1
+plot.vegtype1 <- header %>% 
+  dplyr::select(PlotObservationID, `Cover tree layer (%)`) %>% 
+  rename(Cover_trees=`Cover tree layer (%)`) %>% 
+  mutate(is.forest=Cover_trees>=25) 
+table(plot.vegtype1 %>% dplyr::select(is.forest), exclude=NULL)
+# Criterium 2
+# Select only plots having cover data in percentage
+mysel <- (DT.gf %>% 
+            distinct(PlotObservationID, Ab_scale) %>% 
+            group_by(PlotObservationID) %>% 
+            summarize(AllCovPer=all(Ab_scale=="CoverPerc")) %>% 
+            filter(AllCovPer==T) %>% 
+            pull(PlotObservationID))
+# Excludedd plots
+nrow(header)-length(mysel)
+plot.vegtype2 <- DT.gf %>% 
+  filter(PlotObservationID %in% mysel ) %>% 
+  filter(Layer %in% c(1,2,3)) %>% 
+  # first sum the cover of all species in a layer
+  group_by(PlotObservationID, Layer) %>% 
+  summarize(cover_perc=sum(cover_perc)) %>% 
+  # then combine cover across layers
+  group_by(PlotObservationID) %>% 
+  summarize(cover_perc=combine.cover(cover_perc)) %>% 
+  mutate(is.forest=cover_perc>=25) 
+table(plot.vegtype1 %>% dplyr::select(is.forest), exclude=NULL)
+# Criterium 3
+plot.vegtype3 <- DT.gf %>% 
+  #filter plots where all records are recorded as percentage cover
+  filter(PlotObservationID %in% mysel ) %>% 
+  # combine cover across layers
+  group_by(PlotObservationID, species) %>%
+  summarize(cover_perc=combine.cover(cover_perc)) %>%
+  ungroup() %>% 
+  # attach species Growth Form information
+  left_join(GF, by="species")%>% 
+  group_by(PlotObservationID) %>% 
+  summarize(cover_tree=sum(cover_perc*is.tree.or.tall.shrub, na.rm=T), 
+            cover_non_tree=sum(cover_perc*(!is.tree.or.tall.shrub), na.rm=T), 
+            cover_unknown=sum(cover_perc* is.na(is.tree.or.tall.shrub))) %>% 
+  rowwise() %>% 
+  ## classify plots based on cover of different growth forms
+  mutate(tot.cover=sum(cover_tree, cover_non_tree, cover_unknown, na.rm=T)) %>% 
+  mutate(is.forest=cover_tree>=25) %>% 
+  mutate(is.non.forest=cover_tree<25 & (cover_non_tree/tot.cover)>.75)
+table(plot.vegtype3 %>% dplyr::select(is.forest, is.non.forest), exclude=NULL)
+## Criterium 4
+plot.vegtype4 <-  DT.gf %>% 
+  filter(Ab_scale=="x_BA") %>% 
+  group_by(PlotObservationID) %>% 
+  summarize(tot.ba=sum(Abundance)) %>% 
+  mutate(is.forest=tot.ba>10)
+table(plot.vegtype4 %>% dplyr::select(is.forest), exclude=NULL)
+```
+Combine classifications from the three criteria. Use majority vote to assign plots. In case of ties, a progressively lower priority is given from criterium 1 to criterim 4. 
+```{r}
+plot.vegtype <- header %>% 
+  dplyr::select(PlotObservationID) %>% 
+  left_join(plot.vegtype1 %>% 
+              dplyr::select(PlotObservationID, is.forest), 
+            by="PlotObservationID") %>% 
+  left_join(plot.vegtype2 %>% 
+              dplyr::select(PlotObservationID, is.forest), 
+            by="PlotObservationID") %>% 
+  left_join(plot.vegtype3 %>% 
+              dplyr::select(PlotObservationID, is.forest, is.non.forest) %>% 
+              rename(is.non.forest.x.x=is.non.forest), 
+            by="PlotObservationID") %>% 
+  left_join(plot.vegtype4 %>% 
+              dplyr::select(PlotObservationID, is.forest), 
+            by="PlotObservationID") %>% 
+  ## assign vegtype based on majority vote. In case of ties use the order of criteria as ranking
+  rowwise() %>% 
+  mutate(mean.forest=mean(c(is.forest.x, is.forest.y, is.forest.x.x, is.forest.y.y), na.rm=T)) %>% 
+  mutate(mean.forest2=coalesce(is.forest.x, is.forest.y, is.forest.x.x, is.forest.y.y)) %>% 
+  mutate(is.forest=ifelse(mean.forest==0.5, mean.forest2, mean.forest>0.5)) %>%  
+  # same for is.non.forest
+  mutate(mean.non.forest=mean(c( (!is.forest.x), (!is.forest.y), is.non.forest.x.x, (!is.forest.y.y)), na.rm=T)) %>% 
+  mutate(mean.non.forest2=coalesce( (!is.forest.x), (!is.forest.y), is.non.forest.x.x, (!is.forest.y.y))) %>% 
+  mutate(is.non.forest=ifelse(mean.non.forest==0.5, mean.non.forest2, mean.non.forest>0.5)) %>% 
+  # when both is.forest & is.non.forest are F transform to NA
+  mutate(both.F=ifelse( (is.forest==F & is.non.forest==F), T, F)) %>% 
+  mutate(is.forest=replace(is.forest, list=both.F==T, values=NA)) %>% 
+  mutate(is.non.forest=replace(is.non.forest, list=both.F==T, values=NA))
+table(plot.vegtype %>% dplyr::select(is.forest, is.non.forest), exclude=NULL)
+```
+### Cross-check and validate
+Cross check with sPlot's 5-class (incomplete) native classification deriving from data contributors. Build a Confusion matrix.
+```{r}
+cross.check <- header %>% 
+  dplyr::select(PlotObservationID, Forest) %>% 
+  left_join(plot.gf %>% 
+              dplyr::select(PlotObservationID, is.forest, is.non.forest) %>% 
+              rename(Forest=is.forest, Other=is.non.forest) %>% 
+              gather(isfor_isnonfor, value, -PlotObservationID) %>% 
+              filter(value==T) %>% 
+              dplyr::select(-value), 
+            by="PlotObservationID") %>% 
+  mutate(Other=1*Forest!=1) %>% 
+  gather(veg_type, value, -PlotObservationID, -isfor_isnonfor) %>% 
+  filter(value==1) %>% 
+  dplyr::select(-value)
+#Build a confusion matrix to evaluate the comparison  
+u <- union(cross.check$isfor_isnonfor, cross.check$veg_type)
+t <- table( factor(cross.check$isfor_isnonfor, u), factor(cross.check$veg_type, u))
+confm <- caret::confusionMatrix(t)
+```
+```{r echo=F}
+knitr::kable(confm$table, caption="Confusion matrix between sPlot's native classification of habitats (columns), and classification based on four criteria based on vegetation layers and growth forms (rows)") %>%
+    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "center")
+```
+Formulas of associated statistics are available on the help page of the [caret package](https://www.rdocumentation.org/packages/caret/versions/6.0-84/topics/confusionMatrix) and associated references.
+The overall accuracy of the classification based on `is.forest`\\`is.non.forest`, when tested against sPlot's native habitat classification is `r round(confm$overall[1],2)`, the Kappa statistics is `r round(confm$overall[2],2)`.
+```{r echo=F}
+knitr::kable((confm$byClass), caption="Associated statistics of confusion matrix by class") %>%
+    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")
+```
+```{r echo=F}
+header.vegtype <- header %>% 
+  dplyr::select(PlotObservationID, Forest:Wetland) %>% 
+  left_join(plot.vegtype %>% 
+              dplyr::select(PlotObservationID, is.forest, is.non.forest),
+            by="PlotObservationID")
+```
+Through the process described above, we managed to classify `r plot.vegtype %>% filter(is.forest==T | is.non.forest==T) %>% nrow()`, of which `r plot.vegtype %>% filter(is.forest==T) %>% nrow()` is forest and `r plot.vegtype %>% filter(is.non.forest==T) %>% nrow()` is non-forest.  
+\newline\newline
+The total number of plots with attribution to forest\\non-forest (either coming from sPlot's native classification, or from the process above) is: `r header.vegtype %>% dplyr::select(-PlotObservationID) %>% filter(rowMeans(is.na(.)) < 1) %>% nrow()`.
+### Export and update other objects
+```{r}
+sPlot.traits <- sPlot.species %>% 
+  arrange(species) %>% 
+  left_join(GF %>% 
+              dplyr::select(species, GrowthForm, is.tree.or.tall.shrub), 
+            by="species") %>% 
+  left_join(try.combined.means %>% 
+              rename(species=Taxon_name), by="species") %>% 
+  dplyr::select(-Rank_correct)
+save(try.combined.means, CWM, sPlot.traits, file="../_output/Traits_CWMs.RData")
+header <- header %>% 
+  left_join(plot.vegtype %>% 
+              dplyr::select(PlotObservationID, is.forest, is.non.forest),
+            by="PlotObservationID") %>% 
+  dplyr::select(PlotObservationID:ESY, is.forest:is.non.forest, everything())
+save(header, file="../_output/header_splot3.0.RData")
+```
+# Appendix
+## Growth forms of most common species
+```{r, code = readLines("../_derived/Species_missingGF_complete.csv")}
+```
 ## SessionInfo
 ```{r}
 sessionInfo()