Timestamp: Mon Nov 30 21:09:08 2020
Drafted: Francesco Maria Sabatini
Revised: Helge Bruelheide
Version: 1.1

This report documents the construction of the DT table for sPlot 3.0. It is based on dataset sPlot_3.0.2, received on 24/07/2019 from Stephan Hennekens.

Caution: Layer information is not available for all species in each plot. In case of missing information Layer is set to zero.

Changes in version 1.1
1) Added explanation of fields
2) Fixed taxon_group of Friesodielsia
3) Only export the fields Ab_scale and Abundance

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(readr)
library(xlsx)
library(knitr)
library(kableExtra)

#save temporary files
write("TMPDIR = /data/sPlot/users/Francesco/_tmp", file=file.path(Sys.getenv('TMPDIR'), '.Renviron'))
write("R_USER = /data/sPlot/users/Francesco/_tmp", file=file.path(Sys.getenv('R_USER'), '.Renviron'))
#rasterOptions(tmpdir="/data/sPlot/users/Francesco/_tmp")

Search and replace unclosed quotation marks and escape them. Run in Linux terminal

# escape all double quotation marks. Run in Linux terminal
# sed 's/"/\\"/g' sPlot_3_0_2_species.csv > sPlot_3_0_2_species_test.csv

Import data Table

DT table is the species x plot matrix, in long format.

DT0 <- readr::read_delim("../sPlot_data_export/sPlot_3_0_2_species_test.csv", 
                            delim="\t", 
                         col_type = cols(
                                PlotObservationID = col_double(),
                                Taxonomy = col_character(),
                                `Taxon group` = col_character(),
                                `Taxon group ID` = col_double(),
                                `Turboveg2 concept` = col_character(),
                                `Matched concept` = col_character(),
                                Match = col_double(),
                                Layer = col_double(),
                                `Cover %` = col_double(),
                                `Cover code` = col_character(),
                                x_ = col_double()
                              )
                         ) 
nplots <- length(unique(DT0$PlotObservationID))
nspecies <- length(unique(DT0$`Matched concept`))

Match plots with those in header

load("../_output/header_sPlot3.0.RData")
DT0 <- DT0 %>% 
  filter(PlotObservationID %in% unique(header$PlotObservationID))

The DT table includes 43093694 species * plot records, across 1978589 plots. Before taxonomic resolution, there are 107676 species .

Example of initial DT table (3 randomly selected plots shown)
PlotObservationID Taxonomy Taxon group Taxon group ID Turboveg2 concept Matched concept Match Layer Cover % Cover code x_
34576 AU-Austria Vascular plant 1 Alnus incana Alnus incana 3 7 1.0
NA
34576 AU-Austria Vascular plant 1 Calamagrostis canescens Calamagrostis canescens 3 6 37.0 3 NA
34576 AU-Austria Vascular plant 1 Carex elata Carex elata 3 6 15.0 2 NA
34576 AU-Austria Vascular plant 1 Cirsium arvense Cirsium arvense 3 6 1.0
NA
34576 AU-Austria Vascular plant 1 Cornus sanguinea Cornus sanguinea 3 7 1.0
NA
34576 AU-Austria Vascular plant 1 Crataegus monogyna Crataegus monogyna 3 7 1.0
NA
34576 AU-Austria Vascular plant 1 Equisetum fluviatile Equisetum fluviatile 3 6 1.0
NA
34576 AU-Austria Vascular plant 1 Fraxinus excelsior Fraxinus excelsior 3 7 1.0
NA
34576 AU-Austria Vascular plant 1 Galium elongatum Galium elongatum 3 6 1.0
NA
34576 AU-Austria Vascular plant 1 Ligustrum vulgare Ligustrum vulgare 3 7 1.0
NA
34576 AU-Austria Vascular plant 1 Lysimachia vulgaris Lysimachia vulgaris 3 6 1.0
NA
34576 AU-Austria Vascular plant 1 Lythrum salicaria Lythrum salicaria 3 6 1.0
NA
34576 AU-Austria Vascular plant 1 Mentha aquatica Mentha aquatica 3 6 1.0
NA
34576 AU-Austria Vascular plant 1 Persicaria amphibia Persicaria amphibia 3 6 1.0
NA
34576 AU-Austria Vascular plant 1 Phragmites australis Phragmites australis 3 6 62.0 4 NA
34576 AU-Austria Vascular plant 1 Solidago gigantea Solidago gigantea 3 6 0.2 r NA
34576 AU-Austria Vascular plant 1 Stachys palustris Stachys palustris 3 6 1.0
NA
34576 AU-Austria Vascular plant 1 Valeriana dioica Valeriana dioica 3 6 1.0
NA
34576 AU-Austria Vascular plant 1 Valeriana officinalis Valeriana officinalis subsp. officinalis 3 6 1.0
NA
34576 AU-Austria Vascular plant 1 Viburnum opulus Viburnum opulus 3 7 1.0
NA
34576 AU-Austria Vascular plant 1 Vicia cracca Vicia cracca 3 6 1.0
NA
116032 NL-Floranld_2013 Vascular plant 1 Betula pubescens Betula pubescens 3 6 1.0 r NA
116032 NL-Floranld_2013 Vascular plant 1 Carex pilulifera Carex pilulifera 3 6 2.0
NA
116032 NL-Floranld_2013 Moss 3 Dicranella heteromalla Dicranella heteromalla 1 9 1.0 r NA
116032 NL-Floranld_2013 Vascular plant 1 Dryopteris dilatata Dryopteris dilatata 3 6 2.0
NA
116032 NL-Floranld_2013 Moss 3 Eurhynchium praelongum Kindbergia praelonga 1 9 1.0 r NA
116032 NL-Floranld_2013 Vascular plant 1 Fagus sylvatica Fagus sylvatica 3 1 18.0 2b NA
116032 NL-Floranld_2013 Vascular plant 1 Galeopsis tetrahit Galeopsis tetrahit 3 6 1.0 r NA
116032 NL-Floranld_2013 Vascular plant 1 Pinus nigra var. maritima Pinus nigra 3 1 68.0 4 NA
116032 NL-Floranld_2013 Vascular plant 1 Pinus nigra var. maritima Pinus nigra 3 6 1.0 r NA
116032 NL-Floranld_2013 Vascular plant 1 Prunus serotina Prunus serotina 3 4 8.0 2a NA
116032 NL-Floranld_2013 Vascular plant 1 Prunus serotina Prunus serotina 3 6 2.0
NA
116032 NL-Floranld_2013 Vascular plant 1 Quercus robur Quercus robur 3 4 8.0 2a NA
116032 NL-Floranld_2013 Vascular plant 1 Quercus robur Quercus robur 3 6 2.0
NA
116032 NL-Floranld_2013 Vascular plant 1 Quercus rubra Quercus rubra 3 1 18.0 2b NA
116032 NL-Floranld_2013 Vascular plant 1 Quercus rubra Quercus rubra 3 6 2.0
NA
116032 NL-Floranld_2013 Vascular plant 1 Rubus sect. Rubus Rubus sect. Rubus 1 6 2.0
NA
116032 NL-Floranld_2013 Vascular plant 1 Sorbus aucuparia Sorbus aucuparia 3 6 1.0 r NA
947871 IR-Ireland2008 Vascular plant 1 Ammophila arenaria Ammophila arenaria 1 0 1.0 2 NA
947871 IR-Ireland2008 Vascular plant 1 Elytrigia juncea subsp. boreoatlantica Elytrigia juncea subsp. boreoatlantica 3 0 27.5 6 NA

Match species names from DT0 to those in Backbone

Import taxonomic backbone

load("../_output/Backbone3.0.RData")

Match to DT0, using Taxonomic concept as matching key. This is the field that was used to build, and resolve, the Backbone.

DT1 <- DT0 %>% 
  left_join(Backbone %>% 
              dplyr::select(Name_sPlot_TRY, Name_short, `Taxon group`, Rank_correct) %>%
              rename(`Matched concept`=Name_sPlot_TRY,
                     Taxongroup_BB=`Taxon group`), 
            by="Matched concept") %>% 
  # Simplify Rank_correct
  mutate(Rank_correct=fct_collapse(Rank_correct, 
                                   lower=c("subspecies", "variety", "infraspecies", "race", "forma"))) %>% 
  mutate(Rank_correct=fct_explicit_na(Rank_correct, "No_match")) %>% 
  mutate(Name_short=replace(Name_short, 
                            list=Name_short=="No suitable", 
                            values=NA))

Explore name matching based on Backbone v1.2

Select species entries that changed after taxonomic standardization, as a way to check the backbone.

name.check <- DT1 %>% 
  dplyr::select(`Turboveg2 concept`:`Matched concept`, Name_short) %>% 
  rename(Name_TNRS=Name_short) %>% 
  distinct() %>% 
  mutate(Matched_short=word(`Matched concept`, start = 1L, end=2L)) %>% 
  filter(is.na(Name_TNRS) | Matched_short != Name_TNRS) %>%
  dplyr::select(-Matched_short) %>% 
  arrange(Name_TNRS)
Check 30 random species names from DT that changed name after matching to backbone
Turboveg2 concept Matched concept Name_TNRS
Isolepis platycarpa Isolepis platycarpa Isolepis cernua
Senecio streptanthifolius Senecio streptanthifolius Packera streptanthifolia
Carex refracta Carex refracta Carex caryophyllea
Galium pamiro-alaicum Galium pamiro-alaicum Galium pamiroalaicum
Araliaceae sp1_Operation_Wallacea Araliaceae sp1_Operation_Wallacea Araliaceae
Haussmannianthes jucunda Haussmannianthes jucunda Neosepicaea jucunda
Platysace stephensonii Platysace stephensonii Trachymene stephensonii
Saxifraga willkommiana Saxifraga willkommiana Saxifraga pentadactylis
Satureja kilimandscharica Satureja kilimandscharica Clinopodium kilimandschari
Epidendrum acunae Epidendrum acunae Epidendrum blancheanum
Bromus willdenowii Ceratochloa cathartica Bromus catharticus
Justicia debilis Justicia debilis Monechma debile
Selinum silaifolium subsp. orientale Selinum silaifolium Cnidium silaifolium
Picradeniopsis species Picradeniopsis species Bahia
Chaenorhinum robustum Linaria serpyllifolia subsp. robusta Chaenorhinum serpyllifolium
Achlaena piptostachya Achlaena piptostachya Arthropogon piptostachyus
Polypodium chnoodes Polypodium chnoodes Polypodium dissimile
Mimosa sp1_IUCN2 Mimosa sp1_IUCN2 Mimosa
Panicle gross samtig 134050 Panicle gross samtig 134050 NA
Pocockia ruthenica Pocockia ruthenica Medicago ruthenica
Davallia formosana Davallia formosana Araiostegia divaricata
Daphnopsis species Daphnopsis species Daphnopsis
Orianthera flaviflora Orianthera flaviflora Erianthera
Vigna species Vigna species Vigna
ZwStr samtig 134951 ZwStr samtig 134951 NA
Sideritis bilgerana Sideritis bilgerana Sideritis bilgeriana
Alectryon connatus Alectryon connatus Alectryon connatum
Bombacaceae species #2 Bombacaceae species #2 Bombacaceae
Nauclea diderichii Nauclea diderichii Nauclea diderrichii
Argyrolobium species Argyrolobium species Argyrolobium

Check the most common species names from DT after matching to backbone

name.check.freq <- DT1 %>% 
  dplyr::select(`Turboveg2 concept`:`Matched concept`, Name_short) %>% 
  rename(Name_TNRS=Name_short) %>% 
  group_by(`Turboveg2 concept`, `Matched concept`, Name_TNRS) %>% 
  summarize(n=n()) %>% 
  mutate(Matched_short=word(`Matched concept`, start = 1L, end=2L)) %>% 
  filter(is.na(Name_TNRS) | Matched_short != Name_TNRS) %>%
  dplyr::select(-Matched_short) %>% 
  ungroup() %>% 
  arrange(desc(n)) 
## `summarise()` regrouping output by 'Turboveg2 concept', 'Matched concept' (override with `.groups` argument)
Check 40 most common species names from DT that changed name after matching to backbone
Turboveg2 concept Matched concept Name_TNRS n
Deschampsia flexuosa Avenella flexuosa Deschampsia flexuosa 126515
Festuca pratensis Schedonorus pratensis Festuca pratensis 84008
Elymus repens Elytrigia repens Elymus repens 82891
Phalaris arundinacea Phalaroides arundinacea Phalaris arundinacea 75296
Bryophyta species Bryophyta species NA 74393
Poa annua Ochlopoa annua Poa annua 67460
Potentilla anserina Argentina anserina Potentilla anserina 63786
Taraxacum sect. Ruderalia Taraxacum sect. Taraxacum Taraxacum 58429
Taraxacum species Taraxacum species Taraxacum 57167
Cornus sanguinea Cornus sanguinea Cornus controversa 52651
Elytrigia repens Elytrigia repens Elymus repens 51670
Taraxacum officinale Taraxacum sect. Taraxacum Taraxacum 50502
Weinmannia racemosa Weinmannia racemosa Leiospermum racemosum 38269
Bromus erectus Bromopsis erecta Bromus erectus 33765
Cladonia species Cladonia species Cladonia 32464
Avenella flexuosa Avenella flexuosa Deschampsia flexuosa 30787
Rubus sect. Rubus Rubus sect. Rubus Rubus 28684
Festuca arundinacea Schedonorus arundinaceus Festuca arundinacea 26124
Trientalis europaea Trientalis europaea Lysimachia europaea 25940
Rubus fruticosus aggr. Rubus fruticosus aggr. Rubus vestitus 23669
Glaux maritima Glaux maritima Lysimachia maritima 23305
Taraxacum officinale aggr. Taraxacum sect. Taraxacum Taraxacum 22837
Rubus species Rubus species Rubus 22098
Festuca gigantea Schedonorus giganteus Festuca gigantea 20917
Taraxacum sectie Ruderalia Taraxacum sect. Taraxacum Taraxacum 20888
Lophozonia menziesii Lophozonia menziesii Lophozonia 20249
Juncus gerardi Juncus gerardi Juncus gerardii 19094
Sphagnum species Sphagnum species Sphagnum 18293
Festuca rupicola Festuca stricta subsp. sulcata Festuca rupicola 18010
Rosa species Rosa species Rosa 16657
Podocarpus laetus Podocarpus laetus Podocarpus spinulosus 16356
Bromus tectorum Anisantha tectorum Bromus tectorum 16302
Carex species Carex species Carex 15744
Ripogonum scandens Ripogonum scandens Rhipogonum 14984
Rubus hirtus Rubus hirtus aggr. Rubus proiectus 14191
Avenula pubescens Avenula pubescens Helictotrichon pubescens 13490
Notogrammitis billardierei Notogrammitis billardierei NA 13117
Crataegus species Crataegus species Crataegus 13072
Helictotrichon pubescens Avenula pubescens Helictotrichon pubescens 12941
Erophila verna Draba verna Erophila verna 12646

Complete field taxon group

Taxon group information is only available for 35699299 entries, but absent for 7394395. To improve the completeness of this field, we derive additional info from the Backbone, and merge it with the data already present in DT.

table(DT1$`Taxon group`, exclude=NULL)
## 
##           Alga         Lichen           Moss       Mushroom      Stonewort 
##           9497         324078        2034966            513          12166 
##        Unknown Vascular plant 
##        7394395       33318079
DT1 <- DT1 %>% 
  mutate(`Taxon group`=ifelse(`Taxon group`=="Unknown", NA, `Taxon group`)) %>% 
  mutate(Taxongroup_BB=ifelse(Taxongroup_BB=="Unknown", NA, Taxongroup_BB)) %>% 
  mutate(`Taxon group`=coalesce(`Taxon group`, Taxongroup_BB)) %>% 
  dplyr::select(-Taxongroup_BB)


table(DT1$`Taxon group`, exclude=NULL)
## 
##           Alga         Lichen           Moss       Mushroom      Stonewort 
##           9991         366995        2090953            513          12166 
## Vascular plant           <NA> 
##       40522471          90605

Those taxa for which a measures of Basal Area exists can be safely assumed to belong to vascular plants

DT1 <- DT1 %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=`Cover code`=="x_BA", 
                               values="Vascular plant"))

Cross-complement Taxon group information. This means that, whenever a taxon is marked to belong to one group, then assign the same taxon to that group throughout the DT table.

DT1 <- DT1 %>% 
  left_join(DT1 %>% 
              filter(!is.na(Name_short)) %>% 
              filter(`Taxon group` != "Unknown") %>% 
              dplyr::select(Name_short, `Taxon group`) %>% 
              distinct(Name_short, .keep_all=T) %>% 
              rename(TaxonGroup_compl=`Taxon group`),
            by="Name_short") %>% 
  mutate(`Taxon group`=coalesce(`Taxon group`, TaxonGroup_compl)) %>% 
  dplyr::select(-TaxonGroup_compl)

table(DT1$`Taxon group`, exclude=NULL)
## 
##           Alga         Lichen           Moss       Mushroom      Stonewort 
##           9994         367584        2100586            513          12193 
## Vascular plant           <NA> 
##       40524049          78775

Check species with conflicting Taxon group information and fix manually.

#check for conflicts in attribution of genera to Taxon groups
DT1 %>% 
  filter(!is.na(Name_short)) %>% 
  filter(!is.na(`Taxon group`)) %>% 
  distinct(Name_short, `Taxon group`) %>% 
  mutate(Genus=word(Name_short,1)) %>% 
  dplyr::select(Genus, `Taxon group`) %>% 
  distinct() %>% 
  group_by(Genus) %>% 
  summarize(n=n()) %>% 
  filter(n>1) %>% 
  arrange(desc(n))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 15 x 2
##    Genus                 n
##    <chr>             <int>
##  1 Brachytheciastrum     2
##  2 Brachythecium         2
##  3 Chara                 2
##  4 Characeae             2
##  5 Hepatica              2
##  6 Hypericum             2
##  7 Hypnum                2
##  8 Leptorhaphis          2
##  9 Lychnothamnus         2
## 10 Nitella               2
## 11 Oxymitra              2
## 12 Pancovia              2
## 13 Peltaria              2
## 14 Tonina                2
## 15 Zygodon               2

Manually fix some known problems in Taxon group attribution. Some lists of taxa (e.g., lichen.genera, mushroom.genera) were defined when building the Backbone.

#Attach genus info
DT1 <- DT1 %>% 
    left_join(Backbone %>% 
              dplyr::select(Name_sPlot_TRY, Name_short) %>%
              mutate(Genus=word(Name_short, 1, 1)) %>%
              dplyr::select(-Name_short) %>% 
              rename(`Matched concept`=Name_sPlot_TRY),
            by="Matched concept") %>% 
    mutate(`Taxon group`=fct_collapse(`Taxon group`, 
                                    Alga_Stonewort=c("Alga", "Stonewort")))
#manually fix some known problems
mosses.gen    <- c("Hypnum", "Brachytheciastrum","Brachythecium","Hypnum",  
                  "Zygodon", "Oxymitra", "Bryophyta", "Musci", '\\\"Moos\\\"')
vascular.gen  <- c("Polystichum", "Hypericum", "Peltaria", "Pancovia", "Calythrix", "Ripogonum",
                  "Notogrammitis", "Fuscospora", "Lophozonia",  "Rostellularia", 
                  "Hesperostipa", "Microsorium", "Angiosperm","Dicotyledonae", "Spermatophy", 
                  "Oxymitra", "Friesodielsia")
alga.gen      <- c("Chara", "Characeae", "Tonina", "Nostoc", "Entermorpha", "Hydrocoleum" )
 
DT1 <- DT1 %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% mosses.gen, 
                               values="Moss")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% vascular.gen, 
                               values="Vascular plant")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% alga.gen, 
                               values="Alga_Stonewort")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% c(lichen.genera, "Lichenes"),
                               values="Lichen")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% mushroom, 
                               values="Mushroom"))
  
table(DT1$`Taxon group`, exclude=NULL)
## 
## Alga_Stonewort         Lichen           Moss       Mushroom Vascular plant 
##          23098         367585        2100663            513       40525890 
##           <NA> 
##          75945

Delete all records of fungi, and use lists of genera to fix additional problems. While in the previous round the matching was done on the resolved Genus name, here the match is based on unresolved Genus names.

DT1 <- DT1 %>% 
  dplyr::select(-Genus) %>% 
  left_join(DT1 %>% 
              distinct(`Matched concept`) %>% 
              mutate(Genus=word(`Matched concept`, 1)), 
            by="Matched concept") %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                                 list=Genus %in% mushroom, 
                                 values = "Mushroom")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% lichen.genera, 
                               values="Lichen")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% mosses.gen, 
                               values="Moss")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% vascular.gen, 
                               values="Vascular plant")) %>% 
  mutate(`Taxon group` = fct_explicit_na(`Taxon group`, "Unknown")) %>% 
  filter(`Taxon group`!="Mushroom") %>%
  mutate(`Taxon group`=factor(`Taxon group`))
  #dplyr::select(-Genus)

table(DT1$`Taxon group`, exclude=NULL)
## 
## Alga_Stonewort         Lichen           Moss Vascular plant        Unknown 
##          23098         367931        2103320       40563187          35721

After cross-checking all sources of information, the number of taxa not having Taxon group information decreased to 35721 entries

Standardize abundance values

Species abundance information varies across datasets and plots. While for the large majority of plots abundance values are returned as percentage cover, there is a subset where abundance is returned with different scales. These are marked in the column Cover code as follows:
x_BA - Basal Area
x_IC - Individual count
x_SC - Stem count
x_IV - Relative Importance
x_RF - Relative Frequency
x - Presence absence
Still, it’s not really intuitive that in case Cover code belongs to one of the classes above, then the actual abundance value is stored in the x_ column. This stems from the way this data is stored in TURBOVEG.
To make the cover data more user friendly, I simplify the way cover it is stored, so that there are only two columns:
Ab_scale - to report the type of scale used
Abundance - to coalesce the cover\abundance values previously in the columns Cover % and x_.

# Create Ab_scale field
DT1 <- DT1 %>% 
  mutate(Ab_scale = ifelse(`Cover code` %in% 
                             c("x_BA", "x_IC", "x_SC", "x_IV", "x_RF") & !is.na(x_), 
                           `Cover code`, 
                           "CoverPerc"))  

Fix some errors. There are some plots where all species have zeros in the field Cover %. Some of them are marked as p\a (Cover code=="x"), but other not. Consider all this plots as presence\absence and transform Cover % to 1.

allzeroes <- DT1 %>% 
  group_by(PlotObservationID) %>% 
  summarize(allzero=all(`Cover %`==0) ) %>% 
  filter(allzero==T) %>% 
  pull(PlotObservationID)
## `summarise()` ungrouping output (override with `.groups` argument)
DT1 <- DT1 %>% 
  mutate(`Cover %`=replace(`Cover %`, 
                           list=(PlotObservationID %in% allzeroes), 
                           values=1)) %>% 
  mutate(`Cover code`=replace(`Cover code`, 
                           list=(PlotObservationID %in% allzeroes), 
                           values="x"))

Consider all plot-layer combinations where Cover code=="x", and all the entries of the field Cover % == 1 as presence\absence data, and transform Ab_scale to “pa”. This is done to avoid confusion with plots where Cover code=="x" but “x” has to be intended as a class in the cover scale used. For p\a plots, replace the field Cover % with NA, and assign the value 1 to the field x_.

#plots with at least one entry in Cover code=="x"
sel <- DT1 %>% 
  filter(`Cover code`=="x") %>% 
  distinct(PlotObservationID) %>% 
  pull(PlotObservationID)

DT1 <- DT1 %>% 
  left_join(DT1 %>%
              filter(PlotObservationID %in% sel) %>% 
              group_by(PlotObservationID, Layer) %>% 
              mutate(to.pa= all(`Cover %`==1 & `Cover code`=="x")) %>% 
              distinct(PlotObservationID, Layer, to.pa), 
            by=c("PlotObservationID", "Layer")) %>% 
  replace_na(list(to.pa=F)) %>% 
  mutate(Ab_scale=ifelse(to.pa==T, "pa", Ab_scale)) %>% 
  mutate(`Cover %`=ifelse(to.pa==T, NA, `Cover %`)) %>% 
  mutate(x_=ifelse(to.pa==T, 1, x_)) %>% 
  dplyr::select(-to.pa)

There are also some plots having different cover scales in the same layer. They are not many, and I will reduce their cover value to p\a.
Find these plots first:

mixed <- DT1 %>% 
  distinct(PlotObservationID, Ab_scale, Layer) %>% 
  group_by(PlotObservationID, Layer) %>% 
  summarize(n=n()) %>% 
  filter(n>1) %>% 
  pull(PlotObservationID) %>% 
  unique()
## `summarise()` regrouping output by 'PlotObservationID' (override with `.groups` argument)
length(mixed)
## [1] 335

Transform these plots to p\a and correct field Ab_scale. Note: the column Abundance is only created here.

DT1 <- DT1 %>% 
  mutate(Ab_scale=replace(Ab_scale, 
                           list=PlotObservationID %in% mixed, 
                           values="mixed")) %>%
  mutate(`Cover %`=replace(`Cover %`, 
                           list=Ab_scale=="mixed",
                           values=NA)) %>% 
  mutate(x_=replace(x_,  list=Ab_scale=="mixed", values=1)) %>% 
  mutate(Ab_scale=replace(Ab_scale, list=Ab_scale=="mixed", values="pa")) %>% 
  #Create additional field Abundance to avoid overwriting original data
  mutate(Abundance =ifelse(Ab_scale %in% c("x_BA", "x_IC", "x_SC", "x_IV", "x_RF", "pa"), 
                          x_, `Cover %`)) %>% 
  mutate(Abundance=replace(Abundance, 
                           list=PlotObservationID %in% mixed, 
                           values=1))

Double check and summarize Ab_scales

scale_check <- DT1 %>% 
  distinct(PlotObservationID, Layer, Ab_scale) %>% 
  group_by(PlotObservationID) %>% 
  summarise(Ab_scale_combined=ifelse(length(unique(Ab_scale))==1, 
                                     unique(Ab_scale), 
                                     "Multiple_scales"))
## `summarise()` ungrouping output (override with `.groups` argument)
nrow(scale_check)== length(unique(DT1$PlotObservationID))
## [1] TRUE
table(scale_check$Ab_scale_combined)
## 
##       CoverPerc Multiple_scales              pa            x_BA            x_IC 
##         1690422            2084          271057            6293            2092 
##            x_IV            x_RF            x_SC 
##             146             585            4878

Calculate species’ relative covers in each plot

Transform abundances to relative abundance. For consistency with the previous version of sPlot, this field is called Relative_cover.
Watch out - Even plots with p\a information are transformed to relative cover.

DT1 <- DT1 %>% 
  left_join(x=., 
            y={.} %>%
              group_by(PlotObservationID) %>% 
              summarize(tot.abundance=sum(Abundance)), 
            by=c("PlotObservationID")) %>% 
  mutate(Relative.cover=Abundance/tot.abundance)
## `summarise()` ungrouping output (override with `.groups` argument)
# check: there should be no plot where the sum of all relative covers !=0
DT1 %>% 
  group_by(PlotObservationID) %>% 
  summarize(tot.cover=sum(Relative.cover), 
            num.layers=sum(unique(Layer))) %>% 
  filter(tot.cover != num.layers) %>% 
  nrow()
## `summarise()` ungrouping output (override with `.groups` argument)
## [1] 1957784

Clean DT and export

DT2 <- DT1 %>% 
  dplyr::select(PlotObservationID, Name_short, `Turboveg2 concept`, Rank_correct, `Taxon group`, Layer:x_, Ab_scale, Abundance, Relative.cover ) %>% 
  rename(Species_original=`Turboveg2 concept`, 
         Species=Name_short,
         Taxon_group=`Taxon group`, 
         Cover_perc=`Cover %`, 
         Cover_code=`Cover code`, 
         Relative_cover=Relative.cover) %>% 
  ## change in Version 1.1.
  dplyr::select(-x_, -Cover_perc)

The output of the DT table contains 43093257 records, over 1977557 plots. The total number of taxa is 116256 and 0, before and after standardization, respectively. Information on the Taxon group is available for 76548 standardized species.

Example of initial DT table (same 3 randomly selected plots shown above)
PlotObservationID Species Species_original Rank_correct Taxon_group Layer Cover_code Ab_scale Abundance Relative_cover
34576 Alnus incana Alnus incana species Vascular plant 7
CoverPerc 1.0 0.0076220
34576 Calamagrostis canescens Calamagrostis canescens species Vascular plant 6 3 CoverPerc 37.0 0.2820122
34576 Carex elata Carex elata species Vascular plant 6 2 CoverPerc 15.0 0.1143293
34576 Cirsium arvense Cirsium arvense species Vascular plant 6
CoverPerc 1.0 0.0076220
34576 Cornus controversa Cornus sanguinea species Vascular plant 7
CoverPerc 1.0 0.0076220
34576 Crataegus monogyna Crataegus monogyna species Vascular plant 7
CoverPerc 1.0 0.0076220
34576 Equisetum fluviatile Equisetum fluviatile species Vascular plant 6
CoverPerc 1.0 0.0076220
34576 Fraxinus excelsior Fraxinus excelsior species Vascular plant 7
CoverPerc 1.0 0.0076220
34576 Galium elongatum Galium elongatum species Vascular plant 6
CoverPerc 1.0 0.0076220
34576 Ligustrum vulgare Ligustrum vulgare species Vascular plant 7
CoverPerc 1.0 0.0076220
34576 Lysimachia vulgaris Lysimachia vulgaris species Vascular plant 6
CoverPerc 1.0 0.0076220
34576 Lythrum salicaria Lythrum salicaria species Vascular plant 6
CoverPerc 1.0 0.0076220
34576 Mentha aquatica Mentha aquatica species Vascular plant 6
CoverPerc 1.0 0.0076220
34576 Persicaria amphibia Persicaria amphibia species Vascular plant 6
CoverPerc 1.0 0.0076220
34576 Phragmites australis Phragmites australis species Vascular plant 6 4 CoverPerc 62.0 0.4725610
34576 Solidago gigantea Solidago gigantea species Vascular plant 6 r CoverPerc 0.2 0.0015244
34576 Stachys palustris Stachys palustris species Vascular plant 6
CoverPerc 1.0 0.0076220
34576 Valeriana dioica Valeriana dioica species Vascular plant 6
CoverPerc 1.0 0.0076220
34576 Valeriana officinalis Valeriana officinalis higher Vascular plant 6
CoverPerc 1.0 0.0076220
34576 Viburnum opulus Viburnum opulus species Vascular plant 7
CoverPerc 1.0 0.0076220
34576 Vicia cracca Vicia cracca species Vascular plant 6
CoverPerc 1.0 0.0076220
116032 Betula pubescens Betula pubescens species Vascular plant 6 r CoverPerc 1.0 0.0072464
116032 Carex pilulifera Carex pilulifera species Vascular plant 6
CoverPerc 2.0 0.0144928
116032 Dicranella heteromalla Dicranella heteromalla species Moss 9 r CoverPerc 1.0 0.0072464
116032 Dryopteris dilatata Dryopteris dilatata species Vascular plant 6
CoverPerc 2.0 0.0144928
116032 Kindbergia praelonga Eurhynchium praelongum species Moss 9 r CoverPerc 1.0 0.0072464
116032 Fagus sylvatica Fagus sylvatica species Vascular plant 1 2b CoverPerc 18.0 0.1304348
116032 Galeopsis tetrahit Galeopsis tetrahit species Vascular plant 6 r CoverPerc 1.0 0.0072464
116032 Pinus nigra Pinus nigra var. maritima species Vascular plant 1 4 CoverPerc 68.0 0.4927536
116032 Pinus nigra Pinus nigra var. maritima species Vascular plant 6 r CoverPerc 1.0 0.0072464
116032 Prunus serotina Prunus serotina species Vascular plant 4 2a CoverPerc 8.0 0.0579710
116032 Prunus serotina Prunus serotina species Vascular plant 6
CoverPerc 2.0 0.0144928
116032 Quercus robur Quercus robur species Vascular plant 4 2a CoverPerc 8.0 0.0579710
116032 Quercus robur Quercus robur species Vascular plant 6
CoverPerc 2.0 0.0144928
116032 Quercus rubra Quercus rubra species Vascular plant 1 2b CoverPerc 18.0 0.1304348
116032 Quercus rubra Quercus rubra species Vascular plant 6
CoverPerc 2.0 0.0144928
116032 Rubus Rubus sect. Rubus genus Vascular plant 6
CoverPerc 2.0 0.0144928
116032 Sorbus aucuparia Sorbus aucuparia species Vascular plant 6 r CoverPerc 1.0 0.0072464
947871 Ammophila arenaria Ammophila arenaria species Vascular plant 0 2 CoverPerc 1.0 0.0350877
947871 Elymus farctus Elytrigia juncea subsp. boreoatlantica lower Vascular plant 0 6 CoverPerc 27.5 0.9649123

Field List

  • PlotObservationID - Plot ID, as in header.
  • Species - Resolved species name, based on taxonomic backbone
  • Species_original - Original species name, as provided by data contributor.
  • Rank_correct - Taxonomic rank at which Species_original was matched.
  • Taxon_group - Possible entries are: Alga_Stonewort, Lichen, Moss, Vascular plant, Unknown.
  • Layer - Vegetation layer, as specified in Turboveg: 0: No layer specified, 1: Upper tree layer, 2: Middle tree layer, 3: Lower tree layer, 4: Upper shrub layer, 5: Lower shrub layer, 6: Herb layer, 7: Juvenile, 8: Seedling, 9: Moss layer.
  • Cover_code - Cover\abundance value in original data, before transformation to percentage cover.
  • Ab_scale - Abundance scale in original data. Possible values are: CoverPerc: Cover Percentage, pa: Presence absence, x_BA: Basal Area, x_IC: Individual count, x_SC: Stem count, x_IV: Relative Importance, x_RF: Relative Frequency.
  • Abundance - Abundance value, in original value, or as transformed from original Cover code to quantitative values.
  • Relative_cover - Abundance of each species after being normalized to 1 in each plot.
  • save(DT2, file = "../_output/DT_sPlot3.0.RData")

    SessionInfo

    ## R version 3.6.3 (2020-02-29)
    ## Platform: x86_64-pc-linux-gnu (64-bit)
    ## Running under: Ubuntu 16.04.7 LTS
    ## 
    ## Matrix products: default
    ## BLAS:   /usr/lib/openblas-base/libblas.so.3
    ## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
    ## 
    ## locale:
    ##  [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
    ##  [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
    ##  [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
    ##  [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
    ##  [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
    ## [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
    ## 
    ## attached base packages:
    ## [1] stats     graphics  grDevices utils     datasets  methods   base     
    ## 
    ## other attached packages:
    ##  [1] kableExtra_1.3.1 knitr_1.30       xlsx_0.6.5       forcats_0.5.0   
    ##  [5] stringr_1.4.0    dplyr_1.0.2      purrr_0.3.4      readr_1.4.0     
    ##  [9] tidyr_1.1.2      tibble_3.0.1     ggplot2_3.3.0    tidyverse_1.3.0 
    ## 
    ## loaded via a namespace (and not attached):
    ##  [1] tidyselect_1.1.0  xfun_0.19         rJava_0.9-13      haven_2.3.1      
    ##  [5] colorspace_2.0-0  vctrs_0.3.5       generics_0.1.0    viridisLite_0.3.0
    ##  [9] htmltools_0.5.0   yaml_2.2.1        utf8_1.1.4        rlang_0.4.9      
    ## [13] pillar_1.4.3      glue_1.4.2        withr_2.3.0       DBI_1.1.0        
    ## [17] dbplyr_2.0.0      modelr_0.1.6      readxl_1.3.1      lifecycle_0.2.0  
    ## [21] munsell_0.5.0     gtable_0.3.0      cellranger_1.1.0  rvest_0.3.6      
    ## [25] evaluate_0.14     fansi_0.4.1       xlsxjars_0.6.1    highr_0.8        
    ## [29] broom_0.7.0       Rcpp_1.0.5        scales_1.1.1      backports_1.2.0  
    ## [33] webshot_0.5.2     jsonlite_1.7.1    fs_1.5.0          hms_0.5.3        
    ## [37] digest_0.6.25     stringi_1.5.3     grid_3.6.3        cli_2.2.0        
    ## [41] tools_3.6.3       magrittr_2.0.1    crayon_1.3.4      pkgconfig_2.0.3  
    ## [45] ellipsis_0.3.1    xml2_1.3.2        reprex_0.3.0      lubridate_1.7.9.2
    ## [49] assertthat_0.2.1  rmarkdown_2.5     httr_1.4.2        rstudioapi_0.13  
    ## [53] R6_2.5.0          compiler_3.6.3