MEMO!! WHAT TO DO WITH LAYER WHEN IS CONSISTENTLY ZERO IN A PLOT? CHANGE TO NA? WHAT TO DO INSTEAD WHEN LAYER==0 IN A PLOT WHERE LAYER INFO IS OTHERWISE AVAILABLE? !!! ADD Explanation of fields!!!

Timestamp: Fri Mar 13 02:52:06 2020
Drafted: Francesco Maria Sabatini
Revised:
version: 1.0

This report documents the construction of the DT table for sPlot 3.0. It is based on dataset sPlot_3.0.2, received on 24/07/2019 from Stephan Hennekens.

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(readr)
library(xlsx)
library(knitr)
library(kableExtra)

#save temporary files
write("TMPDIR = /data/sPlot/users/Francesco/_tmp", file=file.path(Sys.getenv('TMPDIR'), '.Renviron'))
write("R_USER = /data/sPlot/users/Francesco/_tmp", file=file.path(Sys.getenv('R_USER'), '.Renviron'))
#rasterOptions(tmpdir="/data/sPlot/users/Francesco/_tmp")

Search and replace unclosed quotation marks and escape them. Run in Linux terminal

# escape all double quotation marks. Run in Linux terminal
# sed 's/"/\\"/g' sPlot_3_0_2_species.csv > sPlot_3_0_2_species_test.csv

Import data Table

DT table is the species x plot matrix, in long format.

DT0 <- readr::read_delim("../sPlot_data_export/sPlot_3_0_2_species_test.csv", 
                            delim="\t", 
                         col_type = cols(
                                PlotObservationID = col_double(),
                                Taxonomy = col_character(),
                                `Taxon group` = col_character(),
                                `Taxon group ID` = col_double(),
                                `Turboveg2 concept` = col_character(),
                                `Matched concept` = col_character(),
                                Match = col_double(),
                                Layer = col_double(),
                                `Cover %` = col_double(),
                                `Cover code` = col_character(),
                                x_ = col_double()
                              )
                         ) 
nplots <- length(unique(DT0$PlotObservationID))
nspecies <- length(unique(DT0$`Matched concept`))

Species data include 43103312 species * plot records, across 1978589 plots. Before taxonomic resolution, there are 107676 species .

Example of initial DT table (3 randomly selected plots shown)
PlotObservationID Taxonomy Taxon group Taxon group ID Turboveg2 concept Matched concept Match Layer Cover % Cover code x_
354857 NO-Europe_lenoir Vascular plant 1 Agrostis capillaris Agrostis capillaris 3 0 1 x NA
354857 NO-Europe_lenoir Vascular plant 1 Ammophila arenaria Ammophila arenaria 1 0 1 x NA
354857 NO-Europe_lenoir Moss 3 Bryophyta species Bryophyta species 1 0 1 x NA
354857 NO-Europe_lenoir Vascular plant 1 Carex arenaria Carex arenaria 1 0 1 x NA
354857 NO-Europe_lenoir Vascular plant 1 Festuca rubra subsp. arenaria Festuca arenaria 3 0 1 x NA
354857 NO-Europe_lenoir Vascular plant 1 Galium verum Galium verum 3 0 1 x NA
354857 NO-Europe_lenoir Vascular plant 1 Linaria vulgaris Linaria vulgaris 3 0 1 x NA
354857 NO-Europe_lenoir Vascular plant 1 Poa pratensis subsp. pratensis Poa pratensis subsp. pratensis 3 0 1 x NA
354857 NO-Europe_lenoir Vascular plant 1 Salix repens subsp. repens var. argentea Salix repens subsp. repens 2 0 1 x NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Agrostis rupestris Agrostis rupestris 3 6 3 1 NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Avenella flexuosa Avenella flexuosa 3 6 3 1 NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Avenula versicolor Helictochloa versicolor 3 6 38 3 NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Calamagrostis villosa Calamagrostis villosa 3 6 2
NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Campanula alpina Campanula alpina 3 6 3 1 NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Carex sempervirens Carex sempervirens 3 6 3 1 NA
1462431 CS-Czechia_slovakia_2015 Lichen 4 Cetraria islandica Cetraria islandica 1 9 13 2 NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Empetrum nigrum Empetrum nigrum 3 6 2
NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Geum montanum Geum montanum 3 6 2
NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Hieracium alpinum Hieracium alpinum 3 6 2
NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Homogyne alpina Homogyne alpina 3 6 3 1 NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Hypochaeris uniflora Hypochaeris uniflora 3 6 13 2 NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Persicaria vivipara Bistorta vivipara 3 6 1 r NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Potentilla aurea Potentilla aurea 3 6 2
NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Pulsatilla alpina subsp. alba auct. sudet. & carpat. Pulsatilla alpina subsp. alba 3 6 1 r NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Vaccinium gaultherioides Vaccinium uliginosum 3 6 3 1 NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Vaccinium myrtillus Vaccinium myrtillus 3 6 3 1 NA
1462431 CS-Czechia_slovakia_2015 Vascular plant 1 Vaccinium vitis-idaea Vaccinium vitis-idaea 3 6 2
NA
1585163 RU-Russia Vascular plant 1 Betula pubescens Betula pubescens 3 1 38 3 NA
1585163 RU-Russia Vascular plant 1 Betula pubescens Betula pubescens 3 2 38 3 NA
1585163 RU-Russia Vascular plant 1 Calamagrostis arundinacea Calamagrostis arundinacea 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Carex rhizina Carex pediformis subsp. rhizodes 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Chelidonium majus Chelidonium majus 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Convallaria majalis Convallaria majalis 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Dryopteris carthusiana Dryopteris carthusiana 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Galium mollugo Galium mollugo 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Hypericum perforatum Hypericum perforatum 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Melampyrum pratense Melampyrum pratense 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Moehringia trinervia Moehringia trinervia 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Oxalis acetosella Oxalis acetosella 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Picea obovata Picea obovata 3 1 68 4 NA
1585163 RU-Russia Vascular plant 1 Picea obovata Picea obovata 3 2 68 4 NA
1585163 RU-Russia Vascular plant 1 Pinus sylvestris Pinus sylvestris 3 1 3 1 NA
1585163 RU-Russia Vascular plant 1 Pinus sylvestris Pinus sylvestris 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Rubus idaeus Rubus idaeus 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Rumex acetosella Rumex acetosella 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Sambucus racemosa Sambucus racemosa 3 4 3 1 NA
1585163 RU-Russia Vascular plant 1 Solidago virgaurea Solidago virgaurea 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Sorbus aucuparia Sorbus aucuparia 3 4 3 1 NA
1585163 RU-Russia Vascular plant 1 Stellaria media Stellaria media 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Vaccinium myrtillus Vaccinium myrtillus 3 6 3 1 NA
1585163 RU-Russia Vascular plant 1 Veronica officinalis Veronica officinalis 3 6 3 1 NA

Match species names from DT0 to those in Backbone

Import taxonomic backbone

load("../_output/Backbone3.0.RData")

Match to DT0, using Taxonomic concept as matching key. This is the field that was used to build, and resolve, the Backbone.

DT1 <- DT0 %>% 
  left_join(Backbone %>% 
              dplyr::select(Name_sPlot_TRY, Name_short, `Taxon group`, Rank_correct) %>%
              rename(`Matched concept`=Name_sPlot_TRY,
                     Taxongroup_BB=`Taxon group`), 
            by="Matched concept") %>% 
  # Simplify Rank_correct
  mutate(Rank_correct=fct_collapse(Rank_correct, 
                                   lower=c("subspecies", "variety", "infraspecies", "race", "forma"))) %>% 
  mutate(Rank_correct=fct_explicit_na(Rank_correct, "No_match")) %>% 
  mutate(Name_short=replace(Name_short, 
                            list=Name_short=="No suitable", 
                            values=NA))

Explore name matching based on Backbone v1.2

Select species entries that changed after taxonomic standardization, as a way to check the backbone.

name.check <- DT1 %>% 
  dplyr::select(`Turboveg2 concept`:`Matched concept`, Name_short) %>% 
  rename(Name_TNRS=Name_short) %>% 
  distinct() %>% 
  mutate(Matched_short=word(`Matched concept`, start = 1L, end=2L)) %>% 
  filter(is.na(Name_TNRS) | Matched_short != Name_TNRS) %>%
  dplyr::select(-Matched_short) %>% 
  arrange(Name_TNRS)
Check 30 random species names from DT that changed name after matching to backbone
Turboveg2 concept Matched concept Name_TNRS
Cyclotella comta Cyclotella comta Cyclotella
Lespedeza species #2 Lespedeza species #2 Lespedeza
Alopecurus x brachystylus Alopecurus x brachystylus Alopecurus ×
Verbascum leptocladum Verbascum leptocladum Verbascum glabratum
Anisomeridium polypori Anisomeridium nyssaegenum Anisomeridium nyssigenum
Klasea boetica subsp. lusitanica Klasea boetica subsp. lusitanica Klasea baetica
Arenaria armeniaca Arenaria armeniaca Eremogone armeniaca
Lauraceae species [FB 87] Lauraceae species [FB 87] Lauraceae
Lobelia species #5 Lobelia species #5 Lobelia
Petrocoptis pyrenaica subsp. pyrenaica Petrocoptis pyrenaica subsp. pyrenaica Silene glaucifolia
Plinia species Plinia species Plinia
Stereocaulon species Stereocaulon species Stereocaulon
Gesneria acuminata Gesneria acuminata Gesneria humilis
Tephrocactus articulatus Tephrocactus articulatus Opuntia articulata
Tulostoma brumale Tulostoma brumale NA
Fontinalis species Fontinalis species Fontinalis
Michelia shiluensis Michelia shiluensis Magnolia shiluensis
Calamus aff. Calamus aff. Calamus
Lespedeza daurica Lespedeza daurica Lespedeza davurica
Arrabidaea truncata Arrabidaea truncata Fridericia truncata
Launaea acanthoclada Launaea acanthoclada Launaea lanifera
Jagera pseudorhus var. pseudorhus Jagera pseudorhus var. pseudorhus Cupania pseudorhus
Alliaria_petiolata species Alliaria_petiolata species Alliaria petiolata
Hevea cf. guianensis Hevea cf. guianensis Hevea guianensis
Riccia beirichiana Riccia beirichiana Riccia beyrichiana
Thesium billardieri Thesium billardieri Thesium billardierei
Hugonia sp. Hugonia sp. Hugonia
Coreopsis falcata Coreopsis falcata Coreopsis gladiata
Vismia species [CMG 3029] Vismia species [CMG 3029] Vismia
Bufonia mauritanica Bufonia mauritanica Bufonia perennis

Check the most common species names from DT after matching to backbone

name.check.freq <- DT1 %>% 
  dplyr::select(`Turboveg2 concept`:`Matched concept`, Name_short) %>% 
  rename(Name_TNRS=Name_short) %>% 
  group_by(`Turboveg2 concept`, `Matched concept`, Name_TNRS) %>% 
  summarize(n=n()) %>% 
  mutate(Matched_short=word(`Matched concept`, start = 1L, end=2L)) %>% 
  filter(is.na(Name_TNRS) | Matched_short != Name_TNRS) %>%
  dplyr::select(-Matched_short) %>% 
  ungroup() %>% 
  arrange(desc(n)) 
Check 40 most common species names from DT that changed name after matching to backbone
Turboveg2 concept Matched concept Name_TNRS n
Deschampsia flexuosa Avenella flexuosa Deschampsia flexuosa 126515
Festuca pratensis Schedonorus pratensis Festuca pratensis 84008
Elymus repens Elytrigia repens Elymus repens 82891
Phalaris arundinacea Phalaroides arundinacea Phalaris arundinacea 75296
Bryophyta species Bryophyta species NA 74393
Poa annua Ochlopoa annua Poa annua 67460
Potentilla anserina Argentina anserina Potentilla anserina 63786
Taraxacum sect. Ruderalia Taraxacum sect. Taraxacum Taraxacum 58429
Taraxacum species Taraxacum species Taraxacum 57167
Cornus sanguinea Cornus sanguinea Cornus controversa 52651
Elytrigia repens Elytrigia repens Elymus repens 51670
Taraxacum officinale Taraxacum sect. Taraxacum Taraxacum 50502
Weinmannia racemosa Weinmannia racemosa Leiospermum racemosum 38269
Bromus erectus Bromopsis erecta Bromus erectus 33765
Cladonia species Cladonia species Cladonia 32464
Avenella flexuosa Avenella flexuosa Deschampsia flexuosa 30787
Rubus sect. Rubus Rubus sect. Rubus Rubus 28684
Festuca arundinacea Schedonorus arundinaceus Festuca arundinacea 26124
Trientalis europaea Trientalis europaea Lysimachia europaea 25940
Rubus fruticosus aggr. Rubus fruticosus aggr. Rubus vestitus 23669
Glaux maritima Glaux maritima Lysimachia maritima 23306
Taraxacum officinale aggr. Taraxacum sect. Taraxacum Taraxacum 22837
Rubus species Rubus species Rubus 22098
Festuca gigantea Schedonorus giganteus Festuca gigantea 20917
Taraxacum sectie Ruderalia Taraxacum sect. Taraxacum Taraxacum 20888
Lophozonia menziesii Lophozonia menziesii Lophozonia 20249
Juncus gerardi Juncus gerardi Juncus gerardii 19094
Sphagnum species Sphagnum species Sphagnum 18293
Festuca rupicola Festuca stricta subsp. sulcata Festuca rupicola 18010
Rosa species Rosa species Rosa 16657
Podocarpus laetus Podocarpus laetus Podocarpus spinulosus 16356
Bromus tectorum Anisantha tectorum Bromus tectorum 16305
Carex species Carex species Carex 15744
Ripogonum scandens Ripogonum scandens Rhipogonum 14984
Rubus hirtus Rubus hirtus aggr. Rubus proiectus 14191
Avenula pubescens Avenula pubescens Helictotrichon pubescens 13490
Notogrammitis billardierei Notogrammitis billardierei NA 13117
Crataegus species Crataegus species Crataegus 13072
Helictotrichon pubescens Avenula pubescens Helictotrichon pubescens 12941
Erophila verna Draba verna Erophila verna 12646

Complete field taxon group

Taxon group information is only available for 35708898 entries, but absent for 7394414. To improve the completeness of this field, we derive additional info from the Backbone, and merge it with the data already present in DT.

table(DT1$`Taxon group`, exclude=NULL)
## 
##           Alga         Lichen           Moss       Mushroom      Stonewort 
##           9497         324080        2035007            513          12166 
##        Unknown Vascular plant 
##        7394414       33327635
DT1 <- DT1 %>% 
  mutate(`Taxon group`=ifelse(`Taxon group`=="Unknown", NA, `Taxon group`)) %>% 
  mutate(Taxongroup_BB=ifelse(Taxongroup_BB=="Unknown", NA, Taxongroup_BB)) %>% 
  mutate(`Taxon group`=coalesce(`Taxon group`, Taxongroup_BB)) %>% 
  dplyr::select(-Taxongroup_BB)


table(DT1$`Taxon group`, exclude=NULL)
## 
##           Alga         Lichen           Moss       Mushroom      Stonewort 
##           9991         366997        2090994            513          12166 
## Vascular plant           <NA> 
##       40532073          90578

Those taxa for which a measuress of Basal Area exists can be safely assumed to belong to vascular plants

DT1 <- DT1 %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=`Cover code`=="x_BA", 
                               values="Vascular plant"))

Cross-complement Taxon group information. This means that, whenever a taxon is marked to belong to one group, then assign the same taxon to that group throughout the DT table.

DT1 <- DT1 %>% 
  left_join(DT1 %>% 
              filter(!is.na(Name_short)) %>% 
              filter(`Taxon group` != "Unknown") %>% 
              dplyr::select(Name_short, `Taxon group`) %>% 
              distinct(Name_short, .keep_all=T) %>% 
              rename(TaxonGroup_compl=`Taxon group`),
            by="Name_short") %>% 
  mutate(`Taxon group`=coalesce(`Taxon group`, TaxonGroup_compl)) %>% 
  dplyr::select(-TaxonGroup_compl)

table(DT1$`Taxon group`, exclude=NULL)
## 
##           Alga         Lichen           Moss       Mushroom      Stonewort 
##           9994         367586        2100627            513          12166 
## Vascular plant           <NA> 
##       40533651          78775

Check species with conflicting Taxon group information and fix manually.

#check for conflicts in attribution of genera to Taxon groups
DT1 %>% 
  filter(!is.na(Name_short)) %>% 
  filter(!is.na(`Taxon group`)) %>% 
  distinct(Name_short, `Taxon group`) %>% 
  mutate(Genus=word(Name_short,1)) %>% 
  dplyr::select(Genus, `Taxon group`) %>% 
  distinct() %>% 
  group_by(Genus) %>% 
  summarize(n=n()) %>% 
  filter(n>1) %>% 
  arrange(desc(n))
## # A tibble: 16 x 2
##    Genus                 n
##    <chr>             <int>
##  1 Brachytheciastrum     2
##  2 Brachythecium         2
##  3 Chara                 2
##  4 Characeae             2
##  5 Hepatica              2
##  6 Hypericum             2
##  7 Hypnum                2
##  8 Lamprothamnus         2
##  9 Leptorhaphis          2
## 10 Lychnothamnus         2
## 11 Nitella               2
## 12 Oxymitra              2
## 13 Pancovia              2
## 14 Peltaria              2
## 15 Tonina                2
## 16 Zygodon               2

Manually fix some known problems in Taxon group attribution. Some lists of taxa (e.g., lichen.genera, mushroom.genera) were defined when building the Backbone.

#Attach genus info
DT1 <- DT1 %>% 
    left_join(Backbone %>% 
              dplyr::select(Name_sPlot_TRY, Name_short) %>%
              mutate(Genus=word(Name_short, 1, 1)) %>%
              dplyr::select(-Name_short) %>% 
              rename(`Matched concept`=Name_sPlot_TRY),
            by="Matched concept") %>% 
    mutate(`Taxon group`=fct_collapse(`Taxon group`, 
                                    Alga_Stonewort=c("Alga", "Stonewort")))
#manually fix some known problems
mosses.gen    <- c("Hypnum", "Brachytheciastrum","Brachythecium","Hypnum",  
                  "Zygodon", "Oxymitra", "Bryophyta", "Musci", '\\\"Moos\\\"')
vascular.gen  <- c("Polystichum", "Hypericum", "Peltaria", "Pancovia", "Calythrix", "Ripogonum",
                  "Notogrammitis", "Fuscospora", "Lophozonia",  "Rostellularia", 
                  "Hesperostipa", "Microsorium", "Angiosperm","Dicotyledonae", "Spermatophy")
alga.gen      <- c("Chara", "Characeae", "Tonina", "Nostoc", "Entermorpha", "Hydrocoleum" )
 
DT1 <- DT1 %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% mosses.gen, 
                               values="Moss")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% vascular.gen, 
                               values="Vascular plant")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% alga.gen, 
                               values="Alga_Stonewort")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% c(lichen.genera, "Lichenes"),
                               values="Lichen")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% mushroom, 
                               values="Mushroom"))
  
table(DT1$`Taxon group`, exclude=NULL)
## 
## Alga_Stonewort         Lichen           Moss       Mushroom Vascular plant 
##          23071         367587        2100767            513       40535429 
##           <NA> 
##          75945

Delete all records of fungi, and use lists of genera to fix additional problems. While in the previous round the matching was done on the resolved Genus name, here the match is based on unresolved Genus names.

DT1 <- DT1 %>% 
  dplyr::select(-Genus) %>% 
  left_join(DT1 %>% 
              distinct(`Matched concept`) %>% 
              mutate(Genus=word(`Matched concept`, 1)), 
            by="Matched concept") %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                                 list=Genus %in% mushroom, 
                                 values = "Mushroom")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% lichen.genera, 
                               values="Lichen")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% mosses.gen, 
                               values="Moss")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% vascular.gen, 
                               values="Vascular plant")) %>% 
  mutate(`Taxon group` = fct_explicit_na(`Taxon group`, "Unknown")) %>% 
  filter(`Taxon group`!="Mushroom") %>%
  mutate(`Taxon group`=factor(`Taxon group`))
  #dplyr::select(-Genus)

table(DT1$`Taxon group`, exclude=NULL)
## 
## Alga_Stonewort         Lichen           Moss Vascular plant        Unknown 
##          23071         367933        2103429       40572721          35721

After cross-checking all sources of information, the number of taxa not having Taxon group information decreased to 35721 entries

Calculate relative cover per layer per species in each plot

Species abundance information varies across datasets and plots. While for the large majority of plots abundance values are returned as percentage cover, there is a subset where abundance is returned with different scales. These are marked in the column Cover code as follows: x_BA - Basal Area
x_IC - Individual count
x_SC - Stem count
x_IV - Relative Importance
x_RF - Relative Frequency
x - Presence absence
Still, it’s not really intuitive that in case Cover code belongs to one of the classes above, then the actual abundance value is stored in the x_ column. This stems from the way this data is stored in TURBOVEG.
To make the cover data more user friendly, I simplify the way cover is stored, so that there are only two columns:
Ab_scale - to report the type of scale used
Abundance - to coalesce the cover\abundance values previously in the columns Cover % and x_.

# Create Ab_scale field
DT1 <- DT1 %>% 
  mutate(Ab_scale = ifelse(`Cover code` %in% 
                             c("x_BA", "x_IC", "x_SC", "x_IV", "x_RF") & !is.na(x_), 
                           `Cover code`, 
                           "CoverPerc"))  

Fix some errors. There are some plots where all species have zeros in the field Cover %. Some of them are marked as p\a (Cover code=="x"), but other not. Consider all this plots as presence\absence and transform Cover % to 1.
!! There are some other plots having layers with all zeros. This should be double-checked, but are not being transformed here !!

allzeroes <- DT1 %>% 
  group_by(PlotObservationID) %>% 
  summarize(allzero=all(`Cover %`==0) ) %>% 
  filter(allzero==T) %>% 
  pull(PlotObservationID)
DT1 <- DT1 %>% 
  mutate(`Cover %`=replace(`Cover %`, 
                           list=(PlotObservationID %in% allzeroes), 
                           values=1)) %>% 
  mutate(`Cover code`=replace(`Cover code`, 
                           list=(PlotObservationID %in% allzeroes), 
                           values="x"))

Consider all plot-layer combinations where Cover code=="x", and all the entries of the field Cover % == 1 as presence\absence data, and transform Ab_scale to “pa”. This is done to avoid confusion with plots where Cover code=="x" but “x” has to be intended as a class in the cover scale used. For p\a plots, replace the field Cover % with NA, and assign the value 1 to the field x_.

#plots with at least one entry in Cover code=="x"
sel <- DT1 %>% 
  filter(`Cover code`=="x") %>% 
  distinct(PlotObservationID) %>% 
  pull(PlotObservationID)

DT1 <- DT1 %>% 
  left_join(DT1 %>%
              filter(PlotObservationID %in% sel) %>% 
              group_by(PlotObservationID, Layer) %>% 
              mutate(to.pa= all(`Cover %`==1 & `Cover code`=="x")) %>% 
              distinct(PlotObservationID, Layer, to.pa), 
            by=c("PlotObservationID", "Layer")) %>% 
  replace_na(list(to.pa=F)) %>% 
  mutate(Ab_scale=ifelse(to.pa==T, "pa", Ab_scale)) %>% 
  mutate(`Cover %`=ifelse(to.pa==T, NA, `Cover %`)) %>% 
  mutate(x_=ifelse(to.pa==T, 1, x_)) %>% 
  dplyr::select(-to.pa)

There are also some plots having different cover scales in the same layer. They are not many, and I will reduce their cover value to p\a.
Find these plots first:

mixed <- DT1 %>% 
  distinct(PlotObservationID, Ab_scale, Layer) %>% 
  group_by(PlotObservationID, Layer) %>% 
  summarize(n=n()) %>% 
  filter(n>1) %>% 
  pull(PlotObservationID) %>% 
  unique()
length(mixed)
## [1] 335

Transform these plots to p\a and correct field Ab_scale. Note: the column Abundance is only created here.

DT1 <- DT1 %>% 
  mutate(Ab_scale=replace(Ab_scale, 
                           list=PlotObservationID %in% mixed, 
                           values="mixed")) %>%
  mutate(`Cover %`=replace(`Cover %`, 
                           list=Ab_scale=="mixed",
                           values=NA)) %>% 
  mutate(x_=replace(x_,  list=Ab_scale=="mixed", values=1)) %>% 
  mutate(Ab_scale=replace(Ab_scale, list=Ab_scale=="mixed", values="pa")) %>% 
  #Create additional field Abundance to avoid overwriting original data
  mutate(Abundance =ifelse(Ab_scale %in% c("x_BA", "x_IC", "x_SC", "x_IV", "x_RF", "pa"), 
                          x_, `Cover %`)) %>% 
  mutate(Abundance=replace(Abundance, 
                           list=PlotObservationID %in% mixed, 
                           values=1))

Double check and summarize Ab_scales

scale_check <- DT1 %>% 
  distinct(PlotObservationID, Layer, Ab_scale) %>% 
  group_by(PlotObservationID) %>% 
  summarise(Ab_scale_combined=ifelse(length(unique(Ab_scale))==1, 
                                     unique(Ab_scale), 
                                     "Multiple_scales"))

nrow(scale_check)== length(unique(DT1$PlotObservationID))
## [1] TRUE
table(scale_check$Ab_scale_combined)
## 
##       CoverPerc Multiple_scales              pa            x_BA            x_IC 
##         1691454            2084          271057            6293            2092 
##            x_IV            x_RF            x_SC 
##             146             585            4878

Transform abundances to relative abundance. For consistency with the previous version of sPlot, this field is called Relative cover.
Watch out - Even plots with p\a information are transformed to relative cover.

DT1 <- DT1 %>% 
  left_join(x=., 
            y={.} %>%
              group_by(PlotObservationID) %>% 
              summarize(tot.abundance=sum(Abundance)), 
            by=c("PlotObservationID")) %>% 
  mutate(Relative.cover=Abundance/tot.abundance)

# check: there should be no plot where the sum of all relative covers !=0
DT1 %>% 
  group_by(PlotObservationID) %>% 
  summarize(tot.cover=sum(Relative.cover), 
            num.layers=sum(unique(Layer))) %>% 
  filter(tot.cover != num.layers) %>% 
  nrow()
## [1] 1958816

Clean DT and export

DT2 <- DT1 %>% 
  dplyr::select(PlotObservationID, Name_short, `Turboveg2 concept`, Rank_correct, `Taxon group`, Layer:x_, Ab_scale, Abundance, Relative.cover ) %>% 
  rename(species_original=`Turboveg2 concept`, 
         species=Name_short,
         taxon_group=`Taxon group`, 
         cover_perc=`Cover %`, 
         cover_code=`Cover code`)

The output of the DT table contains 43102875 records, over 1978589 plots. The total number of taxa is 116256 and 76912, before and after standardization, respectively. Information on the Taxon group is available for 76548 standardized species.

Example of initial DT table (same 3 randomly selected plots shown above)
PlotObservationID species species_original Rank_correct taxon_group Layer cover_perc cover_code x_ Ab_scale Abundance Relative.cover
354857 Agrostis capillaris Agrostis capillaris species Vascular plant 0 NA x 1 pa 1 0.1111111
354857 Ammophila arenaria Ammophila arenaria species Vascular plant 0 NA x 1 pa 1 0.1111111
354857 NA Bryophyta species higher Moss 0 NA x 1 pa 1 0.1111111
354857 Carex arenaria Carex arenaria species Vascular plant 0 NA x 1 pa 1 0.1111111
354857 Festuca vaginata Festuca rubra subsp. arenaria species Vascular plant 0 NA x 1 pa 1 0.1111111
354857 Galium verum Galium verum species Vascular plant 0 NA x 1 pa 1 0.1111111
354857 Linaria vulgaris Linaria vulgaris species Vascular plant 0 NA x 1 pa 1 0.1111111
354857 Poa pratensis Poa pratensis subsp. pratensis lower Vascular plant 0 NA x 1 pa 1 0.1111111
354857 Salix Salix repens subsp. repens var. argentea genus Vascular plant 0 NA x 1 pa 1 0.1111111
1462431 Agrostis rupestris Agrostis rupestris species Vascular plant 6 3 1 NA CoverPerc 3 0.0303030
1462431 Deschampsia flexuosa Avenella flexuosa species Vascular plant 6 3 1 NA CoverPerc 3 0.0303030
1462431 Helictochloa versicolor Avenula versicolor species Vascular plant 6 38 3 NA CoverPerc 38 0.3838384
1462431 Calamagrostis villosa Calamagrostis villosa species Vascular plant 6 2
NA CoverPerc 2 0.0202020
1462431 Campanula alpina Campanula alpina species Vascular plant 6 3 1 NA CoverPerc 3 0.0303030
1462431 Carex sempervirens Carex sempervirens species Vascular plant 6 3 1 NA CoverPerc 3 0.0303030
1462431 Cetraria islandica Cetraria islandica species Lichen 9 13 2 NA CoverPerc 13 0.1313131
1462431 Empetrum nigrum Empetrum nigrum species Vascular plant 6 2
NA CoverPerc 2 0.0202020
1462431 Geum montanum Geum montanum species Vascular plant 6 2
NA CoverPerc 2 0.0202020
1462431 Hieracium alpinum Hieracium alpinum species Vascular plant 6 2
NA CoverPerc 2 0.0202020
1462431 Homogyne alpina Homogyne alpina species Vascular plant 6 3 1 NA CoverPerc 3 0.0303030
1462431 Hypochaeris uniflora Hypochaeris uniflora species Vascular plant 6 13 2 NA CoverPerc 13 0.1313131
1462431 Persicaria vivipara Persicaria vivipara species Vascular plant 6 1 r NA CoverPerc 1 0.0101010
1462431 Potentilla aurea Potentilla aurea species Vascular plant 6 2
NA CoverPerc 2 0.0202020
1462431 Anemone scherfelii Pulsatilla alpina subsp. alba auct. sudet. & carpat. lower Vascular plant 6 1 r NA CoverPerc 1 0.0101010
1462431 Vaccinium uliginosum Vaccinium gaultherioides species Vascular plant 6 3 1 NA CoverPerc 3 0.0303030
1462431 Vaccinium myrtillus Vaccinium myrtillus species Vascular plant 6 3 1 NA CoverPerc 3 0.0303030
1462431 Vaccinium vitis-idaea Vaccinium vitis-idaea species Vascular plant 6 2
NA CoverPerc 2 0.0202020
1585163 Betula pubescens Betula pubescens species Vascular plant 1 38 3 NA CoverPerc 38 0.1397059
1585163 Betula pubescens Betula pubescens species Vascular plant 2 38 3 NA CoverPerc 38 0.1397059
1585163 Calamagrostis arundinacea Calamagrostis arundinacea species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Carex rhizina Carex rhizina lower Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Chelidonium majus Chelidonium majus species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Convallaria majalis Convallaria majalis species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Dryopteris carthusiana Dryopteris carthusiana species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Galium mollugo Galium mollugo species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Hypericum perforatum Hypericum perforatum species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Melampyrum pratense Melampyrum pratense species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Moehringia trinervia Moehringia trinervia species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Oxalis acetosella Oxalis acetosella species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Picea obovata Picea obovata species Vascular plant 1 68 4 NA CoverPerc 68 0.2500000
1585163 Picea obovata Picea obovata species Vascular plant 2 68 4 NA CoverPerc 68 0.2500000
1585163 Pinus sylvestris Pinus sylvestris species Vascular plant 1 3 1 NA CoverPerc 3 0.0110294
1585163 Pinus sylvestris Pinus sylvestris species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Rubus idaeus Rubus idaeus species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Rumex acetosella Rumex acetosella species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Sambucus racemosa Sambucus racemosa species Vascular plant 4 3 1 NA CoverPerc 3 0.0110294
1585163 Solidago virgaurea Solidago virgaurea species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Sorbus aucuparia Sorbus aucuparia species Vascular plant 4 3 1 NA CoverPerc 3 0.0110294
1585163 Stellaria media Stellaria media species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Vaccinium myrtillus Vaccinium myrtillus species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
1585163 Veronica officinalis Veronica officinalis species Vascular plant 6 3 1 NA CoverPerc 3 0.0110294
save(DT2, file = "../_output/DT_sPlot3.0.RData")

SessionInfo

## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
##  [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
##  [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
##  [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
##  [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
## [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] kableExtra_1.1.0 knitr_1.28       xlsx_0.6.3       forcats_0.5.0   
##  [5] stringr_1.4.0    dplyr_0.8.5      purrr_0.3.3      readr_1.3.1     
##  [9] tidyr_1.0.2      tibble_2.1.3     ggplot2_3.3.0    tidyverse_1.3.0 
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.0.0  xfun_0.12         rJava_0.9-11      haven_2.2.0      
##  [5] lattice_0.20-40   colorspace_1.4-1  vctrs_0.2.3       generics_0.0.2   
##  [9] viridisLite_0.3.0 htmltools_0.4.0   yaml_2.2.1        utf8_1.1.4       
## [13] rlang_0.4.4       pillar_1.4.2      glue_1.3.1        withr_2.1.2      
## [17] DBI_1.1.0         dbplyr_1.4.2      modelr_0.1.6      readxl_1.3.1     
## [21] lifecycle_0.2.0   munsell_0.5.0     gtable_0.3.0      cellranger_1.1.0 
## [25] rvest_0.3.5       evaluate_0.14     xlsxjars_0.6.1    fansi_0.4.1      
## [29] highr_0.8         broom_0.5.5       Rcpp_1.0.3        scales_1.1.0     
## [33] backports_1.1.5   webshot_0.5.2     jsonlite_1.6.1    fs_1.3.2         
## [37] hms_0.5.3         digest_0.6.23     stringi_1.4.6     grid_3.6.3       
## [41] cli_2.0.2         tools_3.6.3       magrittr_1.5      crayon_1.3.4     
## [45] pkgconfig_2.0.3   ellipsis_0.3.0    xml2_1.2.2        reprex_0.3.0     
## [49] lubridate_1.7.4   assertthat_0.2.1  rmarkdown_2.1     httr_1.4.1       
## [53] rstudioapi_0.11   R6_2.4.1          nlme_3.1-145      compiler_3.6.3