Timestamp: Thu Mar 25 09:23:03 2021
Drafted: Francesco Maria Sabatini
Revised: Helge Bruelheide
Version: 1.1

This report documents the construction of the DT table for sPlot 3.0. It is based on dataset sPlot_3.0.2, received on 24/07/2019 from Stephan Hennekens.

Caution: Layer information is not available for all species in each plot. In case of missing information Layer is set to zero.

Changes in version 1.1
1) Added explanation of fields
2) Fixed taxon_group of Friesodielsia
3) Only export the fields Ab_scale and Abundance

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(readr)
library(xlsx)
library(knitr)
library(kableExtra)

#save temporary files
write("TMPDIR = /data/sPlot/users/Francesco/_tmp", file=file.path(Sys.getenv('TMPDIR'), '.Renviron'))
write("R_USER = /data/sPlot/users/Francesco/_tmp", file=file.path(Sys.getenv('R_USER'), '.Renviron'))
#rasterOptions(tmpdir="/data/sPlot/users/Francesco/_tmp")

Search and replace unclosed quotation marks and escape them. Run in Linux terminal

# escape all double quotation marks. Run in Linux terminal
# sed 's/"/\\"/g' sPlot_3_0_2_species.csv > sPlot_3_0_2_species_test.csv

Import data Table

DT table is the species x plot matrix, in long format.

DT0 <- readr::read_delim("../sPlot_data_export/sPlot_3_0_2_species_test.csv", 
                            delim="\t", 
                         col_type = cols(
                                PlotObservationID = col_double(),
                                Taxonomy = col_character(),
                                `Taxon group` = col_character(),
                                `Taxon group ID` = col_double(),
                                `Turboveg2 concept` = col_character(),
                                `Matched concept` = col_character(),
                                Match = col_double(),
                                Layer = col_double(),
                                `Cover %` = col_double(),
                                `Cover code` = col_character(),
                                x_ = col_double()
                              )
                         ) 

Match plots with those in header

load("../_output/header_sPlot3.0.RData")
DT0 <- DT0 %>% 
  filter(PlotObservationID %in% unique(header$PlotObservationID))

nplots <- length(unique(DT0$PlotObservationID))
nspecies <- length(unique(DT0$`Matched concept`))
# Plots in header but not in DT
empty.plots <- header %>% 
  filter(!PlotObservationID %in% unique(DT0$PlotObservationID)) %>% 
  pull(PlotObservationID)

The DT table includes 43093474 species * plot records, across 1977540 plots. Before taxonomic resolution, there are 107676 species. There are 97. These are plots where the only species reported in Turboveg 3 are not identified (and not in the taxonomic list). Should these be deleted from header?

Example of initial DT table (3 randomly selected plots shown)
PlotObservationID Taxonomy Taxon group Taxon group ID Turboveg2 concept Matched concept Match Layer Cover % Cover code x_
532404 EU-Europe Vascular plant 1 Amaranthus lividus Amaranthus blitum 3 6 13.0 2 NA
532404 EU-Europe Vascular plant 1 Amaranthus powellii Amaranthus powellii 3 6 3.0 1 NA
532404 EU-Europe Vascular plant 1 Amaranthus retroflexus Amaranthus retroflexus 3 6 13.0 2 NA
532404 EU-Europe Vascular plant 1 Brassica rapa Brassica rapa 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Calystegia sepium Calystegia sepium 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Capsella bursa-pastoris Capsella bursa-pastoris 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Chamomilla recutita Matricaria chamomilla 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Chenopodium album Chenopodium album 3 6 3.0 1 NA
532404 EU-Europe Vascular plant 1 Chenopodium ficifolium Chenopodium ficifolium 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Chenopodium polyspermum Lipandra polysperma 3 6 3.0 1 NA
532404 EU-Europe Vascular plant 1 Cirsium arvense Cirsium arvense 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Convolvulus arvensis Convolvulus arvensis 3 6 13.0 2 NA
532404 EU-Europe Vascular plant 1 Digitaria sanguinalis Digitaria sanguinalis 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Echinochloa crus-galli Echinochloa crus-galli 3 6 3.0 1 NA
532404 EU-Europe Vascular plant 1 Galinsoga ciliata Galinsoga quadriradiata 3 6 3.0 1 NA
532404 EU-Europe Vascular plant 1 Galinsoga parviflora Galinsoga parviflora 3 6 38.0 3 NA
532404 EU-Europe Vascular plant 1 Geranium dissectum Geranium dissectum 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Lamium purpureum Lamium purpureum 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Lolium perenne Lolium perenne 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Phacelia tanacetifolia Phacelia tanacetifolia 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Polygonum lapathifolium Persicaria lapathifolia 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Polygonum persicaria Persicaria maculosa 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Setaria pumila Setaria pumila 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Stachys arvensis Stachys arvensis 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Stellaria media Stellaria media 3 6 3.0 1 NA
532404 EU-Europe Vascular plant 1 Taraxacum officinale Taraxacum sect. Taraxacum 3 6 2.0
NA
532404 EU-Europe Vascular plant 1 Veronica persica Veronica persica 3 6 2.0
NA
1648095 RU-Russia Vascular plant 1 Acer campestre Acer campestre 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Acer platanoides Acer platanoides 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Acer pseudoplatanus Acer pseudoplatanus 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Aegonychon purpureocaeruleum Aegonychon purpurocaeruleum 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Anemonoides nemorosa Anemone nemorosa 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Anemonoides ranunculoides Anemone ranunculoides 3 6 10.0 10 NA
1648095 RU-Russia Vascular plant 1 Asarum europaeum Asarum europaeum 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Brachypodium sylvaticum Brachypodium sylvaticum 3 6 0.1 .1 NA
1648095 RU-Russia Moss 3 Brachythecium velutinum Brachytheciastrum velutinum 1 9 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Campanula rapunculoides Campanula rapunculoides 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Carex contigua Carex spicata 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Cerasus avium Prunus avium 3 1 12.0 12 NA
1648095 RU-Russia Vascular plant 1 Cerasus avium Prunus avium 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Cornus mas Cornus mas 3 4 13.0 13 NA
1648095 RU-Russia Vascular plant 1 Cornus mas Cornus mas 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Corylus avellana Corylus avellana 3 4 13.0 13 NA
1648095 RU-Russia Vascular plant 1 Corylus avellana Corylus avellana 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Crataegus curvisepala Crataegus rhipidophylla 3 4 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Euonymus europaea Euonymus europaeus 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Euonymus verrucosa Euonymus verrucosus 3 4 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Euonymus verrucosa Euonymus verrucosus 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Fraxinus excelsior Fraxinus excelsior 3 1 57.0 57 NA
1648095 RU-Russia Vascular plant 1 Fraxinus excelsior Fraxinus excelsior 3 4 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Fraxinus excelsior Fraxinus excelsior 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Gagea lutea Gagea lutea 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Galeobdolon luteum Lamium galeobdolon subsp. galeobdolon 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Lilium martagon Lilium martagon 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Malus sylvestris Malus sylvestris 3 4 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Polygonatum multiflorum Polygonatum multiflorum 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Primula veris Primula veris 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Pulmonaria obscura Pulmonaria obscura 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Quercus robur Quercus robur 3 1 8.0 8 NA
1648095 RU-Russia Vascular plant 1 Quercus robur Quercus robur 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Scilla bifolia Scilla bifolia 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Viburnum lantana Viburnum lantana 3 4 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Viburnum lantana Viburnum lantana 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Viola hirta Viola hirta 3 6 0.1 .1 NA
1648095 RU-Russia Vascular plant 1 Viola mirabilis Viola mirabilis 3 6 6.0 6 NA
1648095 RU-Russia Vascular plant 1 Viola odorata Viola odorata 3 6 20.0 20 NA
1839189 NSW_Austalia Unknown 0 Acacia aneura Acacia aneura 0 0 1.0 x NA

Match species names from DT0 to those in Backbone

Import taxonomic backbone

load("../_output/Backbone3.0.RData")

Match to DT0, using Taxonomic concept as matching key. This is the field that was used to build, and resolve, the Backbone.

DT1 <- DT0 %>% 
  left_join(Backbone %>% 
              dplyr::select(Name_sPlot_TRY, Name_short, `Taxon group`, Rank_correct) %>%
              rename(`Matched concept`=Name_sPlot_TRY,
                     Taxongroup_BB=`Taxon group`), 
            by="Matched concept") %>% 
  # Simplify Rank_correct
  mutate(Rank_correct=fct_collapse(Rank_correct, 
                                   lower=c("subspecies", "variety", "infraspecies", "race", "forma"))) %>% 
  mutate(Rank_correct=fct_explicit_na(Rank_correct, "No_match")) %>% 
  mutate(Name_short=replace(Name_short, 
                            list=Name_short=="No suitable", 
                            values=NA))

Explore name matching based on Backbone v1.2

Select species entries that changed after taxonomic standardization, as a way to check the backbone.

name.check <- DT1 %>% 
  dplyr::select(`Turboveg2 concept`:`Matched concept`, Name_short) %>% 
  rename(Name_TNRS=Name_short) %>% 
  distinct() %>% 
  mutate(Matched_short=word(`Matched concept`, start = 1L, end=2L)) %>% 
  filter(is.na(Name_TNRS) | Matched_short != Name_TNRS) %>%
  dplyr::select(-Matched_short) %>% 
  arrange(Name_TNRS)
Check 30 random species names from DT that changed name after matching to backbone
Turboveg2 concept Matched concept Name_TNRS
Lomatium species Lomatium species Lomatium
Angelica_atropurpurea species Angelica_atropurpurea species Angelica atropurpurea
Artemisia scoparlia Artemisia scoparlia Artemisia scoparia
Alternanthera denticulata Alternanthera denticulata Alternanthera sessilis
Eryngium x chevalieri Eryngium x chevalieri Eryngium ×
Anvillea radiata Anvillea radiata Anvillea garcinii
Manilkara surinamensis Manilkara surinamensis Manilkara bidentata
Acacia penninervis var. longiracemosa Acacia penninervis var. longiracemosa Racosperma penninerve
Anthurium species #7 Anthurium species #7 Anthurium
Plagiothecium succul Plagiothecium succul Plagiothecium
Neslia species Neslia species Neslia
Deplanchea species [M1] Deplanchea species [M1] Deplanchea
Elsholtzia stauntoni Elsholtzia stauntoni Elsholtzia stauntonii
Scrophulariaceae 2915 Scrophulariaceae 2915 Scrophulariaceae
Melanthera aspera Melanthera aspera Melanthera nivea
Lauraceae species [NPZ 5081] Lauraceae species [NPZ 5081] Lauraceae
Betula pendula x pubesens Betula x aurata Betula aurata
Algae (spp) Algae (spp) NA
Celastrus_orbiculatus species Celastrus_orbiculatus species Celastrus orbiculatus
Platysace sp. Eneabba (R. Hnatiuk 770001) Platysace sp. Eneabba (R. Hnatiuk 770001) Platysace
[AT469 Indigofera] [AT469 Indigofera] Indigofera
Scabiosa simplex Lomelosia simplex Scabiosa stellata
Calamagrostis laguroides Calamagrostis laguroides Calamagrostis anthoxanthoides
Guettarda species Guettarda species Guettarda
Philodendron urbanianum Philodendron urbanianum Philodendron consanguineum
Connarus species #1 Connarus species #1 Connarus
Dactylorhiza baltica Dactylorhiza majalis subsp. baltica Dactylorhiza baltica
Lonchocarpu michelianus Lonchocarpu michelianus Lonchocarpus michelianus
Aglaia elliptifolia Aglaia elliptifolia Aglaia rimosa
Eupatorium recurvans Eupatorium recurvans Eupatorium mohrii

Check the most common species names from DT after matching to backbone

name.check.freq <- DT1 %>% 
  dplyr::select(`Turboveg2 concept`:`Matched concept`, Name_short) %>% 
  rename(Name_TNRS=Name_short) %>% 
  group_by(`Turboveg2 concept`, `Matched concept`, Name_TNRS) %>% 
  summarize(n=n()) %>% 
  mutate(Matched_short=word(`Matched concept`, start = 1L, end=2L)) %>% 
  filter(is.na(Name_TNRS) | Matched_short != Name_TNRS) %>%
  dplyr::select(-Matched_short) %>% 
  ungroup() %>% 
  arrange(desc(n)) 
## `summarise()` has grouped output by 'Turboveg2 concept', 'Matched concept'. You can override using the `.groups` argument.
Check 40 most common species names from DT that changed name after matching to backbone
Turboveg2 concept Matched concept Name_TNRS n
Deschampsia flexuosa Avenella flexuosa Deschampsia flexuosa 126514
Festuca pratensis Schedonorus pratensis Festuca pratensis 84008
Elymus repens Elytrigia repens Elymus repens 82891
Phalaris arundinacea Phalaroides arundinacea Phalaris arundinacea 75296
Bryophyta species Bryophyta species NA 74393
Poa annua Ochlopoa annua Poa annua 67460
Potentilla anserina Argentina anserina Potentilla anserina 63786
Taraxacum sect. Ruderalia Taraxacum sect. Taraxacum Taraxacum 58429
Taraxacum species Taraxacum species Taraxacum 57167
Cornus sanguinea Cornus sanguinea Cornus controversa 52651
Elytrigia repens Elytrigia repens Elymus repens 51670
Taraxacum officinale Taraxacum sect. Taraxacum Taraxacum 50502
Weinmannia racemosa Weinmannia racemosa Leiospermum racemosum 38269
Bromus erectus Bromopsis erecta Bromus erectus 33765
Cladonia species Cladonia species Cladonia 32464
Avenella flexuosa Avenella flexuosa Deschampsia flexuosa 30787
Rubus sect. Rubus Rubus sect. Rubus Rubus 28684
Festuca arundinacea Schedonorus arundinaceus Festuca arundinacea 26124
Trientalis europaea Trientalis europaea Lysimachia europaea 25940
Rubus fruticosus aggr. Rubus fruticosus aggr. Rubus vestitus 23669
Glaux maritima Glaux maritima Lysimachia maritima 23305
Taraxacum officinale aggr. Taraxacum sect. Taraxacum Taraxacum 22837
Rubus species Rubus species Rubus 22098
Festuca gigantea Schedonorus giganteus Festuca gigantea 20917
Taraxacum sectie Ruderalia Taraxacum sect. Taraxacum Taraxacum 20888
Lophozonia menziesii Lophozonia menziesii Lophozonia 20249
Juncus gerardi Juncus gerardi Juncus gerardii 19094
Sphagnum species Sphagnum species Sphagnum 18293
Festuca rupicola Festuca stricta subsp. sulcata Festuca rupicola 18010
Rosa species Rosa species Rosa 16657
Podocarpus laetus Podocarpus laetus Podocarpus spinulosus 16356
Bromus tectorum Anisantha tectorum Bromus tectorum 16302
Carex species Carex species Carex 15744
Ripogonum scandens Ripogonum scandens Rhipogonum 14984
Rubus hirtus Rubus hirtus aggr. Rubus proiectus 14191
Avenula pubescens Avenula pubescens Helictotrichon pubescens 13490
Notogrammitis billardierei Notogrammitis billardierei NA 13117
Crataegus species Crataegus species Crataegus 13072
Helictotrichon pubescens Avenula pubescens Helictotrichon pubescens 12941
Erophila verna Draba verna Erophila verna 12646

Complete field taxon group

Taxon group information is only available for 35699079 entries, but absent for 7394395. To improve the completeness of this field, we derive additional info from the Backbone, and merge it with the data already present in DT.

table(DT1$`Taxon group`, exclude=NULL)
## 
##           Alga         Lichen           Moss       Mushroom      Stonewort 
##           9497         324002        2034938            513          12166 
##        Unknown Vascular plant 
##        7394395       33317963
DT1 <- DT1 %>% 
  mutate(`Taxon group`=ifelse(`Taxon group`=="Unknown", NA, `Taxon group`)) %>% 
  mutate(Taxongroup_BB=ifelse(Taxongroup_BB=="Unknown", NA, Taxongroup_BB)) %>% 
  mutate(`Taxon group`=coalesce(`Taxon group`, Taxongroup_BB)) %>% 
  dplyr::select(-Taxongroup_BB)


table(DT1$`Taxon group`, exclude=NULL)
## 
##           Alga         Lichen           Moss       Mushroom      Stonewort 
##           9991         366919        2090925            513          12166 
## Vascular plant           <NA> 
##       40522355          90605

Those taxa for which a measures of Basal Area exists can be safely assumed to belong to vascular plants

DT1 <- DT1 %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=`Cover code`=="x_BA", 
                               values="Vascular plant"))

Cross-complement Taxon group information. This means that, whenever a taxon is marked to belong to one group, then assign the same taxon to that group throughout the DT table.

DT1 <- DT1 %>% 
  left_join(DT1 %>% 
              filter(!is.na(Name_short)) %>% 
              filter(`Taxon group` != "Unknown") %>% 
              dplyr::select(Name_short, `Taxon group`) %>% 
              distinct(Name_short, .keep_all=T) %>% 
              rename(TaxonGroup_compl=`Taxon group`),
            by="Name_short") %>% 
  mutate(`Taxon group`=coalesce(`Taxon group`, TaxonGroup_compl)) %>% 
  dplyr::select(-TaxonGroup_compl)

table(DT1$`Taxon group`, exclude=NULL)
## 
##           Alga         Lichen           Moss       Mushroom      Stonewort 
##           9994         367508        2100558            513          12193 
## Vascular plant           <NA> 
##       40523933          78775

Check species with conflicting Taxon group information and fix manually.

#check for conflicts in attribution of genera to Taxon groups
DT1 %>% 
  filter(!is.na(Name_short)) %>% 
  filter(!is.na(`Taxon group`)) %>% 
  distinct(Name_short, `Taxon group`) %>% 
  mutate(Genus=word(Name_short,1)) %>% 
  dplyr::select(Genus, `Taxon group`) %>% 
  distinct() %>% 
  group_by(Genus) %>% 
  summarize(n=n()) %>% 
  filter(n>1) %>% 
  arrange(desc(n))
## # A tibble: 15 x 2
##    Genus                 n
##    <chr>             <int>
##  1 Brachytheciastrum     2
##  2 Brachythecium         2
##  3 Chara                 2
##  4 Characeae             2
##  5 Hepatica              2
##  6 Hypericum             2
##  7 Hypnum                2
##  8 Leptorhaphis          2
##  9 Lychnothamnus         2
## 10 Nitella               2
## 11 Oxymitra              2
## 12 Pancovia              2
## 13 Peltaria              2
## 14 Tonina                2
## 15 Zygodon               2

Manually fix some known problems in Taxon group attribution. Some lists of taxa (e.g., lichen.genera, mushroom.genera) were defined when building the Backbone.

#Attach genus info
DT1 <- DT1 %>% 
    left_join(Backbone %>% 
              dplyr::select(Name_sPlot_TRY, Name_short) %>%
              mutate(Genus=word(Name_short, 1, 1)) %>%
              dplyr::select(-Name_short) %>% 
              rename(`Matched concept`=Name_sPlot_TRY),
            by="Matched concept") %>% 
    mutate(`Taxon group`=fct_collapse(`Taxon group`, 
                                    Alga_Stonewort=c("Alga", "Stonewort")))
#manually fix some known problems
mosses.gen    <- c("Hypnum", "Brachytheciastrum","Brachythecium","Hypnum",  
                  "Zygodon", "Oxymitra", "Bryophyta", "Musci", '\\\"Moos\\\"')
vascular.gen  <- c("Polystichum", "Hypericum", "Peltaria", "Pancovia", "Calythrix", "Ripogonum",
                  "Notogrammitis", "Fuscospora", "Lophozonia",  "Rostellularia", 
                  "Hesperostipa", "Microsorium", "Angiosperm","Dicotyledonae", "Spermatophy", 
                  "Oxymitra", "Friesodielsia")
alga.gen      <- c("Chara", "Characeae", "Tonina", "Nostoc", "Entermorpha", "Hydrocoleum" )
 
DT1 <- DT1 %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% mosses.gen, 
                               values="Moss")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% vascular.gen, 
                               values="Vascular plant")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% alga.gen, 
                               values="Alga_Stonewort")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% c(lichen.genera, "Lichenes"),
                               values="Lichen")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% mushroom, 
                               values="Mushroom"))
  
table(DT1$`Taxon group`, exclude=NULL)
## 
## Alga_Stonewort         Lichen           Moss       Mushroom Vascular plant 
##          23098         367509        2100635            513       40525774 
##           <NA> 
##          75945

Delete all records of fungi, and use lists of genera to fix additional problems. While in the previous round the matching was done on the resolved Genus name, here the match is based on unresolved Genus names.

DT1 <- DT1 %>% 
  dplyr::select(-Genus) %>% 
  left_join(DT1 %>% 
              distinct(`Matched concept`) %>% 
              mutate(Genus=word(`Matched concept`, 1)), 
            by="Matched concept") %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                                 list=Genus %in% mushroom, 
                                 values = "Mushroom")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% lichen.genera, 
                               values="Lichen")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% mosses.gen, 
                               values="Moss")) %>% 
  mutate(`Taxon group`=replace(`Taxon group`, 
                               list=Genus %in% vascular.gen, 
                               values="Vascular plant")) %>% 
  mutate(`Taxon group` = fct_explicit_na(`Taxon group`, "Unknown")) %>% 
  filter(`Taxon group`!="Mushroom") %>%
  mutate(`Taxon group`=factor(`Taxon group`))
  #dplyr::select(-Genus)

table(DT1$`Taxon group`, exclude=NULL)
## 
## Alga_Stonewort         Lichen           Moss Vascular plant        Unknown 
##          23098         367855        2103292       40563071          35721

After cross-checking all sources of information, the number of taxa not having Taxon group information decreased to 35721 entries

Standardize abundance values

Species abundance information varies across datasets and plots. While for the large majority of plots abundance values are returned as percentage cover, there is a subset where abundance is returned with different scales. These are marked in the column Cover code as follows:
x_BA - Basal Area
x_IC - Individual count
x_SC - Stem count
x_IV - Relative Importance
x_RF - Relative Frequency
x - Presence absence
Still, it’s not really intuitive that in case Cover code belongs to one of the classes above, then the actual abundance value is stored in the x_ column. This stems from the way this data is stored in TURBOVEG.
To make the cover data more user friendly, I simplify the way cover it is stored, so that there are only two columns:
Ab_scale - to report the type of scale used
Abundance - to coalesce the cover\abundance values previously in the columns Cover % and x_.

# Create Ab_scale field
DT1 <- DT1 %>% 
  mutate(Ab_scale = ifelse(`Cover code` %in% 
                             c("x_BA", "x_IC", "x_SC", "x_IV", "x_RF") & !is.na(x_), 
                           `Cover code`, 
                           "CoverPerc"))  

Fix some errors. There are some plots where all species have zeros in the field Cover %. Some of them are marked as p\a (Cover code=="x"), but other not. Consider all this plots as presence\absence and transform Cover % to 1.

allzeroes <- DT1 %>% 
  group_by(PlotObservationID) %>% 
  summarize(allzero=all(`Cover %`==0) ) %>% 
  filter(allzero==T) %>% 
  pull(PlotObservationID)
DT1 <- DT1 %>% 
  mutate(`Cover %`=replace(`Cover %`, 
                           list=(PlotObservationID %in% allzeroes), 
                           values=1)) %>% 
  mutate(`Cover code`=replace(`Cover code`, 
                           list=(PlotObservationID %in% allzeroes), 
                           values="x"))

Consider all plot-layer combinations where Cover code=="x", and all the entries of the field Cover % == 1 as presence\absence data, and transform Ab_scale to “pa”. This is done to avoid confusion with plots where Cover code=="x" but “x” has to be intended as a class in the cover scale used. For p\a plots, replace the field Cover % with NA, and assign the value 1 to the field x_.

#plots with at least one entry in Cover code=="x"
sel <- DT1 %>% 
  filter(`Cover code`=="x") %>% 
  distinct(PlotObservationID) %>% 
  pull(PlotObservationID)

DT1 <- DT1 %>% 
  left_join(DT1 %>%
              filter(PlotObservationID %in% sel) %>% 
              group_by(PlotObservationID, Layer) %>% 
              mutate(to.pa= all(`Cover %`==1 & `Cover code`=="x")) %>% 
              distinct(PlotObservationID, Layer, to.pa), 
            by=c("PlotObservationID", "Layer")) %>% 
  replace_na(list(to.pa=F)) %>% 
  mutate(Ab_scale=ifelse(to.pa==T, "pa", Ab_scale)) %>% 
  mutate(`Cover %`=ifelse(to.pa==T, NA, `Cover %`)) %>% 
  mutate(x_=ifelse(to.pa==T, 1, x_)) %>% 
  dplyr::select(-to.pa)

There are also some plots having different cover scales in the same layer. They are not many, and I will reduce their cover value to p\a.
Find these plots first:

mixed <- DT1 %>% 
  distinct(PlotObservationID, Ab_scale, Layer) %>% 
  group_by(PlotObservationID, Layer) %>% 
  summarize(n=n()) %>% 
  filter(n>1) %>% 
  distinct(PlotObservationID) %>% 
  pull(PlotObservationID) 
## `summarise()` has grouped output by 'PlotObservationID'. You can override using the `.groups` argument.
length(mixed)
## [1] 335

Transform these plots to p\a and correct field Ab_scale. Note: the column Abundance is only created here.

DT1 <- DT1 %>% 
  mutate(Ab_scale=replace(Ab_scale, 
                           list=PlotObservationID %in% mixed, 
                           values="mixed")) %>%
  mutate(`Cover %`=replace(`Cover %`, 
                           list=Ab_scale=="mixed",
                           values=NA)) %>% 
  mutate(x_=replace(x_,  list=Ab_scale=="mixed", values=1)) %>% 
  mutate(Ab_scale=replace(Ab_scale, list=Ab_scale=="mixed", values="pa")) %>% 
  #Create additional field Abundance to avoid overwriting original data
  mutate(Abundance =ifelse(Ab_scale %in% c("x_BA", "x_IC", "x_SC", "x_IV", "x_RF", "pa"), 
                          x_, `Cover %`)) %>% 
  mutate(Abundance=replace(Abundance, 
                           list=PlotObservationID %in% mixed, 
                           values=1))

Double check and summarize Ab_scales

scale_check <- DT1 %>% 
  distinct(PlotObservationID, Layer, Ab_scale) %>% 
  group_by(PlotObservationID) %>% 
  summarise(Ab_scale_combined=ifelse(length(unique(Ab_scale))==1, 
                                     unique(Ab_scale), 
                                     "Multiple_scales"))

nrow(scale_check)== length(unique(DT1$PlotObservationID))
## [1] TRUE
table(scale_check$Ab_scale_combined)
## 
##       CoverPerc Multiple_scales              pa            x_BA            x_IC 
##         1690405            2084          271057            6293            2092 
##            x_IV            x_RF            x_SC 
##             146             585            4878

Calculate species’ relative covers in each plot

Transform abundances to relative abundance. For consistency with the previous version of sPlot, this field is called Relative_cover.
Watch out - Even plots with p\a information are transformed to relative cover.

DT1 <- DT1 %>% 
  left_join(x=., 
            y={.} %>%
              group_by(PlotObservationID) %>% 
              summarize(tot.abundance=sum(Abundance)), 
            by=c("PlotObservationID")) %>% 
  mutate(Relative.cover=Abundance/tot.abundance)

Clean DT and export

DT2 <- DT1 %>% 
  dplyr::select(PlotObservationID, Name_short, `Turboveg2 concept`, Rank_correct, `Taxon group`, Layer:x_, Ab_scale, Abundance, Relative.cover ) %>% 
  rename(Species_original=`Turboveg2 concept`, 
         Species=Name_short,
         Taxon_group=`Taxon group`, 
         Cover_perc=`Cover %`, 
         Cover_code=`Cover code`, 
         Relative_cover=Relative.cover) %>% 
  ## change in Version 1.1.
  dplyr::select(-x_, -Cover_perc)

The output of the DT table contains 43093037 records, over 1977540 plots. The total number of taxa is 116256 and 0, before and after standardization, respectively. Information on the Taxon group is available for 76548 standardized species.

Example of initial DT table (same 3 randomly selected plots shown above)
PlotObservationID Species Species_original Rank_correct Taxon_group Layer Cover_code Ab_scale Abundance Relative_cover
532404 Amaranthus blitum Amaranthus lividus species Vascular plant 6 2 CoverPerc 13.0 0.1007752
532404 Amaranthus powellii Amaranthus powellii species Vascular plant 6 1 CoverPerc 3.0 0.0232558
532404 Amaranthus retroflexus Amaranthus retroflexus species Vascular plant 6 2 CoverPerc 13.0 0.1007752
532404 Brassica rapa Brassica rapa species Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Calystegia sepium Calystegia sepium species Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Capsella bursa-pastoris Capsella bursa-pastoris species Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Matricaria chamomilla Chamomilla recutita higher Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Chenopodium album Chenopodium album species Vascular plant 6 1 CoverPerc 3.0 0.0232558
532404 Chenopodium ficifolium Chenopodium ficifolium species Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Lipandra polysperma Chenopodium polyspermum species Vascular plant 6 1 CoverPerc 3.0 0.0232558
532404 Cirsium arvense Cirsium arvense species Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Convolvulus arvensis Convolvulus arvensis species Vascular plant 6 2 CoverPerc 13.0 0.1007752
532404 Digitaria sanguinalis Digitaria sanguinalis species Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Echinochloa crus-galli Echinochloa crus-galli species Vascular plant 6 1 CoverPerc 3.0 0.0232558
532404 Galinsoga quadriradiata Galinsoga ciliata species Vascular plant 6 1 CoverPerc 3.0 0.0232558
532404 Galinsoga parviflora Galinsoga parviflora species Vascular plant 6 3 CoverPerc 38.0 0.2945736
532404 Geranium dissectum Geranium dissectum species Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Lamium purpureum Lamium purpureum species Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Lolium perenne Lolium perenne species Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Phacelia tanacetifolia Phacelia tanacetifolia species Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Persicaria lapathifolia Polygonum lapathifolium species Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Persicaria maculosa Polygonum persicaria species Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Setaria pumila Setaria pumila species Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Stachys arvensis Stachys arvensis species Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Stellaria media Stellaria media species Vascular plant 6 1 CoverPerc 3.0 0.0232558
532404 Taraxacum Taraxacum officinale genus Vascular plant 6
CoverPerc 2.0 0.0155039
532404 Veronica persica Veronica persica species Vascular plant 6
CoverPerc 2.0 0.0155039
1648095 Acer campestre Acer campestre species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Acer platanoides Acer platanoides species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Acer pseudoplatanus Acer pseudoplatanus species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Buglossoides purpurocaerulea Aegonychon purpureocaeruleum species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Anemone nemorosa Anemonoides nemorosa species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Anemone ranunculoides Anemonoides ranunculoides species Vascular plant 6 10 CoverPerc 10.0 0.0703730
1648095 Asarum europaeum Asarum europaeum species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Brachypodium sylvaticum Brachypodium sylvaticum species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Brachytheciastrum velutinum Brachythecium velutinum species Moss 9 .1 CoverPerc 0.1 0.0007037
1648095 Campanula rapunculoides Campanula rapunculoides species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Carex spicata Carex contigua species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Prunus avium Cerasus avium species Vascular plant 1 12 CoverPerc 12.0 0.0844476
1648095 Prunus avium Cerasus avium species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Cornus mas Cornus mas species Vascular plant 4 13 CoverPerc 13.0 0.0914849
1648095 Cornus mas Cornus mas species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Corylus avellana Corylus avellana species Vascular plant 4 13 CoverPerc 13.0 0.0914849
1648095 Corylus avellana Corylus avellana species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Crataegus rhipidophylla Crataegus curvisepala species Vascular plant 4 .1 CoverPerc 0.1 0.0007037
1648095 Euonymus europaeus Euonymus europaea species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Euonymus verrucosus Euonymus verrucosa species Vascular plant 4 .1 CoverPerc 0.1 0.0007037
1648095 Euonymus verrucosus Euonymus verrucosa species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Fraxinus excelsior Fraxinus excelsior species Vascular plant 1 57 CoverPerc 57.0 0.4011260
1648095 Fraxinus excelsior Fraxinus excelsior species Vascular plant 4 .1 CoverPerc 0.1 0.0007037
1648095 Fraxinus excelsior Fraxinus excelsior species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Gagea lutea Gagea lutea species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Lamium galeobdolon Galeobdolon luteum lower Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Lilium martagon Lilium martagon species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Malus sylvestris Malus sylvestris species Vascular plant 4 .1 CoverPerc 0.1 0.0007037
1648095 Polygonatum multiflorum Polygonatum multiflorum species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Primula veris Primula veris species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Pulmonaria obscura Pulmonaria obscura species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Quercus robur Quercus robur species Vascular plant 1 8 CoverPerc 8.0 0.0562984
1648095 Quercus robur Quercus robur species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Scilla bifolia Scilla bifolia species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Viburnum lantana Viburnum lantana species Vascular plant 4 .1 CoverPerc 0.1 0.0007037
1648095 Viburnum lantana Viburnum lantana species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Viola hirta Viola hirta species Vascular plant 6 .1 CoverPerc 0.1 0.0007037
1648095 Viola mirabilis Viola mirabilis species Vascular plant 6 6 CoverPerc 6.0 0.0422238
1648095 Viola odorata Viola odorata species Vascular plant 6 20 CoverPerc 20.0 0.1407460
1839189 Acacia aneura Acacia aneura species Vascular plant 0 x pa 1.0 1.0000000

Field List

  • PlotObservationID - Plot ID, as in header.
  • Species - Resolved species name, based on taxonomic backbone
  • Species_original - Original species name, as provided by data contributor.
  • Rank_correct - Taxonomic rank at which Species_original was matched.
  • Taxon_group - Possible entries are: Alga_Stonewort, Lichen, Moss, Vascular plant, Unknown.
  • Layer - Vegetation layer, as specified in Turboveg: 0: No layer specified, 1: Upper tree layer, 2: Middle tree layer, 3: Lower tree layer, 4: Upper shrub layer, 5: Lower shrub layer, 6: Herb layer, 7: Juvenile, 8: Seedling, 9: Moss layer.
  • Cover_code - Cover\abundance value in original data, before transformation to percentage cover.
  • Ab_scale - Abundance scale in original data. Possible values are: CoverPerc: Cover Percentage, pa: Presence absence, x_BA: Basal Area, x_IC: Individual count, x_SC: Stem count, x_IV: Relative Importance, x_RF: Relative Frequency.
  • Abundance - Abundance value, in original value, or as transformed from original Cover code to quantitative values.
  • Relative_cover - Abundance of each species after being normalized to 1 in each plot.
  • save(DT2, file = "../_output/DT_sPlot3.0.RData")

    SessionInfo

    ## R version 3.6.3 (2020-02-29)
    ## Platform: x86_64-pc-linux-gnu (64-bit)
    ## Running under: Ubuntu 16.04.7 LTS
    ## 
    ## Matrix products: default
    ## BLAS:   /usr/lib/openblas-base/libblas.so.3
    ## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
    ## 
    ## locale:
    ##  [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
    ##  [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
    ##  [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
    ##  [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
    ##  [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
    ## [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
    ## 
    ## attached base packages:
    ## [1] stats     graphics  grDevices utils     datasets  methods   base     
    ## 
    ## other attached packages:
    ##  [1] kableExtra_1.3.4 knitr_1.31       xlsx_0.6.5       forcats_0.5.1   
    ##  [5] stringr_1.4.0    dplyr_1.0.5      purrr_0.3.4      readr_1.4.0     
    ##  [9] tidyr_1.1.3      tibble_3.0.1     ggplot2_3.3.0    tidyverse_1.3.0 
    ## 
    ## loaded via a namespace (and not attached):
    ##  [1] tidyselect_1.1.0  xfun_0.22         bslib_0.2.4       rJava_0.9-13     
    ##  [5] haven_2.3.1       colorspace_2.0-0  vctrs_0.3.6       generics_0.1.0   
    ##  [9] viridisLite_0.3.0 htmltools_0.5.1.1 yaml_2.2.1        utf8_1.2.1       
    ## [13] rlang_0.4.10      jquerylib_0.1.3   pillar_1.4.3      glue_1.4.2       
    ## [17] withr_2.4.1       DBI_1.1.1         gdtools_0.2.3     dbplyr_2.1.0     
    ## [21] modelr_0.1.6      readxl_1.3.1      lifecycle_1.0.0   munsell_0.5.0    
    ## [25] gtable_0.3.0      cellranger_1.1.0  rvest_1.0.0       evaluate_0.14    
    ## [29] fansi_0.4.2       xlsxjars_0.6.1    highr_0.8         broom_0.7.0      
    ## [33] Rcpp_1.0.5        scales_1.1.1      backports_1.2.1   webshot_0.5.2    
    ## [37] jsonlite_1.7.2    systemfonts_1.0.1 fs_1.5.0          hms_1.0.0        
    ## [41] digest_0.6.25     stringi_1.5.3     grid_3.6.3        cli_2.3.1        
    ## [45] tools_3.6.3       magrittr_2.0.1    sass_0.3.1        crayon_1.4.1     
    ## [49] pkgconfig_2.0.3   ellipsis_0.3.1    xml2_1.3.2        reprex_1.0.0     
    ## [53] lubridate_1.7.10  svglite_1.2.3.2   assertthat_0.2.1  rmarkdown_2.7    
    ## [57] httr_1.4.2        rstudioapi_0.13   R6_2.5.0          compiler_3.6.3