Timestamp: Wed Dec 2 00:51:39 2020
Drafted: Francesco Maria Sabatini
Revised: Helge Bruelheide
Version: 1.1

This report documents the construction of the header file for sPlot 3.0. It is based on dataset sPlot_3.0.2, received on 24/07/2019 from Stephan Hennekens.

Changes in version 1.1.
1) Excluded plots from Canada, as recommended by Custodian
2) Filled missing info from most of the ~2000 plots without country information from these datasets.
3) Corrected mismatched sBiomes and ecoregions

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(viridis)
library(readr)
library(xlsx)
library(knitr)
library(kableExtra)

## Spatial packages
library(rgdal)
library(sp)
library(rgeos)
library(raster)
library(rworldmap)
library(elevatr)
library(sf)
library(rnaturalearth)
library(dggridR)

#save temporary files
write("TMPDIR = /data/sPlot/users/Francesco/_tmp", file=file.path(Sys.getenv('TMPDIR'), '.Renviron'))
write("R_USER = /data/sPlot/users/Francesco/_tmp", file=file.path(Sys.getenv('R_USER'), '.Renviron'))
rasterOptions(tmpdir="/data/sPlot/users/Francesco/_tmp")

1 Import data

Import header data. Clean header data from quotation and double quotation marks from linux terminal.

# escape all double quotation marks. Run in Linux terminal
#sed 's/"/\\"/g' sPlot_3_0_2_header.csv > sPlot_3_0_2_header_test.csv

#more general alternative in case some " are already escaped
##first removing \s before all "s, and then adding \ before all ":
#sed 's/\([^\\]\)"/\1\\\"/g; s/"/\\"/g'

Import cleaned header data.

header0 <- readr::read_delim("../sPlot_data_export/sPlot_3_0_2_header_test.csv", locale = locale(encoding = 'UTF-8'),
                            delim="\t", col_types=cols(
  PlotObservationID = col_double(),
  PlotID = col_double(),
  `TV2 relevé number` = col_double(),
  Country = col_factor(),
  `Cover abundance scale` = col_factor(),
  `Date of recording` = col_date(format="%d-%m-%Y"),
  `Relevé area (m²)` = col_double(),
  `Altitude (m)` = col_double(),
  `Aspect (°)` = col_double(),
  `Slope (°)` = col_double(),
  `Cover total (%)` = col_double(),
  `Cover tree layer (%)` = col_double(),
  `Cover shrub layer (%)` = col_double(),
  `Cover herb layer (%)` = col_double(),
  `Cover moss layer (%)` = col_double(),
  `Cover lichen layer (%)` = col_double(),
  `Cover algae layer (%)` = col_double(),
  `Cover litter layer (%)` = col_double(),
  `Cover open water (%)` = col_double(),
  `Cover bare rock (%)` = col_double(),
  `Height (highest) trees (m)` = col_double(),
  `Height lowest trees (m)` = col_double(),
  `Height (highest) shrubs (m)` = col_double(),
  `Height lowest shrubs (m)` = col_double(),
  `Aver. height (high) herbs (cm)` = col_double(),
  `Aver. height lowest herbs (cm)` = col_double(),
  `Maximum height herbs (cm)` = col_double(),
  `Maximum height cryptogams (mm)` = col_double(),
  `Mosses identified (y/n)` = col_factor(),
  `Lichens identified (y/n)` = col_factor(),
  COMMUNITY = col_character(),
  SUBSTRATE = col_character(),
  Locality = col_character(),
  ORIG_NUM = col_character(),
  ALLIAN_REV = col_character(),
  REV_AUTHOR = col_character(),
  Forest = col_logical(),
  Grassland = col_logical(),
  Wetland = col_logical(),
  `Sparse vegetation` = col_logical(),
  Shrubland = col_logical(),
  `Plants recorded` = col_factor(),
  `Herbs identified (y/n)` = col_factor(),
  Naturalness = col_factor(),
  EUNIS = col_factor(),
  Longitude = col_double(),
  Latitude = col_double(),
  `Location uncertainty (m)` = col_double(),
  Dataset = col_factor(),
  GUID = col_character()
)) %>% 
  rename(Sparse.vegetation=`Sparse vegetation`, 
         ESY=EUNIS) %>% 
  dplyr::select(-COMMUNITY, -ALLIAN_REV, -REV_AUTHOR, -SUBSTRATE) %>%   #too sparse information to be useful
  dplyr::select(-PlotID) #identical to PlotObservationID

The following column names occurred in the header of sPlot v2.1 and are currently missing from the header of v3.0
1. Syntaxon
2. Cover cryptogams (%)
3. Cover bare soil (%)
4. is.forest
5. is.non.forest
6. EVA
7. Biome
8. BiomeID
9. CONTINENT
10. POINT_X
11. POINT_Y
~~ Columns #1, #2, #3, #10, #11 will be dropped. The others will be derived below.

1.1 Exclude unreliable plots

Some canadian plots need to be removed, on indication of Laura Boisvert-Marsh from GIVD NA-CA-004. The plots (and corresponding PlotObservationID) are:

Fabot01 - 1707776
Fadum01, 02 & 03 - 1707779:1707781
Faers01 - 1707782
Pfe-f-08 - 1707849
Pfe-o-05- 1707854

header0 <- header0 %>% 
  filter(!PlotObservationID %in% c(1707776, 1707779:1707782, 1707849, 1707854)) %>% 
  filter(Dataset != "$Coastal_Borja") %>% 
  filter(Dataset != "$Coastal_Poland") 

1.2 Solve spatial problems

There are 2020 plots in the Nile dataset without spatial coordinates. Assign manually with wide (90km) location uncertainty.

header <- header0 %>% 
  mutate(Latitude=replace(Latitude, 
                          list=(is.na(Latitude) & Dataset=="Egypt Nile delta"), 
                          values=30.917351)) %>% 
  mutate(Longitude=replace(Longitude, 
                          list=(is.na(Longitude) & Dataset=="Egypt Nile delta"), 
                          values=31.138534)) %>% 
  mutate(`Location uncertainty (m)`=replace(`Location uncertainty (m)`, 
                          list=(is.na(`Location uncertainty (m)`) & Dataset=="Egypt Nile delta"), 
                          values=-90000))

There are two plots in the Romania Grassland Databse and ~4442 plots in the Japan database, whose lat\long are inverted. Correct.

toswap <- c(which(header$Dataset=="Japan" & header$Latitude>90), 
            which(header$Dataset=="Romania Grassland Database" & header$Longitude>40))
header[toswap, c("Latitude", "Longitude")] <- header[toswap, c("Longitude", "Latitude")]

There are 237546 plots without location uncertainty. As a first approximation, we assign the median of the respective dataset, as a negative value to indicate this is an estimation, rather than a measure.

header <- header %>% 
  left_join(header %>% 
              group_by(Dataset) %>% 
              summarize(loc.uncer.median=median(`Location uncertainty (m)`, na.rm=T)), 
            by="Dataset") %>% 
  mutate(`Location uncertainty (m)`=ifelse( is.na(`Location uncertainty (m)` & !is.na(Latitude)), 
                                            -abs(loc.uncer.median), 
                                            `Location uncertainty (m)`)) %>% 
  dplyr::select(-loc.uncer.median)
## `summarise()` ungrouping output (override with `.groups` argument)

There are still 91943 plots with no estimation of location uncertainty.
Assign plot size to plots in the Patagonia dataset (input of Ana Cingolani)

header <- header %>% 
  mutate(`Relevé area (m²)`=ifelse( (Dataset=="Patagonia" & is.na(`Relevé area (m²)`)), 
                                    -900, `Relevé area (m²)`))

There are 518 plots from the dataset Germany_gvrd (EU-DE-014) having a location uncertainty equal to 2,147,483 km (!). These plots have a location reported. Replace with a more likely estimate (20 km)

header <- header %>% 
  mutate(`Location uncertainty (m)`=replace(`Location uncertainty (m)`, 
                                            list=`Location uncertainty (m)`==2147483647, 
                                            values=20000))

2 Formations

Fill out the columns Forest:Sparse.vegetation with NAs, where necessary. Create columns is.forest and is.non.forest using script developed for sPlot 2.1
~~ I am not assigning plots to Faber-Langedon formation at this stage, as this is only possible for European plots having an ESY classification.

eunis.key <- read.xlsx("../_input/EUNIS_WFT.xlsx", sheetIndex = "Sheet1", endRow = 246) %>% 
  dplyr::select(EUNIS_code, NATURALNESS:SPARSE_VEG) %>% 
  mutate(EUNIS_code=as.character(EUNIS_code)) %>% 
  rename(ESY=EUNIS_code, 
         Naturalness=NATURALNESS, 
         Forest=FOREST,
         Shrubland=SCRUBLAND,
         Grassland=GRASSLAND,
         Wetland=WETLAND,
         Sparse.vegetation=SPARSE_VEG)#,

header <- header %>% # header.backup %>% 
  mutate(ESY=as.character(ESY)) %>% 
  #mutate(ESY=ifelse(ESY=="?", NA, ESY)) %>% 
  # Systematically assign some databases to forest
  mutate(Forest=ifelse(Dataset %in% 
                         c("Turkey Oak_Forest Database", "Turkey Forest Database", "Chile_forest", "Ethiopia"), 
                       T, Forest)) %>% 
  #fill up with F those rows where at least one column on formation is assigned
  rowwise() %>% 
  mutate(Any=any(Forest, Shrubland, Grassland, Wetland, Sparse.vegetation)) %>% 
  mutate(Forest=ifelse( (is.na(Forest) & Any), F, Forest))  %>%
  mutate(Shrubland=ifelse( (is.na(Shrubland) & Any), F, Shrubland))  %>% 
  mutate(Grassland=ifelse( (is.na(Grassland) & Any), F, Grassland))  %>% 
  mutate(Wetland=ifelse( (is.na(Wetland) & Any), F, Wetland))  %>% 
  mutate(Sparse.vegetation=ifelse( (is.na(Sparse.vegetation) & Any), F, Sparse.vegetation))  %>%
  ungroup() %>% 
  dplyr::select(-Any) %>% 
  mutate_at(vars(Forest:Shrubland), .funs=list(~.*1)) %>% 
  mutate(Naturalness=as.numeric(as.character(Naturalness))) %>% 
  ##join and coalesce with eunis.key
  left_join(eunis.key %>% 
              distinct(), by = "ESY") %>% 
    mutate(
        Forest = dplyr:::coalesce(Forest.x, Forest.y), 
        Shrubland = coalesce(Shrubland.x, Shrubland.y),
        Grassland = coalesce(Grassland.x, Grassland.y),
        Wetland = coalesce(Wetland.x, Wetland.y),
        Sparse.vegetation = coalesce(Sparse.vegetation.x, Sparse.vegetation.y),
        Naturalness = coalesce(Naturalness.x, Naturalness.y)
    ) %>% 
  dplyr::select(-ends_with(".x"), -ends_with(".y"))

3 Fix header and attach GIVD codes

Reduce number of factor levels for the column Plants recorded

header <- header %>% 
  mutate(`Plants recorded`=fct_recode(`Plants recorded`, 
                                      "All vascular plants"="complete vegetation",
                                      "All vascular plants"="Complete vegetation",
                                      "All vascular plants"="all vascular plants", 
                                      "All vascular plants"="complete", 
                                      "All vascular plants"="Complete vegetation (including non-terricolous tax",
                                      "All vascular plants"="Vascular plants",
                                      "All woody plants"="Woody plants",
                                      "All woody plants"="All woody species",
                                      "Woody plants >= 10 cm dbh"= "trees>=10cm dbh",
                                      "All trees & dominant understory"="All trees & dominant shrubs",
                                      "Woody plants >= 1 cm dbh" = "Plants >= 1 cm dbh"
                                      )) %>% 
  mutate(`Plants recorded`=factor(`Plants recorded`, exclude = "#N/A"))

Align consortium labels to those in sPlot’s consortium archive

databases <- read_csv("/data/sPlot/users/Francesco/_sPlot_Management/Consortium/Databases.out.csv")

header <- header %>% 
  mutate(Dataset=fct_recode(Dataset,
                            "BIOTA_South_Africa" = "BIOTA_South_Africa_3", 
                            "Kriti"="Cyprus_Bergmeier", 
                            "European Boreal Forest Database"="European Boreal Forest Database 1", 
                            "European Boreal Forest Database"="European Boreal Forest Database 2", 
                            "European Coastal Vegetation Database"= "European Coastal Vegetation Database-A", 
                            "Germany_vegetweb"="Germany_vegetweb2", 
                            "Germany_vegetweb"="Germany_vegetweb3",
                            "Ladakh"="Ladakh_2", 
                            "Netherlands"="Netherlands Military sites",
                            "NSW Australia" = "NSW Austalia",
                            )) %>% 
  left_join(databases %>% 
              dplyr::select(`GIVD ID`, label) %>% 
              rename(Dataset=label),
            by="Dataset")

4 Assign plots to spatial descriptors

Create spatial point dataframe for sPlot data to intersect with spatial layers

header.shp <- header %>%
  filter(!is.na(Longitude) | !is.na(Latitude))
header.shp <- SpatialPointsDataFrame(coords= header.shp %>% 
                                        dplyr::select(Longitude, Latitude),
                               proj4string = CRS("+init=epsg:4326"), 
                               data=data.frame(PlotObservationID= header.shp$PlotObservationID, 
                                               loc.uncert=header.shp$`Location uncertainty (m)`, 
                                               `GIVD ID`=header.shp$`GIVD ID`))
writeOGR(header.shp, dsn="../_derived/", layer="header.shp", driver="ESRI Shapefile", overwrite_layer = T)

Reimport shapefile

header.shp <- readOGR("../_derived/header.shp.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/data/sPlot/users/Francesco/sPlot3/_derived/header.shp.shp", layer: "header.shp"
## with 1976235 features
## It has 3 fields
header.shp@data <- header.shp@data %>% 
  rename(PlotObservationID=PltObID, 
         loc.uncert=lc_ncrt,
         `GIVD ID`=GIVD_ID)
crs(header.shp) <- CRS("+init=epsg:4326")

4.1 Assign to Continents

Download and manipulate map of continents

sPDF <- rworldmap::getMap(resolution="coarse")
continent <- sPDF[,"continent"]
crs(continent) <- CRS("+init=epsg:4326")
continent@data[243,"continent"] <- "South America" ## Manually correct missing data
# create clipped version of continent to avoid going beyond 180 lON
coords <- data.frame(x=c(-180,180,180,-180),
                     y=c(-90,-90,90,90))
bboxc = Polygon(coords)
bboxc = SpatialPolygons(list(Polygons(list(bboxc), ID = "a")), proj4string=crs(continent))
continent_clipped <- gIntersection(continent[-137,], bboxc, byid=T) # polygon 137 gives problems... workaround

## same but high resolution (slower, but works better for plots along coastlines)
sPDF <- rworldmap::getMap(resolution="high")
continent.high <- sPDF[,"continent"]
crs(continent.high) <- CRS("+init=epsg:4326")
continent.high@data$continent <- fct_recode(continent.high@data$continent, "South America"="South America and the Caribbean")

Assign plots to continent

continent.out <- sp::over(x=header.shp, y=continent)
#overlay unassigned points to the high resolution layer of continent
toassign <- header.shp[which(is.na(continent.out$continent)),] #154782 remain to assign
crs(toassign) <- crs(continent)
continent.out2 <- sp::over(x=toassign, y=continent.high)
#merge first and second overlay 
continent.out$continent[is.na(continent.out$continent)] <- continent.out2$continent

#correct unassigned points to closest continent
toassign <- header.shp[which(is.na(continent.out$continent)),] #47610 remain to assign
crs(toassign) <- crs(continent)

#go parallel
ncores=8
library(parallel)
library(doParallel)
cl <- makeForkCluster(ncores, outfile="" )
registerDoParallel(cl)
  
nearestContinent <- foreach(i=1:length(toassign), .packages=c('raster'), .combine=rbind) %dopar% { 
  nearestContinent.tmp <- geosphere::dist2Line(toassign[i,], continent_clipped)
}
continent.out$continent[which(is.na(continent.out$continent))] <- as.character(continent[-137,]@data[nearestContinent[,"ID"],])
save(continent.out, file = "../_derived/continent.out")
stopCluster(cl)

Reload, manipulate continent and attach to header

load("../_derived/continent.out")
header <- header %>% 
  left_join(header.shp@data %>% 
              dplyr::select(PlotObservationID) %>% 
              bind_cols(continent.out),
            by="PlotObservationID") %>% 
  mutate(CONTINENT=factor(continent, 
                            levels=c("Africa", "Antarctica", "Australia", "Eurasia", "North America", "South America"),
                            labels=c("AF", "AN", "AU", "EU", "N-A", "S-A"))) %>% 
  
  dplyr::select(-continent)
Summarize
Number of plots per continent
CONTINENT num.plot
AF 26803
AN 19
AU 66857
EU 1775493
N-A 90170
S-A 15844
NA 2451

4.2 Assign to sBiomes

Performed in EVE HPC cluster using function A98_PredictorsExtract.R. Divided in 99 chunks.

sBiomes <- readOGR("/data/sPlot/users/Francesco/Ancillary_Data/Biomes_sPlot/sBiomes.shp")
crs(sBiomes) <- crs(header.shp)

cl <- makeForkCluster(5, outfile="")
registerDoParallel(cl)

clusterEvalQ(cl, {
  library(rgdal)
  library(raster)
  library(sp)
  library(elevatr)
  library(dplyr)
  })

sBiomes.out <- foreach(i=1:99, .combine=rbind) %dopar% {
  source("A98_PredictorsExtract.R")
  PredExtr(header.shp, myfunction=NA, output="../_derived/sBiomes/",  
                     toextract=sBiomes, typp="shp", ncores=1, chunkn=99, chunk.i=i)
}
stopCluster(cl)

Reimport output and join to header

sBiome.files <- list.files("../_derived/sBiomes", pattern="sBiomes-[0-9]+.csv", full.names=T)
sBiome.files <- sBiome.files[order(as.numeric(str_extract(sBiome.files, pattern="[0-9]+")))]
sBiomes.out <- do.call(rbind, lapply(sBiome.files, function(x) {read_csv(x)}))
sBiomes.out <- header.shp@data %>% 
  dplyr::select(PlotObservationID) %>% 
  bind_cols(sBiomes.out)
header <- header %>% 
  left_join(sBiomes.out %>% 
              dplyr::select(PlotObservationID, Name, BiomeID) %>% 
              dplyr::rename(sBiome=Name, sBiomeID=BiomeID), 
            by="PlotObservationID")

There are 2451 unassigned plots.

Summarize:
Number of plots per Biome
sBiome num.plot
Alpine 35578
Boreal zone 34515
Dry midlatitudes 69744
Dry tropics and subtropics 40474
Polar and subpolar zone 7902
Subtropics with winter rain 196070
Subtrop. with year-round rain 78472
Temperate midlatitudes 1485447
Tropics with summer rain 15332
Tropics with year-round rain 11652
NA 2451

4.3 Extract WWF Ecoregions

Extract ecoregion name and ID from Ecoregions of the World. Olson et al. 2001 (BioScience).
Computation was performed in EVE HPC cluster using function A98_PredictorsExtract.R. Divided in 99 chunks.

ecoreg <- readOGR("../sPlot/_Ancillary/official", layer="wwf_terr_ecos")
ecoreg@data <- ecoreg@data %>% 
  dplyr::select(OBJECTID, ECO_NAME, REALM, BIOME, ECO_NUM, ECO_ID, eco_code)

cl <- makeForkCluster(5, outfile="")
registerDoParallel(cl)

clusterEvalQ(cl, {
  library(rgdal)
  library(raster)
  library(sp)
  library(elevatr)
  library(dplyr)
  })

ecoreg.out <- foreach(i=1:99, .combine=rbind) %dopar% {
  source("A98_PredictorsExtract.R")
  PredExtr(header.shp, myfunction=NA, output="../_derived/wwf_ecoregions/",  
                     toextract=ecoreg, typp="shp", ncores=1, chunkn=99, chunk.i=i)
}
stopCluster(cl)

Reimport output and join to header

ecoreg.files <- list.files("../_derived/wwf_ecoregions/", pattern="wwf_terr_ecos-[0-9]+.csv", full.names=T)
ecoreg.files <- ecoreg.files[order(as.numeric(str_extract(ecoreg.files, pattern="[0-9]+")))]
ecoreg.out <- do.call(rbind, lapply(ecoreg.files, function(x) {read_csv(x, 
  col_types=cols(
    .default = col_double(),
    ECO_NAME = col_character(),
    REALM = col_character(),
    G200_REGIO = col_character(),
    eco_code = col_character()))}))

ecoreg.out <- header.shp@data %>% 
  dplyr::select(PlotObservationID) %>% 
  bind_cols(ecoreg.out)

header <- header %>% 
  left_join(ecoreg.out %>% 
              dplyr::select(PlotObservationID, ECO_NAME, ECO_ID) %>% 
              dplyr::rename(Ecoregion=ECO_NAME, EcoregionID=ECO_ID), 
            by="PlotObservationID")

There are 0 unassigned plots.

Summarize:
Number of plots in the 30 best represented Ecoregions
Ecoregion num.plot
Atlantic mixed forests 348365
Western European broadleaf forests 251804
Baltic mixed forests 191195
Central European mixed forests 122612
Pannonian mixed forests 95684
Alps conifer and mixed forests 74185
Carpathian montane forests 68018
Celtic broadleaf forests 64540
Sarmatic mixed forests 34920
Northeastern Spain and Southern France Mediterranean forests 34194
Taiheiyo evergreen forests 31629
English Lowlands beech forests 29728
Balkan mixed forests 28785
Italian sclerophyllous and semi-deciduous forests 28232
Cantabrian mixed forests 26603
Pyrenees conifer and mixed forests 24453
Tyrrhenian-Adriatic Sclerophyllous and mixed forests 22922
Iberian sclerophyllous and semi-deciduous forests 18034
East European forest steppe 16667
Great Basin shrub steppe 16131
Scandinavian and Russian taiga 15344
Eastern Australian temperate forests 15117
Dinaric Mountains mixed forests 14484
Pontic steppe 14085
Rodope montane mixed forests 13454
Caspian lowland desert 13443
Nihonkai montane deciduous forests 12414
Southeast Australia temperate savanna 12134
Colorado Plateau shrublands 10964
North Atlantic moist mixed forests 10591

4.4 Extract elevation

Extract elevation for each plot. Loops over tiles of 1 x 1°, projects to mercator, and extract elevation for plot coordinates, as well as 2.5, 50, and 97.5 quantiles for a buffer area having a radius equal to the location uncertainty of each plot (but only if location uncertainty < 50 km). DEM derive from package elevatr, which uses the Terrain Tiles on Amazon Web Services. Resolutions of DEM rasters vary by region. I set a zoom factor z=10, which corresponds to ~ 75-150 m. Sources are: SRTM, data.gov.at in Austria, NRCAN in Canada, SRTM, NED/3DEP 1/3 arcsec, data.gov.uk in United Kingdom, INEGI in Mexico, ArcticDEM in latitudes above 60°, LINZ in New Zealand, Kartverket in Norway, as described here.
Split data into tiles of 1 x 1 degrees, and create sp::SpatialPointsDataFrame files. Only for plots having a location uncertainty < 50 km, which corresponds to 1582252 plots.

header.tiles <- header %>%
  dplyr::select(PlotObservationID, Dataset, Longitude, Latitude, `Location uncertainty (m)`) %>%
  mutate(`Location uncertainty (m)`=abs(`Location uncertainty (m)`)) %>% 
  filter(`Location uncertainty (m)`<= 50000) %>%
  mutate_at(.vars=vars(Longitude, Latitude), 
            .funs=list(tile=~cut(., breaks = seq(-180,180, by=.2)))) %>%
  filter(!is.na(Longitude_tile) & !is.na(Latitude_tile) ) %>%
  mutate(tilenam=factor(paste(Longitude_tile, Latitude_tile)))

There are 1581231 plots out of 1977637 plots with Location uncertainty <= 50km (or absent). The total number of tiles is 30118.
Performed in EVE HPC cluster using function A97_ElevationExtract.R. Divided in 99 chunks.

cl <- makeForkCluster(5, outfile="")
registerDoParallel(cl)

clusterEvalQ(cl, {
  library(rgdal)
  library(raster)
  library(sp)
  library(elevatr)
  library(dplyr)})

# Divided in 99 chunks
elevation.out <- foreach(i=1:99, .combine=rbind) %dopar% {
  source("A97_ElevationExtract.R")
  ElevationExtract(header.shp, output, ncores=1, chunk.i=1)}
stopCluster(cl)

For those tiles that failed, extract elevation of remaining plots one by one

#create list of tiles for which dem could not be extracted
myfiles <- list.files("../_derived/elevatr/")
failed <- list.files("../_derived/elevatr/", pattern = "[A-Za-z]*_[0-9]+failed\\.RData$")
failed <- as.numeric(unlist(regmatches(failed, gregexpr("[[:digit:]]+", failed))))

#create SpatialPointsDataFrame
sp.tile0 <- SpatialPointsDataFrame(coords=header.tiles %>% 
                                    filter(tilenam %in% levels(header.tiles$tilenam)[failed]) %>%
                                    dplyr::select(Longitude, Latitude),
                                  data=header.tiles %>% 
                                    filter(tilenam %in% levels(header.tiles$tilenam)[failed]) %>%
                                    dplyr::select(-Longitude, -Latitude),
                                  proj4string = CRS("+init=epsg:4326"))
sp.tile0 <- spTransform(sp.tile0, CRSobj = CRS("+init=epsg:3857 +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0
                                               +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null
                                               +no_defs ")) #project to mercator
output.tile <- data.frame(NULL)

cl <- makeForkCluster(5, outfile="")
registerDoParallel(cl)

clusterEvalQ(cl, {
  library(rgdal)
  library(raster)
  library(sp)
  library(elevatr)
  library(dplyr)})

#Loop over all plots
elevation.failed <- foreach(i=1:nrow(sp.tile0), .packages=c('raster'), .combine=rbind) %dopar% { 
  sp.tile <- sp.tile0[i,]
  tryCatch(raster.tile <- get_elev_raster(sp.tile, z=10, 
                                          expand=max(sp.tile$`Location uncertainty (m)`)),
        error = function(e){
          print(paste("could not retrieve DEM for", sp.tile$PlotObservationID))}
          )
   if(!exists("raster.tile")) {
    output.tile <- data.frame(PlotObservationID==sp.tile$PlotObservationID, 
                              elevation=NA, 
                              Elevation_q2.5=NA, 
                              Elevation_median=NA,
                              Elevation_q97.5=NA,
                              DEM.res=NA)
    return(output.tile)
  } else {
  # clip dem tile with continent shape
  raster.tile <- mask(raster.tile, continent.high.merc)
  
  #extract and summarize elevation data
  elev.tile <- raster::extract(raster.tile, sp.tile, small=T)
  elev.tile.buffer <- raster::extract(raster.tile, sp.tile, 
                                      buffer=sp.tile$`Location uncertainty (m)`, small=T)
  elev.q95 <- t(round(mapply( quantile, 
                            x=elev.tile.buffer,
                            probs=rep(c(0.025, 0.5, 0.975), each=length(elev.tile)), na.rm=T)))
  output.tile <- data.frame(PlotObservationID=sp.tile$PlotObservationID, 
                            elevation=round(elev.tile), 
                            elev.q95, 
                            DEM.res=res(raster.tile)[1]) %>%
  rename(Elevation_q2.5=X2.5., Elevation_median=X50., Elevation_q97.5=X97.5.)
  return(output.tile)
  }
}
stopCluster(cl)
save(elevation.failed, file = "../_derived/elevatr/elevation_missing.RData")

Compose tiles into a single output, and export

myfiles <- list.files("../_derived/elevatr/", pattern = "elevation_tile_[0-9]+\\.RData$", full.names = T)

#create empty data.frame
elevation.out <- matrix(NA, nrow=nrow(header.tiles), ncol=6)
elevation.out <- as.data.frame(elevation.out)
colnames(elevation.out) <- c("PlotObservationID", "elevation", "Elevation_q2.5", "Elevation_median", "Elevation_q97.5","DEM.res")
elevation.out$PlotObservationID <- header.tiles$PlotObservationID

tmp <- NULL
for(i in 1:length(myfiles)){
  load(myfiles[i])
  #attach results to empty data.frame
  tmp <- bind_rows(tmp, output.tile)
  if(i %in% seq(5000, length(myfiles), by=5000)){
    mymatch <- base::match(x=tmp$PlotObservationID, table=elevation.out$PlotObservationID)
    mymatch <- mymatch[!is.na(mymatch)]
    elevation.out[mymatch,] <- tmp
    tmp <- NULL
    print(paste("Attached first", i, "files"))
  }
  if(i %in% seq(1,length(myfiles), by=250)){print(i)}
}

load(file = "../_derived/elevatr/elevation_missing.RData")

mymatch <- base::match(x=elevation.failed$PlotObservationID, table=elevation.out$PlotObservationID)
mymatch <- mymatch[!is.na(mymatch)]
elevation.out[mymatch,] <- elevation.failed

write_csv(elevation.out, path ="../_derived/elevatr/elevation.out.csv")

Reimport output, attach to header and check

elevation.out <- read_csv("../_derived/elevatr/elevation.out.csv")
Example of elevation output (10 randomly selected plots shown)
PlotObservationID elevation Elevation_q2.5 Elevation_median Elevation_q97.5 DEM.res
557790 364 30 306 497 153
980511 1230 1230 1230 1230 153
908236 -4 -3 -2 6 153
486948 93 93 93 93 153
839270 76 76 76 76 153
1345262 857 857 857 857 153
1091788 2 2 2 2 153
731205 128 128 128 128 153
1818615 298 298 298 298 153
262230 246 238 242 246 153
summary(elevation.out %>% 
          dplyr::select(-PlotObservationID, -elevation))
##  Elevation_q2.5    Elevation_median Elevation_q97.5     DEM.res     
##  Min.   :-1018.0   Min.   :-474.0   Min.   :-474     Min.   : 76.4  
##  1st Qu.:   17.0   1st Qu.:  21.0   1st Qu.:  26     1st Qu.:153.0  
##  Median :  163.0   Median : 178.0   Median : 194     Median :153.0  
##  Mean   :  441.6   Mean   : 469.4   Mean   : 506     Mean   :153.1  
##  3rd Qu.:  593.0   3rd Qu.: 644.0   3rd Qu.: 698     3rd Qu.:153.0  
##  Max.   : 6155.0   Max.   :6155.0   Max.   :6155     Max.   :306.0  
##  NA's   :104818    NA's   :104818   NA's   :104818   NA's   :3082

There are 104818 plots without elevation info, corresponding to 5.3% of the number of matched plots. Please not that elevation was extracted only for plots with location uncertainty <50 km, i.e., 1582252 plots.
There are 51389 plots with elevation below sea level.
Join elevation data (only median)

header <- header %>% 
  left_join(elevation.out %>% 
              dplyr::select(PlotObservationID, Elevation_median) %>% 
              rename(elevation_dem=Elevation_median) %>% 
              distinct(PlotObservationID, .keep_all=T), 
            by="PlotObservationID")
Summary and check
Dataset with highest number of plots below sea level (10 randomly selected plots shown)
GIVD ID Dataset num.plot
EU-RO-008 Romania Grassland Database 6
EU-00-004e Spain_sivim_sclerophyllous 4
AS-00-001 Korea 1
AS-CN-007 China_Northern_Mountains_2 15
AU-AU-002 Australia 12
EU-GB-001 Britain_nvcd 67
EU-00-018 Nordic Vegetation Database 7
AU-AU-003 NSW Australia 2
EU-IT-011 Italy_uniroma 48
EU-DE-040 Schleswig-Holstein Db 146

Create Scatterplot between measured elevation in the field, and elevation derived from DEM

#join measured and derived elevation
mydata <- header %>% 
  dplyr::select(PlotObservationID, `Altitude (m)`, elevation_dem) %>%
  filter(!is.na(`Altitude (m)`) & !is.na(elevation_dem)) %>%
  rename(elevation_measured=`Altitude (m)`)

ggplot(data=mydata) + 
  geom_point(aes(x=elevation_measured, y=elevation_dem), alpha=1/10, cex=0.8) + 
  theme_bw() + 
  geom_abline(slope=0, intercept=0, col=2, lty=2) + 
  geom_abline(slope=1, intercept=1, col="Dark green")

4.5 Assign to countries

There is a minor number of plots (4064), not assigned to any countries. Fix that.

countries <- readOGR("../../Ancillary_Data/naturalearth/ne_110m_admin_0_countries.shp") 
## OGR data source with driver: ESRI Shapefile 
## Source: "/data/sPlot/users/Francesco/Ancillary_Data/naturalearth/ne_110m_admin_0_countries.shp", layer: "ne_110m_admin_0_countries"
## with 177 features
## It has 94 fields
## Integer64 fields read as strings:  POP_EST NE_ID
crs(countries) <- crs(header.shp)
tmp.sel <- header %>% 
  filter(is.na(Country)) %>% 
  pull(PlotObservationID)

header.shp.nocontry <- header.shp[which(header.shp$PlotObservationID %in% tmp.sel),]
countries.out <- over(header.shp.nocontry, countries)

header$Country <- as.character(header$Country)
header$Country[tmp.sel] <- countries.out$NAME
## Warning in header$Country[tmp.sel] <- countries.out$NAME: number of items to
## replace is not a multiple of replacement length
header$Country <- as.factor(header$Country)

Plots without country info are now only 4073.

5 Map of plots

Update header.shp

header.shp@data <- header.shp@data %>% 
  left_join(header %>% 
          dplyr::select(PlotObservationID, sBiome, CONTINENT, 
                        Ecoregion, GIVD.ID=`GIVD ID`), 
            by="PlotObservationID") 

header.sf <- header.shp %>% 
  st_as_sf() %>% 
  st_transform(crs = "+proj=eck4")

Basic Map of the world in Eckert projection

countries <- ne_countries(returnclass = "sf") %>% 
  st_transform(crs = "+proj=eck4") %>% 
  st_geometry()
graticules <- ne_download(type = "graticules_15", category = "physical",
                          returnclass = "sf") %>% 
  st_transform(crs = "+proj=eck4") %>% 
  st_geometry()
## OGR data source with driver: ESRI Shapefile 
## Source: "/data/sPlot/users/Francesco/_tmp/Rtmpla5oar", layer: "ne_110m_graticules_15"
## with 35 features
## It has 5 fields
## Integer64 fields read as strings:  degrees scalerank
bb <- ne_download(type = "wgs84_bounding_box", category = "physical",
                  returnclass = "sf") %>% 
  st_transform(crs = "+proj=eck4") %>% 
  st_geometry()
## OGR data source with driver: ESRI Shapefile 
## Source: "/data/sPlot/users/Francesco/_tmp/Rtmpla5oar", layer: "ne_110m_wgs84_bounding_box"
## with 1 features
## It has 2 fields
w3a <- ggplot() +
  geom_sf(data = bb, col = "grey20", fill = "white") +
  geom_sf(data = graticules, col = "grey20", lwd = 0.1) +
  geom_sf(data = countries, fill = "grey90", col = NA, lwd = 0.3) +
  coord_sf(crs = "+proj=eck4") +
  theme_minimal() +
  theme(axis.text = element_blank(), 
        legend.title=element_text(size=12), 
        legend.text=element_text(size=12),
        legend.background = element_rect(size=0.1, linetype="solid", colour = 1), 
        legend.key.height = unit(1.1, "cm"), 
        legend.key.width = unit(1.1, "cm")) +
  scale_fill_viridis()

Graph of plot density (hexagons)

header2 <- header %>% 
  filter(!is.na(Longitude) | !is.na(Latitude)) %>% 
  dplyr::select(PlotObservationID, Latitude, Longitude, `GIVD ID`) %>% 
  filter(!(abs(Longitude) >171 & abs(Latitude>70)))
dggs <- dgconstruct(spacing=300, metric=T, resround='down')
## Resolution: 6, Area (km^2): 69967.8493448681, Spacing (km): 261.246386348549, CLS (km): 298.479323187169
#Get the corresponding grid cells for each earthquake epicenter (lat-long pair)
header2$cell <- dgGEO_to_SEQNUM(dggs, header2$Longitude, header2$Latitude)$seqnum

#Calculate number of plots for each cell
header.out   <- header2 %>% 
  group_by(cell) %>% 
  summarise(value.out=log(n(), 10))

#Get the grid cell boundaries for cells 
grid   <- dgcellstogrid(dggs, header.out$cell, frame=F) %>%
  st_as_sf() %>% 
  mutate(cell = header.out$cell) %>% 
  mutate(value.out=header.out$value.out) %>% 
  st_transform("+proj=eck4") %>% 
  st_wrap_dateline(options = c("WRAPDATELINE=YES"))

## plotting
legpos <- c(0.160, .24)
(w3 <- w3a + 
       geom_sf(data=grid, aes(fill=value.out),lwd=0, alpha=0.9)    +
       geom_sf(data = countries, col = "grey10", fill=NA, lwd = 0.3) + 
       scale_fill_viridis(
         name="# plots", breaks=0:5, labels = c("1", "10", "100",
                                                "1,000", "10,000", "100,000"), option="viridis" ) + 
    #labs(fill="# plots") + 
    theme(legend.position = legpos +c(-0.06, 0.25))
)

ggsave(filename="../_pics/PlotDensityLog10_vir.png", width = 15, height = 7, units="in", dpi=300, plot=w3)

Graph of plot location by Dataset

(w4 <- w3a + 
         geom_sf(data=header.sf %>% 
                   mutate(GIVD.ID=fct_shuffle(GIVD.ID)), aes(col=factor(GIVD.ID)), pch=16, size=0.8, alpha=0.6) +
         geom_sf(data = countries, col = "grey10", fill=NA, lwd = 0.3) + 
         theme(legend.position = "none"))

ggsave(filename="../_pics/PlotDistrib_Dark2_shuffle1984.png", width = 15, height = 7, units="in", dpi=300, plot=w4) ## takes ~40' to render

Double check attribution to continents, Biomes and Ecoregions. Do it only on a subset of plots

tmp.sel <- header %>% 
  group_by(sBiome) %>% 
  sample_n(1000) %>% 
  pull(PlotObservationID)

#sBiomes
(w5 <- w3a + 
         geom_sf(data=header.sf %>% 
                   filter(PlotObservationID %in% tmp.sel), 
                 aes(col=factor(sBiome)), pch=16, size=0.8, alpha=0.6) +
         geom_sf(data = countries, col = "grey10", fill=NA, lwd = 0.3))

#continent
tmp.sel <- header %>%
  filter(CONTINENT!="AN") %>% 
  group_by(CONTINENT) %>% 
  sample_n(1000) %>% 
  pull(PlotObservationID)
(w6 <- w3a + 
         geom_sf(data=header.sf %>% 
                   filter(PlotObservationID %in% tmp.sel), 
                 aes(col=factor(CONTINENT)), pch=16, size=0.8, alpha=0.6) +
         geom_sf(data = countries, col = "grey10", fill=NA, lwd = 0.3))

#Ecoregion - Only 10 random ecoregions tested
tmp.sel <- header %>%
  filter(Ecoregion %in% sample(unique(header$Ecoregion), 10)) %>% 
  pull(PlotObservationID)
(w7 <- w3a + 
         geom_sf(data=header.sf %>% 
                   filter(PlotObservationID %in% tmp.sel) %>% 
                   mutate(Ecoregion=factor(Ecoregion)), 
                 aes(col=factor(Ecoregion)), pch=16, size=0.8, alpha=0.6) +
         geom_sf(data = countries, col = "grey10", fill=NA, lwd = 0.3) + 
    theme(legend.position="bottom"))

6 Fix output and export

#check 
nrow(header)==nrow(header0)
## [1] TRUE
header <- header %>% 
  dplyr::select(
    #Metadata
    PlotObservationID, Dataset, "GIVD ID", "TV2 relevé number", "ORIG_NUM", "GUID", 
    #Geographic info
    Longitude:"Location uncertainty (m)", Country, CONTINENT, sBiome, sBiomeID, Ecoregion, EcoregionID, Locality,
    #Methodological info
    "Relevé area (m²)", "Cover abundance scale", "Date of recording", "Plants recorded", 
    "Herbs identified (y/n)","Mosses identified (y/n)","Lichens identified (y/n)",
    #Topographical
    elevation_dem, "Altitude (m)", "Aspect (°)", "Slope (°)", 
    #Vegetation type
    Forest:Naturalness, ESY, 
    #Vegetation structure
                "Cover total (%)":"Maximum height cryptogams (mm)")
save(header, file = "../_output/header_sPlot3.0.RData")
Example of header (20 random rows shown)
PlotObservationID Dataset GIVD ID TV2 relevé number ORIG_NUM GUID Longitude Latitude Location uncertainty (m) Country CONTINENT sBiome sBiomeID Ecoregion EcoregionID Locality Relevé area (m²) Cover abundance scale Date of recording Plants recorded Herbs identified (y/n) Mosses identified (y/n) Lichens identified (y/n) elevation_dem Altitude (m) Aspect (°) Slope (°) Forest Shrubland Grassland Wetland Sparse.vegetation Naturalness ESY Cover total (%) Cover tree layer (%) Cover shrub layer (%) Cover herb layer (%) Cover moss layer (%) Cover lichen layer (%) Cover algae layer (%) Cover litter layer (%) Cover open water (%) Cover bare rock (%) Height (highest) trees (m) Height lowest trees (m) Height (highest) shrubs (m) Height lowest shrubs (m) Aver. height (high) herbs (cm) Aver. height lowest herbs (cm) Maximum height herbs (cm) Maximum height cryptogams (mm)
1684829 Australia AU-AU-002 7151 NA CA88D509-C4D8-4F48-8255-EB8DCA4B0D3F 146.423410 -31.83544 10 Australia AU Dry tropics and subtropics 3 Southeast Australia temperate savanna 10803 NA 400 Percentage (%) 2010-03-31 All vascular plants NA NA NA 297 293 45 1 1 0 1 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
128620 INBOVEG EU-BE-002 5681 NA 14C0B4B2-865F-4678-9749-3F9CA2B63D82 4.667992 50.64321 132565 Belgium EU Temperate midlatitudes 6 Atlantic mixed forests 80402 NA NA Procentueel 2014-07-15 NA NA NA NA NA NA NA NA 0 0 1 0 0 2 E34a NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
1340236 Romania Grassland Database EU-RO-008 13109 NA 68022207-E359-441C-9B67-90A40559E537 25.213300 45.52660 1000 Romania EU Temperate midlatitudes 6 Carpathian montane forests 80504 Piatra Craiului Mica/ 16 Braun/Blanquet (old) 1994-07-09 NA NA NA NA 1983 1700 270 45 0 0 1 0 0 1 E44a 80 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
960242 Ireland_nvd EU-IE-001 24002 NA 0FEA7F76-52C6-4DEC-A24B-69880DBC25B9 -7.443679 52.53039 100 Ireland EU Temperate midlatitudes 6 Celtic broadleaf forests 80409 Kyleadohir Wood 100 Domin 2003-05-19 NA NA NA NA 82 80 0 0 1 0 0 0 0 NA G NA 39 16 NA 39 NA NA NA NA NA NA NA NA NA NA NA NA NA
291870 Czechia_nvd EU-CZ-001 433425 NA D85880E4-2497-46C5-A762-819D7E8789AD 16.306944 49.74667 -100 Czech Republic EU Temperate midlatitudes 6 Western European broadleaf forests 80445 Chmelik(okr. Svitavy), SPR V bukach 50 Braun/Blanquet (old) NA NA NA N N 556 -1 NA NA 1 0 0 0 0 1 G16a NA NA NA 90 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
598606 France_SOPHY EU-FR-003 41222 NA F8D009F6-7DFC-4709-AFE9-476E05261D13 7.737229 48.69000 -10 France EU Temperate midlatitudes 6 Western European broadleaf forests 80445 FORET DOM. DU KRITTWALD NA Braun/Blanquet (old) 1984-01-01 NA NA NA NA 138 139 NA NA 1 0 0 0 0 NA G NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
919400 GrassVeg.DE EU-DE-020 6579 NA 793F017D-3539-4D87-9ABE-C90F64AB608D 8.727559 54.29145 358 Germany EU Temperate midlatitudes 6 Atlantic mixed forests 80402 North Frisia and Dithmarschen 1 Percentage (%) 1989-01-01 NA Y N N 0 2 0 0 0 0 1 1 0 2 E 92 NA NA 92 NA NA NA NA NA NA NA NA NA NA 40 NA NA NA
1482496 Slovenia EU-SI-001 6310 NA 1EEA956F-1FDA-4C33-8D8E-F3B252DCC24D 14.713742 46.15488 -121560 Slovenia EU Temperate midlatitudes 6 Pannonian mixed forests 80431 Predalpsko območje NA Braun/Blanquet (old) NA NA NA NA NA NA 320 45 30 1 0 0 0 0 1 G16a NA NA 90 70 40 NA NA NA NA NA NA NA NA NA NA NA NA NA
1074055 Netherlands EU-NL-001 211466 NA 5D553640-188A-4071-A98E-B03220AA0268 4.771816 51.74459 5000 Netherlands EU Temperate midlatitudes 6 Atlantic mixed forests 80402 NA NA Braun/Blanquet (old) 1983-08-09 NA NA Y NA 0 NA NA NA 0 0 0 1 0 1 C51a NA NA NA 80 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
840021 Germany_vegetweb EU-DE-013 90452 NA 8B6392B0-52B4-44AF-8AF6-BF0E42DE2E35 9.198411 51.73354 10 Germany EU Temperate midlatitudes 6 Western European broadleaf forests 80445 Nordrhein-Westfalen; Brakel; Brakel NA Percentage (%) 1971-06-07 NA NA NA NA 158 154 NA NA 0 0 1 0 0 NA E NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
1749064 Japan AS-JP-002 19203 NA 3801F2EB-9F78-451C-8A46-947EFA7A71C9 135.882990 35.49453 -10 Japan EU Subtrop. with year-round rain 5 Nihonkai evergreen forests 80427 NA NA Presence/Absence NA NA NA NA NA 111 NA NA NA 1 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
504008 European Coastal Vegetation Database EU-00-017 3235 NA 12C38BCA-2858-44E9-9949-C8E43E1648ED 26.162149 39.93535 -479776 Turkey EU Subtropics with winter rain 4 Aegean and Western Turkey sclerophyllous and mixed forests 81201 Hisarlik (Truva) 10 Ordinale scale (1-9) 1989-01-01 NA NA NA NA NA NA NA NA 0 0 0 1 0 1 A25d 70 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
1824174 New Zealand AU-NZ-001 8908 NA A19BC0FA-DA5D-46AB-8A19-5FB85FE0BB8C 172.774300 -41.83680 50 New Zealand AU Temperate midlatitudes 6 Richmond temperate forests 10408 NA NA Percentage (%) 1984-02-01 NA NA NA NA 996 1000 140 30 1 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
957430 Ireland_nvd EU-IE-001 19047 NA 96109D8E-5D24-48D9-B2BE-186CEFD9F3DD -10.006850 53.38878 1000 Ireland EU Temperate midlatitudes 6 Celtic broadleaf forests 80409 Namanawaun Lough, South west Connemara 4 Braun/Blanquet (new) 1975-01-01 NA NA Y NA 16 23 NA NA NA NA NA NA NA NA ? 90 NA NA 70 20 NA NA NA NA NA NA NA NA NA 50 NA 120 NA
1935621 USA_VegBank NA-US-002 25650 NA E7AEC1B8-DBB2-4F9C-9CB8-CF5F69EF2F77 -112.265690 35.76303 -10 United States N-A Temperate midlatitudes 6 Arizona Mountains forests 50503 NA NA Percentage (%) 2002-06-13 NA NA NA NA 1769 1780 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
79689 Balkan Vegetation Database EU-00-019 13815 NA 9B3561B1-FE72-458A-A627-8EE317A19085 24.640142 42.78998 -10 Bulgaria EU Temperate midlatitudes 6 Rodope montane mixed forests 80435 Central Stara planina Mountain, Balkanec, " Razssadnika" 200 Braun/Blanquet (old) 2004-06-06 NA NA NA NA 1031 1100 360 40 1 0 0 0 0 1 G16a NA 80 40 40 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
1469106 Slovakia_nvd EU-SK-001 640916 NA 154D1EE1-6E8E-494A-A7AA-51E7FA24001E 19.663889 48.44792 -10 Slovak Republic EU Temperate midlatitudes 6 Pannonian mixed forests 80431 Cinobaňa, 0.5 kmEEN, on the left side of the road to Turíčky 100 Domin 1974-07-01 NA NA N N 286 290 135 7 0 0 1 0 0 NA E 95 NA NA NA NA NA NA NA NA NA NA NA NA NA 65 NA NA NA
1820498 New Zealand AU-NZ-001 5232 NA FDD0CBD3-FB3E-4199-A225-A26EF46C4475 168.539400 -44.05830 50 New Zealand AU Temperate midlatitudes 6 Westland temperate forests 10414 NA NA Percentage (%) 1985-02-01 NA NA NA NA 556 580 20 15 1 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
1714379 Congo AF-CD-001 102 NA 10D95DA3-6C8A-482F-AD08-9E764B0F4074 24.487510 0.80258 10 Congo, The Democratic Republic AF Tropics with year-round rain 1 Northeastern Congolian lowland forests 30124 Yangambi 10000 Presence/Absence 2012-01-01 Woody plants >= 10 cm dbh N N N 485 475 NA NA 1 0 0 0 0 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
929628 Greece_nat2000 EU-GR-005 303631 NA F0B10D7D-9D21-4342-ACF9-B2A45E9F549A 27.180833 36.60444 -425000 Greece EU Subtropics with winter rain 4 Aegean and Western Turkey sclerophyllous and mixed forests 81201 NA 100 Braun/Blanquet (new) 2000-05-04 NA NA NA NA NA 290 360 45 1 0 0 0 0 1 G21 96 47 16 65 NA NA NA NA NA NA 4 NA 2 NA NA NA NA NA

Supplementary Material

ANNEX 1 - Ancillary function - PredExtr

# define ancillary functions
robust.mean <- function(x){mean(x, na.rm=T)}
robust.mode <- function(x){if(any(x!=0)) {
  a <- x[which(x!=0)] #exclude zero (i.e. NAs)
  return(as.numeric(names(sort(table(a), decreasing=T))[1]))} else
    return(NA)}
robust.sd <- function(x){sd(x, na.rm=T)}

#main function
PredExtr <- function(x.shp, myfunction=NA, output=NA, 
                     toextract, typp=c("raster", "shp"), ncores, chunkn=1, chunk.i=NA){
  print("Load Packages")
  require(foreach)
  require(parallel)
  require(doParallel)
  require(raster)
  require(rgeos)
  require(rgdal)
  require(geosphere) 
  require(spatialEco)
  require(sf)
 
  print(paste("Loading", typp, "data :", toextract))
  print(paste("output will be:", output))
  if(typp=="raster"){ mypredictor <- raster(toextract)} else
  mypredictor <- readOGR(toextract)
  header.shp <- readOGR(x.shp)
  crs(mypredictor) <- crs(header.shp) #should be verified beforehand!

  ## Divide in chunks if requested
  if(chunkn>1 & !is.na(chunk.i)){
    print(paste("divide in chunks and run on chunk n.", chunk.i))
    indices <- 1:length(header.shp)
    chunks <- split(indices, sort(indices%%chunkn))
    header.shp <- header.shp[chunks[[chunk.i]],]
    } 

  print("myfunction defined as")
  myfunction <- get(myfunction)
  print(myfunction)
  
  print("go parallel")
  cl <- makeForkCluster(ncores, outfile="" )
  registerDoParallel(cl)

  
  if(typp=="raster"){
    print("start main foreach loop")
    out <- foreach(i=1:length(header.shp), .packages=c('raster'), .combine=rbind) %dopar% { 
    tmp <- raster::extract(mypredictor, header.shp[i,], 
                         buffer=min(10000,  max(250, header.shp@data$lc_ncrt[i])), fun=myfunction) 
  }
  if(!is.na(output)) {write.csv(out, file = output)} else{return(out)}
  } else {
    print("Match sequentially")
    out <- sp::over(x=header.shp, y=mypredictor) 
    i.toassign <- which(is.na(out[,1]))
    toassign <- header.shp[i.toassign,]
    print(paste(length(toassign), "plots not matched directly - seek for NN"))
  if(length(toassign)>0){
    # Crack if multipolygon
    if(any( (st_geometry_type(mypredictor %>% st_as_sf)) == "MULTIPOLYGON")) {
      ## explode multipolygon
      mypredictor.expl <- explode(mypredictor) %>% 
        as_Spatial()
      } else {
        mypredictor.expl <- mypredictor}
    print("start main foreach loop")
    nearest <- foreach(i=1:length(toassign), .packages=c('raster'), .combine=rbind) %dopar% { 
      print(i)
      ## create a subset of geoentities based on a 5° buffer radius around each target plot.
      tmp.buff <- gBuffer(toassign[i,], width=5) 
      tryCatch(
        tmp.mypredictor <- spatialEco::spatial.select(
          x = tmp.buff,
          y = mypredictor.expl,
          distance = 0.1,
          predicate = "intersect"
        ),
        error = function(e) {
          print(paste("Nothing close enough for plot", toassign@data$PlotObservationID[i]))
        }
      )
      # find nearest neighbour  
      nearest.tmp <- tryCatch(tmp.mypredictor@data[geosphere::dist2Line(toassign[i,], tmp.mypredictor)[,"ID"],],
                              error = function(e){
                                ee <- myptedictor@data[1,]
                                ee[1,] <- rep(NA, ncol(mypredictor))
                                }
                              )
      return(nearest.tmp)
    }

    out[i.toassign,] <- nearest
   }
    if(!is.na(output)) {write.csv(out, file = output)} else{return(out)}
  }
  stopCluster(cl)
  }

ANNEX 2 - Ancillary function - ElevationExtract

ElevationExtract <- function(header, output, ncores, chunk.i){
  print("load packages")
  require(tidyverse)
  
  require(rgdal)
  require(sp)
  require(rgeos)
  require(raster)
  require(rworldmap)
  require(elevatr)
  
  require(parallel)
  require(doParallel)
  
  print("Import header and divide in tiles")
  header.shp <- readOGR(header)
  header.tiles <- header.shp@data %>% 
    bind_cols(as.data.frame(header.shp@coords)) %>% 
    rename(PlotObservationID=PltObID, Longitude=coords.x1, Latitude=coords.x2) %>% 
    mutate(lc_ncrt=abs(lc_ncrt)) %>% 
    filter(lc_ncrt <= 50000) %>%
    mutate_at(.vars=vars(Longitude, Latitude), 
              .funs=list(tile=~cut(., breaks = seq(-180,180, by=.2)))) %>%
    mutate(tilenam=factor(paste(Longitude_tile, Latitude_tile)))
    
  
  print("Get continent")
  sPDF <- rworldmap::getMap(resolution="high")
  continent.high <- sPDF[,"continent"]
  crs(continent.high) <- CRS("+init=epsg:4326")
  continent.high@data$continent <- fct_recode(continent.high@data$continent, "South America"="South America and the Caribbean")
  continent.high.merc <- spTransform(continent.high, CRS( "+init=epsg:3857 +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m
+nadgrids=@null +no_defs"))

  print("go parallel")
  cl <- makeForkCluster(ncores, outfile="")
  registerDoParallel(ncores)
  
  clusterEvalQ(cl, {
    library(rgdal)
    library(raster)
    library(sp)
    library(elevatr)
    library(dplyr)
  })

  print("create list of tiles still to do")
  myfiles <- list.files(path = output, pattern = "[A-Za-z]*_[0-9]+\\.RData$")
  done <- NULL
  done <- as.numeric(unlist(regmatches(myfiles, gregexpr("[[:digit:]]+", myfiles))))
  todo <- 1:nlevels(header.tiles$tilenam)
  if(length(done)>0) {todo <- todo[-which(todo %in% done)]}
  todo <- sample(todo, replace=F)
  print(paste(length(todo), "tiles to do"))
  
  print("divide in chunks")
  #divide in chunks
  todo.chunks <- split(todo, sort(todo%%10))
  
  print(paste("start main foreach loop on chunk n=", chunk.i))
  print(paste(length(todo.chunks[[chunk.i]]), "to do"))
  foreach(i = todo.chunks[[chunk.i]]) %dopar% {  
    print(paste("doing", i))
    #create sp and project data
    if(nrow(header.tiles %>% 
            filter(tilenam %in% levels(header.tiles$tilenam)[i])) ==0 ) next()
    sp.tile <- SpatialPointsDataFrame(coords=header.tiles %>% 
                                        filter(tilenam %in% levels(header.tiles$tilenam)[i]) %>%
                                        dplyr::select(Longitude, Latitude),
                                      data=header.tiles %>% 
                                        filter(tilenam %in% levels(header.tiles$tilenam)[i]) %>%
                                        dplyr::select(-Longitude, -Latitude),
                                      proj4string = CRS("+init=epsg:4326"))
    sp.tile <- spTransform(sp.tile, CRSobj = CRS("+init=epsg:3857 +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0
                                                 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null
                                                 +no_defs ")) #project to mercator
    
    #retrieve dem raster
    tryCatch(raster.tile <- get_elev_raster(sp.tile, z=9, expand=max(sp.tile$lc_ncrt), clip="bbox"),
             error = function(e){e}
    )
    if(!exists("raster.tile")) {
      raster.tile <- NA
      print(paste("tile", i, "doesn't work!, skip to next"))
      save(raster.tile, file = paste(output, "elevation_tile_", i, "failed.RData", sep=""))
      rm(raster.tile)
      } else {
    # clip dem tile with continent shape
    raster.tile <- mask(raster.tile, continent.high.merc)
    
    #extract and summarize elevation data
    elev.tile <- raster::extract(raster.tile, sp.tile, small=T)
    elev.tile.buffer <- raster::extract(raster.tile, sp.tile, 
                                        buffer=sp.tile@data$lc_ncrt, 
                                        small=T)
    tmp <- round(mapply( quantile, 
                         x=elev.tile.buffer,
                         #center=elev.tile,
                         probs=rep(c(0.025, 0.5, 0.975), each=length(elev.tile)), 
                         #lc_ncrt=sp.tile$lc_ncrt, 
                         na.rm=T))
    elev.q95 <- setNames(data.frame(matrix(tmp, ncol = 3, nrow = length(elev.tile.buffer))), 
                         c("Elevation_q2.5", "Elevation_median", "Elevation_q97.5"))
    output.tile <- data.frame(PlotObservationID=sp.tile$PlotObservationID, 
                              elevation=round(elev.tile), 
                              elev.q95, 
                              DEM.res=res(raster.tile)[1]) 
    
    #save output
    save(output.tile, file = paste(output, "elevation_tile_", i, ".RData", sep=""))
    print(paste(i, "done"))
  }}
  stopCluster(cl)
  }

ANNEX 3 - SessionInfo()

sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.7 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
##  [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
##  [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
##  [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
##  [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
## [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] dggridR_2.0.3       rnaturalearth_0.2.0 sf_0.9-3           
##  [4] elevatr_0.2.0       rworldmap_1.3-6     raster_3.0-7       
##  [7] rgeos_0.5-5         rgdal_1.5-18        sp_1.4-4           
## [10] kableExtra_1.3.1    knitr_1.30          xlsx_0.6.5         
## [13] viridis_0.5.1       viridisLite_0.3.0   forcats_0.5.0      
## [16] stringr_1.4.0       dplyr_1.0.2         purrr_0.3.4        
## [19] readr_1.4.0         tidyr_1.1.2         tibble_3.0.1       
## [22] ggplot2_3.3.0       tidyverse_1.3.0    
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.2         maps_3.3.0         jsonlite_1.7.1     dotCall64_1.0-0   
##  [5] modelr_0.1.6       assertthat_0.2.1   highr_0.8          xlsxjars_0.6.1    
##  [9] cellranger_1.1.0   yaml_2.2.1         pillar_1.4.3       backports_1.2.0   
## [13] lattice_0.20-41    glue_1.4.2         digest_0.6.25      rvest_0.3.6       
## [17] colorspace_2.0-0   htmltools_0.5.0    pkgconfig_2.0.3    broom_0.7.0       
## [21] haven_2.3.1        scales_1.1.1       webshot_0.5.2      farver_2.0.3      
## [25] generics_0.1.0     ellipsis_0.3.1     withr_2.3.0        cli_2.2.0         
## [29] magrittr_2.0.1     crayon_1.3.4       readxl_1.3.1       maptools_1.0-2    
## [33] evaluate_0.14      ps_1.4.0           fs_1.5.0           fansi_0.4.1       
## [37] class_7.3-17       xml2_1.3.2         foreign_0.8-76     tools_3.6.3       
## [41] hms_0.5.3          lifecycle_0.2.0    munsell_0.5.0      reprex_0.3.0      
## [45] e1071_1.7-4        compiler_3.6.3     rlang_0.4.9        units_0.6-7       
## [49] classInt_0.4-3     grid_3.6.3         rstudioapi_0.13    spam_2.5-1        
## [53] rmarkdown_2.5      gtable_0.3.0       codetools_0.2-18   DBI_1.1.0         
## [57] R6_2.5.0           gridExtra_2.3      lubridate_1.7.9.2  rworldxtra_1.01   
## [61] KernSmooth_2.23-18 rJava_0.9-13       stringi_1.5.3      Rcpp_1.0.5        
## [65] fields_11.6        vctrs_0.3.5        dbplyr_2.0.0       tidyselect_1.1.0  
## [69] xfun_0.19