Timestamp: Fri Nov 27 14:47:19 2020
Drafted: Francesco Maria Sabatini
Revised: Helge Bruelheide
Version: 1.1
This report documents the construction of the header file for sPlot 3.0. It is based on dataset sPlot_3.0.2, received on 24/07/2019 from Stephan Hennekens.
Changes in version 1.1.
1) Excluded plots from Canada, as recommended by Custodian
2) Filled missing info from most of the ~2000 plots without country information from these datasets.
3) Corrected mismatched sBiomes and ecoregions
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(viridis)
library(readr)
library(xlsx)
library(knitr)
library(kableExtra)
## Spatial packages
library(rgdal)
library(sp)
library(rgeos)
library(raster)
library(rworldmap)
library(elevatr)
library(sf)
library(rnaturalearth)
library(dggridR)
#save temporary files
write("TMPDIR = /data/sPlot/users/Francesco/_tmp", file=file.path(Sys.getenv('TMPDIR'), '.Renviron'))
write("R_USER = /data/sPlot/users/Francesco/_tmp", file=file.path(Sys.getenv('R_USER'), '.Renviron'))
rasterOptions(tmpdir="/data/sPlot/users/Francesco/_tmp")
Import header data. Clean header data from quotation and double quotation marks from linux terminal.
# escape all double quotation marks. Run in Linux terminal
#sed 's/"/\\"/g' sPlot_3_0_2_header.csv > sPlot_3_0_2_header_test.csv
#more general alternative in case some " are already escaped
##first removing \s before all "s, and then adding \ before all ":
#sed 's/\([^\\]\)"/\1\\\"/g; s/"/\\"/g'
Import cleaned header data.
header0 <- readr::read_delim("../sPlot_data_export/sPlot_3_0_2_header_test.csv", locale = locale(encoding = 'UTF-8'),
delim="\t", col_types=cols(
PlotObservationID = col_double(),
PlotID = col_double(),
`TV2 relevé number` = col_double(),
Country = col_factor(),
`Cover abundance scale` = col_factor(),
`Date of recording` = col_date(format="%d-%m-%Y"),
`Relevé area (m²)` = col_double(),
`Altitude (m)` = col_double(),
`Aspect (°)` = col_double(),
`Slope (°)` = col_double(),
`Cover total (%)` = col_double(),
`Cover tree layer (%)` = col_double(),
`Cover shrub layer (%)` = col_double(),
`Cover herb layer (%)` = col_double(),
`Cover moss layer (%)` = col_double(),
`Cover lichen layer (%)` = col_double(),
`Cover algae layer (%)` = col_double(),
`Cover litter layer (%)` = col_double(),
`Cover open water (%)` = col_double(),
`Cover bare rock (%)` = col_double(),
`Height (highest) trees (m)` = col_double(),
`Height lowest trees (m)` = col_double(),
`Height (highest) shrubs (m)` = col_double(),
`Height lowest shrubs (m)` = col_double(),
`Aver. height (high) herbs (cm)` = col_double(),
`Aver. height lowest herbs (cm)` = col_double(),
`Maximum height herbs (cm)` = col_double(),
`Maximum height cryptogams (mm)` = col_double(),
`Mosses identified (y/n)` = col_factor(),
`Lichens identified (y/n)` = col_factor(),
COMMUNITY = col_character(),
SUBSTRATE = col_character(),
Locality = col_character(),
ORIG_NUM = col_character(),
ALLIAN_REV = col_character(),
REV_AUTHOR = col_character(),
Forest = col_logical(),
Grassland = col_logical(),
Wetland = col_logical(),
`Sparse vegetation` = col_logical(),
Shrubland = col_logical(),
`Plants recorded` = col_factor(),
`Herbs identified (y/n)` = col_factor(),
Naturalness = col_factor(),
EUNIS = col_factor(),
Longitude = col_double(),
Latitude = col_double(),
`Location uncertainty (m)` = col_double(),
Dataset = col_factor(),
GUID = col_character()
)) %>%
rename(Sparse.vegetation=`Sparse vegetation`,
ESY=EUNIS) %>%
dplyr::select(-COMMUNITY, -ALLIAN_REV, -REV_AUTHOR, -SUBSTRATE) %>% #too sparse information to be useful
dplyr::select(-PlotID) #identical to PlotObservationID
The following column names occurred in the header of sPlot v2.1 and are currently missing from the header of v3.0
1. Syntaxon
2. Cover cryptogams (%)
3. Cover bare soil (%)
4. is.forest
5. is.non.forest
6. EVA
7. Biome
8. BiomeID
9. CONTINENT
10. POINT_X
11. POINT_Y
~~ Columns #1, #2, #3, #10, #11 will be dropped. The others will be derived below.
Some canadian plots need to be removed, on indication of Laura Boisvert-Marsh from GIVD NA-CA-004. The plots (and corresponding PlotObservationID
) are:
Fabot01 - 1707776
Fadum01, 02 & 03 - 1707779:1707781
Faers01 - 1707782
Pfe-f-08 - 1707849
Pfe-o-05- 1707854
header0 <- header0 %>%
filter(!PlotObservationID %in% c(1707776, 1707779:1707782, 1707849, 1707854))
There are 2020 plots in the Nile dataset without spatial coordinates. Assign manually with wide (90km) location uncertainty.
header <- header0 %>%
mutate(Latitude=replace(Latitude,
list=(is.na(Latitude) & Dataset=="Egypt Nile delta"),
values=30.917351)) %>%
mutate(Longitude=replace(Longitude,
list=(is.na(Longitude) & Dataset=="Egypt Nile delta"),
values=31.138534)) %>%
mutate(`Location uncertainty (m)`=replace(`Location uncertainty (m)`,
list=(is.na(`Location uncertainty (m)`) & Dataset=="Egypt Nile delta"),
values=-90000))
There are two plots in the Romania Grassland Databse
and ~4442 plots in the Japan
database, whose lat\long are inverted. Correct.
toswap <- c(which(header$Dataset=="Japan" & header$Latitude>90),
which(header$Dataset=="Romania Grassland Database" & header$Longitude>40))
header[toswap, c("Latitude", "Longitude")] <- header[toswap, c("Longitude", "Latitude")]
There are 237563 plots without location uncertainty. As a first approximation, we assign the median of the respective dataset, as a negative value to indicate this is an estimation, rather than a measure.
header <- header %>%
left_join(header %>%
group_by(Dataset) %>%
summarize(loc.uncer.median=median(`Location uncertainty (m)`, na.rm=T)),
by="Dataset") %>%
mutate(`Location uncertainty (m)`=ifelse( is.na(`Location uncertainty (m)` & !is.na(Latitude)),
-abs(loc.uncer.median),
`Location uncertainty (m)`)) %>%
dplyr::select(-loc.uncer.median)
## `summarise()` ungrouping output (override with `.groups` argument)
There are still 91960 plots with no estimation of location uncertainty.
Assign plot size to plots in the Patagonia dataset (input of Ana Cingolani)
header <- header %>%
mutate(`Relevé area (m²)`=ifelse( (Dataset=="Patagonia" & is.na(`Relevé area (m²)`)),
-900, `Relevé area (m²)`))
There are 518 plots from the dataset Germany_gvrd (EU-DE-014) having a location uncertainty equal to 2,147,483 km (!). These plots have a location reported. Replace with a more likely estimate (20 km)
header <- header %>%
mutate(`Location uncertainty (m)`=replace(`Location uncertainty (m)`,
list=`Location uncertainty (m)`==2147483647,
values=20000))
Fill out the columns Forest:Sparse.vegetation
with NAs, where necessary. Create columns is.forest
and is.non.forest
using script developed for sPlot 2.1
~~ I am not assigning plots to Faber-Langedon formation at this stage, as this is only possible for European plots having an ESY classification.
eunis.key <- read.xlsx("../_input/EUNIS_WFT.xlsx", sheetIndex = "Sheet1", endRow = 246) %>%
dplyr::select(EUNIS_code, NATURALNESS:SPARSE_VEG) %>%
mutate(EUNIS_code=as.character(EUNIS_code)) %>%
rename(ESY=EUNIS_code,
Naturalness=NATURALNESS,
Forest=FOREST,
Shrubland=SCRUBLAND,
Grassland=GRASSLAND,
Wetland=WETLAND,
Sparse.vegetation=SPARSE_VEG)#,
header <- header %>% # header.backup %>%
mutate(ESY=as.character(ESY)) %>%
#mutate(ESY=ifelse(ESY=="?", NA, ESY)) %>%
# Systematically assign some databases to forest
mutate(Forest=ifelse(Dataset %in%
c("Turkey Oak_Forest Database", "Turkey Forest Database", "Chile_forest", "Ethiopia"),
T, Forest)) %>%
#fill up with F those rows where at least one column on formation is assigned
rowwise() %>%
mutate(Any=any(Forest, Shrubland, Grassland, Wetland, Sparse.vegetation)) %>%
mutate(Forest=ifelse( (is.na(Forest) & Any), F, Forest)) %>%
mutate(Shrubland=ifelse( (is.na(Shrubland) & Any), F, Shrubland)) %>%
mutate(Grassland=ifelse( (is.na(Grassland) & Any), F, Grassland)) %>%
mutate(Wetland=ifelse( (is.na(Wetland) & Any), F, Wetland)) %>%
mutate(Sparse.vegetation=ifelse( (is.na(Sparse.vegetation) & Any), F, Sparse.vegetation)) %>%
ungroup() %>%
dplyr::select(-Any) %>%
mutate_at(vars(Forest:Shrubland), .funs=list(~.*1)) %>%
mutate(Naturalness=as.numeric(as.character(Naturalness))) %>%
##join and coalesce with eunis.key
left_join(eunis.key %>%
distinct(), by = "ESY") %>%
mutate(
Forest = dplyr:::coalesce(Forest.x, Forest.y),
Shrubland = coalesce(Shrubland.x, Shrubland.y),
Grassland = coalesce(Grassland.x, Grassland.y),
Wetland = coalesce(Wetland.x, Wetland.y),
Sparse.vegetation = coalesce(Sparse.vegetation.x, Sparse.vegetation.y),
Naturalness = coalesce(Naturalness.x, Naturalness.y)
) %>%
dplyr::select(-ends_with(".x"), -ends_with(".y"))
Reduce number of factor levels for the column Plants recorded
header <- header %>%
mutate(`Plants recorded`=fct_recode(`Plants recorded`,
"All vascular plants"="complete vegetation",
"All vascular plants"="Complete vegetation",
"All vascular plants"="all vascular plants",
"All vascular plants"="complete",
"All vascular plants"="Complete vegetation (including non-terricolous tax",
"All vascular plants"="Vascular plants",
"All woody plants"="Woody plants",
"All woody plants"="All woody species",
"Woody plants >= 10 cm dbh"= "trees>=10cm dbh",
"All trees & dominant understory"="All trees & dominant shrubs",
"Woody plants >= 1 cm dbh" = "Plants >= 1 cm dbh"
)) %>%
mutate(`Plants recorded`=factor(`Plants recorded`, exclude = "#N/A"))
Align consortium labels to those in sPlot’s consortium archive
databases <- read_csv("/data/sPlot/users/Francesco/_sPlot_Management/Consortium/Databases.out.csv")
header <- header %>%
mutate(Dataset=fct_recode(Dataset,
"BIOTA_South_Africa" = "BIOTA_South_Africa_3",
"Kriti"="Cyprus_Bergmeier",
"European Boreal Forest Database"="European Boreal Forest Database 1",
"European Boreal Forest Database"="European Boreal Forest Database 2",
"European Coastal Vegetation Database"= "European Coastal Vegetation Database-A",
"Germany_vegetweb"="Germany_vegetweb2",
"Germany_vegetweb"="Germany_vegetweb3",
"Ladakh"="Ladakh_2",
"Netherlands"="Netherlands Military sites",
"NSW Australia" = "NSW Austalia",
)) %>%
left_join(databases %>%
dplyr::select(`GIVD ID`, label) %>%
rename(Dataset=label),
by="Dataset")
Create spatial point dataframe for sPlot data to intersect with spatial layers
header.shp <- header %>%
filter(!is.na(Longitude) | !is.na(Latitude))
header.shp <- SpatialPointsDataFrame(coords= header.shp %>%
dplyr::select(Longitude, Latitude),
proj4string = CRS("+init=epsg:4326"),
data=data.frame(PlotObservationID= header.shp$PlotObservationID,
loc.uncert=header.shp$`Location uncertainty (m)`,
`GIVD ID`=header.shp$`GIVD ID`))
writeOGR(header.shp, dsn="../_derived/", layer="header.shp", driver="ESRI Shapefile", overwrite_layer = T)
Reimport shapefile
header.shp <- readOGR("../_derived/header.shp.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "/data/sPlot/users/Francesco/sPlot3/_derived/header.shp.shp", layer: "header.shp"
## with 1976235 features
## It has 3 fields
header.shp@data <- header.shp@data %>%
rename(PlotObservationID=PltObID,
loc.uncert=lc_ncrt,
`GIVD ID`=GIVD_ID)
crs(header.shp) <- CRS("+init=epsg:4326")
Download and manipulate map of continents
sPDF <- rworldmap::getMap(resolution="coarse")
continent <- sPDF[,"continent"]
crs(continent) <- CRS("+init=epsg:4326")
continent@data[243,"continent"] <- "South America" ## Manually correct missing data
# create clipped version of continent to avoid going beyond 180 lON
coords <- data.frame(x=c(-180,180,180,-180),
y=c(-90,-90,90,90))
bboxc = Polygon(coords)
bboxc = SpatialPolygons(list(Polygons(list(bboxc), ID = "a")), proj4string=crs(continent))
continent_clipped <- gIntersection(continent[-137,], bboxc, byid=T) # polygon 137 gives problems... workaround
## same but high resolution (slower, but works better for plots along coastlines)
sPDF <- rworldmap::getMap(resolution="high")
continent.high <- sPDF[,"continent"]
crs(continent.high) <- CRS("+init=epsg:4326")
continent.high@data$continent <- fct_recode(continent.high@data$continent, "South America"="South America and the Caribbean")
Assign plots to continent
continent.out <- sp::over(x=header.shp, y=continent)
#overlay unassigned points to the high resolution layer of continent
toassign <- header.shp[which(is.na(continent.out$continent)),] #154782 remain to assign
crs(toassign) <- crs(continent)
continent.out2 <- sp::over(x=toassign, y=continent.high)
#merge first and second overlay
continent.out$continent[is.na(continent.out$continent)] <- continent.out2$continent
#correct unassigned points to closest continent
toassign <- header.shp[which(is.na(continent.out$continent)),] #47610 remain to assign
crs(toassign) <- crs(continent)
#go parallel
ncores=8
library(parallel)
library(doParallel)
cl <- makeForkCluster(ncores, outfile="" )
registerDoParallel(cl)
nearestContinent <- foreach(i=1:length(toassign), .packages=c('raster'), .combine=rbind) %dopar% {
nearestContinent.tmp <- geosphere::dist2Line(toassign[i,], continent_clipped)
}
continent.out$continent[which(is.na(continent.out$continent))] <- as.character(continent[-137,]@data[nearestContinent[,"ID"],])
save(continent.out, file = "../_derived/continent.out")
stopCluster(cl)
Reload, manipulate continent and attach to header
load("../_derived/continent.out")
header <- header %>%
left_join(header.shp@data %>%
dplyr::select(PlotObservationID) %>%
bind_cols(continent.out),
by="PlotObservationID") %>%
mutate(CONTINENT=factor(continent,
levels=c("Africa", "Antarctica", "Australia", "Eurasia", "North America", "South America"),
labels=c("AF", "AN", "AU", "EU", "NA", "SA"))) %>%
dplyr::select(-continent)
Summarize
CONTINENT | num.plot |
---|---|
AF | 26803 |
AN | 19 |
AU | 66857 |
EU | 1776535 |
NA | 90170 |
SA | 15844 |
NA | 2451 |
Performed in EVE HPC cluster using function A98_PredictorsExtract.R
. Divided in 99 chunks.
sBiomes <- readOGR("/data/sPlot/users/Francesco/Ancillary_Data/Biomes_sPlot/sBiomes.shp")
crs(sBiomes) <- crs(header.shp)
cl <- makeForkCluster(5, outfile="")
registerDoParallel(cl)
clusterEvalQ(cl, {
library(rgdal)
library(raster)
library(sp)
library(elevatr)
library(dplyr)
})
sBiomes.out <- foreach(i=1:99, .combine=rbind) %dopar% {
source("A98_PredictorsExtract.R")
PredExtr(header.shp, myfunction=NA, output="../_derived/sBiomes/",
toextract=sBiomes, typp="shp", ncores=1, chunkn=99, chunk.i=i)
}
stopCluster(cl)
Reimport output and join to header
sBiome.files <- list.files("../_derived/sBiomes", pattern="sBiomes-[0-9]+.csv", full.names=T)
sBiome.files <- sBiome.files[order(as.numeric(str_extract(sBiome.files, pattern="[0-9]+")))]
sBiomes.out <- do.call(rbind, lapply(sBiome.files, function(x) {read_csv(x)}))
sBiomes.out <- header.shp@data %>%
dplyr::select(PlotObservationID) %>%
bind_cols(sBiomes.out)
header <- header %>%
left_join(sBiomes.out %>%
dplyr::select(PlotObservationID, Name, BiomeID) %>%
dplyr::rename(sBiome=Name, sBiomeID=BiomeID),
by="PlotObservationID")
There are 2487 unassigned plots.
Summarize:sBiome | num.plot |
---|---|
Alpine | 38274 |
Boreal zone | 33833 |
Dry midlatitudes | 70676 |
Dry tropics and subtropics | 40750 |
Polar and subpolar zone | 8199 |
Subtropics with winter rain | 191247 |
Subtrop. with year-round rain | 75181 |
Temperate midlatitudes | 1491155 |
Tropics with summer rain | 15769 |
Tropics with year-round rain | 11108 |
NA | 2487 |
Extract ecoregion name and ID from Ecoregions of the World. Olson et al. 2001 (BioScience).
Computation was performed in EVE HPC cluster using function A98_PredictorsExtract.R
. Divided in 99 chunks.
ecoreg <- readOGR("../sPlot/_Ancillary/official", layer="wwf_terr_ecos")
ecoreg@data <- ecoreg@data %>%
dplyr::select(OBJECTID, ECO_NAME, REALM, BIOME, ECO_NUM, ECO_ID, eco_code)
cl <- makeForkCluster(5, outfile="")
registerDoParallel(cl)
clusterEvalQ(cl, {
library(rgdal)
library(raster)
library(sp)
library(elevatr)
library(dplyr)
})
ecoreg.out <- foreach(i=1:99, .combine=rbind) %dopar% {
source("A98_PredictorsExtract.R")
PredExtr(header.shp, myfunction=NA, output="../_derived/wwf_ecoregions/",
toextract=ecoreg, typp="shp", ncores=1, chunkn=99, chunk.i=i)
}
stopCluster(cl)
Reimport output and join to header
ecoreg.files <- list.files("../_derived/wwf_ecoregions/", pattern="wwf_terr_ecos-[0-9]+.csv", full.names=T)
ecoreg.files <- ecoreg.files[order(as.numeric(str_extract(ecoreg.files, pattern="[0-9]+")))]
ecoreg.out <- do.call(rbind, lapply(ecoreg.files, function(x) {read_csv(x,
col_types=cols(
.default = col_double(),
ECO_NAME = col_character(),
REALM = col_character(),
G200_REGIO = col_character(),
eco_code = col_character()))}))
ecoreg.out <- header.shp@data %>%
dplyr::select(PlotObservationID) %>%
bind_cols(ecoreg.out)
header <- header %>%
left_join(ecoreg.out %>%
dplyr::select(PlotObservationID, ECO_NAME, ECO_ID) %>%
dplyr::rename(Ecoregion=ECO_NAME, EcoregionID=ECO_ID),
by="PlotObservationID")
There are 0 unassigned plots.
Summarize:Ecoregion | num.plot |
---|---|
Atlantic mixed forests | 350272 |
Western European broadleaf forests | 254397 |
Baltic mixed forests | 186554 |
Central European mixed forests | 126331 |
Pannonian mixed forests | 96427 |
Alps conifer and mixed forests | 74847 |
Carpathian montane forests | 68489 |
Celtic broadleaf forests | 62939 |
Sarmatic mixed forests | 36192 |
Northeastern Spain and Southern France Mediterranean forests | 33435 |
Taiheiyo evergreen forests | 31108 |
English Lowlands beech forests | 30721 |
Balkan mixed forests | 29995 |
Italian sclerophyllous and semi-deciduous forests | 27766 |
Cantabrian mixed forests | 25972 |
Pyrenees conifer and mixed forests | 24558 |
Tyrrhenian-Adriatic Sclerophyllous and mixed forests | 20071 |
Iberian sclerophyllous and semi-deciduous forests | 18179 |
East European forest steppe | 16728 |
Great Basin shrub steppe | 16170 |
Eastern Australian temperate forests | 14827 |
Dinaric Mountains mixed forests | 14746 |
Scandinavian and Russian taiga | 14456 |
Nihonkai montane deciduous forests | 14131 |
Caspian lowland desert | 14050 |
Rodope montane mixed forests | 13999 |
Pontic steppe | 13703 |
Southeast Australia temperate savanna | 12439 |
Colorado Plateau shrublands | 10964 |
North Atlantic moist mixed forests | 10593 |
Extract elevation for each plot. Loops over tiles of 1 x 1°, projects to mercator, and extract elevation for plot coordinates, as well as 2.5, 50, and 97.5 quantiles for a buffer area having a radius equal to the location uncertainty of each plot (but only if location uncertainty < 50 km). DEM derive from package elevatr, which uses the Terrain Tiles on Amazon Web Services. Resolutions of DEM rasters vary by region. I set a zoom factor z=10, which corresponds to ~ 75-150 m. Sources are: SRTM, data.gov.at in Austria, NRCAN in Canada, SRTM, NED/3DEP 1/3 arcsec, data.gov.uk in United Kingdom, INEGI in Mexico, ArcticDEM in latitudes above 60°, LINZ in New Zealand, Kartverket in Norway, as described here.
Split data into tiles of 1 x 1 degrees, and create sp::SpatialPointsDataFrame
files. Only for plots having a location uncertainty < 50 km.
header.tiles <- header %>%
dplyr::select(PlotObservationID, Dataset, Longitude, Latitude, `Location uncertainty (m)`) %>%
mutate(`Location uncertainty (m)`=abs(`Location uncertainty (m)`)) %>%
filter(`Location uncertainty (m)`<= 50000) %>%
mutate_at(.vars=vars(Longitude, Latitude),
.funs=list(tile=~cut(., breaks = seq(-180,180, by=.2)))) %>%
filter(!is.na(Longitude_tile) & !is.na(Latitude_tile) ) %>%
mutate(tilenam=factor(paste(Longitude_tile, Latitude_tile)))
There are 1582245 plots out of 1978679 plots with Location uncertainty <= 50km (or absent). The total number of tiles is 30123.
Performed in EVE HPC cluster using function A97_ElevationExtract.R
. Divided in 99 chunks.
cl <- makeForkCluster(5, outfile="")
registerDoParallel(cl)
clusterEvalQ(cl, {
library(rgdal)
library(raster)
library(sp)
library(elevatr)
library(dplyr)})
# Divided in 99 chunks
elevation.out <- foreach(i=1:99, .combine=rbind) %dopar% {
source("A97_ElevationExtract.R")
ElevationExtract(header.shp, output, ncores=1, chunk.i=1)}
stopCluster(cl)
For those tiles that failed, extract elevation of remaining plots one by one
#create list of tiles for which dem could not be extracted
myfiles <- list.files("../_derived/elevatr/")
failed <- list.files("../_derived/elevatr/", pattern = "[A-Za-z]*_[0-9]+failed\\.RData$")
failed <- as.numeric(unlist(regmatches(failed, gregexpr("[[:digit:]]+", failed))))
#create SpatialPointsDataFrame
sp.tile0 <- SpatialPointsDataFrame(coords=header.tiles %>%
filter(tilenam %in% levels(header.tiles$tilenam)[failed]) %>%
dplyr::select(Longitude, Latitude),
data=header.tiles %>%
filter(tilenam %in% levels(header.tiles$tilenam)[failed]) %>%
dplyr::select(-Longitude, -Latitude),
proj4string = CRS("+init=epsg:4326"))
sp.tile0 <- spTransform(sp.tile0, CRSobj = CRS("+init=epsg:3857 +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0
+lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null
+no_defs ")) #project to mercator
output.tile <- data.frame(NULL)
cl <- makeForkCluster(5, outfile="")
registerDoParallel(cl)
clusterEvalQ(cl, {
library(rgdal)
library(raster)
library(sp)
library(elevatr)
library(dplyr)})
#Loop over all plots
elevation.failed <- foreach(i=1:nrow(sp.tile0), .packages=c('raster'), .combine=rbind) %dopar% {
sp.tile <- sp.tile0[i,]
tryCatch(raster.tile <- get_elev_raster(sp.tile, z=10,
expand=max(sp.tile$`Location uncertainty (m)`)),
error = function(e){
print(paste("could not retrieve DEM for", sp.tile$PlotObservationID))}
)
if(!exists("raster.tile")) {
output.tile <- data.frame(PlotObservationID==sp.tile$PlotObservationID,
elevation=NA,
Elevation_q2.5=NA,
Elevation_median=NA,
Elevation_q97.5=NA,
DEM.res=NA)
return(output.tile)
} else {
# clip dem tile with continent shape
raster.tile <- mask(raster.tile, continent.high.merc)
#extract and summarize elevation data
elev.tile <- raster::extract(raster.tile, sp.tile, small=T)
elev.tile.buffer <- raster::extract(raster.tile, sp.tile,
buffer=sp.tile$`Location uncertainty (m)`, small=T)
elev.q95 <- t(round(mapply( quantile,
x=elev.tile.buffer,
probs=rep(c(0.025, 0.5, 0.975), each=length(elev.tile)), na.rm=T)))
output.tile <- data.frame(PlotObservationID=sp.tile$PlotObservationID,
elevation=round(elev.tile),
elev.q95,
DEM.res=res(raster.tile)[1]) %>%
rename(Elevation_q2.5=X2.5., Elevation_median=X50., Elevation_q97.5=X97.5.)
return(output.tile)
}
}
stopCluster(cl)
save(elevation.failed, file = "../_derived/elevatr/elevation_missing.RData")
Compose tiles into a single output, and export
myfiles <- list.files("../_derived/elevatr/", pattern = "elevation_tile_[0-9]+\\.RData$", full.names = T)
#create empty data.frame
elevation.out <- matrix(NA, nrow=nrow(header.tiles), ncol=6)
elevation.out <- as.data.frame(elevation.out)
colnames(elevation.out) <- c("PlotObservationID", "elevation", "Elevation_q2.5", "Elevation_median", "Elevation_q97.5","DEM.res")
elevation.out$PlotObservationID <- header.tiles$PlotObservationID
tmp <- NULL
for(i in 1:length(myfiles)){
load(myfiles[i])
#attach results to empty data.frame
tmp <- bind_rows(tmp, output.tile)
if(i %in% seq(5000, length(myfiles), by=5000)){
mymatch <- base::match(x=tmp$PlotObservationID, table=elevation.out$PlotObservationID)
mymatch <- mymatch[!is.na(mymatch)]
elevation.out[mymatch,] <- tmp
tmp <- NULL
print(paste("Attached first", i, "files"))
}
if(i %in% seq(1,length(myfiles), by=250)){print(i)}
}
load(file = "../_derived/elevatr/elevation_missing.RData")
mymatch <- base::match(x=elevation.failed$PlotObservationID, table=elevation.out$PlotObservationID)
mymatch <- mymatch[!is.na(mymatch)]
elevation.out[mymatch,] <- elevation.failed
write_csv(elevation.out, path ="../_derived/elevatr/elevation.out.csv")
Reimport output, attach to header and check
elevation.out <- read_csv("../_derived/elevatr/elevation.out.csv")
PlotObservationID | elevation | Elevation_q2.5 | Elevation_median | Elevation_q97.5 | DEM.res |
---|---|---|---|---|---|
647636 | 1708 | 1708 | 1708 | 1708 | 153 |
562052 | 323 | 323 | 323 | 323 | 153 |
1201030 | -1 | -3 | -2 | 0 | 153 |
1237423 | 18 | -1 | 20 | 69 | 153 |
1569408 | 686 | 686 | 686 | 686 | 153 |
1056489 | 0 | 0 | 0 | 0 | 153 |
1063585 | 0 | -2 | 1 | 5 | 153 |
1533345 | 378 | 86 | 761 | 1485 | 153 |
1459631 | 2027 | 2027 | 2027 | 2027 | 153 |
44137 | 2048 | 2048 | 2048 | 2048 | 153 |
summary(elevation.out %>%
dplyr::select(-PlotObservationID, -elevation))
## Elevation_q2.5 Elevation_median Elevation_q97.5 DEM.res
## Min. :-1018.0 Min. :-474.0 Min. :-474 Min. : 76.4
## 1st Qu.: 17.0 1st Qu.: 21.0 1st Qu.: 26 1st Qu.:153.0
## Median : 163.0 Median : 178.0 Median : 194 Median :153.0
## Mean : 441.6 Mean : 469.4 Mean : 506 Mean :153.1
## 3rd Qu.: 593.0 3rd Qu.: 644.0 3rd Qu.: 698 3rd Qu.:153.0
## Max. : 6155.0 Max. :6155.0 Max. :6155 Max. :306.0
## NA's :104818 NA's :104818 NA's :104818 NA's :3082
There are 104818 plots without elevation info, corresponding to 5.3% of total.
There are 51389 plots with elevation below sea level.
Join elevation data (only median)
header <- header %>%
left_join(elevation.out %>%
dplyr::select(PlotObservationID, Elevation_median) %>%
rename(elevation_dem=Elevation_median) %>%
distinct(PlotObservationID, .keep_all=T),
by="PlotObservationID")
Summary and check
GIVD ID | Dataset | num.plot |
---|---|---|
AS-00-001 | Korea | 1 |
EU-DK-002 | Denmark | 1197 |
EU-RO-008 | Romania Grassland Database | 6 |
EU-DE-001 | Germany_vegmv | 858 |
EU-DE-013 | Germany_vegetweb | 33 |
EU-DE-014 | Germany_gvrd | 35 |
EU-DE-020 | GrassVeg.DE | 27 |
EU-00-004c | Spain_sivim_grasslands | 1 |
AU-AU-003 | NSW Australia | 2 |
AS-JP-002 | Japan | 220 |
Create Scatterplot between measured elevation in the field, and elevation derived from DEM
#join measured and derived elevation
mydata <- header %>%
dplyr::select(PlotObservationID, `Altitude (m)`, elevation_dem) %>%
filter(!is.na(`Altitude (m)`) & !is.na(elevation_dem)) %>%
rename(elevation_measured=`Altitude (m)`)
ggplot(data=mydata) +
geom_point(aes(x=elevation_measured, y=elevation_dem), alpha=1/10, cex=0.8) +
theme_bw() +
geom_abline(slope=0, intercept=0, col=2, lty=2) +
geom_abline(slope=1, intercept=1, col="Dark green")
There is a minor number of plots (4064), not assigned to any countries. Fix that.
countries <- readOGR("../../Ancillary_Data/naturalearth/ne_110m_admin_0_countries.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "/data/sPlot/users/Francesco/Ancillary_Data/naturalearth/ne_110m_admin_0_countries.shp", layer: "ne_110m_admin_0_countries"
## with 177 features
## It has 94 fields
## Integer64 fields read as strings: POP_EST NE_ID
crs(countries) <- crs(header.shp)
tmp.sel <- header %>%
filter(is.na(Country)) %>%
pull(PlotObservationID)
header.shp.nocontry <- header.shp[which(header.shp$PlotObservationID %in% tmp.sel),]
countries.out <- over(header.shp.nocontry, countries)
header$Country <- as.character(header$Country)
header$Country[tmp.sel] <- countries.out$NAME
## Warning in header$Country[tmp.sel] <- countries.out$NAME: number of items to
## replace is not a multiple of replacement length
header$Country <- as.factor(header$Country)
Plots without country info are now only 4019.
Update header.shp
header.shp@data <- header.shp@data %>%
left_join(header %>%
dplyr::select(PlotObservationID, sBiome, CONTINENT,
Ecoregion, GIVD.ID=`GIVD ID`),
by="PlotObservationID")
header.sf <- header.shp %>%
st_as_sf() %>%
st_transform(crs = "+proj=eck4")
Basic Map of the world in Eckert projection
countries <- ne_countries(returnclass = "sf") %>%
st_transform(crs = "+proj=eck4") %>%
st_geometry()
graticules <- ne_download(type = "graticules_15", category = "physical",
returnclass = "sf") %>%
st_transform(crs = "+proj=eck4") %>%
st_geometry()
## OGR data source with driver: ESRI Shapefile
## Source: "/data/sPlot/users/Francesco/_tmp/RtmpCjhJeY", layer: "ne_110m_graticules_15"
## with 35 features
## It has 5 fields
## Integer64 fields read as strings: degrees scalerank
bb <- ne_download(type = "wgs84_bounding_box", category = "physical",
returnclass = "sf") %>%
st_transform(crs = "+proj=eck4") %>%
st_geometry()
## OGR data source with driver: ESRI Shapefile
## Source: "/data/sPlot/users/Francesco/_tmp/RtmpCjhJeY", layer: "ne_110m_wgs84_bounding_box"
## with 1 features
## It has 2 fields
w3a <- ggplot() +
geom_sf(data = bb, col = "grey20", fill = "white") +
geom_sf(data = graticules, col = "grey20", lwd = 0.1) +
geom_sf(data = countries, fill = "grey90", col = NA, lwd = 0.3) +
coord_sf(crs = "+proj=eck4") +
theme_minimal() +
theme(axis.text = element_blank(),
legend.title=element_text(size=12),
legend.text=element_text(size=12),
legend.background = element_rect(size=0.1, linetype="solid", colour = 1),
legend.key.height = unit(1.1, "cm"),
legend.key.width = unit(1.1, "cm")) +
scale_fill_viridis()
Graph of plot density (hexagons)
header2 <- header %>%
filter(!is.na(Longitude) | !is.na(Latitude)) %>%
dplyr::select(PlotObservationID, Latitude, Longitude, `GIVD ID`) %>%
filter(!(abs(Longitude) >171 & abs(Latitude>70)))
dggs <- dgconstruct(spacing=300, metric=T, resround='down')
## Resolution: 6, Area (km^2): 69967.8493448681, Spacing (km): 261.246386348549, CLS (km): 298.479323187169
#Get the corresponding grid cells for each earthquake epicenter (lat-long pair)
header2$cell <- dgGEO_to_SEQNUM(dggs, header2$Longitude, header2$Latitude)$seqnum
#Calculate number of plots for each cell
header.out <- header2 %>%
group_by(cell) %>%
summarise(value.out=log(n(), 10))
#Get the grid cell boundaries for cells
grid <- dgcellstogrid(dggs, header.out$cell, frame=F) %>%
st_as_sf() %>%
mutate(cell = header.out$cell) %>%
mutate(value.out=header.out$value.out) %>%
st_transform("+proj=eck4") %>%
st_wrap_dateline(options = c("WRAPDATELINE=YES"))
## plotting
legpos <- c(0.160, .24)
(w3 <- w3a +
geom_sf(data=grid, aes(fill=value.out),lwd=0, alpha=0.9) +
geom_sf(data = countries, col = "grey10", fill=NA, lwd = 0.3) +
scale_fill_viridis(
name="# plots", breaks=0:5, labels = c("1", "10", "100",
"1,000", "10,000", "100,000"), option="viridis" ) +
#labs(fill="# plots") +
theme(legend.position = legpos +c(-0.06, 0.25))
)
ggsave(filename="../_pics/PlotDensityLog10_vir.png", width = 15, height = 7, units="in", dpi=300, plot=w3)
Graph of plot location by Dataset
(w4 <- w3a +
geom_sf(data=header.sf %>%
mutate(GIVD.ID=fct_shuffle(GIVD.ID)), aes(col=factor(GIVD.ID)), pch=16, size=0.8, alpha=0.6) +
geom_sf(data = countries, col = "grey10", fill=NA, lwd = 0.3) +
theme(legend.position = "none"))
ggsave(filename="../_pics/PlotDistrib_Dark2_shuffle1984.png", width = 15, height = 7, units="in", dpi=300, plot=w4) ## takes ~40' to render
Double check attribution to continents, Biomes and Ecoregions. Do it only on a subset of plots
tmp.sel <- header %>%
group_by(sBiome) %>%
sample_n(1000) %>%
pull(PlotObservationID)
#sBiomes
(w5 <- w3a +
geom_sf(data=header.sf %>%
filter(PlotObservationID %in% tmp.sel),
aes(col=factor(sBiome)), pch=16, size=0.8, alpha=0.6) +
geom_sf(data = countries, col = "grey10", fill=NA, lwd = 0.3))
#continent
tmp.sel <- header %>%
filter(CONTINENT!="AN") %>%
group_by(CONTINENT) %>%
sample_n(1000) %>%
pull(PlotObservationID)
(w6 <- w3a +
geom_sf(data=header.sf %>%
filter(PlotObservationID %in% tmp.sel),
aes(col=factor(CONTINENT)), pch=16, size=0.8, alpha=0.6) +
geom_sf(data = countries, col = "grey10", fill=NA, lwd = 0.3))
#Ecoregion - Only 10 random ecoregions tested
tmp.sel <- header %>%
filter(Ecoregion %in% sample(unique(header$Ecoregion), 10)) %>%
pull(PlotObservationID)
(w7 <- w3a +
geom_sf(data=header.sf %>%
filter(PlotObservationID %in% tmp.sel) %>%
mutate(Ecoregion=factor(Ecoregion)),
aes(col=factor(Ecoregion)), pch=16, size=0.8, alpha=0.6) +
geom_sf(data = countries, col = "grey10", fill=NA, lwd = 0.3) +
theme(legend.position="bottom"))
#check
nrow(header)==nrow(header0)
## [1] TRUE
header <- header %>%
dplyr::select(
#Metadata
PlotObservationID, Dataset, "GIVD ID", "TV2 relevé number", "ORIG_NUM", "GUID",
#Geographic info
Longitude:"Location uncertainty (m)", Country, CONTINENT, sBiome, sBiomeID, Ecoregion, EcoregionID, Locality,
#Methodological info
"Relevé area (m²)", "Cover abundance scale", "Date of recording", "Plants recorded",
"Herbs identified (y/n)","Mosses identified (y/n)","Lichens identified (y/n)",
#Topographical
elevation_dem, "Altitude (m)", "Aspect (°)", "Slope (°)",
#Vegetation type
Forest:Naturalness, ESY,
#Vegetation structure
"Cover total (%)":"Maximum height cryptogams (mm)")
save(header, file = "../_output/header_sPlot3.0.RData")
PlotObservationID | Dataset | GIVD ID | TV2 relevé number | ORIG_NUM | GUID | Longitude | Latitude | Location uncertainty (m) | Country | CONTINENT | sBiome | sBiomeID | Ecoregion | EcoregionID | Locality | Relevé area (m²) | Cover abundance scale | Date of recording | Plants recorded | Herbs identified (y/n) | Mosses identified (y/n) | Lichens identified (y/n) | elevation_dem | Altitude (m) | Aspect (°) | Slope (°) | Forest | Shrubland | Grassland | Wetland | Sparse.vegetation | Naturalness | ESY | Cover total (%) | Cover tree layer (%) | Cover shrub layer (%) | Cover herb layer (%) | Cover moss layer (%) | Cover lichen layer (%) | Cover algae layer (%) | Cover litter layer (%) | Cover open water (%) | Cover bare rock (%) | Height (highest) trees (m) | Height lowest trees (m) | Height (highest) shrubs (m) | Height lowest shrubs (m) | Aver. height (high) herbs (cm) | Aver. height lowest herbs (cm) | Maximum height herbs (cm) | Maximum height cryptogams (mm) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1767241 | Japan | AS-JP-002 | 37380 | NA | 13F424E1-083D-4BE2-A657-9A7C8748BD21 | 139.000000 | 36.00000 | 1000000 | Japan | EU | Temperate midlatitudes | 6 | Taiheiyo evergreen forests | 80440 | NA | NA | Presence/Absence | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1987944 | WA_Kwongan-associates_P Macintyre | AU-AU-XXX | 5265 | NA | 45ECC662-598B-4FEE-9E6C-D7BCA28FCD3D | 119.051613 | -30.89685 | NA | Australia | AU | Dry tropics and subtropics | 3 | Coolgardie woodlands | 11201 | NA | NA | Presence/Absence | 2012-01-01 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
661720 | France_SOPHY | EU-FR-003 | 106706 | NA | 3538F837-E531-4439-B5EA-8C75623929B0 | 2.551960 | 46.56450 | 685000 | France | EU | Temperate midlatitudes | 6 | Western European broadleaf forests | 80445 | PONTCHARDON-SUR-TOUQUES | NA | Braun/Blanquet (old) | 1983-01-01 | NA | NA | NA | NA | NA | NA | NA | NA | 0 | 0 | 1 | 0 | 0 | 2 | E54 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1799502 | Japan | AS-JP-002 | 69641 | NA | 6CD68FEE-5B63-4262-9B42-94C92CB22948 | 141.870810 | 42.73374 | -10 | Japan | EU | Temperate midlatitudes | 6 | Hokkaido deciduous forests | 80423 | NA | 400.0 | Presence/Absence | 1983-01-01 | NA | NA | NA | NA | 16 | 17 | NA | 0 | 1 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
841876 | Germany_vegetweb | EU-DE-013 | 92308 | NA | DE36DF31-F6A3-4100-AA34-E2F76E895DA9 | 7.495251 | 51.29074 | 10 | Germany | EU | Temperate midlatitudes | 6 | Western European broadleaf forests | 80445 | Nordrhein-Westfalen; Hagen; Hagen-Dahl | NA | Percentage (%) | 1983-08-08 | NA | NA | NA | NA | 362 | 359 | 135 | 2 | 0 | 0 | 1 | 0 | 0 | 2 | E22 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
946645 | Ireland_nvd | EU-IE-001 | 7187 | NA | 7BC25DB4-8736-4D2A-98B4-F9760616347B | -9.478401 | 52.77862 | 100 | Ireland | EU | Temperate midlatitudes | 6 | Celtic broadleaf forests | 80409 | Lough Donnell, Quilty - Clare | 16.0 | Percentage (%) | 1996-01-01 | NA | NA | NA | NA | 5 | -4 | NA | NA | 0 | 0 | 1 | 0 | 0 | 2 | E34b | 100 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
263296 | Czechia_nvd | EU-CZ-001 | 404332 | NA | 96707E21-A0BA-49BB-BB01-5A022C24D7D6 | 15.158333 | 49.04444 | -100 | Czech Republic | EU | Temperate midlatitudes | 6 | Western European broadleaf forests | 80445 | Eisenbahnstation Hůrky, über der Strasse | 15.0 | Braun/Blanquet (old) | 1978-07-17 | NA | NA | Y | Y | 656 | 650 | NA | NA | 0 | 0 | 1 | 0 | 0 | 2 | E34a | 98 | NA | NA | 98 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1123456 | Netherlands | EU-NL-001 | 522149 | NA | 7EFC02FB-D0B1-41B6-AB98-BF27E5BFAA18 | 6.308213 | 52.80665 | 10 | Netherlands | EU | Temperate midlatitudes | 6 | Atlantic mixed forests | 80402 | NA | NA | Tansley | 1988-05-14 | NA | NA | NA | NA | 3 | 2 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1343842 | Romania Grassland Database | EU-RO-008 | 16724 | NA | BEB1C4B0-89A2-489C-8A4A-A0A7A1CEFE1B | 25.929300 | 44.40551 | 3000 | Romania | EU | Temperate midlatitudes | 6 | Central European mixed forests | 80412 | Imprejurimile Bucurestiului, Domnesti | 100.0 | Braun/Blanquet (old) | 1967-07-15 | NA | NA | NA | NA | 87 | NA | NA | NA | 0 | 0 | 1 | 0 | 0 | 3 | IE51 | 80 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
563449 | France_SOPHY | EU-FR-003 | 4722 | NA | F8504939-991D-4A68-AB8D-FC1941BAEDF8 | 7.247629 | 47.76750 | -10 | France | EU | Temperate midlatitudes | 6 | Western European broadleaf forests | 80445 | REININGUE, PLATEAU AU NE ETANG | NA | Braun/Blanquet (old) | 1983-01-01 | NA | NA | NA | NA | 261 | 266 | NA | NA | 0 | 0 | 1 | 0 | 0 | 3 | IE51 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1350639 | Romania Grassland Database | EU-RO-008 | 23529 | NA | 532172E5-8F57-4FA2-8ED3-BC2A626027A8 | 24.969260 | 45.84361 | 375000 | Romania | EU | Temperate midlatitudes | 6 | Carpathian montane forests | 80504 | jud. Galata, Padurii Balta | 50.0 | Braun/Blanquet (old) | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | 0 | 1 | 0 | 0 | 3 | IE51 | 60 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
880203 | Germany_vegmv | EU-DE-001 | 20678 | NA | 62292442-9338-43D5-A858-A7914928A982 | 11.123563 | 53.87341 | 3905 | Germany | EU | Temperate midlatitudes | 6 | Baltic mixed forests | 80405 | Mummendorf | 4.0 | Braun/Blanquet (old) | 1976-01-01 | NA | NA | N | N | 30 | 23 | NA | NA | 0 | 0 | 0 | 1 | 0 | 1 | D23a | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
822841 | Germany_gvrd | EU-DE-014 | 96062 | NA | D382EEB7-D7EB-4DEC-8A0D-41040B5FB58F | 7.660070 | 52.05875 | -10 | Germany | EU | Temperate midlatitudes | 6 | Atlantic mixed forests | 80402 | Nordrhein-Westfalen | 6.0 | Reichelt&Wilm.73_long | 1992-01-01 | NA | NA | NA | NA | 51 | 42 | NA | NA | 0 | 0 | 1 | 0 | 0 | 2 | E56 | NA | NA | NA | 85 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 | NA | NA | NA |
318884 | Czechia_nvd | EU-CZ-001 | 488404 | NA | C2425597-1AE6-4884-A105-BCC489B99EA8 | 16.975833 | 48.79250 | -100 | Czech Republic | EU | Temperate midlatitudes | 6 | Pannonian mixed forests | 80431 | Hrušky (Břeclav), intravilán obce | 2.0 | Braun/Blanquet (new) | 2000-05-15 | NA | NA | NA | NA | 173 | 175 | NA | NA | 0 | 0 | 1 | 0 | 0 | 3 | IE51 | 50 | NA | NA | 50 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1114255 | Netherlands | EU-NL-001 | 453854 | NA | FA959E10-757B-4500-B1DA-DC85A11F9AF8 | 5.603420 | 52.24926 | 180000 | Netherlands | EU | Temperate midlatitudes | 6 | Atlantic mixed forests | 80402 | NA | 4.0 | Braun/Blanquet (new) | 1999-09-29 | NA | NA | NA | NA | NA | NA | NA | 1 | 0 | 0 | 0 | 1 | 0 | 1 | D12 | 50 | NA | NA | 40 | 15 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 10 | NA | NA | NA |
1347141 | Romania Grassland Database | EU-RO-008 | 20031 | NA | 43AE5B61-5F40-4C2B-B14E-B8F2340EEF0F | 23.058810 | 46.66792 | 5000 | Romania | EU | Temperate midlatitudes | 6 | Carpathian montane forests | 80504 | Barajul Fintinele, Ghiduri | NA | Braun/Blanquet (old) | 1973-01-01 | NA | NA | NA | Y | 1121 | NA | 315 | NA | NA | NA | NA | NA | NA | NA | ? | 90 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
797058 | Germany_gvrd | EU-DE-014 | 63155 | NA | A6B33E39-BB95-43A0-9494-62AAC434D9D6 | 12.124970 | 50.32502 | -10 | Germany | EU | Temperate midlatitudes | 6 | Western European broadleaf forests | 80445 | 1 km sö. Neutschau | 27.0 | Braun/Blanquet (old) | 1988-01-01 | NA | NA | NA | NA | 600 | 545 | NA | NA | 0 | 0 | 1 | 0 | 0 | 2 | E22 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
150505 | Britain_nvcd | EU-GB-001 | 498 | NA | 3F0096FE-29C2-48C0-AF69-645EF5EB7AE4 | -9.000000 | 53.40000 | 620000 | United Kingdom | EU | Temperate midlatitudes | 6 | Celtic broadleaf forests | 80409 | NA | NA | Domin | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ? | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
20033 | Austria_VINCA | EU-AT-001 | 57715 | NA | 073142BB-329F-4C55-9759-7F54B99F0142 | 14.140190 | 47.59290 | 290000 | Austria | EU | Temperate midlatitudes | 6 | Alps conifer and mixed forests | 80501 | Gampberg-Nenzing | NA | Braun-Blanquet (old) | NA | NA | NA | Y | NA | NA | 1430 | 113 | 85 | 1 | 0 | 0 | 0 | 0 | 1 | G31a | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
356250 | Denmark | EU-DK-002 | 26416 | NA | 2478228F-04DA-4D3A-B54C-23FB44DB2B5D | 8.590641 | 57.11473 | 15 | Denmark | EU | Temperate midlatitudes | 6 | Baltic mixed forests | 80405 | NA | 78.5 | Presence/Absence | 2006-01-01 | NA | NA | NA | NA | NA | 16 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
PredExtr
# define ancillary functions
robust.mean <- function(x){mean(x, na.rm=T)}
robust.mode <- function(x){if(any(x!=0)) {
a <- x[which(x!=0)] #exclude zero (i.e. NAs)
return(as.numeric(names(sort(table(a), decreasing=T))[1]))} else
return(NA)}
robust.sd <- function(x){sd(x, na.rm=T)}
#main function
PredExtr <- function(x.shp, myfunction=NA, output=NA,
toextract, typp=c("raster", "shp"), ncores, chunkn=1, chunk.i=NA){
print("Load Packages")
require(foreach)
require(parallel)
require(doParallel)
require(raster)
require(rgeos)
require(rgdal)
require(geosphere)
print(paste("Loading", typp, "data :", toextract))
print(paste("output will be:", output))
if(typp=="raster"){ mypredictor <- raster(toextract)} else
mypredictor <- readOGR(toextract)
colnames(mypredictor@data)
header.shp <-readOGR(x.shp)
crs(mypredictor) <- crs(header.shp) #should be verified beforehand!
## Divide in chunks if requested
if(chunkn>1 & !is.na(chunk.i)){
print(paste("divide in chunks and run on chunk n.", chunk.i))
indices <- 1:length(header.shp)
chunks <- split(indices, sort(indices%%chunkn))
header.shp <- header.shp[chunks[[chunk.i]],]
}
print("myfunction defined as")
myfunction <- get(myfunction)
print(myfunction)
print("go parallel")
cl <- makeForkCluster(ncores, outfile="" )
registerDoParallel(cl)
print("start main foreach loop")
if(typp=="raster"){
out <- foreach(i=1:length(header.shp), .packages=c('raster'), .combine=rbind) %dopar% {
tmp <- raster::extract(mypredictor, header.shp[i,],
buffer=min(10000, max(250, header.shp@data$lc_ncrt[i])), fun=myfunction)
}
if(!is.na(output)) {write.csv(out, file = output)} else{return(out)}
} else {
out <- sp::over(x=header.shp, y=mypredictor)
toassign <- header.shp[which(is.na(out[,1])),]
print(paste(length(toassign), "plots not matched directly - seek for NN"))
if(length(toassign)>0){
nearest <- foreach(i=1:length(toassign), .packages=c('raster'), .combine=rbind) %dopar% {
print(i)
nearest.tmp <- tryCatch(geosphere::dist2Line(header.shp[i,], mypredictor),
error = function(e){data.frame(distance=NA, lon=NA, lat=NA,ID=NA)}
)
#nearest.tmp <- geosphere::dist2Line(toassign[i,], mypredictor)
return(nearest.tmp)
}
out[which(is.na(out[,1])),] <- mypredictor@data[nearest[,"ID"],]
}
if(!is.na(output)) {write.csv(out, file = output)} else{return(out)}
}
stopCluster(cl)
}
ElevationExtract
ElevationExtract <- function(header, output, ncores, chunk.i){
print("load packages")
require(tidyverse)
require(rgdal)
require(sp)
require(rgeos)
require(raster)
require(rworldmap)
require(elevatr)
require(parallel)
require(doParallel)
print("Import header and divide in tiles")
header.shp <- readOGR(header)
header.tiles <- header.shp@data %>%
bind_cols(as.data.frame(header.shp@coords)) %>%
rename(PlotObservationID=PltObID, Longitude=coords.x1, Latitude=coords.x2) %>%
mutate(lc_ncrt=abs(lc_ncrt)) %>%
filter(lc_ncrt <= 50000) %>%
mutate_at(.vars=vars(Longitude, Latitude),
.funs=list(tile=~cut(., breaks = seq(-180,180, by=.2)))) %>%
mutate(tilenam=factor(paste(Longitude_tile, Latitude_tile)))
print("Get continent")
sPDF <- rworldmap::getMap(resolution="high")
continent.high <- sPDF[,"continent"]
crs(continent.high) <- CRS("+init=epsg:4326")
continent.high@data$continent <- fct_recode(continent.high@data$continent, "South America"="South America and the Caribbean")
continent.high.merc <- spTransform(continent.high, CRS( "+init=epsg:3857 +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m
+nadgrids=@null +no_defs"))
print("go parallel")
cl <- makeForkCluster(ncores, outfile="")
registerDoParallel(ncores)
clusterEvalQ(cl, {
library(rgdal)
library(raster)
library(sp)
library(elevatr)
library(dplyr)
})
print("create list of tiles still to do")
myfiles <- list.files(path = output, pattern = "[A-Za-z]*_[0-9]+\\.RData$")
done <- NULL
done <- as.numeric(unlist(regmatches(myfiles, gregexpr("[[:digit:]]+", myfiles))))
todo <- 1:nlevels(header.tiles$tilenam)
if(length(done)>0) {todo <- todo[-which(todo %in% done)]}
todo <- sample(todo, replace=F)
print(paste(length(todo), "tiles to do"))
print("divide in chunks")
#divide in chunks
todo.chunks <- split(todo, sort(todo%%10))
print(paste("start main foreach loop on chunk n=", chunk.i))
print(paste(length(todo.chunks[[chunk.i]]), "to do"))
foreach(i = todo.chunks[[chunk.i]]) %dopar% {
print(paste("doing", i))
#create sp and project data
if(nrow(header.tiles %>%
filter(tilenam %in% levels(header.tiles$tilenam)[i])) ==0 ) next()
sp.tile <- SpatialPointsDataFrame(coords=header.tiles %>%
filter(tilenam %in% levels(header.tiles$tilenam)[i]) %>%
dplyr::select(Longitude, Latitude),
data=header.tiles %>%
filter(tilenam %in% levels(header.tiles$tilenam)[i]) %>%
dplyr::select(-Longitude, -Latitude),
proj4string = CRS("+init=epsg:4326"))
sp.tile <- spTransform(sp.tile, CRSobj = CRS("+init=epsg:3857 +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0
+lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null
+no_defs ")) #project to mercator
#retrieve dem raster
tryCatch(raster.tile <- get_elev_raster(sp.tile, z=9, expand=max(sp.tile$lc_ncrt), clip="bbox"),
error = function(e){e}
)
if(!exists("raster.tile")) {
raster.tile <- NA
print(paste("tile", i, "doesn't work!, skip to next"))
save(raster.tile, file = paste(output, "elevation_tile_", i, "failed.RData", sep=""))
rm(raster.tile)
} else {
# clip dem tile with continent shape
raster.tile <- mask(raster.tile, continent.high.merc)
#extract and summarize elevation data
elev.tile <- raster::extract(raster.tile, sp.tile, small=T)
elev.tile.buffer <- raster::extract(raster.tile, sp.tile,
buffer=sp.tile@data$lc_ncrt,
small=T)
tmp <- round(mapply( quantile,
x=elev.tile.buffer,
#center=elev.tile,
probs=rep(c(0.025, 0.5, 0.975), each=length(elev.tile)),
#lc_ncrt=sp.tile$lc_ncrt,
na.rm=T))
elev.q95 <- setNames(data.frame(matrix(tmp, ncol = 3, nrow = length(elev.tile.buffer))),
c("Elevation_q2.5", "Elevation_median", "Elevation_q97.5"))
output.tile <- data.frame(PlotObservationID=sp.tile$PlotObservationID,
elevation=round(elev.tile),
elev.q95,
DEM.res=res(raster.tile)[1])
#save output
save(output.tile, file = paste(output, "elevation_tile_", i, ".RData", sep=""))
print(paste(i, "done"))
}}
stopCluster(cl)
}
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.7 LTS
##
## Matrix products: default
## BLAS: /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
## [9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dggridR_2.0.3 rnaturalearth_0.2.0 sf_0.9-3
## [4] elevatr_0.2.0 rworldmap_1.3-6 raster_3.0-7
## [7] rgeos_0.5-5 rgdal_1.5-18 sp_1.4-4
## [10] kableExtra_1.3.1 knitr_1.30 xlsx_0.6.5
## [13] viridis_0.5.1 viridisLite_0.3.0 forcats_0.5.0
## [16] stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4
## [19] readr_1.4.0 tidyr_1.1.2 tibble_3.0.1
## [22] ggplot2_3.3.0 tidyverse_1.3.0
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.2 maps_3.3.0 jsonlite_1.7.1 dotCall64_1.0-0
## [5] modelr_0.1.6 assertthat_0.2.1 highr_0.8 xlsxjars_0.6.1
## [9] cellranger_1.1.0 yaml_2.2.1 pillar_1.4.3 backports_1.2.0
## [13] lattice_0.20-41 glue_1.4.2 digest_0.6.25 rvest_0.3.6
## [17] colorspace_2.0-0 htmltools_0.5.0 pkgconfig_2.0.3 broom_0.7.0
## [21] haven_2.3.1 scales_1.1.1 webshot_0.5.2 farver_2.0.3
## [25] generics_0.1.0 ellipsis_0.3.1 withr_2.3.0 cli_2.2.0
## [29] magrittr_2.0.1 crayon_1.3.4 readxl_1.3.1 maptools_1.0-2
## [33] evaluate_0.14 fs_1.5.0 fansi_0.4.1 class_7.3-17
## [37] xml2_1.3.2 foreign_0.8-76 tools_3.6.3 hms_0.5.3
## [41] lifecycle_0.2.0 munsell_0.5.0 reprex_0.3.0 e1071_1.7-4
## [45] compiler_3.6.3 rlang_0.4.8 units_0.6-7 classInt_0.4-3
## [49] grid_3.6.3 rstudioapi_0.13 spam_2.5-1 rmarkdown_2.5
## [53] gtable_0.3.0 codetools_0.2-18 DBI_1.1.0 R6_2.5.0
## [57] gridExtra_2.3 lubridate_1.7.9.2 rworldxtra_1.01 KernSmooth_2.23-18
## [61] rJava_0.9-13 stringi_1.5.3 Rcpp_1.0.5 fields_11.6
## [65] vctrs_0.3.5 dbplyr_2.0.0 tidyselect_1.1.0 xfun_0.19