Skip to content
Snippets Groups Projects
Select Git revision
  • 9f26f171d7d0208a4e3c7a7093fb47ee7baf0bb5
  • master default protected
2 results

04_buildHeader.Rmd

Blame
  • Code owners
    Assign users and groups as approvers for specific file changes. Learn more.
    04_buildHeader.Rmd 45.20 KiB
    title: "sPlot3.0 - Build Header"
    author: "Francesco Maria Sabatini"
    date: "2/4/2020"
    output: html_document
    ![](/data/sPlot/users/Francesco/_sPlot_Management/splot-long-rgb.png "sPlot Logo")

    \newline

    Timestamp: r date()
    Drafted: Francesco Maria Sabatini
    Revised: Helge Bruelheide
    Version: 1.2

    This report documents the construction of the header file for sPlot 3.0. It is based on dataset sPlot_3.0.2, received on 24/07/2019 from Stephan Hennekens.

    Changes in version 1.1

    1. Excluded plots from Canada, as recommended by Custodian
    2. Filled missing info from most of the ~2000 plots without country information from these datasets.
    3. Corrected mismatched sBiomes and ecoregions
      Changes in version 1.2
    4. Reassigned coordinates to ~19.000 misplaced plots (mostly from SOPHY or in Hungary). Assigned country level centroids
    5. Corrected mismatched CONTINENTS & Countries
    6. Added graphs to check assignment to continents or countries
    knitr::opts_chunk$set(echo = TRUE)
    library(tidyverse)
    library(purrr)
    library(viridis)
    library(readr)
    library(xlsx)
    library(knitr)
    library(kableExtra)
    
    ## Spatial packages
    library(rgdal)
    library(sp)
    library(rgeos)
    library(raster)
    library(rworldmap)
    library(elevatr)
    library(sf)
    library(rnaturalearth)
    library(dggridR)
    library(shotGroups) #minCircle
    
    #save temporary files
    write("TMPDIR = /data/sPlot/users/Francesco/_tmp", file=file.path(Sys.getenv('TMPDIR'), '.Renviron'))
    write("R_USER = /data/sPlot/users/Francesco/_tmp", file=file.path(Sys.getenv('R_USER'), '.Renviron'))
    rasterOptions(tmpdir="/data/sPlot/users/Francesco/_tmp")

    1 Import data

    Import header data. Clean header data from quotation and double quotation marks from linux terminal.

    # escape all double quotation marks. Run in Linux terminal
    #sed 's/"/\\"/g' sPlot_3_0_2_header.csv > sPlot_3_0_2_header_test.csv
    
    #more general alternative in case some " are already escaped
    ##first removing \s before all "s, and then adding \ before all ":
    #sed 's/\([^\\]\)"/\1\\\"/g; s/"/\\"/g'

    Import cleaned header data.

    header0 <- readr::read_delim("../sPlot_data_export/sPlot_3_0_2_header_test.csv", 
                                 locale = locale(encoding = 'UTF-8'),
                                delim="\t", col_types=cols(
      PlotObservationID = col_double(),
      PlotID = col_double(),
      `TV2 relevé number` = col_double(),
      Country = col_character(),
      `Cover abundance scale` = col_factor(),
      `Date of recording` = col_date(format="%d-%m-%Y"),
      `Relevé area (m²)` = col_double(),
      `Altitude (m)` = col_double(),
      `Aspect (°)` = col_double(),
      `Slope (°)` = col_double(),
      `Cover total (%)` = col_double(),
      `Cover tree layer (%)` = col_double(),
      `Cover shrub layer (%)` = col_double(),
      `Cover herb layer (%)` = col_double(),
      `Cover moss layer (%)` = col_double(),
      `Cover lichen layer (%)` = col_double(),
      `Cover algae layer (%)` = col_double(),
      `Cover litter layer (%)` = col_double(),
      `Cover open water (%)` = col_double(),
      `Cover bare rock (%)` = col_double(),
      `Height (highest) trees (m)` = col_double(),
      `Height lowest trees (m)` = col_double(),
      `Height (highest) shrubs (m)` = col_double(),
      `Height lowest shrubs (m)` = col_double(),
      `Aver. height (high) herbs (cm)` = col_double(),
      `Aver. height lowest herbs (cm)` = col_double(),
      `Maximum height herbs (cm)` = col_double(),
      `Maximum height cryptogams (mm)` = col_double(),
      `Mosses identified (y/n)` = col_factor(),
      `Lichens identified (y/n)` = col_factor(),
      COMMUNITY = col_character(),
      SUBSTRATE = col_character(),
      Locality = col_character(),
      ORIG_NUM = col_character(),
      ALLIAN_REV = col_character(),
      REV_AUTHOR = col_character(),
      Forest = col_logical(),
      Grassland = col_logical(),
      Wetland = col_logical(),
      `Sparse vegetation` = col_logical(),
      Shrubland = col_logical(),
      `Plants recorded` = col_factor(),
      `Herbs identified (y/n)` = col_factor(),
      Naturalness = col_factor(),
      EUNIS = col_factor(),
      Longitude = col_double(),
      Latitude = col_double(),
      `Location uncertainty (m)` = col_double(),
      Dataset = col_factor(),
      GUID = col_character()
    )) %>% 
      rename(Sparse.vegetation=`Sparse vegetation`, 
             ESY=EUNIS) %>% 
      dplyr::select(-COMMUNITY, -ALLIAN_REV, -REV_AUTHOR, -SUBSTRATE) %>%   #too sparse information to be useful
      dplyr::select(-PlotID) #identical to PlotObservationID

    The following column names occurred in the header of sPlot v2.1 and are currently missing from the header of v3.0

    1. Syntaxon
    2. Cover cryptogams (%)
    3. Cover bare soil (%)
    4. is.forest
    5. is.non.forest
    6. EVA
    7. Biome
    8. BiomeID
    9. CONTINENT
    10. POINT_X
    11. POINT_Y
      ~~ Columns #1 (closed), #2 (closed), #3 (closed), #10, #11 will be dropped. The others will be derived below.

    1.1 Exclude unreliable plots

    Some canadian plots need to be removed, on indication of Laura Boisvert-Marsh from GIVD NA-CA-004. The plots (and corresponding PlotObservationID) are:
    \newline

    Fabot01 - 1707776
    Fadum01, 02 & 03 - 1707779:1707781
    Faers01 - 1707782
    Pfe-f-08 - 1707849
    Pfe-o-05- 1707854

    header0 <- header0 %>% 
      filter(!PlotObservationID %in% c(1707776, 1707779:1707782, 1707849, 1707854)) %>% 
      filter(Dataset != "$Coastal_Borja") %>% 
      filter(Dataset != "$Coastal_Poland") 

    1.2 Solve spatial problems

    There are 2020 plots in the Nile dataset without spatial coordinates. Assign manually with wide (90km) location uncertainty.

    header <- header0 %>% 
      mutate(Latitude=replace(Latitude, 
                              list=(is.na(Latitude) & Dataset=="Egypt Nile delta"), 
                              values=30.917351)) %>% 
      mutate(Longitude=replace(Longitude, 
                              list=(is.na(Longitude) & Dataset=="Egypt Nile delta"), 
                              values=31.138534)) %>% 
      mutate(`Location uncertainty (m)`=replace(`Location uncertainty (m)`, 
                              list=(is.na(`Location uncertainty (m)`) & Dataset=="Egypt Nile delta"), 
                              values=-90000))

    There are two plots in the Romania Grassland Databse, ~4442 plots in the Japan database, and a few in the European Weed Vegetation Database whose lat\long are inverted. Correct.

    toswap <- c(which(header$Dataset=="Japan" & header$Latitude>90), 
                which(header$Dataset=="Romania Grassland Database" & header$Longitude>40), 
                which(header$PlotObservationID==525283))
    header[toswap, c("Latitude", "Longitude")] <- header[toswap, c("Longitude", "Latitude")]
    nouncert <- nrow(header %>% filter(is.na(`Location uncertainty (m)`)))

    There are r nouncert plots without location uncertainty. As a first approximation, we assign the median of the respective dataset, as a negative value to indicate this is an estimation, rather than a measure.

    header <- header %>% 
      left_join(header %>% 
                  group_by(Dataset) %>% 
                  summarize(loc.uncer.median=median(`Location uncertainty (m)`, na.rm=T)), 
                by="Dataset") %>% 
      mutate(`Location uncertainty (m)`=ifelse( is.na(`Location uncertainty (m)` & !is.na(Latitude)), 
                                                -abs(loc.uncer.median), 
                                                `Location uncertainty (m)`)) %>% 
      dplyr::select(-loc.uncer.median)
    nouncert <- nrow(header %>% filter(is.na(`Location uncertainty (m)`)))

    There are still r nouncert plots with no estimation of location uncertainty.
    \newline Assign plot size to plots in the Patagonia dataset (input of Ana Cingolani)

    header <- header %>% 
      mutate(`Relevé area (m²)`=ifelse( (Dataset=="Patagonia" & is.na(`Relevé area (m²)`)), 
                                        -900, `Relevé area (m²)`))

    There are 518 plots from the dataset Germany_gvrd (EU-DE-014) having a location uncertainty equal to 2,147,483 km (!). These plots have a location reported. Replace with a more likely estimate (20 km)