Adapted 01_Extract_elevation. Created function to avoid sea areas in buffer...

Adapted 01_Extract_elevation. Created function to avoid sea areas in buffer when calculating elevation

Adapted 01_Extract_elevation. Created function to avoid sea areas in buffer...
52bd2d5a · Francesco Sabatini · dfae3940 · 52bd2d5a · 52bd2d5a
Commit 52bd2d5a authored 6 years ago by Francesco Sabatini
--- a/code/01_Extract_elevation.Rmd
+++ b/code/01_Extract_elevation.Rmd
@@ -17,11 +17,11 @@ output: html_document
        
 **Timestamp:** `r date()`  
 **Drafted:** Francesco Maria Sabatini  
-**Version:** 1.0
+**Version:** 1.1
+
+*Changes to v1.2* - based on dataset sPlot_3.0.1, received on 29/06/2019 from SH. Calculates elevation also for plots without location uncertainty (arbitrarily set to 100 m).
  
 This report describes how elevation data was matched to plot locations. 
-
-
 ```{r results="hide", message=F, warning=F}
 knitr::opts_chunk$set(echo = TRUE)
 library(reshape2)
@@ -45,15 +45,14 @@ library(ggrepel)

 Import header data
 ```{r import}
-header <- readr::read_csv("../sPlot_data_export/sPlot_data_header_fix1.csv",
-                            col_types=cols(
+header <- readr::read_delim("../sPlot_data_export/sPlot 3.0.1_header.csv", locale = locale(encoding = 'UTF-8'),
+                            delim="\t", col_types=cols(
  PlotObservationID = col_double(),
  PlotID = col_double(),
  `TV2 relevé number` = col_double(),
  Country = col_factor(),
  `Cover abundance scale` = col_factor(),
-  Author = col_character(),
-  `Date of recording` = col_date(format="%Y"),
+  `Date of recording` = col_date(format="%d-%m-%Y"),
  `Relevé area (m²)` = col_double(),
  `Altitude (m)` = col_double(),
  `Aspect (°)` = col_double(),
@@ -76,27 +75,38 @@ header <- readr::read_csv("../sPlot_data_export/sPlot_data_header_fix1.csv",
  `Aver. height lowest herbs (cm)` = col_double(),
  `Maximum height herbs (cm)` = col_double(),
  `Maximum height cryptogams (mm)` = col_double(),
-  `Mosses identified (y/n)` = col_logical(),
-  `Lichens identified (y/n)` = col_logical(),
+  `Mosses identified (y/n)` = col_factor(),
+  `Lichens identified (y/n)` = col_factor(),
+  COMMUNITY = col_character(),
+  SUBSTRATE = col_character(),
  Locality = col_character(),
-  Naturalness = col_factor(),
+  ORIG_NUM = col_double(),
+  ALLIAN_REV = col_character(),
+  REV_AUTHOR = col_character(),
  Forest = col_logical(),
-  Shrubland = col_logical(),
-  Wetland = col_logical(),
  Grassland = col_logical(),
+  Wetland = col_logical(),
  `Sparse vegetation` = col_logical(),
+  Shrubland = col_logical(),
+  `Plants recorded` = col_factor(),
+  `Herbs identified (y/n)` = col_factor(),
+  Naturalness = col_factor(),
  EUNIS = col_factor(),
  Longitude = col_double(),
  Latitude = col_double(),
  `Location uncertainty (m)` = col_double(),
-  Dataset = col_factor()
+  Dataset = col_factor(),
+  GUID = col_character()
 ))
 ```

-Split data into tiles of 1 x 1 degrees, and create sp::SpatialPointsDataFrame files. Only for plots having a location uncertainty < 50 km
+Split data into tiles of 1 x 1 degrees, and create sp::SpatialPointsDataFrame files. Only for plots having a location uncertainty < 50 km. (Include also plots without location uncertainty, arbitrarily set to 100 m)
 ```{r create tiles}
 header.sp <- header %>%
  dplyr::select(PlotObservationID, Dataset, Longitude, Latitude, `Location uncertainty (m)`) %>%
+  mutate(`Location uncertainty (m)`, 
+         list=is.na(`Location uncertainty (m)`), 
+         value=-100) %>%
  filter(`Location uncertainty (m)`<= 50000) %>%
  mutate_at(.vars=vars(Longitude, Latitude), 
            .funs=list(tile=~cut(., breaks = seq(-180,180, by=1)))) %>%
@@ -104,16 +114,16 @@ header.sp <- header %>%
  mutate(tilenam=factor(paste(Longitude_tile, Latitude_tile))) %>%
  mutate(`Location uncertainty (m)`=abs(`Location uncertainty (m)`))
 ```
-There are `r nrow(header.sp)` plots out of `r nrow(header)` plots with Location uncertainty <= 50km.  
+There are `r nrow(header.sp)` plots out of `r nrow(header)` plots with Location uncertainty <= 50km (or absent).  
  


 Extract elevation for each plot. Loops over tiles of 1 x 1°, projects to mercator, and extract elevation for plot coordinates, as well as 2.5, 50, and 97.5 quantiles for a buffer area having a radius equal to the location uncertainty of each plot (but only if location uncertainty < 50 km). DEM derive from package [elevatr](https://cran.r-project.org/web/packages/elevatr/vignettes/introduction_to_elevatr.html#get_raster_elevation_data), which uses the [Terrain Tiles on Amazon Web Services](https://registry.opendata.aws/terrain-tiles/). Resolutions of DEM rasters vary by region. I set a zoom factor z=10, which corresponds to ~ 75-150 m. Sources are: SRTM, data.gov.at in Austria, NRCAN in Canada, SRTM, NED/3DEP 1/3 arcsec, data.gov.uk in United Kingdom, INEGI in Mexico, ArcticDEM in latitudes above 60°, LINZ in New Zealand, Kartverket in Norway, as described [here](https://github.com/tilezen/joerd/blob/master/docs/data-sources.md)

-```{r eval=F}
+```{r }
 require(parallel)
 require(doParallel)
-cl <- makeForkCluster(8, outfile="")
+cl <- makeForkCluster(6, outfile="")
 registerDoParallel(cl)

 clusterEvalQ(cl, {
@@ -122,8 +132,20 @@ clusterEvalQ(cl, {
  library(elevatr)
  library(dplyr)
  })
-
-foreach(i = todo[length(todo):1]) %do% {
+#Define robustQuantile function
+#if the center of the plot is not at elevation <0, 
+#and not all values are <0, calculate only quantiles for values >0
+  RobustQuantile <- function(x, probs, center, loc.uncert){
+    x <- x[!is.na(x)]
+    if(length(x)==0) return(NA) else {
+      if(all(x)<0 | center<0) return(stats::quantile(x, probs, na.rm=T)) else {
+        x2 <- x[which(x>=0)]
+        return(stats::quantile(x2, probs, na.rm=T))
+      }
+    }
+  }
+
+foreach(i = 1:nlevels(header.sp$tilenam)) %do% {
  #create sp and project data
  if(nrow(header.sp %>% filter(tilenam %in% levels(header.sp$tilenam)[i])) ==0 ) next()
  sp.tile <- SpatialPointsDataFrame(coords=header.sp %>% 
@@ -142,13 +164,18 @@ foreach(i = todo[length(todo):1]) %do% {
  #extract and summarize elevation data
  elev.tile <- raster::extract(raster.tile, sp.tile, small=T)
  elev.tile.buffer <- raster::extract(raster.tile, sp.tile, buffer=sp.tile$`Location uncertainty (m)`, small=T)
-  elev.q95 <- t(round(sapply(elev.tile.buffer,stats::quantile, probs=c(0.025, 0.5, 0.975), na.rm=T)))
+  tmp <- round(mapply( RobustQuantile, 
+                            x=elev.tile.buffer,
+                            center=elev.tile,
+                            probs=rep(c(0.025, 0.5, 0.975), each=length(elev.tile)), 
+                            loc.uncert=sp.tile$`Location uncertainty (m)`))
+  elev.q95 <- setNames(data.frame(matrix(tmp, ncol = 3, nrow = length(elev.tile.buffer))), 
+         c("Elevation_q2.5", "Elevation_median", "Elevation_q97.5"))
  output.tile <- data.frame(PlotObservationID=sp.tile$PlotObservationID, 
-                            elevation=round(elev.tile), 
-                            elev.q95, 
-                            DEM.res=res(raster.tile)[1]) %>%
-    rename(Elevation_q2.5=X2.5., Elevation_median=X50., Elevation_q97.5=X97.5.)
-  
+                          elevation=round(elev.tile), 
+                          elev.q95, 
+                          DEM.res=res(raster.tile)[1]) 
+
 #save output
  save(output.tile, file = paste("../_derived/elevatr/elevation_tile_", i, ".RData", sep=""))
  print(i)
@@ -157,7 +184,7 @@ foreach(i = todo[length(todo):1]) %do% {
 ```

 For those tiles that failed, extract elevation of remaining plots one by one
-```{r, eval=F}
+```{r}
 #create list of tiles for which dem could not be extracted
 myfiles <- list.files("../_derived/elevatr/")
 done <- as.numeric(unlist(regmatches(myfiles, gregexpr("[[:digit:]]+", myfiles))))
@@ -172,7 +199,7 @@ sp.tile0 <- SpatialPointsDataFrame(coords=header.sp %>%
                                    filter(tilenam %in% levels(header.sp$tilenam)[todo]) %>%
                                    dplyr::select(-Longitude, -Latitude),
                                  proj4string = CRS("+init=epsg:4326"))
-sp.tile0 <- spTransform(sp.tile, CRSobj = CRS("+init=epsg:3857")) #project to mercator
+sp.tile0 <- spTransform(sp.tile0, CRSobj = CRS("+init=epsg:3857")) #project to mercator
 output.tile <- data.frame(NULL)
 #Loop over all plots
 for(i in 1:nrow(sp.tile0)){
@@ -192,7 +219,12 @@ for(i in 1:nrow(sp.tile0)){
  elev.tile <- raster::extract(raster.tile, sp.tile, small=T)
  elev.tile.buffer <- raster::extract(raster.tile, sp.tile, 
                                      buffer=sp.tile$`Location uncertainty (m)`, small=T)
-  elev.q95 <- t(round(sapply(elev.tile.buffer,stats::quantile, probs=c(0.025, 0.5, 0.975), na.rm=T)))
+  #elev.q95 <- t(round(sapply(elev.tile.buffer,stats::quantile, probs=c(0.025, 0.5, 0.975), na.rm=T)))
+  elev.q95 <- t(round(mapply( RobustQuantile, 
+                            x=elev.tile.buffer,
+                            center=elev.tile,
+                            probs=rep(c(0.025, 0.5, 0.975), each=length(elev.tile)), 
+                            loc.uncert=sp.tile$`Location uncertainty (m)`)))
  output.tile <- bind_rows(output.tile, 
                        data.frame(PlotObservationID=sp.tile$PlotObservationID, 
                            elevation=round(elev.tile), 
@@ -205,7 +237,7 @@ save(output.tile, file = paste("../_derived/elevatr/elevation_tile_", 0, ".RData


 Compose tiles into a single output, and export
-```{r, eval=F}
+```{r }
 myfiles <- list.files(path)
 myfiles <- myfiles[grep(pattern="*.RData$", myfiles)]
 #create empty data.frame
@@ -223,7 +255,7 @@ for(i in 1:length(myfiles)){
  if(i %in% seq(1,length(myfiles), by=25)){print(i)}
 }

-write_csv(elevation.out, path ="../_derived/elevatr/Elevation_out.csv")
+write_csv(elevation.out, path ="../_derived/elevatr/Elevation_out_v301.csv")

 ```
 Reimport output and check
@@ -296,24 +328,22 @@ for(i in 1:nlevels(strange$tilenam)){
  sp.i <- SpatialPoints(coords=strange.i %>%
                          dplyr::select(Longitude, Latitude),
                        proj4string = CRS("+init=epsg:4326"))
-  dem.i <- get_elev_raster(sp.i, z=10, expand = 1)
+  dem.i <- get_elev_raster(sp.i, z=ifelse(i!=15,5,8), expand=ifelse(i!=15,5,0.1)) #API doesn't work for tile i
  dem.df <- data.frame(xyFromCell(dem.i, cell=1:ncell(dem.i)), z=getValues(dem.i))
  
  ggstrange <- ggplot(countries, aes(x=long, y=lat, group = group)) +
-    geom_tile(data=dem.df, aes(x=x, y=y, colour=z, group=1)) + 
+    geom_raster(data=dem.df, aes(x=x, y=y, fill=z, group=1)) + 
    geom_polygon(col=gray(0.3), lwd=0.3, fill = gray(0.9), alpha=1/5) + 
    geom_point(data=strange.i, 
              aes(x=Longitude, y=Latitude, group=1), col="red")+ 
    coord_equal(xlim= robust.range(strange.i$Longitude),
                    ylim= robust.range(strange.i$Latitude)) + 
-    geom_label_repel(data=strange.i, 
+    geom_text_repel(data=strange.i, 
              aes(x=Longitude, y=Latitude, group=1,
-                  label=strange.i$PlotObservationID), 
-                  box.padding   = 0.35, 
-                  point.padding = 0.5,
-                  segment.color = 'grey50') +
-    ggtitle(strange.i$Dataset[[i]]) + 
-    scale_color_viridis() + 
+                  label=strange.i$PlotObservationID)
+              ) +
+    ggtitle(strange.i$Dataset[1], subtitle = strange.i$Country[1]) + 
+    scale_fill_viridis() + 
    theme_bw()  + 
    theme(axis.title = element_blank())
  print(ggstrange)

--- a/code/02_Compile_dataset.Rmd
+++ b/code/02_Compile_dataset.Rmd
@@ -10,14 +10,20 @@ knitr::opts_chunk$set(echo = TRUE)
 ```

 ## Previously known problems still to be fixed:  
-1) Import field 'Plants Recorded' into header (SH) - create dictionary of possible factors (FMS)  
-2) Import field 'Herbs identified' into header (SH)  
+1) Import field 'Plants Recorded' into header (SH) - create dictionary of possible factors (FMS).Values not available for EVA's datasets. Ask Milan if we can simply assume 'All vascular plants'     
+2) Import field 'Herbs identified' into header (SH) - Values not available for EVA's datasets. According to SH, we can simply assume Y.  
 3) Database with empty 'Cover code' value in DT - British_columbia_meadows and USA_VegBank (FMS)  
 4) Formations - Assign zeros to columns (Forest, Grassland, Shrubland, Wetland, Sparse), when at least one 1  
 5) Link to EUNIS cross-link table, and assign Faber-Langedon Formation (FMS)  
 6) Assign plot elevation using external sources (FMS) 
 7) Add GIVD codes  

+
+
+
+
+
+
 Arbitrarily set location uncertainty to -100, for missing values

 Additional data from external sources