Skip to content
Snippets Groups Projects
Commit b291f906 authored by Francesco Sabatini's avatar Francesco Sabatini
Browse files

Adapted 00_Check_data script to dataset 3.0.1 and highlighted main problems

parent 20daf8cb
Branches
No related tags found
No related merge requests found
......@@ -13,7 +13,8 @@ always_allow_html: yes
***
**Timestamp:** `r date()`
**Drafted:** Francesco Maria Sabatini
**Revised:** Stephan Hennekens
......@@ -21,8 +22,13 @@ always_allow_html: yes
This report checks for consistency of the dataset used for constructing sPlot 3.0.
*Changes to v1.1* - Added check to species data. Created To Do list.
*Changes to v1.1* - based on dataset sPlot_3.0.1, received on 29/06/2019 from SH
*Changes to v1.2* - based on dataset sPlot_3.0.1, received on 29/06/2019 from SH
***
Key Problems:
Fields 'Herbs identified (y/n)' and 'Plants recorded' are mostly empty!
The is still a high proportion of plots without location uncertainty
```{r results="hide", message=F, warning=F}
library(reshape2)
library(tidyverse)
......@@ -137,6 +143,11 @@ knitr::kable(header %>%
full_width = F, position = "center")
```
```{r, echo=F}
nNAs <- nrow(header %>% filter(is.na(`Location uncertainty (m)`)))
```
There are still `r nNAs` plots without location uncertainty.
## Previously known problems still to be fixed:
1) Import field 'Plants Recorded' into header (SH) - create dictionary of possible factors (FMS)
......@@ -149,7 +160,7 @@ knitr::kable(table(levels(header$`Plants recorded`), exclude=NULL),
The field is mostly empty!!
2) Import field 'Herbs identified' into header (SH)
2) Import field 'Herbs identified (y/n)' into header (SH)
```{r}
knitr::kable(table(levels(header$`Herbs identified (y/n)`), exclude=NULL),
caption="Number of records for each level in Plants recorded") %>%
......@@ -158,18 +169,6 @@ knitr::kable(table(levels(header$`Herbs identified (y/n)`), exclude=NULL),
```
The field is mostly empty!!
3) Formations - Assign zeros to columns (Forest, Grassland, Shrubland, Wetland, Sparse), when at least one 1 is present (FMS)
```{r}
header <- header %>%
mutate(any1=rowSums(select(., Forest:Shrubland), na.rm=T)) %>%
mutate_at(.vars = vars(Forest:Shrubland),
.funs = ~ifelse(any1>0, ifelse(!is.na(.), ., 0), 0)) %>%
select(Forest:Shrubland, any1) %>%
filter(any1>0)
```
4) Link to EUNIS cross-link table, and assign Faber-Langedon Formation (FMS)
5) Assign plot elevation using external sources (FMS)
......@@ -177,8 +176,8 @@ header <- header %>%
# Check DT table
```{r, eval=F}
DT0 <- readr::read_delim("../sPlot_data_export/sPlot_data_species.csv",
```{r }
DT0 <- readr::read_delim("../sPlot_data_export/sPlot 3.0.1_species.csv",
delim="\t",
col_type = cols(
PlotObservationID = col_double(),
......@@ -197,7 +196,7 @@ DT0 <- readr::read_delim("../sPlot_data_export/sPlot_data_species.csv",
```
Show problems in DT import
```{r, eval=F}
```{r}
knitr::kable(problems(DT0) %>%
mutate(Dataset=DT0$Taxonomy[problems(DT0)$row]) %>%
dplyr::select(Dataset, col, expected, actual) %>%
......@@ -208,7 +207,7 @@ knitr::kable(problems(DT0) %>%
```
```{r, echo=F, eval=F}
```{r, echo=F}
id <- as.character(DT0$PlotObservationID[(problems(DT0) %>% dplyr::select(row) %>% distinct())$row])
relnum <- (header %>% filter(PlotObservationID == DT0$PlotObservationID[(problems(DT0) %>% dplyr::select(row) %>% distinct())$row]))$`TV2 relevé number`
db <- (problems(DT0) %>% mutate(Dataset=DT0$Taxonomy[problems(DT0)$row]) %>% dplyr::select(Dataset) %>% distinct())[1,1, drop=T]
......@@ -219,7 +218,7 @@ db <- (problems(DT0) %>% mutate(Dataset=DT0$Taxonomy[problems(DT0)$row]) %>% dpl
All problems seem to be concentrated in PlotID = `r id` which corresponds to TV2 relevé Number = `r relnum ` in `r db`.
Other known problems:
```{r, eval=F}
```{r}
#There are some plots without the appropriate cover code
knitr::kable(DT0 %>%
filter(`Cover %` ==0 & is.na(`Cover code`)) %>%
......@@ -237,14 +236,6 @@ knitr::kable(head(DT0 %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "center")
# in db USA_CVS this seems to be due to a duplication of species records, w\o assignment to layers (SH will fix this in source file)
knitr::kable(head(DT0 %>%
filter(`Cover %` ==0 & is.na(`Cover code`) & Taxonomy=="USA_CVS")%>%
dplyr::select(PlotObservationID, Taxonomy, `Matched concept`:x_),10),
caption="Example from USA_CVS db (only first 10 rows)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "center")
# in db USA_VegBank there seem to be species with only p\a values, next to species with also cover (FMS will have to solve this)
knitr::kable(head(DT0 %>%
......@@ -254,17 +245,21 @@ knitr::kable(head(DT0 %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "center")
```
FMS will solve these problems, by assigning an arbitrary low (e.g. 0.1%) cover value to species sampled only as pa. Ideally, FMS should also add a note in the _x field
Distribution of plots across datasets:
```{r, echo=F, eval=F}
```{r, echo=F}
knitr::kable(table(header$Dataset), caption="Plots per dataset") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")
```
# Check geographic coordinates
```{r, message=F, eval=T, cache=T}
```{r, message=F, echo=F}
countries <- map_data("world")
ggworld <- ggplot(countries, aes(x=long, y=lat, group = group)) +
geom_polygon(col=gray(0.3), lwd=0.3, fill = gray(0.9)) +
......@@ -300,10 +295,16 @@ for(dd in 1:nlevels(header$Dataset)){
```
# Other observed problems:
Some plots in the Hungary dataset have a altitude >5000 m (!)
```{r, eval=F, echo=F}
#Depreated below
#Some plots in the Hungary dataset have a altitude >5000 m (!)
print(ggplot(data=datasel %>%
melt()) +
geom_histogram(aes(x=value)) +
......@@ -311,13 +312,8 @@ print(ggplot(data=datasel %>%
theme_minimal() +
theme(axis.text = element_text(size = 8)))
```
Depreated below
Fix known problems
```{r, eval=F}
#Fix known problems
header.fix <- header %>%
mutate(`Altitude (m)`=gsub(`Altitude (m)`, pattern=" ", replacement="")) %>%
mutate(`Altitude (m)`=gsub(`Altitude (m)`, pattern="-", replacement=NA)) %>%
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment