diff --git a/code/00_CheckData.Rmd b/code/00_CheckData.Rmd index 38708542b2f14060874d24a0104c468d156a160c..386577ae627fb2b79eb0b8edb9b7a1b7f6f121d8 100644 --- a/code/00_CheckData.Rmd +++ b/code/00_CheckData.Rmd @@ -18,11 +18,13 @@ always_allow_html: yes **Timestamp:** `r date()` **Drafted:** Francesco Maria Sabatini **Revised:** Stephan Hennekens -**Version:** 1.2 +**Version:** 1.3 This report checks for consistency of the dataset used for constructing sPlot 3.0. *Changes to v1.1* - Added check to species data. Created To Do list. *Changes to v1.2* - based on dataset sPlot_3.0.1, received on 29/06/2019 from SH +*Changes to v1.3* - based on dataset sPlot_3.0.2, received on 24/07/2019 from SH + *** Key Problems: @@ -46,7 +48,7 @@ library(xlsx) # Check Header file Import with parse ```{r} -header <- readr::read_delim("../sPlot_data_export/sPlot 3.0.1_header.csv", locale = locale(encoding = 'UTF-8'), +header <- readr::read_delim("../sPlot_data_export/sPlot_3_0_2_header.csv", locale = locale(encoding = 'UTF-8'), delim="\t", col_types=cols( PlotObservationID = col_double(), PlotID = col_double(), @@ -129,7 +131,8 @@ knitr::kable(header %>% full_width = F, position = "center") ``` -Plots without location uncertainty (by dataset) +Plots without location uncertainty (by dataset). +We could probably assign at least a broad location, with wide uncertainty to the 2020 plots in the Egypt Nile delta dataset. ```{r} @@ -142,6 +145,15 @@ knitr::kable(header %>% full_width = F, position = "center") ``` +Big datasets without coordinate uncertainty are: +Czechia_nvd +Germany_vegetweb2 +Poland +Slovenia + +--> Talk to contributors and ask for 'average' uncertainty (e.g. 100 m) + + ```{r, echo=F} nNAs <- nrow(header %>% filter(is.na(`Location uncertainty (m)`))) ``` @@ -156,7 +168,7 @@ knitr::kable(table(header$`Plants recorded`, exclude=NULL), kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center") ``` -OK - Values not available for EVA's datasets. Ask Milan if we can simply assume 'All vascular plants' +--> Values not available for EVA's datasets. Ask Milan if we can simply assume 'All vascular plants' 2) Import field 'Herbs identified (y/n)' into header (SH) @@ -166,7 +178,7 @@ knitr::kable(table(header$`Herbs identified (y/n)`, exclude=NULL), kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center") ``` -OK - Values not available for EVA's datasets. According to SH, we can simply assume Y. +--> Values not available for EVA's datasets. According to SH, we can simply assume Y. @@ -176,7 +188,7 @@ OK - Values not available for EVA's datasets. According to SH, we can simply ass # Check DT table ```{r } -DT0 <- readr::read_delim("../sPlot_data_export/sPlot 3.0.1_species.csv", +DT0 <- readr::read_delim("../sPlot_data_export/sPlot_3_0_2_species.csv", delim="\t", col_type = cols( PlotObservationID = col_double(), @@ -227,6 +239,9 @@ knitr::kable(DT0 %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center") +``` + +```{r} # in db British_Columbia_meadows this seems to depend on lichen\moss species knitr::kable(head(DT0 %>% filter(`Cover %` ==0 & is.na(`Cover code`) & Taxonomy=="British_Columbia_meadows") %>% @@ -234,7 +249,9 @@ knitr::kable(head(DT0 %>% caption="Example from British Columbia meadows db (only first 10 rows)") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center") +``` +```{r} # in db USA_VegBank there seem to be species with only p\a values, next to species with also cover (FMS will have to solve this) knitr::kable(head(DT0 %>%