Skip to content
Snippets Groups Projects
Commit 1ef5f767 authored by Francesco Sabatini's avatar Francesco Sabatini
Browse files

Created v1.1 of 00_check_data. Check species file, and add To Do list

parent e9029d82
No related branches found
No related tags found
No related merge requests found
--- ---
title: 'sPlot 3.0 - Validity Check' title: 'sPlot 3.0 - Validity Check'
author: "Francesco Maria Sabatini"
output: output:
html_document: default html_document: default
always_allow_html: yes always_allow_html: yes
--- ---
<center> <center>
![](../splot-long-rgb.png "sPlot Logo") ![](/data/sPlot/users/Francesco/_sPlot_Management/splot-long-rgb.png "sPlot Logo")
</center> </center>
...@@ -15,10 +16,11 @@ always_allow_html: yes ...@@ -15,10 +16,11 @@ always_allow_html: yes
**Timestamp:** `r date()` **Timestamp:** `r date()`
**Drafted:** Francesco Maria Sabatini **Drafted:** Francesco Maria Sabatini
**Version:** 1.0 **Revised:** Stephan Hennekens
**Version:** 1.1
This report checks for consistency of the dataset used for constructing sPlot 3.0. This report checks for consistency of the dataset used for constructing sPlot 3.0.
*Changes to v1.1* - Added check to species data. Created To Do list.
```{r results="hide", message=F, warning=F} ```{r results="hide", message=F, warning=F}
library(reshape2) library(reshape2)
...@@ -37,7 +39,7 @@ library(xlsx) ...@@ -37,7 +39,7 @@ library(xlsx)
```{r} ```{r}
#Import sPlot data #Import sPlot data
header <- readr::read_delim("sPlot_data_export/sPlot_data_header.csv", header <- readr::read_delim("../sPlot_data_export/sPlot_data_header.csv",
delim="\t", guess_max = 100000) delim="\t", guess_max = 100000)
``` ```
...@@ -66,12 +68,21 @@ header.fix <- header %>% ...@@ -66,12 +68,21 @@ header.fix <- header %>%
list=`Mosses identified (y/n)` %in% c("1", "j", "J", "T", "y", "Y" ), list=`Mosses identified (y/n)` %in% c("1", "j", "J", "T", "y", "Y" ),
values="TRUE")) values="TRUE"))
write_csv(header.fix, path = "sPlot_data_export/sPlot_data_header_fix1.csv") write_csv(header.fix, path = "../sPlot_data_export/sPlot_data_header_fix1.csv")
``` ```
Other known problems still to be fixed:
1) Import field 'Plants Recorded' into header (SH) - create dictionary of possible factors (FMS)
2) Import field 'Herbs identified' into header (SH)
3) Formations - Assign zeros to columns (Forest, Grassland, Shrubland, Wetland, Sparse), when at least one 1 is present (FMS)
4) Link to EUNIS cross-link table, and assign Faber-Langedon Formation (FMS)
5) Assign plot elevation using external sources (FMS)
Reimport with parse Reimport with parse
```{r} ```{r}
header <- readr::read_csv("sPlot_data_export/sPlot_data_header_fix1.csv", header <- readr::read_csv("../sPlot_data_export/sPlot_data_header_fix1.csv",
col_types=cols( col_types=cols(
PlotObservationID = col_double(), PlotObservationID = col_double(),
PlotID = col_double(), PlotID = col_double(),
...@@ -118,6 +129,8 @@ header <- readr::read_csv("sPlot_data_export/sPlot_data_header_fix1.csv", ...@@ -118,6 +129,8 @@ header <- readr::read_csv("sPlot_data_export/sPlot_data_header_fix1.csv",
Dataset = col_factor() Dataset = col_factor()
)) ))
``` ```
After fixing problems there are `r nrow(header)` plots remaining.
Show remaining problems Show remaining problems
```{r} ```{r}
...@@ -131,7 +144,85 @@ knitr::kable(problems(header) %>% ...@@ -131,7 +144,85 @@ knitr::kable(problems(header) %>%
``` ```
After fixing problems there are `r nrow(header)` plots remaining. Check DT table
```{r}
DT0 <- readr::read_delim("../sPlot_data_export/sPlot_data_species.csv",
delim="\t",
col_type = cols(
PlotObservationID = col_double(),
Taxonomy = col_character(),
`Taxon group` = col_character(),
`Taxon group ID` = col_double(),
`Turboveg2 concept` = col_character(),
`Matched concept` = col_character(),
Match = col_double(),
Layer = col_double(),
`Cover %` = col_double(),
`Cover code` = col_character(),
x_ = col_double()
)
)
```
Show problems in DT import
```{r}
knitr::kable(problems(DT0) %>%
mutate(Dataset=DT0$Taxonomy[problems(DT0)$row]) %>%
dplyr::select(Dataset, col, expected, actual) %>%
distinct(),
caption="Problems when importing Species data") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "center")
```
```{r, echo=F}
id <- DT0$PlotObservationID[(problems(DT0) %>% dplyr::select(row) %>% distinct())$row]
relnum <- (header %>% filter(PlotObservationID == DT0$PlotObservationID[(problems(DT0) %>% dplyr::select(row) %>% distinct())$row]))$`TV2 relevé number`
db <- (problems(DT0) %>% mutate(Dataset=DT0$Taxonomy[problems(DT0)$row]) %>% dplyr::select(Dataset) %>% distinct())[1,1, drop=T]
```
All problems seem to be concentrated in PlotID = `r id` which corresponds to TV2 relevé Number = `r relnum ` in `r db`.
Other known problems:
```{r}
#There are some plots without the appropriate cover code
knitr::kable(DT0 %>%
filter(`Cover %` ==0 & is.na(`Cover code`)) %>%
group_by(Taxonomy) %>%
summarize(n()),
caption="Summary of DBs without appropriate cover codes") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "center")
# in db Britisch_Columbia_meadows this seems to depend on lichen\moss species
knitr::kable(DT0 %>%
filter(`Cover %` ==0 & is.na(`Cover code`) & Taxonomy=="British_Columbia_meadows") %>%
dplyr::select(PlotObservationID, Taxonomy, `Matched concept`:x_),
caption="Summary of DBs without appropriate cover codes") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "center")
# in db USA_CVS this seems to be due to a duplication of species records, w\o assignment to layers (SH will fix this in source file)
knitr::kable(DT0 %>%
filter(`Cover %` ==0 & is.na(`Cover code`) & Taxonomy=="USA_CVS")%>%
dplyr::select(PlotObservationID, Taxonomy, `Matched concept`:x_),
caption="Summary of DBs without appropriate cover codes") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "center")
# in db USA_VegBank there seem to be species with only p\a values, next to species with also cover (FMS will have to solve this)
knitr::kable(DT0 %>%
filter(`Cover %` ==0 & is.na(`Cover code`) & Taxonomy=="USA_VegBank")%>%
dplyr::select(PlotObservationID, Taxonomy, `Matched concept`:x_),
caption="Summary of DBs without appropriate cover codes") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "center")
```
Distribution of plots across datasets: Distribution of plots across datasets:
...@@ -187,7 +278,7 @@ for(dd in 1:nlevels(header$Dataset)){ ...@@ -187,7 +278,7 @@ for(dd in 1:nlevels(header$Dataset)){
Other observed problems: Other observed problems:
Some plots in the Hungary dataset have a altitude >5000 m (!) Some plots in the Hungary dataset have a altitude >5000 m (!)
```{r, eval=F, echo=f} ```{r, eval=F, echo=F}
print(ggplot(data=datasel %>% print(ggplot(data=datasel %>%
melt()) + melt()) +
geom_histogram(aes(x=value)) + geom_histogram(aes(x=value)) +
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment