abstract: "This document describes the workflow (with contributions from Oliver Purschke, Jürgen Dengler and Florian Jansen) that was used to generate the taxonomic backbone that standardizes taxon names across the (i) global vegetation plot database sPlot version 3.0 and (ii) the global plant trait data base TRY version 5."
abstract: "This document describes the workflow (with contributions from Oliver Purschke, Jürgen Dengler and Florian Jansen) that was used to generate the taxonomic backbone that standardizes taxon names across the (i) global vegetation plot database sPlot version 3.0 and (ii) the global plant trait data base TRY version 5."
A total of `r nrow(spec.list.TRY.sPlot %>% filter(OriginalNames != Species))` species names were modified. Although substantially improved, the species list has still quite a lot of inconsistencies.
The total list submitted to TNRS containes `r length(unique(spec.list.TRY.sPlot$Species))` species names.
The total list submitted to TNRS contains `r length(unique(spec.list.TRY.sPlot$Species))` species names.
# Match names against Taxonomic Name Resolution Service ([TNRS](http://tnrs.iplantcollaborative.org))
...
...
@@ -324,7 +327,7 @@ tnrs.res <- tnrs.res0 %>%
slice(1)
```
After this first step, there are `r sum(tnrs.res$Name_matched=="No suitable matches found.")` recprds for which no match was found. Another `r sum(tnrs.res$Overall_score<0.9)` were unreliably matched (overall match score <0.9).
After this first step, there are `r sum(tnrs.res$Name_matched=="No suitable matches found.")` records for which no match was found. Another `r sum(tnrs.res$Overall_score<0.9)` were unreliably matched (overall match score <0.9).
**Total number of unique standardized taxon names and families:**
```{r, eval = T}
length(unique(Backbone$name_short_correct))-1 # minus 1 for NA
length(unique(Backbone$Name_short))-1 # minus 1 for NA
length(unique(Backbone$Family_correct))-1 # minus 1 for NA
```
...
...
@@ -1190,14 +1572,12 @@ entries per resolved name. (Only first 20 shown") %>%
### Based on `unique` standardized names
Generate version of the backbone that only includes the unique resolved names in `name.short.correct`, and for the non-unique names, the first rows of duplicated name:
Generate version of the backbone that only includes the unique resolved names in `Name.short`, and for the non-unique names, the first rows of duplicated name:
```{r, eval = T}
Backbone.uni <- Backbone %>%
distinct(Name_short, .keep_all = T) %>%
filter(!is.na(Name_short))
nrow(Backbone.uni)
```
There are `r nrow(Backbone.uni)` unique taxon names the in the backbone.
**Now, run the stats for unique resolved names (excluding non-vascular and non-matching taxa):**
```{r, eval = T}
nrow(Backbone.uni.vasc$Name_short)
length(Backbone.uni.vasc$Name_short)
```
There are `r nrow(Backbone.uni.vasc$name.short.correct)` unique (vascular plant) taxon names:
There are `r length(Backbone.uni.vasc$Name.short)` unique (vascular plant) taxon names:
```{r, eval = T, echo=F}
kable((table(Backbone.uni.vasc$sPlot_TRY)), caption = "Number of (standardized) vascular plant taxon names per unique to, and shared between TRY (S), sPlot (T) and the Alpine (A) dataset.") %>%