`Taxon group` information is only available for `r nknown` taxa, but absent for `r nunknown`. To improve the completeness of this field, we derive additional info from the `Backbone`, and merge it with the data already present in `DT`.
```{r}
table(DT1$`Taxon group`, exclude=NULL)
...
...
@@ -152,10 +160,19 @@ DT1 <- DT1 %>%
table(DT1$`Taxon group`, exclude=NULL)
```
Those taxon for which measures of Basal Area exist, can be safely assumed to belong to vascular plants
```{r}
DT1 <- DT1 %>%
mutate(`Taxon group`=replace(`Taxon group`,
list=`Cover code`=="x_BA",
values="Vascular plant"))
```
Cross-complement
Cross-complement `Taxon group` information. This means, whenever a taxon is marked to belong to one group, then assign the same taxon to that group throughout the `DT` table.
Check species with conflicting `Taxon group` information and fix manually.
```{r, eval=F}
#check for conflicts in attribution of genera to Taxon groups
conflict <- DT1 %>%
filter(!is.na(Name_short)) %>%
dplyr::select(Genus, `Taxon group`) %>%
filter(!is.na(`Taxon group`)) %>%
distinct() %>%
group_by(Genus) %>%
summarize(n=n()) %>%
filter(n>1) %>%
arrange(desc(n)) %>%
pull(Genus)
```
Manually fix some known problems in `Taxon group` attribution. Some list of taxa (e.g., `lichen.genera`, `mushroom.genera`) derive from the `Backbone`.
#check for conflicts in attribution of genera to Taxon groups
conflict <- DT1 %>%
filter(!is.na(Name_short)) %>%
dplyr::select(Genus, `Taxon group`) %>%
filter(!is.na(`Taxon group`)) %>%
distinct() %>%
group_by(Genus) %>%
summarize(n=n()) %>%
filter(n>1) %>%
arrange(desc(n)) %>%
pull(Genus)
```
Delete all records of fungi
Delete all records of fungi, and use lists of genera to fix additional problems. While in the previous round the matching was done on the resolve Genus name, here we match based on the unresolved Genus name.
After cross-checking all sources of information, the number of taxa not having `Taxon group` information decreased to `r nunknown` species.
Check the most frequent species for which we don't have taxon group info
```{r, echo=F, eval=F}
#Check the most frequent species for which we don't have taxon group info
DT1 %>%
filter(`Taxon group` == "Unknown") %>%
group_by(Genus) %>%
...
...
@@ -264,28 +286,84 @@ DT1 %>%
slice(1:40)
```
Calculate relative cover per layer per species in each plot
## Calculate relative cover per layer per species in each plot
Species abundance information varies across datasets and plots. While for the large majority of plots abundance values are returned as percentage cover, there is a subset where abundance is returned with different scales. These are marked in the column `Cover code` as follows:
\newline \newline
*x_BA* - Basal Area
*x_IC* - Individual count
*x_SC* - Stem count
*x_IV* - Relative Importance
*x_RF* - Relative Frequency
*x* - Presence absence
\newline \newline
Still, it's not really intuitive that in case `Cover code` belongs to one of the classes above, then the actual abundance value is stored in the `x_` column. This stems from the way this data is stored in `TURBOVEG`.
To make the cover data more user friendly, I simplify the way cover is stored, so that there are only two columns:
`Ab_scale` - to report the type of scale used
`Abundance` - to coalesce the cover\\abundance values previously in the columns `Cover %` and `x_`.
Fix some error. There are some plots where only p\\a information is available (`Cover code`=="x"), but have zeros in the field `Cover %`. Consider this as presence\\absence and transform `Cover %` to 1.
I then transform abundances to relative abundance, on a layer by layer basis. For consistency with the previous version of sPlot, I call the field `Relative cover`
The output of the DT table contains `r nrow(DT2)` records, over `r length(unique(DT2$PlotObservationID))` plots. The total number of taxa is `r length(unique(DT2$Species_original))` and `r length(unique(DT2$Species_matched))`, before and after standardization, respectively. Information on the `Taxon group` is available for `r DT2 %>% filter(Taxon_group!="Unknown") %>% distinct(Species_matched) %>% nrow()` standardized species.
The output of the DT table contains `r nrow(DT2)` records, over `r length(unique(DT2$PlotObservationID))` plots. The total number of taxa is `r length(unique(DT2$species_original))` and `r length(unique(DT2$species))`, before and after standardization, respectively. Information on the `Taxon group` is available for `r DT2 %>% filter(taxon_group!="Unknown") %>% distinct(species) %>% nrow()` standardized species.