Skip to content
Snippets Groups Projects

Symobio Modeling

Code for a comparative SDM study for about 600 South American mammal species. Specifically, the study compares different modeling approaches for predicting species distributions.

An analysis of model performance can be found here: https://chrkoenig.quarto.pub/sdm-performance-report/

Project Structure

  • R/: Contains all the R scripts organized by workflow steps.
  • Symobio_modeling.Rproj: RStudio project file for easy navigation.
  • README.md: High-level overview of the project.
  • renv/: Manages package dependencies for reproducibility.
  • renv.lock: Lockfile for renv to ensure consistent package versions.
  • data/: Input data (geo, phylo), intermediate data and modeling results
  • plots/: Plots for visualizing data processing and analysis steps

Workflow Overview

The workflow is divided into several stages, each represented by scripts in the R/ directory. Below is a summary of the key steps:

1. Data Preparation

Pre-process species-specific and environmental information for model fitting and results analysis.

  • 01_01_range_preparation.R: Process species range maps and calculate range dissimilarity.
  • 01_02_traits_preparation.R: Prepare species trait data and calculate functional distances.
  • 01_03_phylo_preparation.R: Process phylogenetic information and calculate phylogenetic distances.
  • 01_04_raster_preparation.R: Prepare environmental raster layers for modeling for data extraction.

2. Presence/Absence Data Processing

Query presence data from Symobio DB, sample absence data, initial exploration of the dataset.

  • 02_01_presence_data_preparation.R: Query species occurrence data from Symobio DB, extract environmental variables from raster files.
  • 02_02_absence_data_preparation.R: Sample pseudo-absence points, extract environmental variables from raster files.
  • 02_03_model_data_finalization.R: Create final dataset for modeling.
  • 02_04_dataset_exploration.R: Explore and visualize the dataset.

3. Modeling

Scripts for model fitting

  • 03_01_modelling_ssdm.R: Fit single-species distribution models (SSDM) based on different algorithms.
  • 03_02_modelling_msdm_embed.R: Fit multi-species distribution model (MSDM) based on Deep Neural Network with species embeddings.
  • 03_03_modelling_msdm_onehot.R: Fit multi-species distribution model (MSDM) based on Deep Neural Network with species identity as factor.
  • 03_04_modelling_msdm_rf.R: Fit multi-species distribution model (MSDM) based on Random Forest with species identity as factor.

4. Analysis and Reporting

Analyse modeling results

  • 04_01_performance_report.qmd: Generate an interactive performance evaluation of implemented SDM algorithms.
  • 04_02_publication_analysis.R: Explore results in depth, analyse

Miscellaneous

  • utils.R: Contains utility functions used across multiple scripts.
  • _publish.yml: Configuration for publishing reports and analyses.

Getting Started

  1. Clone the repository and open the Symobio_modeling.Rproj file in RStudio.
  2. Restore the project environment using renv:
    renv::restore()
  3. Set up the directory structure using the setup_dirs() function in the utils.R
  4. Run the scripts in the R/ directory sequentially. Some scripts, especially for model fitting, may run a long time and benefit from powerful hardware.

Additional Notes

  • Ensure that all required input data (e.g., range maps, raster files) is available in the expected directories.
  • Outputs from each script are typically saved to disk and used as inputs for subsequent scripts.