Symobio Modeling
Code for a comparative SDM study for about 600 South American mammal species. Specifically, the study compares different modeling approaches for predicting species distributions.
An analysis of model performance can be found here: https://chrkoenig.quarto.pub/sdm-performance-report/
Project Structure
-
R/
: Contains all the R scripts organized by workflow steps. -
Symobio_modeling.Rproj
: RStudio project file for easy navigation. -
README.md
: High-level overview of the project. -
renv/
: Manages package dependencies for reproducibility. -
renv.lock
: Lockfile forrenv
to ensure consistent package versions. -
data/
: Input data (geo, phylo), intermediate data and modeling results -
plots/
: Plots for visualizing data processing and analysis steps
Workflow Overview
The workflow is divided into several stages, each represented by scripts in the R/
directory. Below is a summary of the key steps:
1. Data Preparation
Pre-process species-specific and environmental information for model fitting and results analysis.
-
01_01_range_preparation.R
: Process species range maps and calculate range dissimilarity. -
01_02_traits_preparation.R
: Prepare species trait data and calculate functional distances. -
01_03_phylo_preparation.R
: Process phylogenetic information and calculate phylogenetic distances. -
01_04_raster_preparation.R
: Prepare environmental raster layers for modeling for data extraction.
2. Presence/Absence Data Processing
Query presence data from Symobio DB, sample absence data, initial exploration of the dataset.
-
02_01_presence_data_preparation.R
: Query species occurrence data from Symobio DB, extract environmental variables from raster files. -
02_02_absence_data_preparation.R
: Sample pseudo-absence points, extract environmental variables from raster files. -
02_03_model_data_finalization.R
: Create final dataset for modeling. -
02_04_dataset_exploration.R
: Explore and visualize the dataset.
3. Modeling
Scripts for model fitting
-
03_01_modelling_ssdm.R
: Fit single-species distribution models (SSDM) based on different algorithms. -
03_02_modelling_msdm_embed.R
: Fit multi-species distribution model (MSDM) based on Deep Neural Network with species embeddings. -
03_03_modelling_msdm_onehot.R
: Fit multi-species distribution model (MSDM) based on Deep Neural Network with species identity as factor. -
03_04_modelling_msdm_rf.R
: Fit multi-species distribution model (MSDM) based on Random Forest with species identity as factor.
4. Analysis and Reporting
Analyse modeling results
-
04_01_performance_report.qmd
: Generate an interactive performance evaluation of implemented SDM algorithms. -
04_02_publication_analysis.R
: Explore results in depth, analyse
Miscellaneous
-
utils.R
: Contains utility functions used across multiple scripts. -
_publish.yml
: Configuration for publishing reports and analyses.
Getting Started
- Clone the repository and open the
Symobio_modeling.Rproj
file in RStudio. - Restore the project environment using
renv
:renv::restore()
- Set up the directory structure using the
setup_dirs()
function in theutils.R
- Run the scripts in the R/ directory sequentially. Some scripts, especially for model fitting, may run a long time and benefit from powerful hardware.
Additional Notes
- Ensure that all required input data (e.g., range maps, raster files) is available in the expected directories.
- Outputs from each script are typically saved to disk and used as inputs for subsequent scripts.