# Codebase Documentation

This repository implements a species distribution modeling comparison study for about 600 South American mammal species. Specifically, the study compares different modeling approaches for predicting species distributions.

## Project Structure

- **`R/`**: Contains all the R scripts organized by workflow steps.
- **`renv/`**: Manages package dependencies for reproducibility.
- **`Symobio_modeling.Rproj`**: RStudio project file for easy navigation.
- **`README.md`**: High-level overview of the project.
- **`occurrences.png`**: Visualization or reference image for occurrences data.
- **`.Rprofile`**: Custom R environment settings.
- **`renv.lock`**: Lockfile for `renv` to ensure consistent package versions.

## Workflow Overview

The workflow is divided into several stages, each represented by scripts in the `R/` directory. Below is a summary of the key steps:

### 1. Data Preparation
Pre-processing of species-specific and environmental information for model fitting and results analysis.

- **`01_01_range_preparation.R`**: Process species range maps and calculate range dissimilarity.
- **`01_02_traits_preparation.R`**: Prepare species trait data and calculate functional distances.
- **`01_03_phylo_preparation.R`**: Process phylogenetic information and alculate phylogenetic distances.
- **`01_04_raster_preparation.R`**: Prepare environmental raster layers for modeling for data extraction.

### 2. Presence/Absence Data Processing
Querying of presence data from Symobio DB, sampling of absence data and initial exploration of the dataset.

- **`02_01_presence_data_preparation.R`**: Query species occurrence data from Symobio DB, extract environmental variables from raster files.
- **`02_02_absence_data_preparation.R`**: Sample pseudo-absence points, extract environmental variables from raster files.
- **`02_03_model_data_finalization.R`**: Create final dataset for modeling.
- **`02_04_dataset_exploration.R`**: Explore and visualize the dataset.

### 3. Modeling
Scripts for model fitting

- **`03_01_modelling_ssdm.R`**: Fit single-species distribution models (SSDM) based on different algorithms.
- **`03_02_modelling_msdm_embed.R`**: Fit multi-species distribution model (MSDM) based on Deep Neural Network with species embeddings.
- **`03_03_modelling_msdm_onehot.R`**: Fit multi-species distribution model (MSDM) based on Deep Neural Network with species identity as factor.
- **`03_04_modelling_msdm_rf.R`**: Fit multi-species distribution model (MSDM) based on Random Forest with species identity as factor.

### 4. Analysis and Reporting
- **`04_01_performance_report.qmd`**: Generate an interactive performance evaluation of implemented SDM algorithms.
- **`04_02_publication_analysis.R`**: Explore results in depth, analyse 

### Miscellaneous
- **`utils.R`**: Contains utility functions used across multiple scripts.
- **`_publish.yml`**: Configuration for publishing reports and analyses.

## Getting Started

1. Clone the repository and open the `Symobio_modeling.Rproj` file in RStudio.
2. Restore the project environment using `renv`:
   ```r
   renv::restore()
   ```
3. Run the scripts in the R/ directory sequentially. Some scripts, especially for model fitting, may run a long time and benefit from powerful hardware.

## Additional Notes
- Ensure that all required input data (e.g., range maps, raster files) is available in the expected directories. 
- Outputs from each script are typically saved to disk and used as inputs for subsequent scripts.
- Refer to the README.md file for any additional project-specific instructions.