# Codebase Documentation This repository implements a species distribution modeling comparison study for about 600 South American mammal species. Specifically, the study compares different modeling approaches for predicting species distributions. ## Project Structure - **`R/`**: Contains all the R scripts organized by workflow steps. - **`renv/`**: Manages package dependencies for reproducibility. - **`Symobio_modeling.Rproj`**: RStudio project file for easy navigation. - **`README.md`**: High-level overview of the project. - **`occurrences.png`**: Visualization or reference image for occurrences data. - **`.Rprofile`**: Custom R environment settings. - **`renv.lock`**: Lockfile for `renv` to ensure consistent package versions. ## Workflow Overview The workflow is divided into several stages, each represented by scripts in the `R/` directory. Below is a summary of the key steps: ### 1. Data Preparation Pre-processing of species-specific and environmental information for model fitting and results analysis. - **`01_01_range_preparation.R`**: Process species range maps and calculate range dissimilarity. - **`01_02_traits_preparation.R`**: Prepare species trait data and calculate functional distances. - **`01_03_phylo_preparation.R`**: Process phylogenetic information and alculate phylogenetic distances. - **`01_04_raster_preparation.R`**: Prepare environmental raster layers for modeling for data extraction. ### 2. Presence/Absence Data Processing Querying of presence data from Symobio DB, sampling of absence data and initial exploration of the dataset. - **`02_01_presence_data_preparation.R`**: Query species occurrence data from Symobio DB, extract environmental variables from raster files. - **`02_02_absence_data_preparation.R`**: Sample pseudo-absence points, extract environmental variables from raster files. - **`02_03_model_data_finalization.R`**: Create final dataset for modeling. - **`02_04_dataset_exploration.R`**: Explore and visualize the dataset. ### 3. Modeling Scripts for model fitting - **`03_01_modelling_ssdm.R`**: Fit single-species distribution models (SSDM) based on different algorithms. - **`03_02_modelling_msdm_embed.R`**: Fit multi-species distribution model (MSDM) based on Deep Neural Network with species embeddings. - **`03_03_modelling_msdm_onehot.R`**: Fit multi-species distribution model (MSDM) based on Deep Neural Network with species identity as factor. - **`03_04_modelling_msdm_rf.R`**: Fit multi-species distribution model (MSDM) based on Random Forest with species identity as factor. ### 4. Analysis and Reporting - **`04_01_performance_report.qmd`**: Generate an interactive performance evaluation of implemented SDM algorithms. - **`04_02_publication_analysis.R`**: Explore results in depth, analyse ### Miscellaneous - **`utils.R`**: Contains utility functions used across multiple scripts. - **`_publish.yml`**: Configuration for publishing reports and analyses. ## Getting Started 1. Clone the repository and open the `Symobio_modeling.Rproj` file in RStudio. 2. Restore the project environment using `renv`: ```r renv::restore() ``` 3. Run the scripts in the R/ directory sequentially. Some scripts, especially for model fitting, may run a long time and benefit from powerful hardware. ## Additional Notes - Ensure that all required input data (e.g., range maps, raster files) is available in the expected directories. - Outputs from each script are typically saved to disk and used as inputs for subsequent scripts. - Refer to the README.md file for any additional project-specific instructions.