From 3b5a8f6b5e2cf0053b050ec259d613e20988a1eb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <chr.koenig@outlook.com>
Date: Fri, 21 Mar 2025 12:50:07 +0100
Subject: [PATCH] update readme

---
 README.md | 55 ++++++++++++++++++++++++++++---------------------------
 1 file changed, 28 insertions(+), 27 deletions(-)

diff --git a/README.md b/README.md
index a26b747..35f7174 100644
--- a/README.md
+++ b/README.md
@@ -1,51 +1,52 @@
 # Codebase Documentation
 
-This repository implements a species distribution modeling comparison study for about 600 South American mammal species. Specifically, the 
+This repository implements a species distribution modeling comparison study for about 600 South American mammal species. Specifically, the study compares different modeling approaches for predicting species distributions.
 
 ## Project Structure
 
 - **`R/`**: Contains all the R scripts organized by workflow steps.
+- **`renv/`**: Manages package dependencies for reproducibility.
 - **`Symobio_modeling.Rproj`**: RStudio project file for easy navigation.
 - **`README.md`**: High-level overview of the project.
-- **`renv/`**: Manages package dependencies for reproducibility.
+- **`occurrences.png`**: Visualization or reference image for occurrences data.
+- **`.Rprofile`**: Custom R environment settings.
 - **`renv.lock`**: Lockfile for `renv` to ensure consistent package versions.
 
 ## Workflow Overview
 
 The workflow is divided into several stages, each represented by scripts in the `R/` directory. Below is a summary of the key steps:
 
-### 1. Preparation of Geographic Data
-- **`01_01_range_map_preparation.R`**: Processes IUCN mammal range maps for South America, converting them to a standardized raster format with consistent projection for downstream analysis.
-- **`01_02_raster_preparation.R`**: Prepares environmental predictor rasters (e.g., climate, elevation, land cover) by cropping to South America extent, resampling to consistent resolution, and performing any necessary transformations.
-
-### 2. Preparation of complementary species-level data
+### 1. Data Preparation
+Pre-processing of species-specific and environmental information for model fitting and results analysis.
 
+- **`01_01_range_preparation.R`**: Process species range maps and calculate range dissimilarity.
+- **`01_02_traits_preparation.R`**: Prepare species trait data and calculate functional distances.
+- **`01_03_phylo_preparation.R`**: Process phylogenetic information and alculate phylogenetic distances.
+- **`01_04_raster_preparation.R`**: Prepare environmental raster layers for modeling for data extraction.
 
-- **`02_01_functional_group_assignment.R`**: Assigns mammal species to functional groups based on diet, locomotion, and body size characteristics, creating categorical variables for modeling.
-- **`02_02_functional_traits_preparation.R`**: Cleans and standardizes continuous trait data (body mass, diet breadth, etc.) for all study species, handling missing values through imputation where necessary.
-- **`02_03_phylo_preparation.R`**: Extracts phylogenetic information for target mammal species, computes phylogenetic distance matrices, and prepares the data for inclusion in models.
+### 2. Presence/Absence Data Processing
+Querying of presence data from Symobio DB, sampling of absence data and initial exploration of the dataset.
 
-### 3. Preparation of Presence/Absence Data
-- **`03_01_presence_preparation.R`**: Processes occurrence records from GBIF and other sources, applies spatial filtering to reduce sampling bias, and aligns taxonomic nomenclature.
-- **`03_02_absence_preparation.R`**: Generates pseudo-absence points using a stratified random approach, with constraints based on environmental conditions and range map boundaries.
-- **`03_03_dataset_exploration.R`**: Produces descriptive statistics and visualizations of presence/absence data, environmental variables, and species coverage to assess data quality.
-- **`03_04_model_data_finalization.R`**: Merges all prepared datasets (occurrences, absences, predictors) into final modeling datasets, splits data into training/testing sets, and applies any necessary scaling or transformations.
+- **`02_01_presence_data_preparation.R`**: Query species occurrence data from Symobio DB, extract environmental variables from raster files.
+- **`02_02_absence_data_preparation.R`**: Sample pseudo-absence points, extract environmental variables from raster files.
+- **`02_03_model_data_finalization.R`**: Create final dataset for modeling.
+- **`02_04_dataset_exploration.R`**: Explore and visualize the dataset.
 
-### 4. Modeling
-- **`04_01_modelling_ssdm.R`**: Implements traditional single-species distribution modeling (SSDM) approaches with selected algorithms (e.g., MaxEnt, random forests), including hyperparameter tuning.
-- **`04_02_modelling_msdm_embed.R`**: Develops multi-species distribution models using neural network approaches that embed species identities into a latent space, capturing inter-species relationships.
-- **`04_03_modelling_msdm_onehot.R`**: Implements multi-species distribution models using one-hot encoding for species identities, enabling joint prediction across all species simultaneously.
-- **`04_04_modelling_msdm_rf.R`**: Implements random forest-based multi-species distribution modeling, incorporating species identity as a predictor variable alongside environmental variables.
+### 3. Modeling
+Scripts for model fitting
 
-### 5. Analysis
-- **`05_01_performance_report.qmd`**: Generates comprehensive reports on model performance metrics (AUC, TSS, etc.) for all modeling approaches, with visualizations comparing performance across species and methods.
-- **`05_02_publication_analysis.qmd`**: Conducts advanced statistical analyses of model results, creates publication-quality figures, and summarizes findings for manuscript preparation.
+- **`03_01_modelling_ssdm.R`**: Fit single-species distribution models (SSDM) based on different algorithms.
+- **`03_02_modelling_msdm_embed.R`**: Fit multi-species distribution model (MSDM) based on Deep Neural Network with species embeddings.
+- **`03_03_modelling_msdm_onehot.R`**: Fit multi-species distribution model (MSDM) based on Deep Neural Network with species identity as factor.
+- **`03_04_modelling_msdm_rf.R`**: Fit multi-species distribution model (MSDM) based on Random Forest with species identity as factor.
 
-## Miscellaneous
-- **`utils.R`**: Contains utility functions used across multiple scripts, including data processing helpers, custom evaluation metrics, and visualization functions. 
+### 4. Analysis and Reporting
+- **`04_01_performance_report.qmd`**: Generate an interactive performance evaluation of implemented SDM algorithms.
+- **`04_02_publication_analysis.R`**: Explore results in depth, analyse 
 
 ### Miscellaneous
-- **`utils.R`**: 
+- **`utils.R`**: Contains utility functions used across multiple scripts.
+- **`_publish.yml`**: Configuration for publishing reports and analyses.
 
 ## Getting Started
 
@@ -59,4 +60,4 @@ The workflow is divided into several stages, each represented by scripts in the
 ## Additional Notes
 - Ensure that all required input data (e.g., range maps, raster files) is available in the expected directories. 
 - Outputs from each script are typically saved to disk and used as inputs for subsequent scripts.
-- Refer to the README.md file for any additional project-specific instructions.
+- Refer to the README.md file for any additional project-specific instructions.
\ No newline at end of file
-- 
GitLab