# git and rdm ## (... and reproducibility) notes: - `make -B` to create images, then reload
## objectives - show how git relates to RDM - ... and reproducibility
## agenda 1. motivation 1. ~~teach ***how*** to use git/GitHub/GitLab~~ 1. use cases 1. anti-patterns 1. platforms 1. Q & A notes: - feel free to interrupt with immediate questions - more involved detailed discussion as part of Q & A
## about version control > records changes what who when (why) notes: - the **why** (aka context) is optional - you have to do this - you have to care about it
## about git ### best tool for the job - simple by design - powerful if needed - documentation / community - industry standard notes: - git name: "the stupid content tracker" - doc: finding answers with web searches
## about me - scientific computing support @ iDiv since 2014 - satisfied git user since 2010 > There will never be a better version control system than git. > > -- Christian Krause, 2017
## about you ![people](img/people.jpg) notes: - who has never used any VCS before? - who actively maintains a VCS repository?
# motivation > Why should I use version control? notes: - motivation chapter is about the concept of version control - git is just a tool to do it
## motivation #1 ### avoid mess ![blah](img/motivation-draft-mess.png) notes: - who has seen such a mess? - who has contributed to such a mess? - who has created such a mess?
## motivation #1 ### want structure ![blah](img/motivation-structure.svg) notes: - structure: who, when - why is hidden (only shows message header, not body) - ability to inspect old versions and their diff - ability to revert/undo a change
## motivation #2 ### throw-away playgrounds ![playground](img/motivation-throwaway-playground.svg) notes: - test stuff without interfering - throw away if garbage - integrate if good - switch back and forth without pain
## motivation #c ### collaboration made easy > This text cntains a typo. notes: - demo GitLab (change **target branch** to create merge request) - this is how you can do reviews of drafts - discuss this slide/chapter/section link
## motivation #c ![typo-pr](https://pbs.twimg.com/media/EDsklbLUEAMdusJ.png) notes: - might not seem like much - but you are still making the world a better place - and it is not too much effort
## motivation #a ### automation - ***quality*** - **code analysis** - **spell check** - **software testing** - enforce **style guide** - ***deployment*** (app store, web server) notes: - basically, everything you can script - refresh presentation before next slide

motivation #wars

1. view the ***history*** of changes 1. know ***why*** someone changed it 1. ***revert*** a bad change 1. maintain ***multiple versions*** 1. see the ***diff*** of two versions 1. find commit ***that broke*** something 1. have free ***backup*** 1. have ***non-interfering*** playgrounds 1. have ***automated*** testing 1. have ***automated*** deployment 1. ***contribute*** to a project 1. ***share*** your code 1. let other people do the work ***for you***
# use cases ##### for ### version control system (vcs) ##### aka ### source code management (scm)
## software - generic (as in parameterized) - (ideally) tested notes: - e.g. R package
## scripting ##### aka ### how to run `$software` - ... in `$environment` - digital lab notes - **reproducibility** !!1! - execution scalability notes: - separate software from scripting - `$enviroment`: multiple scripts/configurations for different environments: - EasyBuild, conda, singularity - RStudio server, HPC cluster - execution scalability: running software or script without having to change it - keep failed attempts in branches - e.g. highly parameterized software, make a record of bad parameter sets
# publishing ### (markdown (with some tex)) - paper / thesis / book - presentation - documentation - blog notes: - who is using markdown? - who is using tex? - who is using word? why? - markdown with tex - git.idiv.de/help - search for markdown - search for math - show example
## integration ![run `make` to generate the image](img/rdm-use-cases-a.svg) notes: - software A: tried and tested, reference to compare to - software B: your experimental better version
## integration ![run `make` to generate the image](img/rdm-use-cases-b.svg)
## integration ![run `make` to generate the image](img/rdm-use-cases.svg)
## integration ![run `make` to generate the image](img/rdm-use-case-data.svg)
# anti-patterns ![unlimited power](img/unlimited-power.jpg) notes: - does anyone know what an anti-pattern is?
# anti-patterns > An anti-pattern is a common response to a recurring problem that is usually ineffective and risks being highly counterproductive. notes: - most anti-patterns about how to use git - focus here is on these relating to RDM and reproducibility
# binary files ### (aka non-text) - no diff with binary - use textual representation - convert with automation notes: - can someone give me an example of text file? - markdown - source code, scripts (R, shell) - XSV - can someone give me an example of binary file? - compiled programs - MS word excel - PDF PS - don't just put everything in the repo, use *ignore
# data in git ## (scientific/big data) - version control for data is **DIFF**erent !!1! - git is VCS for text, not for data notes: - do you release data with every script change? no, too expensive - you don't ever put data files in a git repository - not even with LFS - big ball of mud, no metadata
# platforms - GitHub - https://github.com/idiv-biodiversity (cloud) - GitLab - https://gitlab.com (cloud) - https://git.idiv.de (self-hosted @ iDiv)
# platforms - enable collaboration - bug tracker / feature requests - documentation / wiki - project management tools - issue boards, milestones, gantt - trigger automation - publish/download releases
# Q & A notes: - walk through examples of software/script/publishing - walk through one of your use cases - demo consulting
# consulting [christian.krause@idiv.de](mailto:christian.krause@idiv.de)
### thanks for listening # EOF
# backup slides
edit