Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
git-seminar
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Iterations
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Locked files
Deploy
Releases
Model registry
Analyze
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Scientific Computing
education
git-seminar
Compare revisions
fix/typo to main
Compare revisions
Changes are shown as if the
source
revision was being merged into the
target
revision.
Learn more about comparing revisions.
Source
sc/edu/git-seminar
Select target project
No results found
main
Select Git revision
Swap
Target
mk21womu/git-seminar
Select target project
sc/edu/git-seminar
mk21womu/git-seminar
2 results
fix/typo
Select Git revision
Show changes
Only incoming changes from source
Include changes to target since source was created
Compare
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
rdm.md
+131
-0
131 additions, 0 deletions
rdm.md
rdm.yml
+37
-0
37 additions, 0 deletions
rdm.yml
reveal.js
+1
-1
1 addition, 1 deletion
reveal.js
with
169 additions
and
1 deletion
rdm.md
0 → 100644
View file @
1a7f9522
git for RDM and reproducibility
===============================
checklist
---------
-
**software**
(a generic tool to do
*something*
)
-
[ ] use separate git repo for software
-
[ ] tag versions for reproducibility
-
[ ] keep software as generic as possible
-
**scripts**
(
*how*
to use
*software*
)
-
[ ] use separate git repo for scripts
-
[ ] tag versions for reproducibility
-
[ ] software is configured here
-
[ ] reference used software tag
-
**data management**
-
[ ] publish dataset(s) to scientific data archive system
-
[ ] always attach proper metadata
-
[ ] get DOI for each version of the dataset(s) for reproducibility
-
[ ] reference used scripts tag
-
**publishing**
-
[ ] use separate git repo for paper/thesis/...
-
[ ] tag versions for draft/review/final
-
[ ] convert text/source to (binary) products
-
[ ] reference used scripts tag
-
[ ] reference used data DOI
-
**platforms**
(GitLab, GitHub)
-
[ ] use platforms (GitLab, GitHub) for collaboration
-
[ ] review commits / merge requests
-
[ ] utilize project management tools
-
[ ] utilize automation for testing and publishing
intro
-----
-
version control system (VCS) records changes (what, who, when, why)
-
use platforms (GitLab, GitHub) for collaboration
git use cases
-------------
### software
-
keep software as generic as possible
-
turn configuration/parameters into arguments, e.g.
`myapp --seed=42`
-
this avoids having to rewrite software for parameter changes
-
use software testing to verify software does what it's supposed to do
-
tag versions to enable
**reproducibility**
### scripting
-
separate scripting from software
-
software: generic
-
scripting: software called with specific configuration/arguments
-
scripting means
**how**
to run the software
-
i.e. here is where the parameters/arguments go
-
think of it as digital lab notes
-
this enables
**reproducibility**
-
specialized script variants for different environments, e.g.
-
laptop
-
RStudio / terminal server
-
HPC cluster
-
think about
*execution scalability*
, i.e. not having to change software and
scripting when you want to change parameters
-
keep failed attempts in branches to keep history of what you tried and why it
didn't work in commit message
### publishing
-
for paper, thesis, book, presentation, documentation, blog posts
-
use
*programming languages*
code/scripts for plots, flowcharts, etc.
-
write text/paragraphs in markup language (e.g. markdown)
-
use automation workflows to
-
generate plot/flowchart code to image files
-
convert text with pandoc to PDF/PS/HTML/ebup
-
use platforms for review process
## integration of use cases for reproducibility

anti patterns
-------------
> An anti-pattern is a common response to a recurring problem that is usually
> ineffective and risks being highly counterproductive.
-
most git anti-patterns are about
*how*
to use git
-
focus here is on these relating to RDM and reproducibility
### binary files
-
git as VCS only good for text files
-
markdown
-
source code, scripts
-
(small) CSV
-
binary files can't be diff'ed, e.g.
-
compiled programs
-
MS word, excel
-
PDF, PS
-
JPEG, PNG
-
use textual representation, e.g.
-
graphviz dot for flowcharts
-
R ggplot and CSV for plots
-
use automation to convert textual representation to e.g. images
-
use gitignore to never add binary products to the repo
### scientific data in git repos
-
data is often binary
-
git repo should be small, data blows it up, even if text
-
data has different release cycles than code
-
even git lfs (large file storage) is bad because still big ball of mud
-
scientific datasets need metadata!
-
use proper archive system for data
platforms
---------
-
enable collaboration
-
bug tracker / feature requests
-
documentation / wiki
-
project management tools
-
issue boards, milestones, gantt
-
trigger automation
-
publish/download releases
-
go to https://git.idiv.de log in and create new projects!
This diff is collapsed.
Click to expand it.
rdm.yml
0 → 100644
View file @
1a7f9522
---
# these are not shown in the document, they are just for metadata
title
:
git RDM reproducibility check list cheat sheet
author
:
Christian Krause
lang
:
en
keywords
:
-
git
-
RDM
-
resource data management
-
reproducibility
# highlighting increases readability
linkcolor
:
blue
# these LaTeX variables fit as much content on as few pages as possible
documentclass
:
scrartcl
pagestyle
:
empty
papersize
:
a4paper
geometry
:
-
a4paper
-
left=1cm
-
right=1cm
-
top=1cm
-
bottom=1cm
# you can also add "landscape" to geometry if you want more than 2 columns
# fiddle with these to increase readability
columns
:
2
fontsize
:
9pt
# this essentially disables justification, which can increase readability
ragged
:
yes
# color for header background
sectionbg
:
BurntOrange
subsectionbg
:
Apricot
...
This diff is collapsed.
Click to expand it.
reveal.js
@
a4b7f9df
Compare
65bdccd5
...
a4b7f9df
Subproject commit
65bdccd5807b6dfecad6eb3ea38872436d291e81
Subproject commit
a4b7f9dff7ef360afdb6d0cb53fd89063cbe0b66
This diff is collapsed.
Click to expand it.
Prev
1
2
Next