Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • fixes-typo-19
  • fixes-typo-2025-03-27
  • fixes-typo-2025-04-14
  • fixes-typo-2025u
  • fixes-typo-8
  • fixes-typo-9
  • krausec-main-patch-16624
  • lfs
  • main
  • wip/config-paste
  • wip/project-management
11 results

Target

Select target project
  • sc/edu/git-seminar
  • mk21womu/git-seminar
2 results
Select Git revision
  • fix/typo
  • lfs
  • master
  • wip/project-management
4 results
Show changes
digraph {
compound = true
node [shape = "box", style = "filled, rounded"]
subgraph cluster_software_a {
label = "software A"
node [color = orchid]
software_a_version_a[label = "v2.1.6"]
}
subgraph cluster_software_b {
label = "software B"
node [color = orchid]
software_b_version_a[label = "v0.2.0"]
}
subgraph cluster_script {
label = "scripts"
node [color = lightskyblue]
script_version_a[label = "nature-v1"]
}
subgraph cluster_paper {
label = "paper"
node [color = limegreen]
paper_version_a[label = "nature-review-1"]
}
script_version_a -> software_a_version_a [label = "\n\n"]
script_version_a -> software_b_version_a [label = "\n\n"]
paper_version_a -> script_version_a [label = "\n\n"]
}
digraph {
compound = true
node [shape = "box", style = "filled, rounded"]
subgraph cluster_software_a {
label = "software A"
node [color = orchid]
software_a_version_a[label = "v2.1.6"]
}
subgraph cluster_software_b {
label = "software B"
node [color = orchid]
software_b_version_b[label = "v0.3.4"]
software_b_version_a[label = "v0.2.0"]
}
subgraph cluster_script {
label = "scripts"
node [color = lightskyblue]
script_version_b[label = "nature-v2"]
script_version_a[label = "nature-v1"]
}
subgraph cluster_paper {
label = "paper"
node [color = limegreen]
paper_version_a[label = "nature-review-1"]
paper_version_b[label = "nature-review-2"]
}
script_version_a -> software_a_version_a [label = "\n\n"]
script_version_b -> software_a_version_a [label = "\n\n"]
script_version_a -> software_b_version_a [label = "\n\n"]
script_version_b -> software_b_version_b [label = "\n\n"]
paper_version_a -> script_version_a [label = "\n\n"]
paper_version_b -> script_version_b [label = "\n\n"]
}
digraph {
compound = true
node [shape = "box", style = "filled, rounded"]
subgraph cluster_software_a {
label = "software A"
node [color = orchid]
software_a_version_a[label = "v2.1.6"]
}
subgraph cluster_software_b {
label = "software B"
node [color = orchid]
software_b_version_b[label = "v0.3.4"]
software_b_version_a[label = "v0.2.0"]
}
subgraph cluster_script {
label = "scripts"
node [color = lightskyblue]
script_version_b[label = "nature-v2"]
script_version_a[label = "nature-v1"]
}
subgraph cluster_paper {
label = "paper"
node [color = limegreen]
paper_version_a[label = "nature-review-1"]
paper_version_b[label = "nature-review-2"]
paper_version_c[label = "nature-final"]
}
script_version_a -> software_a_version_a [label = "\n\n"]
script_version_b -> software_a_version_a [label = "\n\n"]
script_version_a -> software_b_version_a [label = "\n\n"]
script_version_b -> software_b_version_b [label = "\n\n"]
paper_version_a -> script_version_a [label = "\n\n"]
paper_version_b -> script_version_b [label = "\n\n"]
paper_version_c -> script_version_b [label = "\n\n"]
}
digraph {
node [shape = "box", style = "filled,rounded"]
workdir[label = "working directory\n(changes not staged for commit)", color = lightskyblue]
stage[label = "staging area\n(changes to be committed)", color = orchid]
repo[label = "repository\n(tracked content)", color = limegreen]
workdir -> stage [label = "git diff"]
stage -> repo [label = "git diff --staged"]
}
......@@ -10,10 +10,10 @@ digraph {
# untracked -> stage [label = "git add"]
# stage -> untracked [label = "git unstage"]
workdir -> stage [label = "git stage file"]
stage -> workdir [label = "git unstage file"]
workdir:sw -> stage:w [label = "git stage file"]
stage:e -> workdir:se [xlabel = "git unstage file"]
# workdir -> discard [label = "git checkout"]
# workdir -> discard [label = "git restore"]
# untracked -> discard [label = "rm"]
stage -> repo [label = "git commit"]
......
img/unlimited-power.jpg

54.9 KiB

Basics
======
VCS für mich
------------
### config
### init
### add
git add foo bar
### commit
git commit -m 'foo bar'
prima -
was hat sich nu verändert?
--------------------------
### log
--stat
-p
### diff
- was ändern
- git diff
- git commit -am
### repo / stage / working
- grafik raussuchen
- wofür nutzen
### add / reset / checkout
diff und diff --staged und diff HEAD
vcs mit backup
--------------
- github neues projekt --> projekteinbindung: clone / remote add origin foo
- push
- pull
## merge
mergetool linux = meld
mac = ???
windows = ???
Workflows
=========
branching workflows
-------------------
was sind branches und wofür brauch ich sie? diese frage sollte eindeutig in diesem kapitel erklärt
werden.
### nvie successful git branching model
### multi master
paper with many journals
same paper text - different theme file for page layout, graph layout, etc.
## tips for papers
git diff -w --ignore-blank-lines --word-diff
Your Projects
=============
##### STuff
- explain all collaboration workflows via github
-
Blöcke
======
each block is about 3 hours e.g. 9 - 12 including breaks
0. Intro
- Why git?
- git installation hands on
- show try git like tutorial on beamer (closed laptops)
- hands on - created project
- wrap up hands on
1. Git Commands
- working dir - index/stage - repo
2. GitLab / GitHub
- project maintanance
- issues and PRs
- "easier collaboration"
3. VCS Workflows
- how to collaborate in an organized manner
Subproject commit 2346a002289757ee612bfd4b56b57db53e0c751b
---
title: pandoc papers
subtitle: a future proof way to write
author:
- Christian Krause
...
intro
=====
## motivation #1
![](img/motivation-draft-mess-half-size.png)
## motivation #2
- **focus** on content/writing
- **communication**
- avoid emails with attachments, right?
- early and frequent reviews
- get rid of MS Word, Google Docs, Dropbox, etc.
## motivation #3
- use **version control** with all its benefits
- **automated**
- spell checking
- high quality typesetting
- publishing
- **future proof** way of doing things, meaning
- tools are open source
- tools are replaceable or optional
- all content is plain text
## agenda
- short presentation including demo
- learn tools and workflows
- establish community
tools
=====
## pandoc -- Swiss Army Knife of Text Conversion
### source: markup
- usually **Markdown**
- less powerful than \LaTeX, but inline \LaTeX\ can be used
### target**s**
- high quality PDF via \LaTeX
- others: HTML web page, EPUB, Office documents
### templates
- different journals
- iDiv branding
## Markdown
### source
```
- **easy to learn**, read and write
- ~~no~~ fewer distractions to *procrastinate*
- inline \LaTeX\ if needed \rightarrow\ $a^2 + b^2 = c^2$
```
### rendered
- **easy to learn**, read and write
- ~~no~~ fewer distractions to *procrastinate*
- inline \LaTeX\ if needed \rightarrow\ $a^2 + b^2 = c^2$
## git & GitLab
### git
- version control
- free backup
### GitLab: [https://git.idiv.de](https://git.idiv.de)
- self-hosted, full access control
- communication: discussion and reviews
- continuous integration i.e. spell checking
- continuous deployment to e.g. cloud storage
## visualization
### Graphviz/dot
- graph visualization language
- define flowcharts with code
- **dot** handles rendering to **SVG**
### plotting
- any *programming language* can be used
- add results as **CSV** to repository
- plot data to **SVG** e.g. using **R** with **ggplot2** and **svglite**
## build tool
### make
- glues everything together
- automates build
- render images from **dot** and **CSV** sources
- build PDF
- run spell check
workflow
========
## workflow #1
### [set up new project in GitLab](https://git.idiv.de/projects/new)
- use project template
- template is managed by our community
- contains Makefile, README, CI/CD, ~~themes~~, visualization examples, etc.
### writing
- start writing the main document in Markdown
- use text editor you're comfortable with
## workflow #2
### communication
- invite contributors / collaborators to GitLab project
- use GitLab project as main communication platform
- discuss and review in
- commit comments
- issues
- merge requests
## workflow #3
### continuous deployment
- some parts are done automatically with the project template
- deploy PDF to cloud storage:
[https://portal.idiv.de/nextcloud/](https://portal.idiv.de/nextcloud/)
- (uses **DRAFT** watermark ~~if commit not tagged~~)
## demo
(demo here only if all tools are already installed)
your turn
=========
## discussion
- Which tools do you currently use?
- Anyone know **pandoc-scholar**?
- Anyone know **authorea**?
- What are your workflows?
- How do you communicate, collaborate and review?
- What annoys you the most? (i.e. how can we improve)
## let's get started
### create new project
- https://git.idiv.de/projects/new
- **create from template** \rightarrow\ **instance** \rightarrow\ **pandoc**
- install tools from `README.md`
- set up Nextcloud share and CD
### editor: Atom
- **atom-latex** for bib syntax highlighting
- **graphviz-preview-plus** for dot syntax highlighting and preview
- **hard-wrap** for wrapping paragraphs
- **language-pfm** for pandoc flavored Markdown
- **language-r** for R
## issues
### prose diff tool
- hard wrap vs soft wrap
- these issues might help:
[gitlab-ce#25650](https://gitlab.com/gitlab-org/gitlab-ce/issues/25650),
[gitlab-ce#26804](https://gitlab.com/gitlab-org/gitlab-ce/issues/26804)
- [https://github.blog/2014-02-14-rendered-prose-diffs/](https://github.blog/2014-02-14-rendered-prose-diffs/)
### spell checking
- integrate editor spell checking with **mdspell**
- per project editor integration
- needs additional ignores, e.g. links, \LaTeX\ and citations
community
=========
## contributing
### [https://git.idiv.de/publishing](https://git.idiv.de/publishing)
- contribute to **project template** about build / workflow related issues
- e.g. if you need more packages for plotting
- contribute to **bibliography** about new references
- we may even import/sync an existing publication database
- contribute to **pandoc templates**
- had no need for them yet
- we might add some iDiv corporate identity (talk with PR)
- **review-able** template with link to join discussion at every paragraph
<!DOCTYPE html>
<html>
<head>
<title>git@rdm</title>
<link rel="stylesheet" href="reveal.js/dist/reveal.css">
<link rel="stylesheet" href="reveal.js/dist/theme/black.css">
<link rel="stylesheet" href="css/company-logo.css" />
<link rel="stylesheet" href="css/ribbon.css" />
<link rel="stylesheet" href="css/crawl.css" />
</head>
<body>
<div class="reveal">
<div class="slides">
<!---------------------------------------------------------------------
<!-- intro
<!-------------------------------------------------------------------->
<section>
<section id="title" data-markdown>
# git and rdm
## (... and reproducibility)
notes:
- `make -B` to create images, then reload
</section>
<section id="intro-objectives" data-markdown>
## objectives
- show how git relates to RDM
- ... and reproducibility
</section>
<section id="intro-agenda" data-markdown>
## agenda
1. motivation
1. ~~teach ***how*** to use git/GitHub/GitLab~~
1. use cases
1. anti-patterns
1. platforms
1. Q & A
notes:
- feel free to interrupt with immediate questions
- more involved detailed discussion as part of Q & A
</section>
<section id="intro-version-control" data-markdown>
## about version control
> records changes
what who when (why)
notes:
- the **why** (aka context) is optional
- you have to do this
- you have to care about it
</section>
<section id="intro-about-git" data-markdown>
## about git
### best tool for the job
- simple by design
- powerful if needed
- documentation / community
- industry standard
notes:
- git name: "the stupid content tracker"
- doc: finding answers with web searches
</section>
<section id="intro-about-me" data-markdown>
## about me
- scientific computing support @ iDiv since 2014
- satisfied git user since 2010
> There will never be a better version control system than git.
>
> -- Christian Krause, 2017
</section>
<section id="intro-about-you" data-markdown>
## about you
![people](img/people.jpg)
notes:
- who has never used any VCS before?
- who actively maintains a VCS repository?
</section>
</section>
<!---------------------------------------------------------------------
<!-- motivation
<!-------------------------------------------------------------------->
<section>
<section id="motivation" data-markdown>
# motivation
> Why should I use version control?
notes:
- motivation chapter is about the concept of version control
- git is just a tool to do it
</section>
<section id="motivation-avoid-mess" data-markdown>
## motivation #1
### avoid mess
![blah](img/motivation-draft-mess.png)
notes:
- who has seen such a mess?
- who has contributed to such a mess?
- who has created such a mess?
</section>
<section id="motivation-want-structure" data-markdown>
## motivation #1
### want structure
![blah](img/motivation-structure.svg)
notes:
- structure: who, when
- why is hidden (only shows message header, not body)
- ability to inspect old versions and their diff
- ability to revert/undo a change
</section>
<section id="motivation-playground" data-markdown>
## motivation #2
### throw-away playgrounds
![playground](img/motivation-throwaway-playground.svg)
notes:
- test stuff without interfering
- throw away if garbage
- integrate if good
- switch back and forth without pain
</section>
<section id="motivation-collaboration-1" data-markdown>
## motivation #c
### collaboration made easy
<!-- do not fix this typo, it is here on purpose to show collab -->
> This text cntains a typo.
notes:
- demo GitLab (change **target branch** to create merge request)
- this is how you can do reviews of drafts
- discuss this slide/chapter/section link
</section>
<section id="motivation-collaboration-2" data-markdown>
## motivation #c
![typo-pr](https://pbs.twimg.com/media/EDsklbLUEAMdusJ.png)
notes:
- might not seem like much
- but you are still making the world a better place
- and it is not too much effort
</section>
<section id="motivation-automation" data-markdown>
## motivation #a
### automation
- ***quality***
- **code analysis**
- **spell check**
- **software testing**
- enforce **style guide**
- ***deployment*** (app store, web server)
notes:
- basically, everything you can script
- refresh presentation before next slide
</section>
<section class="star-wars" id="motivation-wars">
<div class="crawl">
<div class="title">
<h1>motivation #wars</h1>
</div>
<div data-markdown>
1. view the ***history*** of changes
1. know ***why*** someone changed it
1. ***revert*** a bad change
1. maintain ***multiple versions***
1. see the ***diff*** of two versions
1. find commit ***that broke*** something
1. have free ***backup***
1. have ***non-interfering*** playgrounds
1. have ***automated*** testing
1. have ***automated*** deployment
1. ***contribute*** to a project
1. ***share*** your code
1. let other people do the work ***for you***
</div>
<img src="http://i.giphy.com/90F8aUepslB84.gif" />
</div>
</section>
</section>
<!---------------------------------------------------------------------
<!-- use cases
<!-------------------------------------------------------------------->
<section>
<section id="use-cases" data-markdown>
# use cases
##### for
### version control system (vcs)
##### aka
### source code management (scm)
</section>
<section id="use-case-software" data-markdown>
## software
- generic (as in parameterized)
- (ideally) tested
notes:
- e.g. R package
</section>
<section id="use-case-scripting" data-markdown>
## scripting
##### aka
### how to run `$software`
- ... in `$environment`
- digital lab notes
- **reproducibility** !!1!
- execution scalability
notes:
- separate software from scripting
- `$enviroment`: multiple scripts/configurations for different
environments:
- EasyBuild, conda, singularity
- RStudio server, HPC cluster
- execution scalability: running software or script without having
to change it
- keep failed attempts in branches
- e.g. highly parameterized software, make a record of bad
parameter sets
</section>
<section id="use-case-publishing" data-markdown>
# publishing
### (markdown (with some tex))
- paper / thesis / book
- presentation
- documentation
- blog
notes:
- who is using markdown?
- who is using tex?
- who is using word? why?
- markdown with tex
- git.idiv.de/help
- search for markdown
- search for math
- show example
</section>
<section id="use-case-integrated-1" data-markdown>
## integration
![run `make` to generate the image](img/rdm-use-cases-a.svg)
notes:
- software A: tried and tested, reference to compare to
- software B: your experimental better version
</section>
<section id="use-case-integrated-2" data-markdown>
## integration
![run `make` to generate the image](img/rdm-use-cases-b.svg)
</section>
<section id="use-case-integrated-3" data-markdown>
## integration
![run `make` to generate the image](img/rdm-use-cases.svg)
</section>
<section id="use-case-integrated-data" data-markdown>
## integration
![run `make` to generate the image](img/rdm-use-case-data.svg)
</section>
</section>
<!---------------------------------------------------------------------
<!-- anti patterns
<!-------------------------------------------------------------------->
<section>
<section id="anti-patterns" data-markdown>
# anti-patterns
![unlimited power](img/unlimited-power.jpg)
notes:
- does anyone know what an anti-pattern is?
</section>
<section id="anti-patterns-def" data-markdown>
# anti-patterns
> An anti-pattern is a common response to a recurring problem that
is usually ineffective and risks being highly counterproductive.
notes:
- most anti-patterns about how to use git
- focus here is on these relating to RDM and reproducibility
</section>
<section id="anti-pattern-binary" data-markdown>
# binary files
### (aka non-text)
- no diff with binary
- use textual representation
- convert with automation
notes:
- can someone give me an example of text file?
- markdown
- source code, scripts (R, shell)
- XSV
- can someone give me an example of binary file?
- compiled programs
- MS word excel
- PDF PS
- don't just put everything in the repo, use *ignore
</section>
<section id="anti-pattern-big-data" data-markdown>
# data in git
## (scientific/big data)
- version control for data is **DIFF**erent !!1!
- git is VCS for text, not for data
notes:
- do you release data with every script change? no, too expensive
- you don't ever put data files in a git repository
- not even with LFS
- big ball of mud, no metadata
</section>
</section>
<!---------------------------------------------------------------------
<!-- platforms
<!-------------------------------------------------------------------->
<section>
<section id="platforms-list" data-markdown>
# platforms
- GitHub
- https://github.com/idiv-biodiversity (cloud)
- GitLab
- https://gitlab.com (cloud)
- https://git.idiv.de (self-hosted @ iDiv)
</section>
<section id="platforms-purpose" data-markdown>
# platforms
- enable collaboration
- bug tracker / feature requests
- documentation / wiki
- project management tools
- issue boards, milestones, gantt
- trigger automation
- publish/download releases
</section>
</section>
<!---------------------------------------------------------------------
<!-- q & a
<!-------------------------------------------------------------------->
<section>
<section id="q-n-a" data-markdown>
# Q & A
notes:
- walk through examples of software/script/publishing
- walk through one of your use cases
- demo consulting
</section>
<section id="consulting" data-markdown>
# consulting
[christian.krause@idiv.de](mailto:christian.krause@idiv.de)
</section>
</section>
<!---------------------------------------------------------------------
<!-- eof
<!-------------------------------------------------------------------->
<section id="eof" data-background="img/trex.png" data-markdown>
### thanks for listening
# EOF
</section>
<!---------------------------------------------------------------------
<!-- backup
<!-------------------------------------------------------------------->
<section>
<section id="backup" data-markdown>
# backup slides
</section>
<section id="empty" data-markdown>
</section>
</section>
</div>
</div>
<!-------------------------------------------------------------------------
<!-- css/js
<!------------------------------------------------------------------------>
<!-- company logo -->
<div class="logo-wrapper">
<a href="https://www.idiv.de/">
<img src="img/company-logo-small.png" />
</a>
</div>
<!-- ribbon -->
<div class="ribbon-wrapper right">
<div class="ribbon">
<a href="https://git.idiv.de/sc/edu/git-seminar/edit/main/rdm.html"
target="_blank">
edit
</a>
</div>
</div>
<!-- reveal.stuff -->
<script src="reveal.js/dist/reveal.js"></script>
<script src="reveal.js/plugin/highlight/highlight.js"></script>
<script src="reveal.js/plugin/markdown/markdown.js"></script>
<script src="reveal.js/plugin/notes/notes.js"></script>
<script>
Reveal.initialize({
hash: true,
plugins: [RevealHighlight, RevealMarkdown, RevealNotes]
});
</script>
</body>
</html>
git for RDM and reproducibility
===============================
checklist
---------
- **software** (a generic tool to do *something*)
- [ ] use separate git repo for software
- [ ] tag versions for reproducibility
- [ ] keep software as generic as possible
- **scripts** (*how* to use *software*)
- [ ] use separate git repo for scripts
- [ ] tag versions for reproducibility
- [ ] software is configured here
- [ ] reference used software tag
- **data management**
- [ ] publish dataset(s) to scientific data archive system
- [ ] always attach proper metadata
- [ ] get DOI for each version of the dataset(s) for reproducibility
- [ ] reference used scripts tag
- **publishing**
- [ ] use separate git repo for paper/thesis/...
- [ ] tag versions for draft/review/final
- [ ] convert text/source to (binary) products
- [ ] reference used scripts tag
- [ ] reference used data DOI
- **platforms** (GitLab, GitHub)
- [ ] use platforms (GitLab, GitHub) for collaboration
- [ ] review commits / merge requests
- [ ] utilize project management tools
- [ ] utilize automation for testing and publishing
intro
-----
- version control system (VCS) records changes (what, who, when, why)
- use platforms (GitLab, GitHub) for collaboration
git use cases
-------------
### software
- keep software as generic as possible
- turn configuration/parameters into arguments, e.g. `myapp --seed=42`
- this avoids having to rewrite software for parameter changes
- use software testing to verify software does what it's supposed to do
- tag versions to enable **reproducibility**
### scripting
- separate scripting from software
- software: generic
- scripting: software called with specific configuration/arguments
- scripting means **how** to run the software
- i.e. here is where the parameters/arguments go
- think of it as digital lab notes
- this enables **reproducibility**
- specialized script variants for different environments, e.g.
- laptop
- RStudio / terminal server
- HPC cluster
- think about *execution scalability*, i.e. not having to change software and
scripting when you want to change parameters
- keep failed attempts in branches to keep history of what you tried and why it
didn't work in commit message
### publishing
- for paper, thesis, book, presentation, documentation, blog posts
- use *programming languages* code/scripts for plots, flowcharts, etc.
- write text/paragraphs in markup language (e.g. markdown)
- use automation workflows to
- generate plot/flowchart code to image files
- convert text with pandoc to PDF/PS/HTML/ebup
- use platforms for review process
## integration of use cases for reproducibility
![](img/rdm-use-case-merged.svg)
anti patterns
-------------
> An anti-pattern is a common response to a recurring problem that is usually
> ineffective and risks being highly counterproductive.
- most git anti-patterns are about *how* to use git
- focus here is on these relating to RDM and reproducibility
### binary files
- git as VCS only good for text files
- markdown
- source code, scripts
- (small) CSV
- binary files can't be diff'ed, e.g.
- compiled programs
- MS word, excel
- PDF, PS
- JPEG, PNG
- use textual representation, e.g.
- graphviz dot for flowcharts
- R ggplot and CSV for plots
- use automation to convert textual representation to e.g. images
- use gitignore to never add binary products to the repo
### scientific data in git repos
- data is often binary
- git repo should be small, data blows it up, even if text
- data has different release cycles than code
- even git lfs (large file storage) is bad because still big ball of mud
- scientific datasets need metadata!
- use proper archive system for data
platforms
---------
- enable collaboration
- bug tracker / feature requests
- documentation / wiki
- project management tools
- issue boards, milestones, gantt
- trigger automation
- publish/download releases
- go to https://git.idiv.de log in and create new projects!
---
# these are not shown in the document, they are just for metadata
title: git RDM reproducibility check list cheat sheet
author: Christian Krause
lang: en
keywords:
- git
- RDM
- resource data management
- reproducibility
# highlighting increases readability
linkcolor: blue
# these LaTeX variables fit as much content on as few pages as possible
documentclass: scrartcl
pagestyle: empty
papersize: a4paper
geometry:
- a4paper
- left=1cm
- right=1cm
- top=1cm
- bottom=1cm
# you can also add "landscape" to geometry if you want more than 2 columns
# fiddle with these to increase readability
columns: 2
fontsize: 9pt
# this essentially disables justification, which can increase readability
ragged: yes
# color for header background
sectionbg: BurntOrange
subsectionbg: Apricot
...
Subproject commit 65bdccd5807b6dfecad6eb3ea38872436d291e81
Subproject commit a4b7f9dff7ef360afdb6d0cb53fd89063cbe0b66