diff --git a/codereview-persefone-model-may-2024.md b/codereview-persefone-model-may-2024.md new file mode 100644 index 0000000000000000000000000000000000000000..170d857d1572fe275c2fcfcce6a9cd37ad965f18 --- /dev/null +++ b/codereview-persefone-model-may-2024.md @@ -0,0 +1,614 @@ +# persefone-model code review, May 2024 + +The code is well-structured into understandable units, both in terms +of directories and files, as well as functions of reasonable size. + +Every function has a docstring that is clearly written. Functions and +variable names are easy to understand. + +The documentation is clearly written and well-structured and the +website is nicely illustrated. + +The actual simulation code is behind two different macro systems: the +`Agents.jl` macro system as well as Persefone's built-in +domain-specific language (DSL). This makes the overall system hard to +understand, modify, and validate. If possible, a simpler design +without macros should be considered. + + +## General comments + +- use of `eval` inside simulation code: this seems to me both an + unnecessary reduction of performance as well as making binary + packaging harder, meaning a binary distribution of Persefone will + always have to + +- The domain-specific language (DSL) for defining species is complex + with nested macro calls and large amounts of generated code. I + found it takes quite a bit longer to understand what actually + happens under the hood. Using `@macroexpand` one can see the + generated code in the REPL and this is quite large for the skylark + model, and interspersed with comments from macro expansion. + - If possible, I would suggest using normal types and functions + instead of a macro-based DSL. + +- Some macros could be converted into functions, which would make the + code easier to understand. The macro form is currently necessary + for use inside the DSL, but the macro form also gets used outside of + the DSL. Examples: + - `@param` + - `@rand` + - `@shuffle!` + - `@chance` + +- Hardcoded magic constants in some functions, should be made global + constants or perhaps in `struct model`. Giving these constants + names makes the code clearer, and also makes it easier for users of + the package to refer to these numbers if necessary. It might also + be useful to give these constants physical units (see next point). + +- Use physical units (`Unitful.jl`) for clarity, documentation, and + catching accidental unit mismatch errors, both in Persefone as well + as for users of the package. Some examples: + - `LAItotal` + - `LAIgreen` + - `GDD` + - `conversionfactor # hectares to pixels` + +- Hardcoded pixel size of `10m * 10m = 100 m^2`, this should be a + global const with physical units or part of `struct model`. + +- Coding style + - some overuse of `&&` and ternary operator `cond ? a : b`, some + would probably be better as `if` statements + - sometimes no spaces between operators, e.g. `*` for string + concatenation, makes the code harder to read + - perhaps explicit `return` statements at the end of functions for + clarity, but this is a personal preference of mine that is not + universal in the Julia ecosystem + +- Minimum julia version should be 1.10 due to this bug opened by + Daniel https://github.com/carlobaldassi/ArgParse.jl/issues/121 which + was closed recently recommending upgrading to a newer julia version. + +- Docstrings for functions should state what meaning and type of the + return value is. + +- Possible workflows (Github) / pipelines (Gitlab) for automation + - CI: run unit tests after each commit and for each candidate merge + request + - Docs: build docs automatically after each commit + - CompatHelper: create pull request when dependencies have new + versions + - TagBot: tag releases to be incorporated in the Julia package + registry (once Persefone is released there) + - benchmark: automatically run benchmarks to catch performance + regressions, see also `PkgBenchmarks.jl` + +- Use `PrecompileTools.jl` for precompilation and also check for + invalidations etc. to reduce package loading time. + +- Replace loops in the style of `for i = 1:length(a)` with `for i in + eachindex(a)`, this makes the for loop independent of 1-based or + 0-based indexing etc. + +- `make test` / `test/runtests.jl`: no progress visible, hard to see + what is currently happening. I use the following in my packages: + ```julia + # this function goes in `test/runtests.jl` + # show which testset is currently running + showtestset() = println(" "^(2 * Test.get_testset_depth()), "testing ", + Test.get_testset().description) + + # call showtestset() in each @testset + @testset verbose=true "testset_name" begin + showtestset() + # tests here ... + end + ``` + +- It might make sense to use type-based dispatch (on the type of the + agent) instead of having code of this form: + ```julia + for a in allagents(model) + (typeof(a) != Animal) && continue + pops[a.name] += 1 + end + ``` + - A different alternative is to have an `allanimals` function. + + + +## General comments on testing + +### CI (Continuous integration), recommended + +https://en.wikipedia.org/wiki/Continuous_integration + +There doesn't seem to be a CI (Continuous integration) runner set up, +that runs the unit tests after every commit to master on a matrix of +OS and julia version combinations. + +This should also be activated in merge requests, so that nothing is +merged to the `master` branch that fails the unit tests. + +Julia packages on Github typically include a little badge at the top +of the README indicating the CI status in the form of "Run tests: +passing". + +### Aqua.jl tests, recommended + +https://github.com/JuliaTesting/Aqua.jl + +The package `Aqua.jl` runs some automatic quality assurance checks for +common problems in Julia codebases, for example (excerpt from the link +above): + +- There are no method ambiguities. +- There are no undefined exports. +- There are no unbound type parameters. +- There are no stale dependencies listed in Project.toml +- etc. + +Once the issues highlighted by the Aqua tests are fixed, the Aqua +tests should be run as part of the unit tests. + +Julia packages on Github usually have a little badge "tested with +Aqua.jl" at the top of the README. + +### Code coverage testing, recommended + +Code coverage when running unit tests should be recorded, as code that +is not tested during unit tests or manual testing might fail due to +even a simple typo. Many Julia packages on Github display a badge +showing the code coverage of their package when running the unit tests +as a percentage. This can be integrated into the workflows (Github) / +pipelines (Gitlab) and recorded and updated automatically. When +creating a merge request, the change in coverage can also be recorded, +alerting developers when they add significant amount of code without +corresponding tests. + +Some relevant packages: +- https://github.com/JuliaCI/Coverage.jl +- https://github.com/JuliaCI/LocalCoverage.jl + +External services: +- codecov.io +- coveralls.io + +### Optional: enforce consistent coding style + +If a consistent coding style is desired, a code formatter can be run +automatically to ensure any code committed to the repository follows +the coding style guidelines chosen. + +Coding style does not affect code correctness, and therefore is less +important. It also usually causes a large commit when changing the +code base to the formatting style, unless the coding style was used +from the beginning. + +Code formatting can be checked before merging a merge request, to +always ensure proper coding style. README badges often used on Github +Julia packages for coding styles as well. + +JuliaFormatter is the package usually used for code formatting: +https://github.com/domluna/JuliaFormatter.jl + +Some popular coding styles in the Julia ecosystem: +- https://github.com/SciML/SciMLStyle +- https://github.com/JuliaDiff/BlueStyle + +### Optional: JET.jl testing + +https://github.com/aviatesk/JET.jl + +JET.jl can be used to detect type instabilities which can cause slow +performance, as well as other bugs. + +This package depends on the internals of the Julia compiler, and +therefore results can change between Julia versions. + + + +## persefone-model code + +### `Project.toml` + +All packages listed under `[deps]` should have an entry in the +`[compat]` section (this would also be flagged by an `Aqua.jl` test). + +Version bounds on a compat entry of the form ">=5.6" can cause errors +in the future, as Julia uses semantic versioning (semver) throughout +the ecosystem to indicate API compatibility. A specifier of ">=5.6" +means that any version greater than "5.6.0" is acceptable, but future +major version releases "6.x.y", "7.x.y" will have API changes compared +to version "5.6.x", potentially causing the code to not work or +generate incorrect results. + +If a dependent package is known to work for multiple major versions, +these should be listed explicitly separated by commas. + +Example: the Agents.jl dependency works for versions "^5.6" and "^6", +so the compat section should read "5.6,6", indicating that the range +of accepted versions is "[5.6.0, 7.0.0)". + +See: +- https://pkgdocs.julialang.org/v1/compatibility/ +- https://semver.org/ + + +### `Persefone.jl` + +#### Line 16 + +Explicitly listing which functions are imported with `using` is +usually preferred. This makes it clearer from which package the +functions used come from. + +#### Lines 120-121 + +PrecompileTools.jl should be used so that calls to `precompile` only +run during precompilation, and not everytime when loading the package +via `using Persefone`. + + +### `analysis/makeplots.jl` + +It might make sense to decouple CairoMakie from the model code. + +See +https://git.idiv.de/persefone/persefone-model/-/issues/81 +for details. + +#### Line 15: function visualisemap + +Type annotations on the `date` and `landcover` arguments would make it +clearer which types can be passed in. Alternatively it could be stated +in the docstring. + + +### `core/input.jl` + +#### Line 12 + +The code assumes `/` is the path separator. Does this work on windows, +where it is `\`? Not sure if there is perhaps some on-the-fly +translation on Windows. + +Better: `joinpath(pkgdir(Persefone), "src", "parameters.toml")` + +#### Line 16-17 + +I think this comment can be removed. + +#### Lines 19-35: `@param` + +Why does this have to be a macro? Why not: + +```julia +param(model::AgentBasedModel, domain::AbstractString, par::AbstractString) = model.settings["$domain.$par"] +# or +param(model::AgentBasedModel, domainparam::AbstractString) = model.settings[domainparam] +``` + +#### Lines 118-149: `@rand`, `@shuffle!`, `@chance` + +Why are these macros? Why not: + +```julia +rand(model::AgentBasedModel, args...) = rand(model.rng, args...) +shuffle!(model::AgentBasedModel, collection) = shuffle!(model.rng, collection) +chance(odds) = rand(model.rng) < odds +# better nomenclature instead of 'odds' +chance(prob) = rand(model.rng) < prob +``` + +Note on nomenclature: odds is defined as the ratio `p / (1 - p)` where +`p` is a probability between 0 and 1. What is called `odds` here is a +probability. + +#### Line 180, in `function parsecommandline` + +Perhaps the minimum julia version for Persefone should be set to 1.10, +and the comment can be removed. According to +https://github.com/carlobaldassi/ArgParse.jl/issues/121 the fix is to +upgrade julia. + +#### Line 183, in `function parsecommandline` + +`(args[a] == nothing)`: use `isnothing`. + +#### Lines 188-213, in `function loadmodelobject` + +Why `@warn "..."` instead of `throw(ErrorException("..."))`? + +Otherwise code will continue to run even though there was an error. + + +### `core/output.jl` + +#### Lines 25-27, in `function createdatadir` + +Replace ternary operator with if-then-else and invert logic for +clarity. + +```julia +if overwrite + @warn "Overwriting existing output directory $(outdir)." +else + error("Output directory exists, will not overwrite. Aborting.") +end +``` + +Note: `error("...")` throws an `ErrorException`. Is the `#TODO replace +with exception` comment referring to use a more specific exception +type? + +#### Lines 40-45, in `function modellogger` + +Replace ternary logic with if-then-else. + +`error` instead of `Base.error`. Comment can be removed. + +#### Line 62, in `function withtestlogger` + +Prefer `isnothing(x)` to `x == nothing`. + +Can be written shorter: + +`model.logger = (isnothing(logstate) ? global_logger() : logstate.logger)` + +#### Lines 66-98, `function saveinputfiles` + +This function assumes that the current working directory is a git +repository. It also assumes that the current working directory is the +Persefone.jl git repository. I'm not sure that this makes sense. + +Line 93-95: replace `Base.error` with `error` which already throws an +exception. Not sure what the comment `#TODO replace errors with +exceptions` should mean. + +#### Lines 100-120, `function prepareTOML` + +Type annotation `AbstractDict` on `settings` argument? + +Lines 109-111: replace ternary logic with if-then-else. + +#### Line 154, in `function newdataoutput!` + +Replace `Base.error` with `error`, remove comment. + +#### Line 167, in `function newdataoutput!` + +Use of `Any[]` can mean slower performance. + +#### Line 241, in `function savemodelobject` + +Use `if` statement instead of short-circuit boolean for readability. + +### `core/simulation.jl` + +#### Lines 18-20, in `function simulate` + +Replace ternary operator with if-then-else. + +#### Line 32, in `function simulate!` + +Where is `dummystep` defined? The string `dummystep` only appears once +in the codebase in this line. + +#### Lines 50-52, in `function intialise` + +Replace ternary operator with if-then-else. + +#### Lines 130-151, `function stepsimulation!` + +Lines 136-143: it's not quite clear to me why Agents.jl would throw an +exception here, but the comments here say that this part of the code +will be changed in the next code reorganisation. + +Line 148: hardcoded simulation timestep, might be worthwile to have +`model.timestep`. + +### `crop/crops.jl` + +#### Lines 24-26, in `struct CropCurveParams` + +Perhaps it's worthwile for people like me who are not so well-versed +in the terminology of ALMaSS to add a reminder in the docstring what +`GDD`, `LAItotal`, `LAIgreen` stand for. + +Skimming the documentation in `docs/ALMaSSVegetationModelling.pdf`, +`LAI` seems to refer to "leaf-area index". + +Elsewhere in the code, `GDD` seems to refer to "growing degree days", +which seems to have units of "°C * days". + +#### Lines 55-60, in `function Base.tryparse` + +Replace long chain of ternary operators with if-then-else for clarity. + + +### `crop/farmplot.jl` + +#### Lines 32-60, `function initfields!` + +Line 32: `initfields!` sounds like it refers to the fields of a +struct, maybe a clearer name is `initfarms!`? + +Lines 48-49: the comment sounds unsure, is it correct? perhaps factor +this out into a small function? + +Line 51: hardcoded "natural grass", change to global constant. + +#### Lines 62-80, `function initfields!` + +A few boolean short-circuit `&&` statements could be converted to +if-then-else for clarity. + +Hardcoded default base temperature, change to global constant. + +#### Line 94, in `function sow!` + +Comment: `#XXX test if the crop is sowable?`, should this be done? + +#### Lines 104-106, in `function harvest!` + +Replace ternary operator with if-then-else. + +#### Lines 111-113 + +Seemingly missing functions: + +```julia +#TODO fertilise!() +#TODO spray!() +#TODO till!() +``` + +#### Lines 115-150, `function growcrop!` + +Line 126: replace `for p in 1:length(points)` with `for p in eachindex(points)`. + +Lines 127,129: make magical constants `99999` and `-1` global consts. + +Line 137: replace `p == length(points)` with `p == lastindex(points)`. + +Lines 136, 143-147: the comments here seem to indicate that the +implementation is not quite correct. + + +#### Lines 155-168, `function averagefieldsize` + +Line 161: hardcoded `conversionfactor` should be global const. Has +units of `u"m^2"`. + +Line 162: perhaps clearer as `sizes = Float64[]` + +Line 165: replace `size(a.pixels)[1]` with `length(a.pixels)` as it's +a `Vector`. + +Line 167: Replace `size(sizes)[1]` with `length(sizes)`. Why round to +two digits? Perhaps make a note in a comment why. + +#### Lines 170-178, `function croptype` + +Perhaps rewrite like this: +```julia +function croptype(pos::Tuple{Int64,Int64}, model::AgentBasedModel) + fid = model.landscape[pos...].fieldid + return ismissing(fid) ? nothing : model[fid].croptype +end +``` + +#### Lines 180-189, `function cropname` + +Rewrite in the same way as `croptype` (see previous comment for +`croptype`). + +#### Lines 191-200, `function cropheight` + +Rewrite in the same way as `croptype` (see comment for `croptype`). + + +### `farm/farm.jl` + +Code still missing. + + +### `nature/species/skylark.jl` + +#### Lines 6-214 + +Line 41: empty tuple? (immutable) + +Line 46: `clutch` has type `Vector{Any}`, if the IDs are all integers +`clutch = Int[]` is faster. + +Line 53-56: comments indicate some missing functionality. + +Lines 131, 133: magic constant `-1`. + +### `nature/species/wolpertinger.jl`, `nature/species/wyvern.jl` + +:-) + + +### `nature/ecologicaldata.jl` + +#### Lines 28-31, 53-56 + +```julia +for a in allagents(model) + (typeof(a) != Animal) && continue + # do something with a +end +``` + +Possible alternative: + + ```julia + for a in allanimals(model) + # do something with a + end + ``` + + +### `nature/energy.jl` + +#### Lines 9-28, in `struct DEBparameters` + +It might be helpful to move the comments that explain the meaning of +the struct's fields into the docstring. + +#### Lines 30-53, in `struct EnergyBudget` + +In case `M_E` etc. refer to parameters from the literature, it might +be helpful to move this information to the docstring. + + +#### Line 65, in `function volumentriclength` + +The comment expresses some uncertainty about the correctness of the +units. + +#### Line 199, in `function update!` + +Empty else clause, with comment indicating parts of the model are +currently not impemented? + + +### `nature/insects.jl` + +#### Lines 6-77, `function insectbiomass` + +Many magical constants in the function. + + +### `nature/macros.jl` + +A complex macro-based DSL. + + +### `nature/nature.jl` + +Currently still uses `Agents.jl`. + + +### `nature/populations.jl` + +#### Lines 7-82, `function initpopulation` + +This function is quite long and has a complex design. The function +both accepts an `initfunction` as well as returns a closure. Perhaps +there is a simpler way to do this. + +#### Lines 110, in `function reproduce!` + +Why is `@eval` necessary here? Using `@eval` will mean slower +performance, as well as requiring a binary distribution of Persefone +to always carry along the whole Julia compiler infrastructure. + +Perhaps there is a simpler way to do this. + +