# persefone-model code review, May 2024 The code is well-structured into understandable units, both in terms of directories and files, as well as functions of reasonable size. Every function has a docstring that is clearly written. Functions and variable names are easy to understand. The documentation is clearly written and well-structured and the website is nicely illustrated. The actual simulation code is behind two different macro systems: the `Agents.jl` macro system as well as Persefone's built-in domain-specific language (DSL). This makes the overall system hard to understand, modify, and validate. If possible, a simpler design without macros should be considered. ## General comments - use of `eval` inside simulation code: this seems to me both an unnecessary reduction of performance as well as making binary packaging harder, meaning a binary distribution of Persefone will always have to - The domain-specific language (DSL) for defining species is complex with nested macro calls and large amounts of generated code. I found it takes quite a bit longer to understand what actually happens under the hood. Using `@macroexpand` one can see the generated code in the REPL and this is quite large for the skylark model, and interspersed with comments from macro expansion. - If possible, I would suggest using normal types and functions instead of a macro-based DSL. - Some macros could be converted into functions, which would make the code easier to understand. The macro form is currently necessary for use inside the DSL, but the macro form also gets used outside of the DSL. Examples: - `@param` - `@rand` - `@shuffle!` - `@chance` - Hardcoded magic constants in some functions, should be made global constants or perhaps in `struct model`. Giving these constants names makes the code clearer, and also makes it easier for users of the package to refer to these numbers if necessary. It might also be useful to give these constants physical units (see next point). - Use physical units (`Unitful.jl`) for clarity, documentation, and catching accidental unit mismatch errors, both in Persefone as well as for users of the package. Some examples: - `LAItotal` - `LAIgreen` - `GDD` - `conversionfactor # hectares to pixels` - Hardcoded pixel size of `10m * 10m = 100 m^2`, this should be a global const with physical units or part of `struct model`. - Coding style - some overuse of `&&` and ternary operator `cond ? a : b`, some would probably be better as `if` statements - sometimes no spaces between operators, e.g. `*` for string concatenation, makes the code harder to read - perhaps explicit `return` statements at the end of functions for clarity, but this is a personal preference of mine that is not universal in the Julia ecosystem - Minimum julia version should be 1.10 due to this bug opened by Daniel https://github.com/carlobaldassi/ArgParse.jl/issues/121 which was closed recently recommending upgrading to a newer julia version. - Docstrings for functions should state what meaning and type of the return value is. - Possible workflows (Github) / pipelines (Gitlab) for automation - CI: run unit tests after each commit and for each candidate merge request - Docs: build docs automatically after each commit - CompatHelper: create pull request when dependencies have new versions - TagBot: tag releases to be incorporated in the Julia package registry (once Persefone is released there) - benchmark: automatically run benchmarks to catch performance regressions, see also `PkgBenchmarks.jl` - Use `PrecompileTools.jl` for precompilation and also check for invalidations etc. to reduce package loading time. - Replace loops in the style of `for i = 1:length(a)` with `for i in eachindex(a)`, this makes the for loop independent of 1-based or 0-based indexing etc. - `make test` / `test/runtests.jl`: no progress visible, hard to see what is currently happening. I use the following in my packages: ```julia # this function goes in `test/runtests.jl` # show which testset is currently running showtestset() = println(" "^(2 * Test.get_testset_depth()), "testing ", Test.get_testset().description) # call showtestset() in each @testset @testset verbose=true "testset_name" begin showtestset() # tests here ... end ``` - It might make sense to use type-based dispatch (on the type of the agent) instead of having code of this form: ```julia for a in allagents(model) (typeof(a) != Animal) && continue pops[a.name] += 1 end ``` - A different alternative is to have an `allanimals` function. ## General comments on testing ### CI (Continuous integration), recommended https://en.wikipedia.org/wiki/Continuous_integration There doesn't seem to be a CI (Continuous integration) runner set up, that runs the unit tests after every commit to master on a matrix of OS and julia version combinations. This should also be activated in merge requests, so that nothing is merged to the `master` branch that fails the unit tests. Julia packages on Github typically include a little badge at the top of the README indicating the CI status in the form of "Run tests: passing". ### Aqua.jl tests, recommended https://github.com/JuliaTesting/Aqua.jl The package `Aqua.jl` runs some automatic quality assurance checks for common problems in Julia codebases, for example (excerpt from the link above): - There are no method ambiguities. - There are no undefined exports. - There are no unbound type parameters. - There are no stale dependencies listed in Project.toml - etc. Once the issues highlighted by the Aqua tests are fixed, the Aqua tests should be run as part of the unit tests. Julia packages on Github usually have a little badge "tested with Aqua.jl" at the top of the README. ### Code coverage testing, recommended Code coverage when running unit tests should be recorded, as code that is not tested during unit tests or manual testing might fail due to even a simple typo. Many Julia packages on Github display a badge showing the code coverage of their package when running the unit tests as a percentage. This can be integrated into the workflows (Github) / pipelines (Gitlab) and recorded and updated automatically. When creating a merge request, the change in coverage can also be recorded, alerting developers when they add significant amount of code without corresponding tests. Some relevant packages: - https://github.com/JuliaCI/Coverage.jl - https://github.com/JuliaCI/LocalCoverage.jl External services: - codecov.io - coveralls.io ### Optional: enforce consistent coding style If a consistent coding style is desired, a code formatter can be run automatically to ensure any code committed to the repository follows the coding style guidelines chosen. Coding style does not affect code correctness, and therefore is less important. It also usually causes a large commit when changing the code base to the formatting style, unless the coding style was used from the beginning. Code formatting can be checked before merging a merge request, to always ensure proper coding style. README badges often used on Github Julia packages for coding styles as well. JuliaFormatter is the package usually used for code formatting: https://github.com/domluna/JuliaFormatter.jl Some popular coding styles in the Julia ecosystem: - https://github.com/SciML/SciMLStyle - https://github.com/JuliaDiff/BlueStyle ### Optional: JET.jl testing https://github.com/aviatesk/JET.jl JET.jl can be used to detect type instabilities which can cause slow performance, as well as other bugs. This package depends on the internals of the Julia compiler, and therefore results can change between Julia versions. ## persefone-model code ### `Project.toml` All packages listed under `[deps]` should have an entry in the `[compat]` section (this would also be flagged by an `Aqua.jl` test). Version bounds on a compat entry of the form ">=5.6" can cause errors in the future, as Julia uses semantic versioning (semver) throughout the ecosystem to indicate API compatibility. A specifier of ">=5.6" means that any version greater than "5.6.0" is acceptable, but future major version releases "6.x.y", "7.x.y" will have API changes compared to version "5.6.x", potentially causing the code to not work or generate incorrect results. If a dependent package is known to work for multiple major versions, these should be listed explicitly separated by commas. Example: the Agents.jl dependency works for versions "^5.6" and "^6", so the compat section should read "5.6,6", indicating that the range of accepted versions is "[5.6.0, 7.0.0)". See: - https://pkgdocs.julialang.org/v1/compatibility/ - https://semver.org/ ### `Persefone.jl` #### Line 16 Explicitly listing which functions are imported with `using` is usually preferred. This makes it clearer from which package the functions used come from. #### Lines 120-121 PrecompileTools.jl should be used so that calls to `precompile` only run during precompilation, and not everytime when loading the package via `using Persefone`. ### `analysis/makeplots.jl` It might make sense to decouple CairoMakie from the model code. See https://git.idiv.de/persefone/persefone-model/-/issues/81 for details. #### Line 15: function visualisemap Type annotations on the `date` and `landcover` arguments would make it clearer which types can be passed in. Alternatively it could be stated in the docstring. ### `core/input.jl` #### Line 12 The code assumes `/` is the path separator. Does this work on windows, where it is `\`? Not sure if there is perhaps some on-the-fly translation on Windows. Better: `joinpath(pkgdir(Persefone), "src", "parameters.toml")` #### Line 16-17 I think this comment can be removed. #### Lines 19-35: `@param` Why does this have to be a macro? Why not: ```julia param(model::AgentBasedModel, domain::AbstractString, par::AbstractString) = model.settings["$domain.$par"] # or param(model::AgentBasedModel, domainparam::AbstractString) = model.settings[domainparam] ``` #### Lines 118-149: `@rand`, `@shuffle!`, `@chance` Why are these macros? Why not: ```julia rand(model::AgentBasedModel, args...) = rand(model.rng, args...) shuffle!(model::AgentBasedModel, collection) = shuffle!(model.rng, collection) chance(odds) = rand(model.rng) < odds # better nomenclature instead of 'odds' chance(prob) = rand(model.rng) < prob ``` Note on nomenclature: odds is defined as the ratio `p / (1 - p)` where `p` is a probability between 0 and 1. What is called `odds` here is a probability. #### Line 180, in `function parsecommandline` Perhaps the minimum julia version for Persefone should be set to 1.10, and the comment can be removed. According to https://github.com/carlobaldassi/ArgParse.jl/issues/121 the fix is to upgrade julia. #### Line 183, in `function parsecommandline` `(args[a] == nothing)`: use `isnothing`. #### Lines 188-213, in `function loadmodelobject` Why `@warn "..."` instead of `throw(ErrorException("..."))`? Otherwise code will continue to run even though there was an error. ### `core/output.jl` #### Lines 25-27, in `function createdatadir` Replace ternary operator with if-then-else and invert logic for clarity. ```julia if overwrite @warn "Overwriting existing output directory $(outdir)." else error("Output directory exists, will not overwrite. Aborting.") end ``` Note: `error("...")` throws an `ErrorException`. Is the `#TODO replace with exception` comment referring to use a more specific exception type? #### Lines 40-45, in `function modellogger` Replace ternary logic with if-then-else. `error` instead of `Base.error`. Comment can be removed. #### Line 62, in `function withtestlogger` Prefer `isnothing(x)` to `x == nothing`. Can be written shorter: `model.logger = (isnothing(logstate) ? global_logger() : logstate.logger)` #### Lines 66-98, `function saveinputfiles` This function assumes that the current working directory is a git repository. It also assumes that the current working directory is the Persefone.jl git repository. I'm not sure that this makes sense. Line 93-95: replace `Base.error` with `error` which already throws an exception. Not sure what the comment `#TODO replace errors with exceptions` should mean. #### Lines 100-120, `function prepareTOML` Type annotation `AbstractDict` on `settings` argument? Lines 109-111: replace ternary logic with if-then-else. #### Line 154, in `function newdataoutput!` Replace `Base.error` with `error`, remove comment. #### Line 167, in `function newdataoutput!` Use of `Any[]` can mean slower performance. #### Line 241, in `function savemodelobject` Use `if` statement instead of short-circuit boolean for readability. ### `core/simulation.jl` #### Lines 18-20, in `function simulate` Replace ternary operator with if-then-else. #### Line 32, in `function simulate!` Where is `dummystep` defined? The string `dummystep` only appears once in the codebase in this line. #### Lines 50-52, in `function intialise` Replace ternary operator with if-then-else. #### Lines 130-151, `function stepsimulation!` Lines 136-143: it's not quite clear to me why Agents.jl would throw an exception here, but the comments here say that this part of the code will be changed in the next code reorganisation. Line 148: hardcoded simulation timestep, might be worthwile to have `model.timestep`. ### `crop/crops.jl` #### Lines 24-26, in `struct CropCurveParams` Perhaps it's worthwile for people like me who are not so well-versed in the terminology of ALMaSS to add a reminder in the docstring what `GDD`, `LAItotal`, `LAIgreen` stand for. Skimming the documentation in `docs/ALMaSSVegetationModelling.pdf`, `LAI` seems to refer to "leaf-area index". Elsewhere in the code, `GDD` seems to refer to "growing degree days", which seems to have units of "°C * days". #### Lines 55-60, in `function Base.tryparse` Replace long chain of ternary operators with if-then-else for clarity. ### `crop/farmplot.jl` #### Lines 32-60, `function initfields!` Line 32: `initfields!` sounds like it refers to the fields of a struct, maybe a clearer name is `initfarms!`? Lines 48-49: the comment sounds unsure, is it correct? perhaps factor this out into a small function? Line 51: hardcoded "natural grass", change to global constant. #### Lines 62-80, `function initfields!` A few boolean short-circuit `&&` statements could be converted to if-then-else for clarity. Hardcoded default base temperature, change to global constant. #### Line 94, in `function sow!` Comment: `#XXX test if the crop is sowable?`, should this be done? #### Lines 104-106, in `function harvest!` Replace ternary operator with if-then-else. #### Lines 111-113 Seemingly missing functions: ```julia #TODO fertilise!() #TODO spray!() #TODO till!() ``` #### Lines 115-150, `function growcrop!` Line 126: replace `for p in 1:length(points)` with `for p in eachindex(points)`. Lines 127,129: make magical constants `99999` and `-1` global consts. Line 137: replace `p == length(points)` with `p == lastindex(points)`. Lines 136, 143-147: the comments here seem to indicate that the implementation is not quite correct. #### Lines 155-168, `function averagefieldsize` Line 161: hardcoded `conversionfactor` should be global const. Has units of `u"m^2"`. Line 162: perhaps clearer as `sizes = Float64[]` Line 165: replace `size(a.pixels)[1]` with `length(a.pixels)` as it's a `Vector`. Line 167: Replace `size(sizes)[1]` with `length(sizes)`. Why round to two digits? Perhaps make a note in a comment why. #### Lines 170-178, `function croptype` Perhaps rewrite like this: ```julia function croptype(pos::Tuple{Int64,Int64}, model::AgentBasedModel) fid = model.landscape[pos...].fieldid return ismissing(fid) ? nothing : model[fid].croptype end ``` #### Lines 180-189, `function cropname` Rewrite in the same way as `croptype` (see previous comment for `croptype`). #### Lines 191-200, `function cropheight` Rewrite in the same way as `croptype` (see comment for `croptype`). ### `farm/farm.jl` Code still missing. ### `nature/species/skylark.jl` #### Lines 6-214 Line 41: empty tuple? (immutable) Line 46: `clutch` has type `Vector{Any}`, if the IDs are all integers `clutch = Int[]` is faster. Line 53-56: comments indicate some missing functionality. Lines 131, 133: magic constant `-1`. ### `nature/species/wolpertinger.jl`, `nature/species/wyvern.jl` :-) ### `nature/ecologicaldata.jl` #### Lines 28-31, 53-56 ```julia for a in allagents(model) (typeof(a) != Animal) && continue # do something with a end ``` Possible alternative: ```julia for a in allanimals(model) # do something with a end ``` ### `nature/energy.jl` #### Lines 9-28, in `struct DEBparameters` It might be helpful to move the comments that explain the meaning of the struct's fields into the docstring. #### Lines 30-53, in `struct EnergyBudget` In case `M_E` etc. refer to parameters from the literature, it might be helpful to move this information to the docstring. #### Line 65, in `function volumentriclength` The comment expresses some uncertainty about the correctness of the units. #### Line 199, in `function update!` Empty else clause, with comment indicating parts of the model are currently not impemented? ### `nature/insects.jl` #### Lines 6-77, `function insectbiomass` Many magical constants in the function. ### `nature/macros.jl` A complex macro-based DSL. ### `nature/nature.jl` Currently still uses `Agents.jl`. ### `nature/populations.jl` #### Lines 7-82, `function initpopulation` This function is quite long and has a complex design. The function both accepts an `initfunction` as well as returns a closure. Perhaps there is a simpler way to do this. #### Lines 110, in `function reproduce!` Why is `@eval` necessary here? Using `@eval` will mean slower performance, as well as requiring a binary distribution of Persefone to always carry along the whole Julia compiler infrastructure. Perhaps there is a simpler way to do this.