# persefone-model code review, May 2024

The code is well-structured into understandable units, both in terms
of directories and files, as well as functions of reasonable size.

Every function has a docstring that is clearly written.  Functions and
variable names are easy to understand.

The documentation is clearly written and well-structured and the
website is nicely illustrated.

The actual simulation code is behind two different macro systems: the
`Agents.jl` macro system as well as Persefone's built-in
domain-specific language (DSL).  This makes the overall system hard to
understand, modify, and validate.  If possible, a simpler design
without macros should be considered.


## General comments

- use of `eval` inside simulation code: this seems to me both an
  unnecessary reduction of performance as well as making binary
  packaging harder, meaning a binary distribution of Persefone will
  always have to

- The domain-specific language (DSL) for defining species is complex
  with nested macro calls and large amounts of generated code.  I
  found it takes quite a bit longer to understand what actually
  happens under the hood.  Using `@macroexpand` one can see the
  generated code in the REPL and this is quite large for the skylark
  model, and interspersed with comments from macro expansion.
  - If possible, I would suggest using normal types and functions
    instead of a macro-based DSL.

- Some macros could be converted into functions, which would make the
  code easier to understand.  The macro form is currently necessary
  for use inside the DSL, but the macro form also gets used outside of
  the DSL.  Examples:
  - `@param`
  - `@rand`
  - `@shuffle!`
  - `@chance`

- Hardcoded magic constants in some functions, should be made global
  constants or perhaps in `struct model`.  Giving these constants
  names makes the code clearer, and also makes it easier for users of
  the package to refer to these numbers if necessary.  It might also
  be useful to give these constants physical units (see next point).

- Use physical units (`Unitful.jl`) for clarity, documentation, and
  catching accidental unit mismatch errors, both in Persefone as well
  as for users of the package.  Some examples:
  - `LAItotal`
  - `LAIgreen`
  - `GDD`
  - `conversionfactor  # hectares to pixels`

- Hardcoded pixel size of `10m * 10m = 100 m^2`, this should be a
  global const with physical units or part of `struct model`.

- Coding style
  - some overuse of `&&` and ternary operator `cond ? a : b`, some
    would probably be better as `if` statements
  - sometimes no spaces between operators, e.g. `*` for string
    concatenation, makes the code harder to read
  - perhaps explicit `return` statements at the end of functions for
    clarity, but this is a personal preference of mine that is not
    universal in the Julia ecosystem

- Minimum julia version should be 1.10 due to this bug opened by
  Daniel https://github.com/carlobaldassi/ArgParse.jl/issues/121 which
  was closed recently recommending upgrading to a newer julia version.

- Docstrings for functions should state what meaning and type of the
  return value is.

- Possible workflows (Github) / pipelines (Gitlab) for automation
  - CI: run unit tests after each commit and for each candidate merge
    request
  - Docs: build docs automatically after each commit
  - CompatHelper: create pull request when dependencies have new
    versions
  - TagBot: tag releases to be incorporated in the Julia package
    registry (once Persefone is released there)
  - benchmark: automatically run benchmarks to catch performance
    regressions, see also `PkgBenchmarks.jl`

- Use `PrecompileTools.jl` for precompilation and also check for
  invalidations etc. to reduce package loading time.

- Replace loops in the style of `for i = 1:length(a)` with `for i in
  eachindex(a)`, this makes the for loop independent of 1-based or
  0-based indexing etc.

- `make test` / `test/runtests.jl`: no progress visible, hard to see
  what is currently happening.  I use the following in my packages:
  ```julia
  # this function goes in `test/runtests.jl`
  # show which testset is currently running
  showtestset() = println(" "^(2 * Test.get_testset_depth()), "testing ",
                          Test.get_testset().description)

  # call showtestset() in each @testset
  @testset verbose=true "testset_name" begin
      showtestset()
      # tests here ...
  end
  ```

- It might make sense to use type-based dispatch (on the type of the
  agent) instead of having code of this form:
  ```julia
  for a in allagents(model)
      (typeof(a) != Animal) && continue
      pops[a.name] += 1
  end
  ```
  - A different alternative is to have an `allanimals` function.



## General comments on testing

### CI (Continuous integration), recommended

https://en.wikipedia.org/wiki/Continuous_integration

There doesn't seem to be a CI (Continuous integration) runner set up,
that runs the unit tests after every commit to master on a matrix of
OS and julia version combinations.

This should also be activated in merge requests, so that nothing is
merged to the `master` branch that fails the unit tests.

Julia packages on Github typically include a little badge at the top
of the README indicating the CI status in the form of "Run tests:
passing".

### Aqua.jl tests, recommended

https://github.com/JuliaTesting/Aqua.jl

The package `Aqua.jl` runs some automatic quality assurance checks for
common problems in Julia codebases, for example (excerpt from the link
above):

- There are no method ambiguities.
- There are no undefined exports.
- There are no unbound type parameters.
- There are no stale dependencies listed in Project.toml
- etc.

Once the issues highlighted by the Aqua tests are fixed, the Aqua
tests should be run as part of the unit tests.

Julia packages on Github usually have a little badge "tested with
Aqua.jl" at the top of the README.

### Code coverage testing, recommended

Code coverage when running unit tests should be recorded, as code that
is not tested during unit tests or manual testing might fail due to
even a simple typo.  Many Julia packages on Github display a badge
showing the code coverage of their package when running the unit tests
as a percentage. This can be integrated into the workflows (Github) /
pipelines (Gitlab) and recorded and updated automatically.  When
creating a merge request, the change in coverage can also be recorded,
alerting developers when they add significant amount of code without
corresponding tests.

Some relevant packages:
- https://github.com/JuliaCI/Coverage.jl
- https://github.com/JuliaCI/LocalCoverage.jl

External services:
- codecov.io
- coveralls.io

### Optional: enforce consistent coding style

If a consistent coding style is desired, a code formatter can be run
automatically to ensure any code committed to the repository follows
the coding style guidelines chosen.

Coding style does not affect code correctness, and therefore is less
important.  It also usually causes a large commit when changing the
code base to the formatting style, unless the coding style was used
from the beginning.

Code formatting can be checked before merging a merge request, to
always ensure proper coding style.  README badges often used on Github
Julia packages for coding styles as well.

JuliaFormatter is the package usually used for code formatting:
https://github.com/domluna/JuliaFormatter.jl

Some popular coding styles in the Julia ecosystem:
- https://github.com/SciML/SciMLStyle
- https://github.com/JuliaDiff/BlueStyle

### Optional: JET.jl testing

https://github.com/aviatesk/JET.jl

JET.jl can be used to detect type instabilities which can cause slow
performance, as well as other bugs.

This package depends on the internals of the Julia compiler, and
therefore results can change between Julia versions.



## persefone-model code

### `Project.toml`

All packages listed under `[deps]` should have an entry in the
`[compat]` section (this would also be flagged by an `Aqua.jl` test).

Version bounds on a compat entry of the form ">=5.6" can cause errors
in the future, as Julia uses semantic versioning (semver) throughout
the ecosystem to indicate API compatibility.  A specifier of ">=5.6"
means that any version greater than "5.6.0" is acceptable, but future
major version releases "6.x.y", "7.x.y" will have API changes compared
to version "5.6.x", potentially causing the code to not work or
generate incorrect results.

If a dependent package is known to work for multiple major versions,
these should be listed explicitly separated by commas.

Example: the Agents.jl dependency works for versions "^5.6" and "^6",
so the compat section should read "5.6,6", indicating that the range
of accepted versions is "[5.6.0, 7.0.0)".

See:
- https://pkgdocs.julialang.org/v1/compatibility/
- https://semver.org/


### `Persefone.jl`

#### Line 16

Explicitly listing which functions are imported with `using` is
usually preferred.  This makes it clearer from which package the
functions used come from.

#### Lines 120-121

PrecompileTools.jl should be used so that calls to `precompile` only
run during precompilation, and not everytime when loading the package
via `using Persefone`.


### `analysis/makeplots.jl`

It might make sense to decouple CairoMakie from the model code.

See
https://git.idiv.de/persefone/persefone-model/-/issues/81
for details.

#### Line 15: function visualisemap

Type annotations on the `date` and `landcover` arguments would make it
clearer which types can be passed in. Alternatively it could be stated
in the docstring.


### `core/input.jl`

#### Line 12

The code assumes `/` is the path separator. Does this work on windows,
where it is `\`?  Not sure if there is perhaps some on-the-fly
translation on Windows.

Better: `joinpath(pkgdir(Persefone), "src", "parameters.toml")`

#### Line 16-17

I think this comment can be removed.

#### Lines 19-35: `@param`

Why does this have to be a macro? Why not:

```julia
param(model::AgentBasedModel, domain::AbstractString, par::AbstractString) = model.settings["$domain.$par"]
# or
param(model::AgentBasedModel, domainparam::AbstractString) = model.settings[domainparam]
```

#### Lines 118-149: `@rand`, `@shuffle!`, `@chance`

Why are these macros? Why not:

```julia
rand(model::AgentBasedModel, args...) = rand(model.rng, args...)
shuffle!(model::AgentBasedModel, collection) = shuffle!(model.rng, collection)
chance(odds) = rand(model.rng) < odds
# better nomenclature instead of 'odds'
chance(prob) = rand(model.rng) < prob
```

Note on nomenclature: odds is defined as the ratio `p / (1 - p)` where
`p` is a probability between 0 and 1.  What is called `odds` here is a
probability.

#### Line 180, in `function parsecommandline`

Perhaps the minimum julia version for Persefone should be set to 1.10,
and the comment can be removed.  According to
https://github.com/carlobaldassi/ArgParse.jl/issues/121 the fix is to
upgrade julia.

#### Line 183, in `function parsecommandline`

`(args[a] == nothing)`: use `isnothing`.

#### Lines 188-213, in `function loadmodelobject`

Why `@warn "..."` instead of `throw(ErrorException("..."))`?

Otherwise code will continue to run even though there was an error.


### `core/output.jl`

#### Lines 25-27, in `function createdatadir`

Replace ternary operator with if-then-else and invert logic for
clarity.

```julia
if overwrite
    @warn "Overwriting existing output directory $(outdir)."
else
    error("Output directory exists, will not overwrite. Aborting.")
end
```

Note: `error("...")` throws an `ErrorException`. Is the `#TODO replace
with exception` comment referring to use a more specific exception
type?

#### Lines 40-45, in `function modellogger`

Replace ternary logic with if-then-else.

`error` instead of `Base.error`.  Comment can be removed.

#### Line 62, in `function withtestlogger`

Prefer `isnothing(x)` to `x == nothing`.

Can be written shorter:

`model.logger = (isnothing(logstate) ? global_logger() : logstate.logger)`

#### Lines 66-98, `function saveinputfiles`

This function assumes that the current working directory is a git
repository.  It also assumes that the current working directory is the
Persefone.jl git repository.  I'm not sure that this makes sense.

Line 93-95: replace `Base.error` with `error` which already throws an
exception.  Not sure what the comment `#TODO replace errors with
exceptions` should mean.

#### Lines 100-120, `function prepareTOML`

Type annotation `AbstractDict` on `settings` argument?

Lines 109-111: replace ternary logic with if-then-else.

#### Line 154, in `function newdataoutput!`

Replace `Base.error` with `error`, remove comment.

#### Line 167, in `function newdataoutput!`

Use of `Any[]` can mean slower performance.

#### Line 241, in `function savemodelobject`

Use `if` statement instead of short-circuit boolean for readability.

### `core/simulation.jl`

#### Lines 18-20, in `function simulate`

Replace ternary operator with if-then-else.

#### Line 32, in `function simulate!`

Where is `dummystep` defined? The string `dummystep` only appears once
in the codebase in this line.

#### Lines 50-52, in `function intialise`

Replace ternary operator with if-then-else.

#### Lines 130-151, `function stepsimulation!`

Lines 136-143: it's not quite clear to me why Agents.jl would throw an
exception here, but the comments here say that this part of the code
will be changed in the next code reorganisation.

Line 148: hardcoded simulation timestep, might be worthwile to have
`model.timestep`.

### `crop/crops.jl`

#### Lines 24-26, in `struct CropCurveParams`

Perhaps it's worthwile for people like me who are not so well-versed
in the terminology of ALMaSS to add a reminder in the docstring what
`GDD`, `LAItotal`, `LAIgreen` stand for.

Skimming the documentation in `docs/ALMaSSVegetationModelling.pdf`,
`LAI` seems to refer to "leaf-area index".

Elsewhere in the code, `GDD` seems to refer to "growing degree days",
which seems to have units of "°C * days".

#### Lines 55-60, in `function Base.tryparse`

Replace long chain of ternary operators with if-then-else for clarity.


### `crop/farmplot.jl`

#### Lines 32-60, `function initfields!`

Line 32: `initfields!` sounds like it refers to the fields of a
struct, maybe a clearer name is `initfarms!`?

Lines 48-49: the comment sounds unsure, is it correct? perhaps factor
this out into a small function?

Line 51: hardcoded "natural grass", change to global constant.

#### Lines 62-80, `function initfields!`

A few boolean short-circuit `&&` statements could be converted to
if-then-else for clarity.

Hardcoded default base temperature, change to global constant.

#### Line 94, in `function sow!`

Comment: `#XXX test if the crop is sowable?`, should this be done?

#### Lines 104-106, in `function harvest!`

Replace ternary operator with if-then-else.

#### Lines 111-113

Seemingly missing functions:

```julia
#TODO fertilise!()
#TODO spray!()
#TODO till!()
```

#### Lines 115-150, `function growcrop!`

Line 126: replace `for p in 1:length(points)` with `for p in eachindex(points)`.

Lines 127,129: make magical constants `99999` and `-1` global consts.

Line 137: replace `p == length(points)` with `p == lastindex(points)`.

Lines 136, 143-147: the comments here seem to indicate that the
implementation is not quite correct.


#### Lines 155-168, `function averagefieldsize`

Line 161: hardcoded `conversionfactor` should be global const.  Has
units of `u"m^2"`.

Line 162: perhaps clearer as `sizes = Float64[]`

Line 165: replace `size(a.pixels)[1]` with `length(a.pixels)` as it's
a `Vector`.

Line 167: Replace `size(sizes)[1]` with `length(sizes)`. Why round to
two digits?  Perhaps make a note in a comment why.

#### Lines 170-178, `function croptype`

Perhaps rewrite like this:
```julia
function croptype(pos::Tuple{Int64,Int64}, model::AgentBasedModel)
    fid = model.landscape[pos...].fieldid
    return ismissing(fid) ? nothing : model[fid].croptype
end
```

#### Lines 180-189, `function cropname`

Rewrite in the same way as `croptype` (see previous comment for
`croptype`).

#### Lines 191-200, `function cropheight`

Rewrite in the same way as `croptype` (see comment for `croptype`).


### `farm/farm.jl`

Code still missing.


### `nature/species/skylark.jl`

#### Lines 6-214

Line 41: empty tuple? (immutable)

Line 46: `clutch` has type `Vector{Any}`, if the IDs are all integers
`clutch = Int[]` is faster.

Line 53-56: comments indicate some missing functionality.

Lines 131, 133: magic constant `-1`.

### `nature/species/wolpertinger.jl`, `nature/species/wyvern.jl`

:-)


### `nature/ecologicaldata.jl`

#### Lines 28-31, 53-56

```julia
for a in allagents(model)
    (typeof(a) != Animal) && continue
    # do something with a
end
```

Possible alternative:

    ```julia
    for a in allanimals(model)
        # do something with a
    end
    ```


### `nature/energy.jl`

#### Lines 9-28, in `struct DEBparameters`

It might be helpful to move the comments that explain the meaning of
the struct's fields into the docstring.

#### Lines 30-53, in `struct EnergyBudget`

In case `M_E` etc. refer to parameters from the literature, it might
be helpful to move this information to the docstring.


#### Line 65, in `function volumentriclength`

The comment expresses some uncertainty about the correctness of the
units.

#### Line 199, in `function update!`

Empty else clause, with comment indicating parts of the model are
currently not impemented?


### `nature/insects.jl`

#### Lines 6-77, `function insectbiomass`

Many magical constants in the function.


### `nature/macros.jl`

A complex macro-based DSL.


### `nature/nature.jl`

Currently still uses `Agents.jl`.


### `nature/populations.jl`

#### Lines 7-82, `function initpopulation`

This function is quite long and has a complex design.  The function
both accepts an `initfunction` as well as returns a closure.  Perhaps
there is a simpler way to do this.

#### Lines 110, in `function reproduce!`

Why is `@eval` necessary here?  Using `@eval` will mean slower
performance, as well as requiring a binary distribution of Persefone
to always carry along the whole Julia compiler infrastructure.

Perhaps there is a simpler way to do this.