|
| 1 | +# 0.42.0 |
| 2 | + |
| 3 | +## DynamicPPL 0.39 |
| 4 | + |
| 5 | +Turing.jl v0.42 brings with it all the underlying changes in DynamicPPL 0.39. |
| 6 | +Please see [the DynamicPPL changelog](https://github.com/TuringLang/DynamicPPL.jl/releases/tag/v0.39.0) for full details; in here we summarise only the changes that are most pertinent to end-users of Turing.jl. |
| 7 | + |
| 8 | +### Thread safety opt-in |
| 9 | + |
| 10 | +Turing.jl has supported threaded tilde-statements for a while now, as long as said tilde-statements are **observations** (i.e., likelihood terms). |
| 11 | +For example: |
| 12 | + |
| 13 | +```julia |
| 14 | +@model function f(y) |
| 15 | + x ~ Normal() |
| 16 | + Threads.@threads for i in eachindex(y) |
| 17 | + y[i] ~ Normal(x) |
| 18 | + end |
| 19 | +end |
| 20 | +``` |
| 21 | + |
| 22 | +**Models where tilde-statements or `@addlogprob!` are used in parallel require what we call 'threadsafe evaluation'.** |
| 23 | +In previous releases of Turing.jl, threadsafe evaluation was enabled whenever Julia was launched with more than one thread. |
| 24 | +However, this is an imprecise way of determining whether threadsafe evaluation is really needed. |
| 25 | +It causes performance degradation for models that do _not_ actually need threadsafe evaluation, and generally led to ill-defined behaviour in various parts of the Turing codebase. |
| 26 | + |
| 27 | +In Turing.jl v0.42, **threadsafe evaluation is now opt-in.** |
| 28 | +To enable threadsafe evaluation, after defining a model, you now need to call `setthreadsafe(model, true)` (note that this is not a mutating function, it returns a new model): |
| 29 | + |
| 30 | +```julia |
| 31 | +y = randn(100) |
| 32 | +model = f(y) |
| 33 | +model = setthreadsafe(model, true) |
| 34 | +``` |
| 35 | + |
| 36 | +You *only* need to do this if your model uses tilde-statements or `@addlogprob!` in parallel. |
| 37 | +You do *not* need to do this if: |
| 38 | + |
| 39 | + - your model has other kinds of parallelism but does not include tilde-statements inside; |
| 40 | + - or you are using `MCMCThreads()` or `MCMCDistributed()` to sample multiple chains in parallel, but your model itself does not use parallelism. |
| 41 | + |
| 42 | +If your model does include parallelised tilde-statements or `@addlogprob!` calls, and you evaluate it/sample from it without setting `setthreadsafe(model, true)`, then you may get statistically incorrect results without any warnings or errors. |
| 43 | + |
| 44 | +### Faster performance |
| 45 | + |
| 46 | +Many operations in DynamicPPL have been substantially sped up. |
| 47 | +You should find that anything that uses LogDensityFunction (i.e., HMC/NUTS samplers, optimisation) is faster in this release. |
| 48 | +Prior sampling should also be much faster than before. |
| 49 | + |
| 50 | +### `predict` improvements |
| 51 | + |
| 52 | +If you have a model that requires threadsafe evaluation (i.e., parallel observations), you can now use this with `predict`. |
| 53 | +Carrying on from the previous example, you can do: |
| 54 | + |
| 55 | +```julia |
| 56 | +model = setthreadsafe(f(y), true) |
| 57 | +chain = sample(model, NUTS(), 1000) |
| 58 | + |
| 59 | +pdn_model = f(fill(missing, length(y))) |
| 60 | +pdn_model = setthreadsafe(pdn_model, true) # set threadsafe |
| 61 | +predictions = predict(pdn_model, chain) # generate new predictions in parallel |
| 62 | +``` |
| 63 | + |
| 64 | +### Log-density names in chains |
| 65 | + |
| 66 | +When sampling from a Turing model, the resulting `MCMCChains.Chains` object now contains the log-joint, log-prior, and log-likelihood under the names `:logjoint`, `:logprior`, and `:loglikelihood` respectively. |
| 67 | +Previously, `:logjoint` would be stored under the name `:lp`. |
| 68 | + |
| 69 | +### Log-evidence in chains |
| 70 | + |
| 71 | +When sampling using MCMCChains, the chain object will no longer have its `chain.logevidence` field set. |
| 72 | +Instead, you can calculate this yourself from the log-likelihoods stored in the chain. |
| 73 | +For SMC samplers, the log-evidence of the entire trajectory is stored in `chain[:logevidence]` (which is the same for every particle in the 'chain'). |
| 74 | + |
| 75 | +## AdvancedVI 0.6 |
| 76 | + |
| 77 | +Turing.jl v0.42 updates `AdvancedVI.jl` compatibility to 0.6 (we skipped the breaking 0.5 update as it does not introduce new features). |
| 78 | +`AdvancedVI.jl@0.6` introduces major structural changes including breaking changes to the interface and multiple new features. |
| 79 | +The summary of the changes below are the things that affect the end-users of Turing. |
| 80 | +For a more comprehensive list of changes, please refer to the [changelogs](https://github.com/TuringLang/AdvancedVI.jl/blob/main/HISTORY.md) in `AdvancedVI`. |
| 81 | + |
| 82 | +### Breaking changes |
| 83 | + |
| 84 | +A new level of interface for defining different variational algorithms has been introduced in `AdvancedVI` v0.5. As a result, the function `Turing.vi` now receives a keyword argument `algorithm`. The object `algorithm <: AdvancedVI.AbstractVariationalAlgorithm` should now contain all the algorithm-specific configurations. Therefore, keyword arguments of `vi` that were algorithm-specific such as `objective`, `operator`, `averager` and so on, have been moved as fields of the relevant `<: AdvancedVI.AbstractVariationalAlgorithm` structs. |
| 85 | + |
| 86 | +In addition, the outputs also changed. Previously, `vi` returned both the last-iterate of the algorithm `q` and the iterate average `q_avg`. Now, for the algorithms running parameter averaging, only `q_avg` is returned. As a result, the number of returned values reduced from 4 to 3. |
| 87 | + |
| 88 | +For example, |
| 89 | + |
| 90 | +```julia |
| 91 | +q, q_avg, info, state = vi( |
| 92 | + model, q, n_iters; objective=RepGradELBO(10), operator=AdvancedVI.ClipScale() |
| 93 | +) |
| 94 | +``` |
| 95 | + |
| 96 | +is now |
| 97 | + |
| 98 | +```julia |
| 99 | +q_avg, info, state = vi( |
| 100 | + model, |
| 101 | + q, |
| 102 | + n_iters; |
| 103 | + algorithm=KLMinRepGradDescent(adtype; n_samples=10, operator=AdvancedVI.ClipScale()), |
| 104 | +) |
| 105 | +``` |
| 106 | + |
| 107 | +Similarly, |
| 108 | + |
| 109 | +```julia |
| 110 | +vi( |
| 111 | + model, |
| 112 | + q, |
| 113 | + n_iters; |
| 114 | + objective=RepGradELBO(10; entropy=AdvancedVI.ClosedFormEntropyZeroGradient()), |
| 115 | + operator=AdvancedVI.ProximalLocationScaleEntropy(), |
| 116 | +) |
| 117 | +``` |
| 118 | + |
| 119 | +is now |
| 120 | + |
| 121 | +```julia |
| 122 | +vi(model, q, n_iters; algorithm=KLMinRepGradProxDescent(adtype; n_samples=10)) |
| 123 | +``` |
| 124 | + |
| 125 | +Lastly, to obtain the last-iterate `q` of `KLMinRepGradDescent`, which is not returned in the new interface, simply select the averaging strategy to be `AdvancedVI.NoAveraging()`. That is, |
| 126 | + |
| 127 | +```julia |
| 128 | +q, info, state = vi( |
| 129 | + model, |
| 130 | + q, |
| 131 | + n_iters; |
| 132 | + algorithm=KLMinRepGradDescent( |
| 133 | + adtype; |
| 134 | + n_samples=10, |
| 135 | + operator=AdvancedVI.ClipScale(), |
| 136 | + averager=AdvancedVI.NoAveraging(), |
| 137 | + ), |
| 138 | +) |
| 139 | +``` |
| 140 | + |
| 141 | +Additionally, |
| 142 | + |
| 143 | + - The default hyperparameters of `DoG`and `DoWG` have been altered. |
| 144 | + - The deprecated `AdvancedVI@0.2`-era interface is now removed. |
| 145 | + - `estimate_objective` now always returns the value to be minimized by the optimization algorithm. For example, for ELBO maximization algorithms, `estimate_objective` will return the *negative ELBO*. This is breaking change from the previous behavior where the ELBO was returned. |
| 146 | + - The initial value for the `q_meanfield_gaussian`, `q_fullrank_gaussian`, and `q_locationscale` have changed. Specificially, the default initial value for the scale matrix has been changed from `I` to `0.6*I`. |
| 147 | + - When using algorithms that expect to operate in unconstrained spaces, the user is now explicitly expected to provide a `Bijectors.TransformedDistribution` wrapping an unconstrained distribution. (Refer to the docstring of `vi`.) |
| 148 | + |
| 149 | +### New Features |
| 150 | + |
| 151 | +`AdvancedVI@0.6` adds numerous new features including the following new VI algorithms: |
| 152 | + |
| 153 | + - `KLMinWassFwdBwd`: Also known as "Wasserstein variational inference," this algorithm minimizes the KL divergence under the Wasserstein-2 metric. |
| 154 | + - `KLMinNaturalGradDescent`: This algorithm, also known as "online variational Newton," is the canonical "black-box" natural gradient variational inference algorithm, which minimizes the KL divergence via mirror descent under the KL divergence as the Bregman divergence. |
| 155 | + - `KLMinSqrtNaturalGradDescent`: This is a recent variant of `KLMinNaturalGradDescent` that operates in the Cholesky-factor parameterization of Gaussians instead of precision matrices. |
| 156 | + - `FisherMinBatchMatch`: This algorithm called "batch-and-match," minimizes the variation of the 2nd order Fisher divergence via a proximal point-type algorithm. |
| 157 | + |
| 158 | +Any of the new algorithms above can readily be used by simply swappin the `algorithm` keyword argument of `vi`. |
| 159 | +For example, to use batch-and-match: |
| 160 | + |
| 161 | +```julia |
| 162 | +vi(model, q, n_iters; algorithm=FisherMinBatchMatch()) |
| 163 | +``` |
| 164 | + |
| 165 | +## External sampler interface |
| 166 | + |
| 167 | +The interface for defining an external sampler has been reworked. |
| 168 | +In general, implementations of external samplers should now no longer need to depend on Turing. |
| 169 | +This is because the interface functions required have been shifted upstream to AbstractMCMC.jl. |
| 170 | + |
| 171 | +In particular, you now only need to define the following functions: |
| 172 | + |
| 173 | + - `AbstractMCMC.step(rng::Random.AbstractRNG, model::AbstractMCMC.LogDensityModel, ::MySampler; kwargs...)` (and also a method with `state`, and the corresponding `step_warmup` methods if needed) |
| 174 | + - `AbstractMCMC.getparams(::MySamplerState)` -> Vector{<:Real} |
| 175 | + - `AbstractMCMC.getstats(::MySamplerState)` -> NamedTuple |
| 176 | + - `AbstractMCMC.requires_unconstrained_space(::MySampler)` -> Bool (default `true`) |
| 177 | + |
| 178 | +This means that you only need to depend on AbstractMCMC.jl. |
| 179 | +As long as the above functions are defined correctly, Turing will be able to use your external sampler. |
| 180 | + |
| 181 | +The `Turing.Inference.isgibbscomponent(::MySampler)` interface function still exists, but in this version the default has been changed to `true`, so you should not need to overload this. |
| 182 | + |
| 183 | +## Optimisation interface |
| 184 | + |
| 185 | +The Optim.jl interface has been removed (so you cannot call `Optim.optimize` directly on Turing models). |
| 186 | +You can use the `maximum_likelihood` or `maximum_a_posteriori` functions with an Optim.jl solver instead (via Optimization.jl: see https://docs.sciml.ai/Optimization/stable/optimization_packages/optim/ for documentation of the available solvers). |
| 187 | + |
| 188 | +## Internal changes |
| 189 | + |
| 190 | +The constructors of `OptimLogDensity` have been replaced with a single constructor, `OptimLogDensity(::DynamicPPL.LogDensityFunction)`. |
| 191 | + |
1 | 192 | # 0.41.4 |
2 | 193 |
|
3 | 194 | Fixed a bug where the `check_model=false` keyword argument would not be respected when sampling with multiple threads or cores. |
|
0 commit comments