TuringLang
diff --git a/‎HISTORY.md‎
Lines changed: 191 additions & 0 deletions b/‎HISTORY.md‎
Lines changed: 191 additions & 0 deletions
diff --git a/‎Project.toml‎
Lines changed: 6 additions & 10 deletions b/‎Project.toml‎
Lines changed: 6 additions & 10 deletions
diff --git a/‎docs/src/api.md‎
Lines changed: 14 additions & 6 deletions b/‎docs/src/api.md‎
Lines changed: 14 additions & 6 deletions
diff --git a/‎ext/TuringDynamicHMCExt.jl‎
Lines changed: 5 additions & 9 deletions b/‎ext/TuringDynamicHMCExt.jl‎
Lines changed: 5 additions & 9 deletions
@@ -1,3 +1,194 @@
+# 0.42.0
+
+## DynamicPPL 0.39
+
+Turing.jl v0.42 brings with it all the underlying changes in DynamicPPL 0.39.
+Please see [the DynamicPPL changelog](https://github.com/TuringLang/DynamicPPL.jl/releases/tag/v0.39.0) for full details; in here we summarise only the changes that are most pertinent to end-users of Turing.jl.
+
+### Thread safety opt-in
+
+Turing.jl has supported threaded tilde-statements for a while now, as long as said tilde-statements are **observations** (i.e., likelihood terms).
+For example:
+
+```julia
+@model function f(y)
+    x ~ Normal()
+    Threads.@threads for i in eachindex(y)
+        y[i] ~ Normal(x)
+    end
+end
+```
+
+**Models where tilde-statements or `@addlogprob!` are used in parallel require what we call 'threadsafe evaluation'.**
+In previous releases of Turing.jl, threadsafe evaluation was enabled whenever Julia was launched with more than one thread.
+However, this is an imprecise way of determining whether threadsafe evaluation is really needed.
+It causes performance degradation for models that do _not_ actually need threadsafe evaluation, and generally led to ill-defined behaviour in various parts of the Turing codebase.
+
+In Turing.jl v0.42, **threadsafe evaluation is now opt-in.**
+To enable threadsafe evaluation, after defining a model, you now need to call `setthreadsafe(model, true)` (note that this is not a mutating function, it returns a new model):
+
+```julia
+y = randn(100)
+model = f(y)
+model = setthreadsafe(model, true)
+```
+
+You *only* need to do this if your model uses tilde-statements or `@addlogprob!` in parallel.
+You do *not* need to do this if:
+
+  - your model has other kinds of parallelism but does not include tilde-statements inside;
+  - or you are using `MCMCThreads()` or `MCMCDistributed()` to sample multiple chains in parallel, but your model itself does not use parallelism.
+
+If your model does include parallelised tilde-statements or `@addlogprob!` calls, and you evaluate it/sample from it without setting `setthreadsafe(model, true)`, then you may get statistically incorrect results without any warnings or errors.
+
+### Faster performance
+
+Many operations in DynamicPPL have been substantially sped up.
+You should find that anything that uses LogDensityFunction (i.e., HMC/NUTS samplers, optimisation) is faster in this release.
+Prior sampling should also be much faster than before.
+
+### `predict` improvements
+
+If you have a model that requires threadsafe evaluation (i.e., parallel observations), you can now use this with `predict`.
+Carrying on from the previous example, you can do:
+
+```julia
+model = setthreadsafe(f(y), true)
+chain = sample(model, NUTS(), 1000)
+
+pdn_model = f(fill(missing, length(y)))
+pdn_model = setthreadsafe(pdn_model, true)  # set threadsafe
+predictions = predict(pdn_model, chain) # generate new predictions in parallel
+```
+
+### Log-density names in chains
+
+When sampling from a Turing model, the resulting `MCMCChains.Chains` object now contains the log-joint, log-prior, and log-likelihood under the names `:logjoint`, `:logprior`, and `:loglikelihood` respectively.
+Previously, `:logjoint` would be stored under the name `:lp`.
+
+### Log-evidence in chains
+
+When sampling using MCMCChains, the chain object will no longer have its `chain.logevidence` field set.
+Instead, you can calculate this yourself from the log-likelihoods stored in the chain.
+For SMC samplers, the log-evidence of the entire trajectory is stored in `chain[:logevidence]` (which is the same for every particle in the 'chain').
+
+## AdvancedVI 0.6
+
+Turing.jl v0.42 updates `AdvancedVI.jl` compatibility to 0.6 (we skipped the breaking 0.5 update as it does not introduce new features).
+`AdvancedVI.jl@0.6` introduces major structural changes including breaking changes to the interface and multiple new features.
+The summary of the changes below are the things that affect the end-users of Turing.
+For a more comprehensive list of changes, please refer to the [changelogs](https://github.com/TuringLang/AdvancedVI.jl/blob/main/HISTORY.md) in `AdvancedVI`.
+
+### Breaking changes
+
+A new level of interface for defining different variational algorithms has been introduced in `AdvancedVI` v0.5. As a result, the function `Turing.vi` now receives a keyword argument `algorithm`. The object `algorithm <: AdvancedVI.AbstractVariationalAlgorithm` should now contain all the algorithm-specific configurations. Therefore, keyword arguments of `vi` that were algorithm-specific such as `objective`, `operator`, `averager` and so on, have been moved as fields of the relevant `<: AdvancedVI.AbstractVariationalAlgorithm` structs.
+
+In addition, the outputs also changed. Previously, `vi` returned both the last-iterate of the algorithm `q` and the iterate average `q_avg`. Now, for the algorithms running parameter averaging, only `q_avg` is returned. As a result, the number of returned values reduced from 4 to 3.
+
+For example,
+
+```julia
+q, q_avg, info, state = vi(
+    model, q, n_iters; objective=RepGradELBO(10), operator=AdvancedVI.ClipScale()
+)
+```
+
+is now
+
+```julia
+q_avg, info, state = vi(
+    model,
+    q,
+    n_iters;
+    algorithm=KLMinRepGradDescent(adtype; n_samples=10, operator=AdvancedVI.ClipScale()),
+)
+```
+
+Similarly,
+
+```julia
+vi(
+    model,
+    q,
+    n_iters;
+    objective=RepGradELBO(10; entropy=AdvancedVI.ClosedFormEntropyZeroGradient()),
+    operator=AdvancedVI.ProximalLocationScaleEntropy(),
+)
+```
+
+is now
+
+```julia
+vi(model, q, n_iters; algorithm=KLMinRepGradProxDescent(adtype; n_samples=10))
+```
+
+Lastly, to obtain the last-iterate `q` of `KLMinRepGradDescent`, which is not returned in the new interface, simply select the averaging strategy to be `AdvancedVI.NoAveraging()`. That is,
+
+```julia
+q, info, state = vi(
+    model,
+    q,
+    n_iters;
+    algorithm=KLMinRepGradDescent(
+        adtype;
+        n_samples=10,
+        operator=AdvancedVI.ClipScale(),
+        averager=AdvancedVI.NoAveraging(),
+    ),
+)
+```
+
+Additionally,
+
+  - The default hyperparameters of `DoG`and `DoWG` have been altered.
+  - The deprecated `AdvancedVI@0.2`-era interface is now removed.
+  - `estimate_objective` now always returns the value to be minimized by the optimization algorithm. For example, for ELBO maximization algorithms, `estimate_objective` will return the *negative ELBO*. This is breaking change from the previous behavior where the ELBO was returned.
+  - The initial value for the `q_meanfield_gaussian`, `q_fullrank_gaussian`, and `q_locationscale` have changed. Specificially, the default initial value for the scale matrix has been changed from `I` to `0.6*I`.
+  - When using algorithms that expect to operate in unconstrained spaces, the user is now explicitly expected to provide a `Bijectors.TransformedDistribution` wrapping an unconstrained distribution. (Refer to the docstring of `vi`.)
+
+### New Features
+
+`AdvancedVI@0.6` adds numerous new features including the following new VI algorithms:
+
+  - `KLMinWassFwdBwd`: Also known as "Wasserstein variational inference," this algorithm minimizes the KL divergence under the Wasserstein-2 metric.
+  - `KLMinNaturalGradDescent`: This algorithm, also known as "online variational Newton," is the canonical "black-box" natural gradient variational inference algorithm, which minimizes the KL divergence via mirror descent under the KL divergence as the Bregman divergence.
+  - `KLMinSqrtNaturalGradDescent`: This is a recent variant of `KLMinNaturalGradDescent` that operates in the Cholesky-factor parameterization of Gaussians instead of precision matrices.
+  - `FisherMinBatchMatch`: This algorithm called "batch-and-match," minimizes the variation of the 2nd order Fisher divergence via a proximal point-type algorithm.
+
+Any of the new algorithms above can readily be used by simply swappin the `algorithm` keyword argument of `vi`.
+For example, to use batch-and-match:
+
+```julia
+vi(model, q, n_iters; algorithm=FisherMinBatchMatch())
+```
+
+## External sampler interface
+
+The interface for defining an external sampler has been reworked.
+In general, implementations of external samplers should now no longer need to depend on Turing.
+This is because the interface functions required have been shifted upstream to AbstractMCMC.jl.
+
+In particular, you now only need to define the following functions:
+
+  - `AbstractMCMC.step(rng::Random.AbstractRNG, model::AbstractMCMC.LogDensityModel, ::MySampler; kwargs...)` (and also a method with `state`, and the corresponding `step_warmup` methods if needed)
+  - `AbstractMCMC.getparams(::MySamplerState)`               -> Vector{<:Real}
+  - `AbstractMCMC.getstats(::MySamplerState)`                -> NamedTuple
+  - `AbstractMCMC.requires_unconstrained_space(::MySampler)` -> Bool (default `true`)
+
+This means that you only need to depend on AbstractMCMC.jl.
+As long as the above functions are defined correctly, Turing will be able to use your external sampler.
+
+The `Turing.Inference.isgibbscomponent(::MySampler)` interface function still exists, but in this version the default has been changed to `true`, so you should not need to overload this.
+
+## Optimisation interface
+
+The Optim.jl interface has been removed (so you cannot call `Optim.optimize` directly on Turing models).
+You can use the `maximum_likelihood` or `maximum_a_posteriori` functions with an Optim.jl solver instead (via Optimization.jl: see https://docs.sciml.ai/Optimization/stable/optimization_packages/optim/ for documentation of the available solvers).
+
+## Internal changes
+
+The constructors of `OptimLogDensity` have been replaced with a single constructor, `OptimLogDensity(::DynamicPPL.LogDensityFunction)`.
+
 # 0.41.4
 
 Fixed a bug where the `check_model=false` keyword argument would not be respected when sampling with multiple threads or cores.
 
@@ -1,6 +1,6 @@
 name = "Turing"
 uuid = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
-version = "0.41.4"
+version = "0.42.0"
 
 [deps]
 ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b"
@@ -41,21 +41,19 @@ StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
 
 [weakdeps]
 DynamicHMC = "bbc10e6e-7c05-544b-b16e-64fede858acb"
-Optim = "429524aa-4258-5aef-a3af-852621145aeb"
 
 [extensions]
 TuringDynamicHMCExt = "DynamicHMC"
-TuringOptimExt = ["Optim", "AbstractPPL"]
 
 [compat]
 ADTypes = "1.9"
-AbstractMCMC = "5.5"
+AbstractMCMC = "5.9"
 AbstractPPL = "0.11, 0.12, 0.13"
 Accessors = "0.1"
-AdvancedHMC = "0.3.0, 0.4.0, 0.5.2, 0.6, 0.7, 0.8"
-AdvancedMH = "0.8"
+AdvancedHMC = "0.8.3"
+AdvancedMH = "0.8.9"
 AdvancedPS = "0.7"
-AdvancedVI = "0.4"
+AdvancedVI = "0.6"
 BangBang = "0.4.2"
 Bijectors = "0.14, 0.15"
 Compat = "4.15.0"
@@ -64,15 +62,14 @@ Distributions = "0.25.77"
 DistributionsAD = "0.6"
 DocStringExtensions = "0.8, 0.9"
 DynamicHMC = "3.4"
-DynamicPPL = "0.38"
+DynamicPPL = "0.39.1"
 EllipticalSliceSampling = "0.5, 1, 2"
 ForwardDiff = "0.10.3, 1"
 Libtask = "0.9.3"
 LinearAlgebra = "1"
 LogDensityProblems = "2"
 MCMCChains = "5, 6, 7"
 NamedArrays = "0.9, 0.10"
-Optim = "1"
 Optimization = "3, 4, 5"
 OptimizationOptimJL = "0.1, 0.2, 0.3, 0.4"
 OrderedCollections = "1"
@@ -89,4 +86,3 @@ julia = "1.10.8"
 
 [extras]
 DynamicHMC = "bbc10e6e-7c05-544b-b16e-64fede858acb"
-Optim = "429524aa-4258-5aef-a3af-852621145aeb"
@@ -43,6 +43,7 @@ even though [`Prior()`](@ref) is actually defined in the `Turing.Inference` modu
 | `prefix`             | [`DynamicPPL.prefix`](@extref)             | Prefix all variable names in a model with a given VarName                                    |
 | `LogDensityFunction` | [`DynamicPPL.LogDensityFunction`](@extref) | A struct containing all information about how to evaluate a model. Mostly for advanced users |
 | `@addlogprob!`       | [`DynamicPPL.@addlogprob!`](@extref)       | Add arbitrary log-probability terms during model evaluation                                  |
+| `setthreadsafe`      | [`DynamicPPL.setthreadsafe`](@extref)      | Mark a model as requiring threadsafe evaluation                                              |
 
 ### Inference
 
@@ -109,12 +110,19 @@ Turing.jl provides several strategies to initialise parameters for models.
 
 See the [docs of AdvancedVI.jl](https://turinglang.org/AdvancedVI.jl/stable/) for detailed usage and the [variational inference tutorial](https://turinglang.org/docs/tutorials/09-variational-inference/) for a basic walkthrough.
 
-| Exported symbol        | Documentation                                     | Description                                                                              |
-|:---------------------- |:------------------------------------------------- |:---------------------------------------------------------------------------------------- |
-| `vi`                   | [`Turing.vi`](@ref)                               | Perform variational inference                                                            |
-| `q_locationscale`      | [`Turing.Variational.q_locationscale`](@ref)      | Find a numerically non-degenerate initialization for a location-scale variational family |
-| `q_meanfield_gaussian` | [`Turing.Variational.q_meanfield_gaussian`](@ref) | Find a numerically non-degenerate initialization for a mean-field Gaussian family        |
-| `q_fullrank_gaussian`  | [`Turing.Variational.q_fullrank_gaussian`](@ref)  | Find a numerically non-degenerate initialization for a full-rank Gaussian family         |
+| Exported symbol               | Documentation                                            | Description                                                                                                                                       |
+|:----------------------------- |:-------------------------------------------------------- |:------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `vi`                          | [`Turing.vi`](@ref)                                      | Perform variational inference                                                                                                                     |
+| `q_locationscale`             | [`Turing.Variational.q_locationscale`](@ref)             | Find a numerically non-degenerate initialization for a location-scale variational family                                                          |
+| `q_meanfield_gaussian`        | [`Turing.Variational.q_meanfield_gaussian`](@ref)        | Find a numerically non-degenerate initialization for a mean-field Gaussian family                                                                 |
+| `q_fullrank_gaussian`         | [`Turing.Variational.q_fullrank_gaussian`](@ref)         | Find a numerically non-degenerate initialization for a full-rank Gaussian family                                                                  |
+| `KLMinRepGradDescent`         | [`Turing.Variational.KLMinRepGradDescent`](@ref)         | KL divergence minimization via stochastic gradient descent with the reparameterization gradient                                                   |
+| `KLMinRepGradProxDescent`     | [`Turing.Variational.KLMinRepGradProxDescent`](@ref)     | KL divergence minimization via stochastic proximal gradient descent with the reparameterization gradient over location-scale variational families |
+| `KLMinScoreGradDescent`       | [`Turing.Variational.KLMinScoreGradDescent`](@ref)       | KL divergence minimization via stochastic gradient descent with the score gradient                                                                |
+| `KLMinWassFwdBwd`             | [`Turing.Variational.KLMinWassFwdBwd`](@ref)             | KL divergence minimization via Wasserstein proximal gradient descent                                                                              |
+| `KLMinNaturalGradDescent`     | [`Turing.Variational.KLMinNaturalGradDescent`](@ref)     | KL divergence minimization via natural gradient descent                                                                                           |
+| `KLMinSqrtNaturalGradDescent` | [`Turing.Variational.KLMinSqrtNaturalGradDescent`](@ref) | KL divergence minimization via natural gradient descent in the square-root parameterization                                                       |
+| `FisherMinBatchMatch`         | [`Turing.Variational.FisherMinBatchMatch`](@ref)         | Covariance-weighted Fisher divergence minimization via the batch-and-match algorithm                                                              |
 
 ### Automatic differentiation types
 
 
@@ -35,9 +35,8 @@ State of the [`DynamicNUTS`](@ref) sampler.
 # Fields
 $(TYPEDFIELDS)
 """
-struct DynamicNUTSState{L,V<:DynamicPPL.AbstractVarInfo,C,M,S}
+struct DynamicNUTSState{L,C,M,S}
     logdensity::L
-    vi::V
     "Cache of sample, log density, and gradient of log density evaluation."
     cache::C
     metric::M
@@ -70,9 +69,8 @@ function Turing.Inference.initialstep(
     Q, _ = DynamicHMC.mcmc_next_step(steps, results.final_warmup_state.Q)
 
     # Create first sample and state.
-    vi = DynamicPPL.unflatten(vi, Q.q)
-    sample = Turing.Inference.Transition(model, vi, nothing)
-    state = DynamicNUTSState(ℓ, vi, Q, steps.H.κ, steps.ϵ)
+    sample = DynamicPPL.ParamsWithStats(Q.q, ℓ)
+    state = DynamicNUTSState(ℓ, Q, steps.H.κ, steps.ϵ)
 
     return sample, state
 end
@@ -85,15 +83,13 @@ function AbstractMCMC.step(
     kwargs...,
 )
     # Compute next sample.
-    vi = state.vi
     ℓ = state.logdensity
     steps = DynamicHMC.mcmc_steps(rng, spl.sampler, state.metric, ℓ, state.stepsize)
     Q, _ = DynamicHMC.mcmc_next_step(steps, state.cache)
 
     # Create next sample and state.
-    vi = DynamicPPL.unflatten(vi, Q.q)
-    sample = Turing.Inference.Transition(model, vi, nothing)
-    newstate = DynamicNUTSState(ℓ, vi, Q, state.metric, state.stepsize)
+    sample = DynamicPPL.ParamsWithStats(Q.q, ℓ)
+    newstate = DynamicNUTSState(ℓ, Q, state.metric, state.stepsize)
 
     return sample, newstate
 end