Full covariance vs variance NLL
Each study’s V determines which branch of the NLL is
used. admixr2 auto-detects the method:
| V structure | Auto-detected method
|
NLL cost |
|---|---|---|
| Non-diagonal matrix | "cov" |
O(n_t³) Cholesky solve |
| Diagonal matrix | "var" |
O(n_t) element-wise |
| Plain vector (variances only) | "var" |
O(n_t) element-wise |
Override by setting method explicitly in the study
list:
# Force full covariance NLL even for a diagonal V
study_cov <- c(study, list(method = "cov"))
# Diagonal approximation when you only have marginal variances
study_var <- list(
E = E,
V = diag(diag(V)), # drop off-diagonal entries
n = n,
times = times,
ev = rxode2::et(amt = 100),
method = "var"
)Use "cov" when you have a full covariance matrix and
expect temporal correlation to inform parameter estimates. Use
"var" when only marginal variances are available (e.g. from
a published table of means and SDs) or when runtime is a priority.
Gradient modes
admControl() and adfoControl() offer
gradient strategies via the grad argument.
adghControl() offers "analytical",
"fd", "cfd", and "none".
adirmcControl() offers "analytical",
"none", and "fd" only (no "sens"
or "cfd").
admControl() gradient modes:
grad = |
Method | Notes |
|---|---|---|
"sens" |
Sensitivity equations (default) | Analytical; requires ODE or linCmt() model |
"fd" |
Forward finite differences | Falls back to this if sens unavailable |
"cfd" |
Central finite differences | More accurate FD; ~2× slower than "fd"
|
"none" |
BOBYQA (derivative-free) | No gradient; useful for debugging or simple models |
adfoControl() gradient modes:
grad = |
Method | Notes |
|---|---|---|
"none" (default) |
BOBYQA | Derivative-free; robust starting point for FO |
"analytical" |
Chain rule through V_pred | Omega/sigma analytical; struct thetas FD only |
"fd" |
Forward FD of full NLL | All parameters; n_p + 1 NLL evals per step |
"cfd" |
Central FD of full NLL | More accurate; 2 n_p NLL evals per step |
adghControl() gradient modes:
grad = |
Method | Notes |
|---|---|---|
"analytical" (default) |
Closed-form contractions through sensitivity equations | Exact and cheapest; one batched sensitivity solve per study over the node grid |
"fd" |
Forward FD of full NLL | All parameters |
"cfd" |
Central FD of full NLL | More accurate; ~2× slower than "fd"
|
"none" |
BOBYQA (derivative-free) | No gradient |
Because the GH objective is noise-free
(deterministic quadrature, no MC draws), its analytical gradient is
exact and the resulting Hessian is well-conditioned —
"analytical" is the recommended default for
adgh.
For adfo, all gradient modes still use the sensitivity
model inside each NLL evaluation to compute J in a single rxSolve —
grad controls only how the optimizer gradient is
formed.
# Analytical gradient (default; fastest for ODE models)
fit_sens <- nlmixr2(pk_model, admData(), est = "admc",
control = admControl(studies = list(s = study), grad = "sens",
n_sim = 5000L, seed = 1L))
# Derivative-free — ignores maxeval; use nloptr stopping criteria instead
fit_bobyqa <- nlmixr2(pk_model, admData(), est = "admc",
control = admControl(studies = list(s = study), grad = "none",
n_sim = 5000L, seed = 1L))"sens" is recommended for all standard ODE and
linCmt() models. If the sensitivity model is unavailable
(e.g. rare ODE features), admixr2 falls back to
"fd" automatically with a warning.
Mu-referencing and sensitivity equations
Mu-referencing is the nlmixr2 convention of expressing each structural parameter as a fixed effect plus a random effect through a known back-transformation:
cl <- exp(tcl + eta.cl) # mu-referenced: tcl paired with eta.cladmixr2 uses this pairing to classify parameters into
three cases, each handled differently by grad = "sens":
Paired (mu-referenced) parameters — rxode2 augments
the ODE system with sensitivity equations d(pred)/d(eta_i)
for each random effect. Because tcl and eta.cl
enter additively on the log scale, the gradient of the NLL with respect
to tcl equals the gradient with respect to
eta.cl. Both are recovered from a single ODE solve, with no
extra function evaluations.
Non-mu-referenced parameters — when a structural
parameter and its random effect are written separately
(e.g. cl <- exp(tcl) * exp(eta.cl)),
admixr2 cannot recover d(NLL)/d(tcl) from the
sensitivity equations alone. The sensitivity model is still used for all
etas, omega, and sigma; only the unpaired structural parameters require
a targeted CRN finite difference solve. A warning is issued but
grad = "sens" is kept.
Parameters without a random effect — structural
parameters with no corresponding eta
(e.g. v2 <- exp(tv2) with IIV dropped from V2) are
always handled by CRN finite differences. The perturbation uses the same
quasi-random seed as the nominal draw so that Monte Carlo noise largely
cancels in the difference.
IRMC kappa correction
The IRMC estimator shifts IS weights by moving the proposal mean when
a structural parameter changes. This works directly for mu-referenced
parameters (e.g. cl <- exp(tcl + eta.cl)) because the
weight shift is a closed-form function of the back-transformed
ratio.
For non-mu-referenced parameters — structural thetas
without a paired random effect (e.g. ka <- exp(tka) with
no eta.ka) — changing tka during inner
optimisation alters the population mean prediction
f(theta, 0) without a corresponding IS weight path. The
kappa correction accounts for this by adding
kappa = f(theta_cand, 0) - f(theta_outer, 0) to the
predicted mean, keeping the inner NLL anchored to the true prediction at
the candidate point. Kappa is zero when all struct thetas are
mu-referenced.
Two methods control how f(theta_cand, 0) is evaluated
during inner optimisation:
-
"exact"(default): re-evaluatesf(theta_cand, 0)via a single rxSolve at each inner NLL evaluation. Exact; costs one extra rxSolve per inner step. -
"linearized": precomputesJ = df/d(theta)once per outer iteration via a small FD batch. Approximateskappa_fn(theta_cand) ≈ f0 + J %*% (theta_cand - theta0)— pure arithmetic per inner step, zero extra rxSolve calls.
"exact" is accurate but adds an rxSolve to every inner
NLL call. "linearized" is faster and the approximation is
good when inner steps stay small relative to the outer box constraint —
which is typical for converged phases. Prefer "linearized"
for complex ODE models where each rxSolve is expensive.
adirmcControl(..., kappa_method = "exact") # default
adirmcControl(..., kappa_method = "linearized") # faster for complex ODE modelsParallel restarts
Multi-restart fitting guards against local optima.
workers > 1 runs restarts in parallel;
admixr2 manages the worker pool internally:
fit_par <- nlmixr2(
pk_model, admData(), est = "admc",
control = admControl(
studies = list(examplomycin = study),
n_sim = 5000L,
n_restarts = 4L,
workers = 4L, # fork on Unix/macOS; PSOCK cluster on Windows
cores = 8L, # total rxSolve OpenMP threads across workers
restart_sd = 0.3, # SD of log-scale perturbation from the starting point
seed = 1L
)
)cores is distributed automatically:
floor(total_cores / n_workers) threads per worker, with
remainder allocated cyclically (e.g. 13 cores / 4 workers → 4, 3, 3, 3).
Workers are stopped after the restart phase so all cores are freed for
the covariance Hessian step. Call admStopWorkers() manually
if a fit is interrupted before cleanup.
Quasi-random sampling
The sampling argument controls how eta samples are
drawn:
sampling = |
Method |
|---|---|
"sobol" |
Sobol sequence (default) |
"halton" |
Halton sequence |
"torus" |
Kronecker torus |
"lhs" |
Latin hypercube |
"rnorm" |
Plain normal |
Sobol sequences typically require 2–5× fewer samples than plain
normal draws for equivalent NLL variance. For small n_sim,
"lhs" provides better coverage than Sobol.
Parameter uncertainty
covMethod = "r" (default) computes a numerical Hessian
after optimisation and reports standard errors in
print(fit). Important: these SEs are
computed for structural and residual-error parameters only; omega (IIV)
SEs are not computed (matching nlmixr2 FOCEI behavior). A larger
cov_n_sim reduces MC noise in the Hessian:
fit_cov <- nlmixr2(
pk_model, admData(), est = "admc",
control = admControl(
studies = list(examplomycin = study),
n_sim = 5000L,
cov_n_sim = 10000L, # more samples → lower Hessian noise
cov_h_outer = 2.5e-3, # outer FD step scale (default: eps^(1/5) ≈ 2.5e-3)
covMethod = "r",
seed = 1L
)
)Set covMethod = "none" to skip uncertainty estimation
during early model development when you only need the point
estimates:
admControl(..., covMethod = "none")If the Hessian is non-positive-definite (SEs printed as
NA), increase cov_h_outer or
cov_n_sim.
Model comparison: AIC and BIC
Standard information criteria work directly on admFit
objects. AIC and BIC are comparable only across estimators that
evaluate the same likelihood. admc,
adgh, and adirmc all target the exact
aggregate MVN likelihood (MC, quadrature, and importance-weighted
estimates of the same integral), so their information criteria are
mutually comparable. adfo evaluates a linearised
likelihood that differs from these, so it must not be compared against
the other three.
Here we compare the full model (IIV on all five parameters) against a reduced model with IIV on CL and V1 only:
ctl <- admControl(
studies = list(examplomycin = study),
n_sim = 5000L,
cov_n_sim = 10000L,
maxeval = 300L,
seed = 1L
)
fit_full <- nlmixr2(pk_model, admData(), est = "admc", control = ctl)
#> [====|====|====|====|====|====|====|====|====|====] 0:00:07
fit_reduced <- nlmixr2(pk_reduced, admData(), est = "admc", control = ctl)
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#>
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#>
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:04
#>
#> [====|====|====|====|====|====|====|====|====|====] 0:00:07
AIC(fit_full, fit_reduced)
#> df AIC
#> fit_full 11 -3668.835
#> fit_reduced 8 -3528.356
BIC(fit_full, fit_reduced)
#> df BIC
#> fit_full 11 -3598.305
#> fit_reduced 8 -3477.061Lower AIC/BIC favours the more parsimonious model; a difference > 10 is generally considered strong evidence.
