Full covariance vs variance NLL
Each study’s V determines which branch of the NLL is
used. admixr2 auto-detects the method:
| V structure | Auto-detected method
|
NLL cost |
|---|---|---|
| Non-diagonal matrix | "cov" |
O(n_t³) Cholesky solve |
| Diagonal matrix | "var" |
O(n_t) element-wise |
| Plain vector (variances only) | "var" |
O(n_t) element-wise |
Override by setting method explicitly in the study
list:
# Force full covariance NLL even for a diagonal V
study_cov <- c(study, list(method = "cov"))
# Diagonal approximation when you only have marginal variances
study_var <- list(
E = E,
V = diag(diag(V)), # drop off-diagonal entries
n = n,
times = times,
ev = rxode2::et(amt = 100),
method = "var"
)Use "cov" when you have a full covariance matrix and
expect temporal correlation to inform parameter estimates. Use
"var" when only marginal variances are available (e.g. from
a published table of means and SDs) or when runtime is a priority.
Gradient modes
admControl() offers four gradient strategies via the
grad argument:
grad = |
Method | Notes |
|---|---|---|
"sens" |
Sensitivity equations (default) | Analytical; requires ODE or linCmt() model |
"fd" |
Forward finite differences | Falls back to this if sens unavailable |
"cfd" |
Central finite differences | More accurate FD; ~2× slower than "fd"
|
"none" |
BOBYQA (derivative-free) | No gradient; useful for debugging or simple models |
# Analytical gradient (default; fastest for ODE models)
fit_sens <- nlmixr2(pk_model, admData(), est = "admc",
control = admControl(studies = list(s = study), grad = "sens",
n_sim = 5000L, seed = 1L))
# Derivative-free — ignores maxeval; use nloptr stopping criteria instead
fit_bobyqa <- nlmixr2(pk_model, admData(), est = "admc",
control = admControl(studies = list(s = study), grad = "none",
n_sim = 5000L, seed = 1L))"sens" is recommended for all standard ODE and
linCmt() models. If the sensitivity model is unavailable
(e.g. rare ODE features), admixr2 falls back to
"fd" automatically with a warning.
Mu-referencing and sensitivity equations
Mu-referencing is the nlmixr2 convention of expressing each structural parameter as a fixed effect plus a random effect through a known back-transformation:
cl <- exp(tcl + eta.cl) # mu-referenced: tcl paired with eta.cladmixr2 uses this pairing to classify parameters into
three cases, each handled differently by grad = "sens":
Paired (mu-referenced) parameters — rxode2 augments
the ODE system with sensitivity equations d(pred)/d(eta_i)
for each random effect. Because tcl and eta.cl
enter additively on the log scale, the gradient of the NLL with respect
to tcl equals the gradient with respect to
eta.cl. Both are recovered from a single ODE solve, with no
extra function evaluations.
Non-mu-referenced parameters — when a structural
parameter and its random effect are written separately
(e.g. cl <- exp(tcl) * exp(eta.cl)),
admixr2 cannot recover d(NLL)/d(tcl) from the
sensitivity equations alone. The sensitivity model is still used for all
etas, omega, and sigma; only the unpaired structural parameters require
a targeted CRN finite difference solve. A warning is issued but
grad = "sens" is kept.
Parameters without a random effect — structural
parameters with no corresponding eta
(e.g. v2 <- exp(tv2) with IIV dropped from V2) are
always handled by CRN finite differences. The perturbation uses the same
quasi-random seed as the nominal draw so that Monte Carlo noise largely
cancels in the difference.
IRMC kappa correction
The IRMC estimator draws proposals from an inflated omega. For nonlinear models, the mean of simulated predictions does not equal the prediction at eta = 0 — a Jensen’s inequality gap. The kappa correction shifts the predicted mean to account for this bias:
-
"first-order"(default): baseline is the population predictionf(theta, 0). Fast; sufficient for most models. -
"second-order": adds a curvature term(1/2) * sum_k(omega_k * d²f/deta_k²). More accurate for strongly nonlinear models.
adirmcControl(..., kappa_method = "second-order")Parallel restarts
Multi-restart fitting guards against local optima.
workers > 1 runs restarts in parallel;
admixr2 manages the worker pool internally:
fit_par <- nlmixr2(
pk_model, admData(), est = "admc",
control = admControl(
studies = list(examplomycin = study),
n_sim = 5000L,
n_restarts = 4L,
workers = 4L, # fork on Unix/macOS; PSOCK cluster on Windows
cores = 8L, # total rxSolve OpenMP threads across workers
restart_sd = 0.3, # SD of log-scale perturbation from the starting point
seed = 1L
)
)cores is distributed automatically:
floor(total_cores / n_workers) threads per worker, with
remainder allocated cyclically (e.g. 13 cores / 4 workers → 4, 3, 3, 3).
Workers are stopped after the restart phase so all cores are freed for
the covariance Hessian step. Call admStopWorkers() manually
if a fit is interrupted before cleanup.
Quasi-random sampling
The sampling argument controls how eta samples are
drawn:
sampling = |
Method |
|---|---|
"sobol" |
Sobol sequence (default) |
"halton" |
Halton sequence |
"torus" |
Kronecker torus |
"lhs" |
Latin hypercube |
"rnorm" |
Plain normal |
Sobol sequences typically require 2–5× fewer samples than plain
normal draws for equivalent NLL variance. For small n_sim,
"lhs" provides better coverage than Sobol.
Parameter uncertainty
covMethod = "r" (default) computes a numerical Hessian
after optimisation and reports standard errors in
print(fit). A larger cov_n_sim reduces MC
noise in the Hessian:
fit_cov <- nlmixr2(
pk_model, admData(), est = "admc",
control = admControl(
studies = list(examplomycin = study),
n_sim = 5000L,
cov_n_sim = 10000L, # more samples → lower Hessian noise
cov_h_outer = 2.5e-3, # outer FD step scale (default: eps^(1/5) ≈ 2.5e-3)
covMethod = "r",
seed = 1L
)
)Set covMethod = "none" to skip uncertainty estimation
during early model development when you only need the point
estimates:
admControl(..., covMethod = "none")If the Hessian is non-positive-definite (SEs printed as
NA), increase cov_h_outer or
cov_n_sim.
Model comparison: AIC and BIC
Standard information criteria work directly on admFit
objects. Here we compare the full model (IIV on all five parameters)
against a reduced model with IIV on CL and V1 only:
ctl <- admControl(
studies = list(examplomycin = study),
n_sim = 5000L,
cov_n_sim = 10000L,
maxeval = 300L,
seed = 1L
)
fit_full <- nlmixr2(pk_model, admData(), est = "admc", control = ctl)
#> [====|====|====|====|====|====|====|====|====|====] 0:00:07
fit_reduced <- nlmixr2(pk_reduced, admData(), est = "admc", control = ctl)
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#>
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#>
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> [====|====|====|====|====|====|====|====|====|====] 0:00:04
#>
#> [====|====|====|====|====|====|====|====|====|====] 0:00:08
AIC(fit_full, fit_reduced)
#> df AIC
#> fit_full 11 -3659.832
#> fit_reduced 8 -3519.403
BIC(fit_full, fit_reduced)
#> df BIC
#> fit_full 11 -3589.302
#> fit_reduced 8 -3468.108Lower AIC/BIC favours the more parsimonious model; a difference > 10 is generally considered strong evidence.