
Generate aggregate study data from (possibly different) pharmacometric models
Source:R/datagen.R
datagen.RdGenerates population mean vectors (E) and covariance matrices
(V) for each study by integrating over the IIV distribution – either by
Monte Carlo (the default) or by a deterministic First-Order expansion
(method = "fo", see datagenControl()). Each study may specify its own PK/PD model (as would be the
case when digitising data from several published studies, each fit with a
different structural model). True parameter values are taken from the
ini() block of each study's model. Each element of the returned list
is ready to supply directly to admControl(studies = ...).
Usage
datagen(studies, model = NULL, control = datagenControl())Arguments
- studies
A named list of study specifications. Each element is a list with:
modelAn nlmixr2-style model function with
ini()andmodel()blocks. Serves as the data-generating model for this study. May differ between studies. Can be omitted if a top-level default is supplied via themodelargument.timesNumeric vector of observation times.
evA dosing event table created with
rxode2::et().n(Optional) integer sample size; stored as metadata and used when supplying the result to
admControl().
- model
Optional default model function used for any study that does not supply its own
modelelement. At least one ofmodelor each study'smodelmust be non-NULL.- control
A
datagenControl()object.
Value
A named list with one element per study. Each element contains:
EPopulation mean vector at
times.VPopulation covariance matrix (
length(times)xlength(times); ML denominatorn_simformethod = "mc", the analytical FO covariance formethod = "fo", or the GH weighted covariance formethod = "gh"). The diagonal carries the model's residual-error variance; to generate residual-free (IIV-only) moments, omit the error term from the model.nSample size (
NA_integer_if not supplied).timesObservation times.
evDosing event table.
samplesRaw
n_sim x length(times)prediction matrix (only whencontrol$return_samples = TRUE).
Details
With control = datagenControl(method = "mc") (the default) population
moments are computed via the same Monte Carlo engine as est = "admc":
$$E_t = \bar{f}_s(\hat\theta_s, \eta_i, t)$$
$$V_{ts} = \widehat{\mathrm{Cov}}_\eta[f_{s,t}, f_{s,s'}] + \Sigma_s$$
where \(f_s\) and \(\hat\theta_s\) are the model and initial estimates
from the ini() block of study \(s\), the sample covariance uses the
ML denominator n_sim, and \(\Sigma_s\) is diagonal with entries
determined by that study model's residual error type (additive, proportional,
or log-normal).
With method = "fo" the moments are instead the deterministic First-Order
expansion used by est = "adfo":
$$E = f_s(\hat\theta_s, 0)$$
$$V = J \Omega_s J^\top + \Sigma_s, \quad J_{tj} = \partial f_{s,t}/\partial \eta_j |_{\eta = 0}$$
with the Jacobian \(J\) obtained from the sensitivity model (or finite
differences if that is unavailable). This is the natural choice for design
evaluation and optimal design: the moments are fast and reproducible, and
because the data-generating and data-analytic models coincide, the FO Hessian
of the log-likelihood (the expected information matrix) is evaluated at the
true maximum rather than at a point that is not an MLE of the generated data.
Note est = "adfo" always adds \(\Sigma\) to its predicted covariance, so
for a consistent FIM keep the residual error in the generating model; omit it
only when residual-free (IIV-only) moments are genuinely what you want.
With method = "gh" the moments are computed by deterministic
Gauss-Hermite quadrature over the random-effects prior \(\eta \sim N(0, \Omega)\):
$$E = \sum_q w_q f(\hat\theta, \eta_q), \quad V = \sum_q w_q (f_q - E)(f_q - E)^\top + \Sigma$$
where \((\eta_q, w_q)\) are the Cholesky-scaled tensor-product GH nodes and
weights. Unlike FO this is unbiased at any IIV magnitude; unlike MC the result
is noise-free and exactly reproducible. Matching the moments of est = "adgh"
makes method = "gh" the natural choice for optimal design with that estimator.
Models are compiled and cached on first use (keyed by model expression digest), so repeated calls or multiple studies sharing the same model incur only a single compilation.
Examples
# \donttest{
library(rxode2)
pk_model <- function() {
ini({
tcl <- log(5); tv <- log(30)
prop.sd <- c(0, 0.2)
eta.cl ~ 0.09; eta.v ~ 0.04
})
model({
cl <- exp(tcl + eta.cl)
v <- exp(tv + eta.v)
d/dt(central) <- -(cl/v) * central
cp <- central / v
cp ~ prop(prop.sd)
})
}
study_data <- datagen(
studies = list(
study1 = list(times = c(1, 2, 4, 8, 12, 24),
ev = rxode2::et(amt = 100), n = 200L)
),
model = pk_model,
control = datagenControl(n_sim = 2000L)
)
#>
#>
#> ℹ parameter labels from comments are typically ignored in non-interactive mode
#> ℹ Need to run with the source intact to parse comments
# E and V plug directly into admControl(studies = ...)
round(study_data$study1$E, 2)
#> 1 2 4 8 12 24
#> 2.83 2.37 1.68 0.88 0.48 0.10
# }