Fits mixqr over a range of component counts and scores them by an
information criterion or cross-validated check loss.
Usage
mixqr_select(
formula,
data,
tau = 0.5,
m = 1:4,
criterion = c("BIC", "AIC", "cv"),
engine = "ald",
error_density = c("unequal", "equal"),
folds = 5L,
weights = NULL,
nstart = 20L,
control = mixqr_control()
)Arguments
- formula, data, tau, weights
Passed to
mixqr().- m
Integer vector of component counts to try. Default
1:4.- criterion
"BIC"(default),"AIC", or"cv".- engine
Engine to use (
"ald"or"kdEM"). For"AIC"/"BIC"the score is always computed with the ALD working likelihood.- error_density
For the kdEM engine.
- folds
Number of CV folds when
criterion = "cv". Default5.- nstart
Multi-starts per fit.
- control
A
mixqr_control()list.
Value
A list with table (data frame of scores by m), best (chosen m),
criterion, and fit (the chosen model refit on all data).
Details
For criterion %in% c("AIC", "BIC") the score uses the parametric ALD
working-likelihood (the semiparametric kdEM engine has no global likelihood;
Wu & Yao 2016, p. 164). Note that AIC/BIC for the number of mixture
components is heuristic: testing m vs m+1 places a component on the
parameter-space boundary (pi_j = 0), so the usual penalty asymptotics do not
hold (McLachlan & Peel 2000, ch. 6; Chen, Chen & Kalbfleisch 2001). The "cv"
criterion (cross-validated held-out check loss) avoids the likelihood entirely
and works for either engine.