Skip to contents

Fits mixqr over a range of component counts and scores them by an information criterion or cross-validated check loss.

Usage

mixqr_select(
  formula,
  data,
  tau = 0.5,
  m = 1:4,
  criterion = c("BIC", "AIC", "cv"),
  engine = "ald",
  error_density = c("unequal", "equal"),
  folds = 5L,
  weights = NULL,
  nstart = 20L,
  control = mixqr_control()
)

Arguments

formula, data, tau, weights

Passed to mixqr().

m

Integer vector of component counts to try. Default 1:4.

criterion

"BIC" (default), "AIC", or "cv".

engine

Engine to use ("ald" or "kdEM"). For "AIC"/"BIC" the score is always computed with the ALD working likelihood.

error_density

For the kdEM engine.

folds

Number of CV folds when criterion = "cv". Default 5.

nstart

Multi-starts per fit.

control

A mixqr_control() list.

Value

A list with table (data frame of scores by m), best (chosen m), criterion, and fit (the chosen model refit on all data).

Details

For criterion %in% c("AIC", "BIC") the score uses the parametric ALD working-likelihood (the semiparametric kdEM engine has no global likelihood; Wu & Yao 2016, p. 164). Note that AIC/BIC for the number of mixture components is heuristic: testing m vs m+1 places a component on the parameter-space boundary (pi_j = 0), so the usual penalty asymptotics do not hold (McLachlan & Peel 2000, ch. 6; Chen, Chen & Kalbfleisch 2001). The "cv" criterion (cross-validated held-out check loss) avoids the likelihood entirely and works for either engine.