Fit a finite mixture of quantile regressions

Estimates a finite mixture of tau-quantile regressions (clusterwise quantile regression) at a single quantile level. Two engines are available: a fast parametric asymmetric-Laplace mixture ("ald", genuine likelihood and AIC/BIC) and the kernel-density EM of Wu & Yao (2016) ("kdEM", nonparametric component error densities constrained to have their tau-th quantile = 0).

Usage

mixqr(
  formula,
  data,
  tau = 0.5,
  m = 2L,
  family = c("quantile", "expectile", "mquantile"),
  engine = "ald",
  error_density = c("unequal", "equal"),
  init = c("ald", "kmeans", "random", "manual"),
  nstart = 20L,
  control = mixqr_control(),
  weights = NULL,
  manual_init = NULL,
  variance = c("none", "sparsity", "stochEM"),
  vcontrol = mixqr_vcontrol(),
  ...
)

Arguments

formula: A model formula y ~ x1 + x2 (intercept implied).
data: A data frame.
tau: Quantile level in (0, 1). Default 0.5.
m: Number of mixture components (>= 1). Default 2.
family: Component-loss family: "quantile" (default; check loss, the Wu & Yao model), "expectile" (asymmetric least squares, Newey & Powell 1987 – a smooth, crossing-free location device), or "mquantile" (asymmetric Huber, Breckling & Chambers 1988 – a robust expectile/quantile blend). A non-quantile family selects the matching engine.
engine: "ald" (default) or "kdEM", or the name of a custom engine registered via register_mixqr_engine() (e.g. "expectile", "mquantile").
error_density: For "kdEM": "unequal" (per-component densities, eq. 2.5) or "equal" (pooled density, eq. 2.8).
init: Initialisation strategy: "ald" (default; ALD pre-fit seeds the kdEM engine), "kmeans", "random", or "manual".
nstart: Number of multi-start initialisations (the mixture likelihood is multimodal). Default 20.
control: A mixqr_control() list.
weights: Optional prior observation weights.
manual_init: Optional n x m responsibility matrix for init = "manual".
variance: Standard-error method: "none" (default), "sparsity" (eq. 3.3) or "stochEM" (Algorithm 3.1 multiple imputation).
vcontrol: A mixqr_vcontrol() list for variance = "stochEM".
...: Reserved for engine extensions; currently unused.

Value

An object of class "mixqr".

Bias under asymmetric errors (Wu & Yao sec.6)

The semiparametric (kdEM) estimator solves estimating equations whose score I(a <= 0) - tau is not orthogonal to the nuisance tangent space of the unknown error densities, so it can be BIASED when component error densities are asymmetric and the clusters have imbalanced overlap (Wu & Yao 2016, sec.6, Fig.6). Watch fit$diagnostics$overlap (responsibility entropy; larger = more overlap = more bias risk) and cross-check against the parametric ald engine. Well-separated clusters and (near-)symmetric errors are the safe regime.

Standard errors

Sparsity SEs (variance = "sparsity", eq.3.3) are CONDITIONAL on the fitted classification and understate uncertainty (they are flagged as such by summary()). Use variance = "stochEM" (the stochastic-EM multiple-imputation estimator, Algorithm 3.1) for inference; it propagates classification and mixing uncertainty.

References

Wu, Q. and Yao, W. (2016). Mixtures of quantile regressions. CSDA 93, 162–176.

Examples

set.seed(1)
d <- sim_mixqr2(n = 300)
fit <- mixqr(y ~ x, data = d, tau = 0.5, m = 2, engine = "ald", nstart = 10)
fit
#> Mixture of quantile regressions (mixqr)
#>   engine: ald   tau = 0.5   components m = 2   n = 300
#>   converged: TRUE in 14 iterations
#> 
#> Mixing probabilities (pi):
#>  comp1  comp2 
#> 0.5413 0.4587 
#> 
#> Component coefficients (beta):
#>                comp1    comp2
#> (Intercept)   9.9792 -10.6760
#> x           -10.7094  12.0896
#> 
#> logLik = -834.810   AIC = 1683.62   BIC = 1709.55
coef(fit)
#>                  comp1     comp2
#> (Intercept)   9.979185 -10.67597
#> x           -10.709414  12.08964