Estimates a finite mixture of tau-quantile regressions (clusterwise quantile
regression) at a single quantile level. Two engines are available: a fast
parametric asymmetric-Laplace mixture ("ald", genuine likelihood and
AIC/BIC) and the kernel-density EM of Wu & Yao (2016) ("kdEM", nonparametric
component error densities constrained to have their tau-th quantile = 0).
Usage
mixqr(
formula,
data,
tau = 0.5,
m = 2L,
family = c("quantile", "expectile", "mquantile"),
engine = "ald",
error_density = c("unequal", "equal"),
init = c("ald", "kmeans", "random", "manual"),
nstart = 20L,
control = mixqr_control(),
weights = NULL,
manual_init = NULL,
variance = c("none", "sparsity", "stochEM"),
vcontrol = mixqr_vcontrol(),
...
)Arguments
- formula
A model formula
y ~ x1 + x2(intercept implied).- data
A data frame.
- tau
Quantile level in (0, 1). Default
0.5.- m
Number of mixture components (>= 1). Default
2.- family
Component-loss family:
"quantile"(default; check loss, the Wu & Yao model),"expectile"(asymmetric least squares, Newey & Powell 1987 – a smooth, crossing-free location device), or"mquantile"(asymmetric Huber, Breckling & Chambers 1988 – a robust expectile/quantile blend). A non-quantile family selects the matching engine.- engine
"ald"(default) or"kdEM", or the name of a custom engine registered viaregister_mixqr_engine()(e.g."expectile","mquantile").- error_density
For
"kdEM":"unequal"(per-component densities, eq. 2.5) or"equal"(pooled density, eq. 2.8).- init
Initialisation strategy:
"ald"(default; ALD pre-fit seeds the kdEM engine),"kmeans","random", or"manual".- nstart
Number of multi-start initialisations (the mixture likelihood is multimodal). Default
20.- control
A
mixqr_control()list.- weights
Optional prior observation weights.
- manual_init
Optional
n x mresponsibility matrix forinit = "manual".- variance
Standard-error method:
"none"(default),"sparsity"(eq. 3.3) or"stochEM"(Algorithm 3.1 multiple imputation).- vcontrol
A
mixqr_vcontrol()list forvariance = "stochEM".- ...
Reserved for engine extensions; currently unused.
Bias under asymmetric errors (Wu & Yao sec.6)
The semiparametric (kdEM) estimator solves estimating equations whose score
I(a <= 0) - tau is not orthogonal to the nuisance tangent space of the
unknown error densities, so it can be BIASED when component error densities are
asymmetric and the clusters have imbalanced overlap (Wu & Yao 2016, sec.6,
Fig.6). Watch fit$diagnostics$overlap (responsibility entropy; larger = more
overlap = more bias risk) and cross-check against the parametric ald engine.
Well-separated clusters and (near-)symmetric errors are the safe regime.
Standard errors
Sparsity SEs (variance = "sparsity", eq.3.3) are CONDITIONAL on the fitted
classification and understate uncertainty (they are flagged as such by
summary()). Use variance = "stochEM" (the stochastic-EM multiple-imputation
estimator, Algorithm 3.1) for inference; it propagates classification and
mixing uncertainty.
Examples
set.seed(1)
d <- sim_mixqr2(n = 300)
fit <- mixqr(y ~ x, data = d, tau = 0.5, m = 2, engine = "ald", nstart = 10)
fit
#> Mixture of quantile regressions (mixqr)
#> engine: ald tau = 0.5 components m = 2 n = 300
#> converged: TRUE in 14 iterations
#>
#> Mixing probabilities (pi):
#> comp1 comp2
#> 0.5413 0.4587
#>
#> Component coefficients (beta):
#> comp1 comp2
#> (Intercept) 9.9792 -10.6760
#> x -10.7094 12.0896
#>
#> logLik = -834.810 AIC = 1683.62 BIC = 1709.55
coef(fit)
#> comp1 comp2
#> (Intercept) 9.979185 -10.67597
#> x -10.709414 12.08964