论文标题
C-lasso-用于约束稀疏和稳健回归和分类的Python软件包
c-lasso -- a Python package for constrained sparse and robust regression and classification
论文作者
论文摘要
我们介绍了C-Lasso,这是一个Python软件包,可通过线性相等性约束实现稀疏且稳健的线性回归和分类。假定基本统计前向模型的形式为:\ [y = xβ+σε\ qquad \ qquad \ textrm {约束} \ qquadcβ= 0 \],$ x \ in \ in \ in \ mathbb {r}^{r}^{n \ times d} $ y \ y y y \ y y \ y y \ y \ y \ y \ y \ y \ y \ In \ mathbb {r}^{n} $是连续或二进制响应向量。矩阵$ c $是一般约束矩阵。向量$β\ in \ mathbb {r}^{d} $包含未知系数和未知量表的$σ$。显着用例是(稀疏)对数对比度回归,并带有组成数据$ x $,需要约束$ 1_D^tβ= 0 $(Aitchion和Bacon-Shone 1984)和广义套索,这是所描述的问题的特殊情况(例如,参见(例如,James,Paulson和Rusmevichientong 2020202020202020),3)。 C-LASSO软件包提供了估计器,以推断出形式\ [\ [\ min_ {β\ in \ Mathbb {r}^d,r}^d,permin_ {β\ in \ in \ mathbb {r MathBB { λ\ left \ lvertβ\ right \ rvert_1 \ qquad \ textrm {byf} \ qquadcβ= 0 \]对于多个凸损耗函数$ f(\ cdot,\ cdot)$。这包括受约束的拉索,受约束的缩放套索和具有线性平等约束的稀疏Huber m估计器。
We introduce c-lasso, a Python package that enables sparse and robust linear regression and classification with linear equality constraints. The underlying statistical forward model is assumed to be of the following form: \[ y = X β+ σε\qquad \textrm{subject to} \qquad Cβ=0 \] Here, $X \in \mathbb{R}^{n\times d}$is a given design matrix and the vector $y \in \mathbb{R}^{n}$ is a continuous or binary response vector. The matrix $C$ is a general constraint matrix. The vector $β\in \mathbb{R}^{d}$ contains the unknown coefficients and $σ$ an unknown scale. Prominent use cases are (sparse) log-contrast regression with compositional data $X$, requiring the constraint $1_d^T β= 0$ (Aitchion and Bacon-Shone 1984) and the Generalized Lasso which is a special case of the described problem (see, e.g, (James, Paulson, and Rusmevichientong 2020), Example 3). The c-lasso package provides estimators for inferring unknown coefficients and scale (i.e., perspective M-estimators (Combettes and Müller 2020a)) of the form \[ \min_{β\in \mathbb{R}^d, σ\in \mathbb{R}_{0}} f\left(Xβ- y,σ \right) + λ\left\lVert β\right\rVert_1 \qquad \textrm{subject to} \qquad Cβ= 0 \] for several convex loss functions $f(\cdot,\cdot)$. This includes the constrained Lasso, the constrained scaled Lasso, and sparse Huber M-estimators with linear equality constraints.