Group lasso

β^λ=arg⁡min⁡β∥Y−Xβ∥22+λ∑g=1G∥βIg∥2,\hat{\bm \beta}_\lambda = \arg \min_{\bm \beta} \| \bm Y - \bm X \bm \beta \|_2^2 + \lambda \sum_{g=1}^G \|\bm \beta_{\mathcal{I}_g}\|_2,β^​λ​=argβmin​∥Y−Xβ∥22​+λg=1∑G​∥βIg​​∥2​,
where Ig\mathcal{I}_gIg​ is the index set belonging to the gggth group of variables, g=1,…,Gg=1,\ldots,Gg=1,…,G.

  • This penalty can be viewed as an intermediate between the ℓ1\ell_1ℓ1​ and ℓ2\ell_2ℓ2​-type penalty.

The ℓ1\ell_1ℓ1​-penalty treats the three coordinate directions differently from other directions, and this encourages sparsity in individual coefficients. The ℓ2\ell_2ℓ2​-penalty treats all directions equally and does not encourage sparsity. The group lasso encourages sparsity at the factor level.

  • The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations (transformations)\textcolor{red}{\text{\small invariant under groupwise orthogonal reparameterizations (transformations)}}invariant under groupwise orthogonal reparameterizations (transformations), like ridge regression.

Group LARS

∥M∥2,1=∑i=1d∥M∥2\|\bm M \|_{2,1}= \sum_{i=1}^d \| \bm M\|_2∥M∥2,1​=i=1∑d​∥M∥2​

Group non-negative garrotte

group lasso group LARS group non-negative garrotte
performance excellent comparable
computational efficiency intensive in large scale problems quickly fastest
applicability sub-optimal when p→np \rightarrow np→n, not applicable when p>np>np>n


Elastic net: Under elastic net, highly correlated features will receive similar weightings. This grouping effect occurs as a result of strict convexity from the ℓ2\ell_2ℓ2​ norm.


  1. Yuan, Ming, and Yi Lin. “Model selection and estimation in regression with grouped variables.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68.1 (2006): 49-67.
  2. Zou, Hui, and Trevor Hastie. “Regularization and variable selection via the elastic net.” Journal of the royal statistical society: series B (statistical methodology) 67.2 (2005): 301-320.

