机器学习专题--Lasso、岭回归、弹性网模型Stata操作专题

弹性网最初由Zou和Hastie (2005)提出，将Lasso扩展为具有惩罚项，该惩罚项是Lasso使用的绝对值惩罚和岭回归使用的平方惩罚的混合体。

与Lasso解相比，弹性网的系数估计对高度相关协变量的存在更加稳健。

对于线性模型，弹性网的惩罚目标函数为

其中是协变量x上的p维系数向量。

给定和的值，估计的是使Q最小化的系数。

与Lasso一样，p可以大于样本量N。

当时，弹性网简化为Lasso。当时，弹性网简化为岭回归。

当时，弹性网像Lasso一样产生稀疏解，其中许多系数估计值正好为零。

当时，即岭回归，所有系数都非零，尽管通常许多系数都很小。

岭回归长期以来一直被用作保持回归模型中高度共线性变量的方法。随着协变量之间相关性的增长，普通最小二乘(OLS)估计器变得越来越不稳定。OLS在高度相关的协变量上产生疯狂的系数估计，这些估计在拟合方面相互抵消。岭回归惩罚消除了这种不稳定性，并产生了可用于预测的点估计。

Stata操作应用案例

命令为elasticnet

语法格式为：


 elasticnet model depvar [(alwaysvars)] othervars [if] [in] [weight] [, options]

选项含义为：

模型类型可以是线性（linear）、逻辑斯蒂（logit）、概率（probit）或泊松（poisson）。
alwaysvars 是始终包含在模型中的变量。
othervars 是elasticnet选择包含在模型中或排除在外的变量。
noconstant 表示抑制常数项
selection(cv[, cv_opts]) 表示使用交叉验证（CV）选择混合参数 alpha* 和套索惩罚参数 lambda*
selection(none) 表示不选择 alpha* 或 lambda*
offset(varname_o) 表示在模型中包含 varname_o，系数限制为1
exposure(varname_e) 表示在模型中包含 ln(varname_e)，系数限制为1（仅泊松模型）
[no]log 表示显示或抑制迭代日志
rseed(#) 表示设置随机数种子
alphas(numlist|matname) 表示使用数值列表或矩阵指定 alpha 网格
grid(#_g[, ratio(#) min(#)]) 表示使用具有 #_g 个网格点的对数网格指定可能的 lambda 集合
crossgrid(augmented) 表示根据需要增强每个 alpha 的 lambda 网格，以产生单一的 lambda 网格；默认
crossgrid(union) 表示使用每个 alpha 的 lambda 网格的联合来产生单一的 lambda 网格
crossgrid(different) 表示为每个 alpha 使用不同的 lambda 网格
stop(#) 表示提前停止 lambda 网格迭代的容忍度
cvtolerance(#) 表示识别 CV 函数最小值的容忍度
tolerance(#) 表示基于其值的系数收敛容忍度
dtolerance(#) 表示基于偏差的系数收敛容忍度
penaltywt(matname) 表示用于指定惩罚项中系数的权重向量
folds(#) 使用 # 折进行 CV
alllambdas 对于网格中的所有 lambda 或直到达到停止(#)容忍度为止拟合模型；默认情况下，CV 函数按 lambda 顺序计算，当识别到最小值时，估计停止

案例：弹性网和数据不是高度相关

使用lasso示例中的示例数据集来拟合一个弹性网模型。它存储了由vl创建的变量列表。有关vl系统的完整描述以及如何使用它来管理大型变量列表，请参阅[D] vl。

加载数据集后，我们输入vl rebuild以再次激活保存的变量列表。

use https://www.stata-press.com/data/r18/fakesurvey_vl 

vl rebuild

结果为：

. vl rebuild
Rebuilding vl macros ...

-------------------------------------------------------------------------------
                  |                      Macro's contents
                  |------------------------------------------------------------
Macro             |  # Vars   Description
------------------+------------------------------------------------------------
System            |
  $vldummy        |      98   0/1 variables
  $vlcategorical  |      16   categorical variables
  $vlcontinuous   |      29   continuous variables
  $vluncertain    |      16   perhaps continuous, perhaps categorical variables
  $vlother        |      12   all missing or constant variables
User              |
  $demographics   |       4   variables
  $factors        |     110   variables
  $idemographics  |           factor-variable list
  $ifactors       |           factor-variable list
-------------------------------------------------------------------------------

我们还使用rseed()选项设置随机数种子，以便我们可以重现我们的结果。

. elasticnet linear q104 $idemographics $ifactors $vlcontinuous, rseed(1234)

结果为：

. elasticnet linear q104 $idemographics $ifactors$vlcontinuous, rseed(1234)

alpha 1 of 3: alpha = 1

10-fold cross-validation with 109 lambdas ...
Grid value 1:     lambda = 1.818102   no. of nonzero coef. =       0
Folds: 1...5....10   CVF = 18.34476
中间结果省略
... cross-validation complete ... minimum found

Elastic net linear model                         No. of obs        =        914
                                                 No. of covariates =        277
Selection: Cross-validation                      No. of CV folds   =         10

-------------------------------------------------------------------------------



    
               |                               No. of      Out-of-      CV mean
               |                              nonzero       sample   prediction
alpha       ID |     Description      lambda    coef.    R-squared        error
---------------+---------------------------------------------------------------
1.000          |
             1 |    first lambda    1.818102        0       0.0016     18.34476
            32 |   lambda before    .1174085       58       0.3543     11.82553
          * 33 | selected lambda    .1069782       64       0.3547     11.81814
            34 |    lambda after    .0974746       66       0.3545      11.8222
            37 |     last lambda    .0737359       80       0.3487     11.92887
---------------+---------------------------------------------------------------
0.750          |
            38 |    first lambda    1.818102        0       0.0016     18.34476
            71 |     last lambda    .0974746      126       0.3473     11.95437
---------------+---------------------------------------------------------------
0.500          |
            72 |    first lambda    1.818102        0       0.0012     18.33643
           102 |     last lambda    .1288556      139       0.3418      12.0549
-------------------------------------------------------------------------------
* alpha and lambda selected by cross-validation.

CV选择，也就是说，普通Lasso的结果。

所有使用这些数据拟合的模型都选择了。数据之间的相关性不足以需要弹性网。

示例 2: 弹性网和数据高度相关

示例1中的数据集fakesurvey vl包含我们在模拟中创建的数据。我们再次进行模拟，将相关性参数设置为更高的值，高达ρ = 0.95，并创建了两组高度相关的变量，不同组之间的变量相关性要低得多。我们将这些数据保存在名为fakesurvey2_vl的新数据集中。弹性网不仅针对高度相关的变量提出，特别是针对高度相关的变量组。

我们加载新数据集并运行vl rebuild。

use https://www.stata-press.com/data/r18/fakesurvey2_vl, clear 

vl rebuild 

. vl rebuild
Rebuilding vl macros ...

-------------------------------------------------------------------------------
                  |                      Macro's contents
                  |------------------------------------------------------------
Macro             |  # Vars   Description
------------------+------------------------------------------------------------
System            |
  $vldummy        |      98   0/1 variables
  $vlcategorical  |      16   categorical variables
  $vlcontinuous   |      29   continuous variables
  $vluncertain    |      16   perhaps continuous, perhaps categorical variables
  $vlother        |      12   all missing or constant variables



    
User              |
  $demographics   |       4   variables
  $factors        |     110   variables
  $idemographics  |           factor-variable list
  $ifactors       |           factor-variable list
-------------------------------------------------------------------------------

.

将数据分成两个相等大小的样本。一个我们将拟合模型，另一个我们将用于测试它们的预测。

我们使用splitsample生成指示样本的变量。

set seed 1234
. splitsample, generate(sample) nsplit(2)
. label define svalues 1 "Training" 2 "Testing"
. label values sample svalues

使用默认的拟合一个elastic-net模型。

elasticnet linear q104 $idemographics $ifactors $vlcontinuous if sample == 1, rseed(1234)

结果为：

. elasticnet linear q104 $idemographics $ifactors$vlcontinuousif sample == 1, rseed
> (1234)  nolog
note: 1.q14 dropped because of collinearity with another variable
note: 1.q136 dropped because of collinearity with another variable
note: 1.q14 dropped because of collinearity with another variable
note: 1.q136 dropped because of collinearity with another variable
note: 1.q14 dropped because of collinearity with another variable
note: 1.q136 dropped because of collinearity with another variable

Elastic net linear model                         No. of obs        =        449
                                                 No. of covariates =        275
Selection: Cross-validation                      No. of CV folds   =         10

-------------------------------------------------------------------------------
               |                               No. of      Out-of-      CV mean
               |                              nonzero       sample   prediction
alpha       ID |     Description      lambda    coef.    R-squared        error
---------------+---------------------------------------------------------------
1.000          |
             1 |    first lambda    6.323778        0       0.0036     26.82324
            42 |     last lambda     .161071       29       0.4339     15.12964
---------------+---------------------------------------------------------------
0.750          |
            43 |    first lambda    6.323778        0       0.0036     26.82324
            82 |     last lambda    .1940106       52       0.4360     15.07523
---------------+---------------------------------------------------------------
0.500          |
            83 |    first lambda    6.323778        0       0.0022     26.78722
           124 |   lambda before     .161071       87       0.4473     14.77189
         * 125 | selected lambda    .1467619       92       0.4476     14.76569
           126 |    lambda after     .133724       96       0.4468     14.78648
           128 |     last lambda      .11102      115       0.4422     14.90808
-------------------------------------------------------------------------------
* alpha and lambda selected by cross-validation.

交叉验证选择的。根据CV，这个值比或更好。

可以为选择的绘制CV函数。

cvplot

CV函数在选定的周围相当平坦。

可以使用lassoknots评估替代的（和替代的）。我们使用lassoknots运行，选项请求显示非零系数的数量（nonzero），以及CV函数（cvmpe）和样本外R2（osr2）的估计。

lassoknots, display(nonzero cvmpe osr2)

 lassoknots, display(nonzero cvmpe osr2)

--------------------------------------------------------
            |              No. of    CV mean     Out-of-
            |             nonzero      pred.      sample
alpha    ID |   lambda      coef.      error   R-squared
------------+-------------------------------------------
1.000       |
         11 | 2.880996          1   25.23369      0.0559
         13 | 2.391853          3   22.54097      0.1567
         15 | 1.985759          4   20.48949      0.2334
         17 | 1.648612          5   19.04514      0.2874
         18 | 1.502154          6   18.49445      0.3080
         20 | 1.247114          7   17.63197      0.3403
         23 | .9433962          8   16.78305      0.3721
         24 | .8595875          9   16.58457      0.3795
         27 | .6502464         11   16.20348      0.3938
         29 |  .539846         13     15.941      0.4036
         31 | .4481896         16   15.67022      0.4137
         32 | .4083737         17   15.53562      0.4188
         34 |  .339039         19   15.32416      0.4267
         35 | .3089197         20   15.23112      0.4301
         36 | .2814761         19   15.16094      0.4328
         37 | .2564706         18   15.10566      0.4348
         38 | .2336864         21   15.07917      0.4358
         40 | .1940106         23   15.10997      0.4347
         41 | .1767752         23   15.12913      0.4340
         41 | .1767752         23   15.12913      0.4340
         42 |  .161071         29   15.12964      0.4339
------------+-------------------------------------------
0.750       |
         49 | 3.841328          4   25.86136      0.0324
         51 | 3.189138          5   23.30185      0.1282
         53 | 2.880996          9   22.09656      0.1733
         54 | 2.625056         13    21.1115      0.2101
         59 | 1.648612         14   17.94825      0.3285
         60 | 1.502154         13   17.57849      0.3423
         63 | 1.136324         14   16.77006      0.3726
         64 | 1.035376         16   16.54948      0.3808
         65 | .9433962         17   16.35695      0.3880
         65 | .9433962         17   16.35695      0.3880
         66 | .8595875         21   16.18638      0.3944
         67 | .7832241         22   16.02176      0.4006
         68 | .7136446         24   15.86705      0.4064
         70 | .5924803         26   15.59497      0.4165
         71 |  .539846         29   15.46632      0.4213
         72 | .4918876         31   15.34795      0.4258
         73 | .4481896         32   15.24319      0.4297
         74 | .4083737         33   15.15827      0.4329
         76 |  .339039         35   15.04362      0.4372
         77 | .3089197         36    15.0178      0.4381
         78 | .2814761         36   15.01447      0.4382
         78 | .2814761         36   15.01447      0.4382
         79 | .2564706         36   15.02567      0.4378
         79 | .2564706         36   15.02567      0.4378



    
         80 | .2336864         39   15.05051      0.4369
         81 | .2129264         47   15.06119      0.4365
         82 | .1940106         52   15.07523      0.4360
------------+-------------------------------------------
0.500       |
         84 | 5.761991          4   26.17688      0.0206
         85 | 5.250112          7   25.12334      0.0600
         86 | 4.783706         14   23.96692      0.1033
         87 | 4.358735         18   22.84471      0.1453
         88 | 4.215852         20   22.46586      0.1595
         89 | 3.841328         24   21.49776      0.1957
         91 | 3.189138         26    19.9409      0.2539
         93 | 2.880996         25    19.2717      0.2790
         96 | 2.179368         28    17.9138      0.3298
         97 | 1.985759         27   17.55144      0.3433
         98 | 1.809349         26     17.227      0.3555
        100 | 1.502154         27   16.69448      0.3754
        102 | 1.247114         29   16.28911      0.3906
        103 | 1.136324         32    16.0984      0.3977
        104 | 1.035376         33   15.92088      0.4043
        104 | 1.035376         33   15.92088      0.4043
        105 | .9433962         35   15.75425      0.4106
        106 | .8595875         37   15.59659      0.4165
        107 | .7832241         39   15.45526      0.4218
        110 | .5924803         38   15.13986      0.4336
        111 |  .539846         41   15.07284      0.4361
        112 | .4918876         42   15.02262      0.4379
        114 | .4083737         45   14.96922      0.4399
        114 | .4083737         45   14.96922      0.4399
        115 | .3720949         46   14.96618      0.4401
        115 | .3720949         46   14.96618      0.4401
        116 |  .339039         47   14.97837      0.4396
        117 | .3089197         52   14.98625      0.4393
        117 | .3089197         52   14.98625      0.4393
        118 | .2814761         59   14.97406      0.4398
        119 | .2564706         65   14.95937      0.4403
        120 | .2336864         70   14.93461      0.4412
        121 | .2129264         73   14.88621      0.4430
        122 | .1940106         80   14.83697      0.4449
        122 | .1940106         80   14.83697      0.4449
        123 | .1767752         86   14.79936      0.4463
        124 |  .161071         87   14.77189      0.4473
      * 125 | .1467619         92   14.76569      0.4476
        126 |  .133724         96   14.78648      0.4468
        126 |  .133724         96   14.78648      0.4468
        127 | .1218443        106   14.84087      0.4447
        128 |   .11102        115   14.90808      0.4422
--------------------------------------------------------
* alpha and lambda selected by cross-validation.

.

当我们检查lassoknots的输出时，我们看到CV函数沿最小和相当平坦。

示例 3: 岭回归

们通过指定alpha(0)来实现这一点。

elasticnet linear q104 $idemographics $ifactors $vlcontinuous if sample == 1, rseed(1234) alpha(0) 

estimates store ridge

结果为：

 elasticnet linear q104 $idemographics $ifactors$vlcontinuousif sample == 1, rseed
> (1234) alpha(0) nolog
note: 1.q14 dropped because of collinearity with another variable
note: 1.q136 dropped because of collinearity with another variable

Elastic net linear model                         No. of obs        =        449
                                                 No. of covariates =        275
Selection: Cross-validation                      No. of CV folds   =         10




    
-------------------------------------------------------------------------------
               |                               No. of      Out-of-      CV mean
               |                              nonzero       sample   prediction
alpha       ID |     Description      lambda    coef.    R-squared        error
---------------+---------------------------------------------------------------
0.000          |
             1 |    first lambda    3161.889      275       0.0036     26.82323
            88 |   lambda before    .9655953      275       0.4387     15.00168
          * 89 | selected lambda    .8798144      275       0.4388     14.99956
            90 |    lambda after    .8016542      275       0.4386     15.00425
           100 |     last lambda    .3161889      275       0.4198     15.50644
-------------------------------------------------------------------------------
* alpha and lambda selected by cross-validation.

.

在这种实现中，岭回归通过CV选择。我们可以绘制CV函数。

. cvplot

示例 4: 比较弹性网、岭回归和Lasso

我们在前几个示例中的一半样本上拟合了弹性网和岭回归，这样我们就可以在另一半样本上评估预测。

让我们继续使用示例2和示例3中的数据，并拟合一个Lasso。

lasso linear q104 $idemographics $ifactors $vlcontinuous if sample == 1, rseed(1234) 
estimates store lasso

结果为：

 lasso linear q104 $idemographics $ifactors$vlcontinuousif sample == 1, rseed(1234
> )  nolog
note: 1.q14 dropped because of collinearity with another variable
note: 1.q136 dropped because of collinearity with another variable

Lasso linear model                          No. of obs        =        449
                                            No. of covariates =        275
Selection: Cross-validation                 No. of CV folds   =         10

--------------------------------------------------------------------------
         |                                No. of      Out-of-      CV mean
         |                               nonzero       sample   prediction
      ID |     Description      lambda     coef.    R-squared        error
---------+----------------------------------------------------------------
       1 |    first lambda    3.161889         0       0.0020     26.67513
      28 |   lambda before    .2564706        18       0.4348     15.10566
    * 29 | selected lambda    .2336864        21       0.4358     15.07917
      30 |    lambda after    .2129264        21       0.4355     15.08812
      33 |     last lambda     .161071        29       0.4339     15.12964
--------------------------------------------------------------------------
* lambda selected by cross-validation.

. 
. estimates store lasso

.

我们使用estimates store将早期的elastic net和ridge的结果存储在内存中。我们对lasso结果也做了同样的事情。现在我们可以使用lassogof比较样本外预测。

. lassogof elasticnet ridge lasso, over(sample)


    


Penalized coefficients
-------------------------------------------------------------
Name             sample |         MSE    R-squared        Obs
------------------------+------------------------------------
elasticnet              |
               Training |     11.4881       0.5568        489
                Testing |    14.57795       0.5030        504
------------------------+------------------------------------
ridge                   |
               Training |    11.82482       0.5576        449
                Testing |    14.88123       0.4809        476
------------------------+------------------------------------
lasso                   |
               Training |    13.41709       0.4823        506
                Testing |    14.91674       0.4867        513
-------------------------------------------------------------

. 


Penalized coefficients
-------------------------------------------------------------
Name             sample |         MSE    R-squared        Obs
------------------------+------------------------------------
elasticnet              |
               Training |     11.4881       0.5568        489
                Testing |    14.57795       0.5030        504
------------------------+------------------------------------
ridge                   |
               Training |    11.82482       0.5576        449
                Testing |    14.88123       0.4809        476
------------------------+------------------------------------
lasso                   |
               Training |    13.41709       0.4823        506
                Testing |    14.91674       0.4867        513
-------------------------------------------------------------

.

结果为：



Penalized coefficients
-------------------------------------------------------------
Name             sample |         MSE    R-squared        Obs
------------------------+------------------------------------
elasticnet              |
               Training |     11.4881       0.5568        489
                Testing |    14.57795       0.5030        504
------------------------+------------------------------------
ridge                   |
               Training |    11.82482       0.5576        449
                Testing |    14.88123       0.4809        476
------------------------+------------------------------------
lasso                   |
               Training |    13.41709       0.4823        506
                Testing |    14.91674       0.4867        513
-------------------------------------------------------------

. 
. lassogof elasticnet ridge lasso, over(sample)

Penalized coefficients
-------------------------------------------------------------
Name             sample |         MSE    R-squared        Obs
------------------------+------------------------------------
elasticnet              |
               Training |     11.4881       0.5568        489
                Testing |    14.57795       0.5030        504
------------------------+------------------------------------
ridge                   |
               Training |    11.82482       0.5576        449
                Testing |    14.88123       0.4809        476
------------------------+------------------------------------
lasso                   |
               Training |    13.41709       0.4823        506
                Testing |    14.91674       0.4867        513
-------------------------------------------------------------

.

基于均方误差和R2，弹性网的样本外预测优于岭回归和Lasso。

请注意，训练和测试样本的观测数对于每个模型略有不同。splitsample将样本精确地一分为二，每一半样本有529个观测。模型之间的样本大小不同是因为不同模型包含不同的选定变量集；因此，缺失值的模式不同。如果你想在丢弃缺失值后使半样本完全相等，可以使用可选的varlist与splitsample一起使用，以省略这些变量中的任何缺失值。有关详细信息，请参见[D] splitsample。

在我们得出弹性网胜过岭回归和Lasso的结论之前，必须指出，我们对Lasso并不公平。理论指出，对于Lasso线性模型，后选择系数提供稍好的预测。有关详细信息，请参见[LASSO] lasso后估计中的predict。

我们再次为lasso结果运行lassogof，这次指定使用后选择系数。

lassogof lasso, over(sample) postselection

结果为：


lassogof lasso, over(sample) postselection 

Postselection coefficients
-------------------------------------------------------------
Name             sample |         MSE    R-squared        Obs



    
------------------------+------------------------------------
lasso                   |
               Training |    13.14487       0.4928        506
                Testing |    14.62903       0.4966        513
-------------------------------------------------------------

.

我们宣布与弹性网打成平手！

不应将后选择系数用于elasticnet，特别是岭回归。岭回归通过收缩系数估计来工作，这些估计是应该用于预测的估计。因为后选择系数是选定系数的OLS回归系数，并且由于岭回归总是选择所有变量，所以岭回归后的后选择系数是所有潜在变量的OLS回归系数，这显然是我们不想用于预测的。