机器学习专题--Lasso、岭回归、弹性网模型Stata操作专题 弹性网最初由Zou和Hastie (2005)提出,将Lasso扩展为具有惩罚项,该惩罚项是Lasso使用的绝对值惩罚和岭回归使用的平方惩罚的混合体。
与Lasso解相比,弹性网的系数估计对高度相关协变量的存在更加稳健。
对于线性模型,弹性网的惩罚目标函数为
其中 是协变量x上的p维系数向量。
给定 和 的值,估计的 是使Q最小化的系数。
与Lasso一样,p可以大于样本量N。
当 时,弹性网简化为Lasso。当
时,弹性网简化为岭回归。
当 时,弹性网像Lasso一样产生稀疏解,其中许多系数估计值正好为零。
当 时,即岭回归,所有系数都非零,尽管通常许多系数都很小。
岭回归长期以来一直被用作保持回归模型中高度共线性变量的方法。随着协变量之间相关性的增长,普通最小二乘(OLS)估计器变得越来越不稳定。OLS在高度相关的协变量上产生疯狂的系数估计,这些估计在拟合方面相互抵消。岭回归惩罚消除了这种不稳定性,并产生了可用于预测的点估计。
Stata操作应用案例 命令为elasticnet
语法格式为:
elasticnet model depvar [(alwaysvars)] othervars [ if ] [ in ] [weight] [, options]
选项含义为:
模型类型可以是线性(linear)、逻辑斯蒂(logit)、概率(probit)或泊松(poisson)。
othervars 是elasticnet选择包含在模型中或排除在外的变量。
selection(cv[, cv_opts]) 表示使用交叉验证(CV)选择混合参数 alpha* 和套索惩罚参数 lambda*
selection(none) 表示不选择 alpha* 或 lambda*
offset(varname_o) 表示在模型中包含 varname_o,系数限制为1
exposure(varname_e) 表示在模型中包含 ln(varname_e),系数限制为1(仅泊松模型)
alphas(numlist|matname) 表示使用数值列表或矩阵指定 alpha 网格
grid(#_g[, ratio(#) min(#)]) 表示使用具有 #_g 个网格点的对数网格指定可能的 lambda 集合
crossgrid(augmented) 表示根据需要增强每个 alpha 的 lambda 网格,以产生单一的 lambda 网格;默认
crossgrid(union) 表示使用每个 alpha 的 lambda 网格的联合来产生单一的 lambda 网格
crossgrid(different) 表示为每个 alpha 使用不同的 lambda 网格
stop(#) 表示提前停止 lambda 网格迭代的容忍度
cvtolerance(#) 表示识别 CV 函数最小值的容忍度
tolerance(#) 表示基于其值的系数收敛容忍度
dtolerance(#) 表示基于偏差的系数收敛容忍度
penaltywt(matname) 表示用于指定惩罚项中系数的权重向量
alllambdas 对于网格中的所有 lambda 或直到达到停止(#)容忍度为止拟合模型;默认情况下,CV 函数按 lambda 顺序计算,当识别到最小值时,估计停止
案例:弹性网和数据不是高度相关 使用lasso示例中的示例数据集来拟合一个弹性网模型。它存储了由vl创建的变量列表。有关vl系统的完整描述以及如何使用它来管理大型变量列表,请参阅[D] vl。
加载数据集后,我们输入vl rebuild以再次激活保存的变量列表。
use https://www.stata-press.com/data/r18/fakesurvey_vl vl rebuild
结果为:
. vl rebuild Rebuilding vl macros ... ------------------------------------------------------------------------------- | Macro 's contents |------------------------------------------------------------ Macro | # Vars Description ------------------+------------------------------------------------------------ System | $vldummy | 98 0/1 variables $vlcategorical | 16 categorical variables $vlcontinuous | 29 continuous variables $vluncertain | 16 perhaps continuous, perhaps categorical variables $vlother | 12 all missing or constant variables User | $demographics | 4 variables $factors | 110 variables $idemographics | factor-variable list $ifactors | factor-variable list -------------------------------------------------------------------------------
我们还使用rseed()选项设置随机数种子,以便我们可以重现我们的结果。
我们还使用rseed()选项设置随机数种子,以便我们可以重现我们的结果。
. elasticnet linear q104 $idemographics $ifactors $vlcontinuous, rseed(1234)
结果为:
. elasticnet linear q104 $idemographics $ifactors $vlcontinuous , rseed(1234) alpha 1 of 3: alpha = 1 10-fold cross-validation with 109 lambdas ... Grid value 1: lambda = 1.818102 no. of nonzero coef. = 0 Folds: 1...5....10 CVF = 18.34476 中间结果省略 ... cross-validation complete ... minimum found Elastic net linear model No. of obs = 914 No. of covariates = 277 Selection: Cross-validation No. of CV folds = 10 -------------------------------------------------------------------------------
| No. of Out-of- CV mean | nonzero sample prediction alpha ID | Description lambda coef. R-squared error ---------------+--------------------------------------------------------------- 1.000 | 1 | first lambda 1.818102 0 0.0016 18.34476 32 | lambda before .1174085 58 0.3543 11.82553 * 33 | selected lambda .1069782 64 0.3547 11.81814 34 | lambda after .0974746 66 0.3545 11.8222 37 | last lambda .0737359 80 0.3487 11.92887 ---------------+--------------------------------------------------------------- 0.750 | 38 | first lambda 1.818102 0 0.0016 18.34476 71 | last lambda .0974746 126 0.3473 11.95437 ---------------+--------------------------------------------------------------- 0.500 | 72 | first lambda 1.818102 0 0.0012 18.33643 102 | last lambda .1288556 139 0.3418 12.0549 ------------------------------------------------------------------------------- * alpha and lambda selected by cross-validation.
CV选择 ,也就是说,普通Lasso的结果。
所有使用这些数据拟合的模型都选择了 。数据之间的相关性不足以需要弹性网。
示例 2: 弹性网和数据高度相关 示例1中的数据集fakesurvey vl包含我们在模拟中创建的数据。我们再次进行模拟,将相关性参数设置为更高的值,高达ρ = 0.95,并创建了两组高度相关的变量,不同组之间的变量相关性要低得多。我们将这些数据保存在名为fakesurvey2_vl的新数据集中。弹性网不仅针对高度相关的变量提出,特别是针对高度相关的变量组。
我们加载新数据集并运行vl rebuild。
use https://www.stata-press.com/data/r18/fakesurvey2_vl, clear vl rebuild . vl rebuild Rebuilding vl macros ... ------------------------------------------------------------------------------- | Macro's contents |------------------------------------------------------------ Macro | # Vars Description ------------------+------------------------------------------------------------ System | $vldummy | 98 0/1 variables $vlcategorical | 16 categorical variables $vlcontinuous | 29 continuous variables $vluncertain | 16 perhaps continuous, perhaps categorical variables $vlother | 12 all missing or constant variables
User | $demographics | 4 variables $factors | 110 variables $idemographics | factor-variable list $ifactors | factor-variable list ------------------------------------------------------------------------------- .
将数据分成两个相等大小的样本。一个我们将拟合模型,另一个我们将用于测试它们的预测。
我们使用splitsample生成指示样本的变量。
set seed 1234 . splitsample, generate(sample) nsplit(2) . label define svalues 1 "Training" 2 "Testing" . label values sample svalues
使用默认的 拟合一个elastic-net模型。
elasticnet linear q104 $idemographics $ifactors $vlcontinuous if sample == 1, rseed(1234)
结果为:
. elasticnet linear q104 $idemographics $ifactors $vlcontinuous if sample == 1, rseed > (1234) nolog note: 1.q14 dropped because of collinearity with another variable note: 1.q136 dropped because of collinearity with another variable note: 1.q14 dropped because of collinearity with another variable note: 1.q136 dropped because of collinearity with another variable note: 1.q14 dropped because of collinearity with another variable note: 1.q136 dropped because of collinearity with another variable Elastic net linear model No. of obs = 449 No. of covariates = 275 Selection: Cross-validation No. of CV folds = 10 ------------------------------------------------------------------------------- | No. of Out-of- CV mean | nonzero sample prediction alpha ID | Description lambda coef. R-squared error ---------------+--------------------------------------------------------------- 1.000 | 1 | first lambda 6.323778 0 0.0036 26.82324 42 | last lambda .161071 29 0.4339 15.12964 ---------------+--------------------------------------------------------------- 0.750 | 43 | first lambda 6.323778 0 0.0036 26.82324 82 | last lambda .1940106 52 0.4360 15.07523 ---------------+--------------------------------------------------------------- 0.500 | 83 | first lambda 6.323778 0 0.0022 26.78722 124 | lambda before .161071 87 0.4473 14.77189 * 125 | selected lambda .1467619 92 0.4476 14.76569 126 | lambda after .133724 96 0.4468 14.78648 128 | last lambda .11102 115 0.4422 14.90808 ------------------------------------------------------------------------------- * alpha and lambda selected by cross-validation.
交叉验证选择的
。根据CV,这个值比 或 更好。
可以为选择的 绘制CV函数。
cvplot
CV函数在选定的 周围相当平坦。
可以使用lassoknots评估替代的 (和替代的 )。我们使用lassoknots运行,选项请求显示非零系数的数量(nonzero),以及CV函数(cvmpe)和样本外R2(osr2)的估计。
lassoknots, display(nonzero cvmpe osr2) lassoknots, display(nonzero cvmpe osr2) -------------------------------------------------------- | No. of CV mean Out-of- | nonzero pred. sample alpha ID | lambda coef. error R-squared ------------+------------------------------------------- 1.000 | 11 | 2.880996 1 25.23369 0.0559 13 | 2.391853 3 22.54097 0.1567 15 | 1.985759 4 20.48949 0.2334 17 | 1.648612 5 19.04514 0.2874 18 | 1.502154 6 18.49445 0.3080 20 | 1.247114 7 17.63197 0.3403 23 | .9433962 8 16.78305 0.3721 24 | .8595875 9 16.58457 0.3795 27 | .6502464 11 16.20348 0.3938 29 | .539846 13 15.941 0.4036 31 | .4481896 16 15.67022 0.4137 32 | .4083737 17 15.53562 0.4188 34 | .339039 19 15.32416 0.4267 35 | .3089197 20 15.23112 0.4301 36 | .2814761 19 15.16094 0.4328 37 | .2564706 18 15.10566 0.4348 38 | .2336864 21 15.07917 0.4358 40 | .1940106 23 15.10997 0.4347 41 | .1767752 23 15.12913 0.4340 41 | .1767752 23 15.12913 0.4340 42 | .161071 29 15.12964 0.4339 ------------+------------------------------------------- 0.750 | 49 | 3.841328 4 25.86136 0.0324 51 | 3.189138 5 23.30185 0.1282 53 | 2.880996 9 22.09656 0.1733 54 | 2.625056 13 21.1115 0.2101 59 | 1.648612 14 17.94825 0.3285 60 | 1.502154 13 17.57849 0.3423 63 | 1.136324 14 16.77006 0.3726 64 | 1.035376 16 16.54948 0.3808 65 | .9433962 17 16.35695 0.3880 65 | .9433962 17 16.35695 0.3880 66 | .8595875 21 16.18638 0.3944 67 | .7832241 22 16.02176 0.4006 68 | .7136446 24 15.86705 0.4064 70 | .5924803 26 15.59497 0.4165 71 | .539846 29 15.46632 0.4213 72 | .4918876 31 15.34795 0.4258 73 | .4481896 32 15.24319 0.4297 74 | .4083737 33 15.15827 0.4329 76 | .339039 35 15.04362 0.4372 77 | .3089197 36 15.0178 0.4381 78 | .2814761 36 15.01447 0.4382 78 | .2814761 36 15.01447 0.4382 79 | .2564706 36 15.02567 0.4378 79 | .2564706 36 15.02567 0.4378
80 | .2336864 39 15.05051 0.4369 81 | .2129264 47 15.06119 0.4365 82 | .1940106 52 15.07523 0.4360 ------------+------------------------------------------- 0.500 | 84 | 5.761991 4 26.17688 0.0206 85 | 5.250112 7 25.12334 0.0600 86 | 4.783706 14 23.96692 0.1033 87 | 4.358735 18 22.84471 0.1453 88 | 4.215852 20 22.46586 0.1595 89 | 3.841328 24 21.49776 0.1957 91 | 3.189138 26 19.9409 0.2539 93 | 2.880996 25 19.2717 0.2790 96 | 2.179368 28 17.9138 0.3298 97 | 1.985759 27 17.55144 0.3433 98 | 1.809349 26 17.227 0.3555 100 | 1.502154 27 16.69448 0.3754 102 | 1.247114 29 16.28911 0.3906 103 | 1.136324 32 16.0984 0.3977 104 | 1.035376 33 15.92088 0.4043 104 | 1.035376 33 15.92088 0.4043 105 | .9433962 35 15.75425 0.4106 106 | .8595875 37 15.59659 0.4165 107 | .7832241 39 15.45526 0.4218 110 | .5924803 38 15.13986 0.4336 111 | .539846 41 15.07284 0.4361 112 | .4918876 42 15.02262 0.4379 114 | .4083737 45 14.96922 0.4399 114 | .4083737 45 14.96922 0.4399 115 | .3720949 46 14.96618 0.4401 115 | .3720949 46 14.96618 0.4401 116 | .339039 47 14.97837 0.4396 117 | .3089197 52 14.98625 0.4393 117 | .3089197 52 14.98625 0.4393 118 | .2814761 59 14.97406 0.4398 119 | .2564706 65 14.95937 0.4403 120 | .2336864 70 14.93461 0.4412 121 | .2129264 73 14.88621 0.4430 122 | .1940106 80 14.83697 0.4449 122 | .1940106 80 14.83697 0.4449 123 | .1767752 86 14.79936 0.4463 124 | .161071 87 14.77189 0.4473 * 125 | .1467619 92 14.76569 0.4476 126 | .133724 96 14.78648 0.4468 126 | .133724 96 14.78648 0.4468 127 | .1218443 106 14.84087 0.4447 128 | .11102 115 14.90808 0.4422 -------------------------------------------------------- * alpha and lambda selected by cross-validation. .
当我们检查lassoknots的输出时,我们看到CV函数沿最小 和 相当平坦。
示例 3: 岭回归 们通过指定alpha(0)来实现这一点。
elasticnet linear q104 $idemographics $ifactors $vlcontinuous if sample == 1, rseed(1234) alpha(0) estimates store ridge
结果为:
elasticnet linear q104 $idemographics $ifactors $vlcontinuous if sample == 1, rseed > (1234) alpha(0) nolog note: 1.q14 dropped because of collinearity with another variable note: 1.q136 dropped because of collinearity with another variable Elastic net linear model No. of obs = 449 No. of covariates = 275 Selection: Cross-validation No. of CV folds = 10
------------------------------------------------------------------------------- | No. of Out-of- CV mean | nonzero sample prediction alpha ID | Description lambda coef. R-squared error ---------------+--------------------------------------------------------------- 0.000 | 1 | first lambda 3161.889 275 0.0036 26.82323 88 | lambda before .9655953 275 0.4387 15.00168 * 89 | selected lambda .8798144 275 0.4388 14.99956 90 | lambda after .8016542 275 0.4386 15.00425 100 | last lambda .3161889 275 0.4198 15.50644 ------------------------------------------------------------------------------- * alpha and lambda selected by cross-validation. .
在这种实现中,岭回归通过CV选择 。我们可以绘制CV函数。
. cvplot
示例 4: 比较弹性网、岭回归和Lasso 我们在前几个示例中的一半样本上拟合了弹性网和岭回归,这样我们就可以在另一半样本上评估预测。
让我们继续使用示例2和示例3中的数据,并拟合一个Lasso。
lasso linear q104 $idemographics $ifactors $vlcontinuous if sample == 1, rseed(1234) estimates store lasso
结果为:
lasso linear q104 $idemographics $ifactors $vlcontinuous if sample == 1, rseed(1234 > ) nolog note: 1.q14 dropped because of collinearity with another variable note: 1.q136 dropped because of collinearity with another variable Lasso linear model No. of obs = 449 No. of covariates = 275 Selection: Cross-validation No. of CV folds = 10 -------------------------------------------------------------------------- | No. of Out-of- CV mean | nonzero sample prediction ID | Description lambda coef. R-squared error ---------+---------------------------------------------------------------- 1 | first lambda 3.161889 0 0.0020 26.67513 28 | lambda before .2564706 18 0.4348 15.10566 * 29 | selected lambda .2336864 21 0.4358 15.07917 30 | lambda after .2129264 21 0.4355 15.08812 33 | last lambda .161071 29 0.4339 15.12964 -------------------------------------------------------------------------- * lambda selected by cross-validation. . . estimates store lasso .
我们使用estimates store将早期的elastic net和ridge的结果存储在内存中。我们对lasso结果也做了同样的事情。现在我们可以使用lassogof比较样本外预测。
. lassogof elasticnet ridge lasso, over(sample)
Penalized coefficients ------------------------------------------------------------- Name sample | MSE R-squared Obs ------------------------+------------------------------------ elasticnet | Training | 11.4881 0.5568 489 Testing | 14.57795 0.5030 504 ------------------------+------------------------------------ ridge | Training | 11.82482 0.5576 449 Testing | 14.88123 0.4809 476 ------------------------+------------------------------------ lasso | Training | 13.41709 0.4823 506 Testing | 14.91674 0.4867 513 ------------------------------------------------------------- . Penalized coefficients ------------------------------------------------------------- Name sample | MSE R-squared Obs ------------------------+------------------------------------ elasticnet | Training | 11.4881 0.5568 489 Testing | 14.57795 0.5030 504 ------------------------+------------------------------------ ridge | Training | 11.82482 0.5576 449 Testing | 14.88123 0.4809 476 ------------------------+------------------------------------ lasso | Training | 13.41709 0.4823 506 Testing | 14.91674 0.4867 513 ------------------------------------------------------------- .
结果为:
Penalized coefficients ------------------------------------------------------------- Name sample | MSE R-squared Obs ------------------------+------------------------------------ elasticnet | Training | 11.4881 0.5568 489 Testing | 14.57795 0.5030 504 ------------------------+------------------------------------ ridge | Training | 11.82482 0.5576 449 Testing | 14.88123 0.4809 476 ------------------------+------------------------------------ lasso | Training | 13.41709 0.4823 506 Testing | 14.91674 0.4867 513 ------------------------------------------------------------- . . lassogof elasticnet ridge lasso, over(sample) Penalized coefficients ------------------------------------------------------------- Name sample | MSE R-squared Obs ------------------------+------------------------------------ elasticnet | Training | 11.4881 0.5568 489 Testing | 14.57795 0.5030 504 ------------------------+------------------------------------ ridge | Training | 11.82482 0.5576 449 Testing | 14.88123 0.4809 476 ------------------------+------------------------------------ lasso | Training | 13.41709 0.4823 506 Testing | 14.91674 0.4867 513 ------------------------------------------------------------- .
基于均方误差和R2,弹性网的样本外预测优于岭回归和Lasso。
请注意,训练和测试样本的观测数对于每个模型略有不同。splitsample将样本精确地一分为二,每一半样本有529个观测。模型之间的样本大小不同是因为不同模型包含不同的选定变量集;因此,缺失值的模式不同。如果你想在丢弃缺失值后使半样本完全相等,可以使用可选的varlist与splitsample一起使用,以省略这些变量中的任何缺失值。有关详细信息,请参见[D] splitsample。
在我们得出弹性网胜过岭回归和Lasso的结论之前,必须指出,我们对Lasso并不公平。理论指出,对于Lasso线性模型,后选择系数提供稍好的预测。有关详细信息,请参见[LASSO] lasso后估计中的predict。
我们再次为lasso结果运行lassogof,这次指定使用后选择系数。
lassogof lasso, over(sample) postselection
结果为:
lassogof lasso, over(sample) postselection Postselection coefficients ------------------------------------------------------------- Name sample | MSE R-squared Obs
------------------------+------------------------------------ lasso | Training | 13.14487 0.4928 506 Testing | 14.62903 0.4966 513 ------------------------------------------------------------- .
我们宣布与弹性网打成平手!
不应将后选择系数用于elasticnet,特别是岭回归。岭回归通过收缩系数估计来工作,这些估计是应该用于预测的估计。因为后选择系数是选定系数的OLS回归系数,并且由于岭回归总是选择所有变量,所以岭回归后的后选择系数是所有潜在变量的OLS回归系数,这显然是我们不想用于预测的。