ML之LiR&LassoR:利用boston房价数据集(PCA处理)采用线性回归和Lasso套索回归算法实现房价预测模型评估

目录

利用boston房价数据集(PCA处理)采用线性回归和Lasso套索回归算法实现房价预测模型评估

设计思路

输出结果

核心代码


利用boston房价数据集(PCA处理)采用线性回归和Lasso套索回归算法实现房价预测模型评估

设计思路

更新……

输出结果

   Id  MSSubClass MSZoning  ...  SaleType  SaleCondition SalePrice
0   1          60       RL  ...        WD         Normal    208500
1   2          20       RL  ...        WD         Normal    181500
2   3          60       RL  ...        WD         Normal    223500
3   4          70       RL  ...        WD        Abnorml    140000
4   5          60       RL  ...        WD         Normal    250000[5 rows x 81 columns]
numeric_columns 36 ['LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold', 'SalePrice']
(1460, 36)LotFrontage  LotArea  OverallQual  ...  MoSold  YrSold  SalePrice
0         65.0     8450            7  ...       2    2008     208500
1         80.0     9600            6  ...       5    2007     181500
2         68.0    11250            7  ...       9    2008     223500
3         60.0     9550            7  ...       2    2006     140000
4         84.0    14260            8  ...      12    2008     250000依次统计每列缺失值元素个数: 36 [259, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Missing_data_Per_dict_0: (33, 0.9167, {'LotArea': 0.0, 'OverallQual': 0.0, 'OverallCond': 0.0, 'YearBuilt': 0.0, 'YearRemodAdd': 0.0, 'BsmtFinSF1': 0.0, 'BsmtFinSF2': 0.0, 'BsmtUnfSF': 0.0, 'TotalBsmtSF': 0.0, '1stFlrSF': 0.0, '2ndFlrSF': 0.0, 'LowQualFinSF': 0.0, 'GrLivArea': 0.0, 'BsmtFullBath': 0.0, 'BsmtHalfBath': 0.0, 'FullBath': 0.0, 'HalfBath': 0.0, 'BedroomAbvGr': 0.0, 'KitchenAbvGr': 0.0, 'TotRmsAbvGrd': 0.0, 'Fireplaces': 0.0, 'GarageCars': 0.0, 'GarageArea': 0.0, 'WoodDeckSF': 0.0, 'OpenPorchSF': 0.0, 'EnclosedPorch': 0.0, '3SsnPorch': 0.0, 'ScreenPorch': 0.0, 'PoolArea': 0.0, 'MiscVal': 0.0, 'MoSold': 0.0, 'YrSold': 0.0, 'SalePrice': 0.0})
Missing_data_Per_dict_Not0: (3, 0.0833, {'LotFrontage': 0.177397, 'MasVnrArea': 0.005479, 'GarageYrBlt': 0.055479})
Missing_data_Per_dict_under01: (2, 0.0556, {'MasVnrArea': 0.005479, 'GarageYrBlt': 0.055479})
依次计算每列缺失值元素占比: {'LotFrontage': 0.177397, 'MasVnrArea': 0.005479, 'GarageYrBlt': 0.055479}
data_Missing_dict {'LotFrontage': 0.1773972602739726, 'LotArea': 0.0, 'OverallQual': 0.0, 'OverallCond': 0.0, 'YearBuilt': 0.0, 'YearRemodAdd': 0.0, 'MasVnrArea': 0.005479452054794521, 'BsmtFinSF1': 0.0, 'BsmtFinSF2': 0.0, 'BsmtUnfSF': 0.0, 'TotalBsmtSF': 0.0, '1stFlrSF': 0.0, '2ndFlrSF': 0.0, 'LowQualFinSF': 0.0, 'GrLivArea': 0.0, 'BsmtFullBath': 0.0, 'BsmtHalfBath': 0.0, 'FullBath': 0.0, 'HalfBath': 0.0, 'BedroomAbvGr': 0.0, 'KitchenAbvGr': 0.0, 'TotRmsAbvGrd': 0.0, 'Fireplaces': 0.0, 'GarageYrBlt': 0.05547945205479452, 'GarageCars': 0.0, 'GarageArea': 0.0, 'WoodDeckSF': 0.0, 'OpenPorchSF': 0.0, 'EnclosedPorch': 0.0, '3SsnPorch': 0.0, 'ScreenPorch': 0.0, 'PoolArea': 0.0, 'MiscVal': 0.0, 'MoSold': 0.0, 'YrSold': 0.0, 'SalePrice': 0.0}
after dropna (1121, 36)
<class 'numpy.ndarray'>LotFrontage   LotArea  OverallQual  ...    MiscVal    MoSold    YrSold
0       -0.233570 -0.205885     0.570704  ...  -0.141407 -1.615345  0.153084
1        0.384834 -0.064358    -0.153825  ...  -0.141407 -0.498715 -0.596291
2       -0.109889  0.138702     0.570704  ...  -0.141407  0.990125  0.153084
3       -0.439705 -0.070512     0.570704  ...  -0.141407 -1.615345 -1.345665
4        0.549742  0.509132     1.295234  ...  -0.141407  2.106755  0.153084
...           ...       ...          ...  ...        ...       ...       ...
1116    -0.357251 -0.271480    -0.153825  ...  -0.141407  0.617915 -0.596291
1117     0.590968  0.375605    -0.153825  ...  -0.141407 -1.615345  1.651832
1118    -0.192343 -0.133030     0.570704  ...  14.947388 -0.498715  1.651832
1119    -0.109889 -0.049960    -0.878355  ...  -0.141407 -0.870925  1.651832
1120     0.178699 -0.022885    -0.878355  ...  -0.141407 -0.126505  0.153084[1121 rows x 35 columns]
前10个主成分解释了数据中63.80%的变化
经过PCA后,进行第一层主成分分析-------------------------------------
[(0.16970682313415306, 'LotFrontage'), (0.1211669980146095, 'LotArea'), (0.3008665261375608, 'OverallQual'), (-0.1017783758120348, 'OverallCond'), (0.23754113423286216, 'YearBuilt'), (0.21067267847804322, 'YearRemodAdd'), (0.19125461510335365, 'MasVnrArea'), (0.14136511574315347, 'BsmtFinSF1'), (-0.013552848692716916, 'BsmtFinSF2'), (0.11439764110410199, 'BsmtUnfSF'), (0.259354275741638, 'TotalBsmtSF'), (0.2591780447881022, '1stFlrSF'), (0.11504305093601253, '2ndFlrSF'), (0.004231304806602964, 'LowQualFinSF'), (0.2877802164879641, 'GrLivArea'), (0.08317879411803167, 'BsmtFullBath'), (-0.02114280846249704, 'BsmtHalfBath'), (0.25499633884283257, 'FullBath'), (0.11080279874459822, 'HalfBath'), (0.1017767099777179, 'BedroomAbvGr'), (-0.01012145139988125, 'KitchenAbvGr'), (0.23572236584667458, 'TotRmsAbvGrd'), (0.17611466785004926, 'Fireplaces'), (0.23726651555979883, 'GarageYrBlt'), (0.2831568046802727, 'GarageCars'), (0.279827792756442, 'GarageArea'), (0.13036585867815073, 'WoodDeckSF'), (0.16664693092097654, 'OpenPorchSF'), (-0.08602539908222213, 'EnclosedPorch'), (0.010532579475601184, '3SsnPorch'), (0.02556170369869493, 'ScreenPorch'), (0.06246570190310543, 'PoolArea'), (-0.015493399959318557, 'MiscVal'), (0.028399126033275164, 'MoSold'), (-0.011129722622237775, 'YrSold')]
[(0.3008665261375608, 'OverallQual'), (0.2877802164879641, 'GrLivArea'), (0.2831568046802727, 'GarageCars'), (0.279827792756442, 'GarageArea'), (0.259354275741638, 'TotalBsmtSF'), (0.2591780447881022, '1stFlrSF'), (0.25499633884283257, 'FullBath'), (0.23754113423286216, 'YearBuilt'), (0.23726651555979883, 'GarageYrBlt'), (0.23572236584667458, 'TotRmsAbvGrd'), (0.21067267847804322, 'YearRemodAdd'), (0.19125461510335365, 'MasVnrArea'), (0.17611466785004926, 'Fireplaces'), (0.16970682313415306, 'LotFrontage'), (0.16664693092097654, 'OpenPorchSF'), (0.14136511574315347, 'BsmtFinSF1'), (0.13036585867815073, 'WoodDeckSF'), (0.1211669980146095, 'LotArea'), (0.11504305093601253, '2ndFlrSF'), (0.11439764110410199, 'BsmtUnfSF'), (0.11080279874459822, 'HalfBath'), (0.1017767099777179, 'BedroomAbvGr'), (0.08317879411803167, 'BsmtFullBath'), (0.06246570190310543, 'PoolArea'), (0.028399126033275164, 'MoSold'), (0.02556170369869493, 'ScreenPorch'), (0.010532579475601184, '3SsnPorch'), (0.004231304806602964, 'LowQualFinSF'), (-0.01012145139988125, 'KitchenAbvGr'), (-0.011129722622237775, 'YrSold'), (-0.013552848692716916, 'BsmtFinSF2'), (-0.015493399959318557, 'MiscVal'), (-0.02114280846249704, 'BsmtHalfBath'), (-0.08602539908222213, 'EnclosedPorch'), (-0.1017783758120348, 'OverallCond')]
经过PCA后,进行第二层主成分分析-------------------------------------
[(0.037140668512444255, 'LotFrontage'), (0.005762269875424171, 'LotArea'), (-0.02265545744738413, 'OverallQual'), (0.06797580738610676, 'OverallCond'), (-0.22034458100877843, 'YearBuilt'), (-0.11769773674122082, 'YearRemodAdd'), (-0.02330741979867707, 'MasVnrArea'), (-0.26830830083400875, 'BsmtFinSF1'), (-0.06776753790369254, 'BsmtFinSF2'), (0.10349973537774373, 'BsmtUnfSF'), (-0.2014230745261159, 'TotalBsmtSF'), (-0.14501101153644946, '1stFlrSF'), (0.43960496790131565, '2ndFlrSF'), (0.11932040000909688, 'LowQualFinSF'), (0.2706724094458561, 'GrLivArea'), (-0.2741406761479087, 'BsmtFullBath'), (-0.001880261013674545, 'BsmtHalfBath'), (0.12608264523927462, 'FullBath'), (0.23358978781221817, 'HalfBath'), (0.3864399252645517, 'BedroomAbvGr'), (0.12179545892853964, 'KitchenAbvGr'), (0.3371810668951179, 'TotRmsAbvGrd'), (0.06581774146310777, 'Fireplaces'), (-0.1834261688794573, 'GarageYrBlt'), (-0.04640661259007604, 'GarageCars'), (-0.08613653500685643, 'GarageArea'), (-0.047991361825782064, 'WoodDeckSF'), (0.03130768246434415, 'OpenPorchSF'), (0.13376424222015906, 'EnclosedPorch'), (-0.02564456693744644, '3SsnPorch'), (0.04211790221668751, 'ScreenPorch'), (0.03032238859229474, 'PoolArea'), (0.04968459727862472, 'MiscVal'), (0.02754218343139985, 'MoSold'), (-0.04555808126996797, 'YrSold')]
[(0.43960496790131565, '2ndFlrSF'), (0.3864399252645517, 'BedroomAbvGr'), (0.3371810668951179, 'TotRmsAbvGrd'), (0.2706724094458561, 'GrLivArea'), (0.23358978781221817, 'HalfBath'), (0.13376424222015906, 'EnclosedPorch'), (0.12608264523927462, 'FullBath'), (0.12179545892853964, 'KitchenAbvGr'), (0.11932040000909688, 'LowQualFinSF'), (0.10349973537774373, 'BsmtUnfSF'), (0.06797580738610676, 'OverallCond'), (0.06581774146310777, 'Fireplaces'), (0.04968459727862472, 'MiscVal'), (0.04211790221668751, 'ScreenPorch'), (0.037140668512444255, 'LotFrontage'), (0.03130768246434415, 'OpenPorchSF'), (0.03032238859229474, 'PoolArea'), (0.02754218343139985, 'MoSold'), (0.005762269875424171, 'LotArea'), (-0.001880261013674545, 'BsmtHalfBath'), (-0.02265545744738413, 'OverallQual'), (-0.02330741979867707, 'MasVnrArea'), (-0.02564456693744644, '3SsnPorch'), (-0.04555808126996797, 'YrSold'), (-0.04640661259007604, 'GarageCars'), (-0.047991361825782064, 'WoodDeckSF'), (-0.06776753790369254, 'BsmtFinSF2'), (-0.08613653500685643, 'GarageArea'), (-0.11769773674122082, 'YearRemodAdd'), (-0.14501101153644946, '1stFlrSF'), (-0.1834261688794573, 'GarageYrBlt'), (-0.2014230745261159, 'TotalBsmtSF'), (-0.22034458100877843, 'YearBuilt'), (-0.26830830083400875, 'BsmtFinSF1'), (-0.2741406761479087, 'BsmtFullBath')]
不进行PCA的线性回归的MSE是1644140595.6636596
前10个PCA主成分进行线性回归的MSE是1836601962.4751632
[1e-10, 1e-09, 1e-08, 1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01, 0.1]
[1642818822.3530025, 1642818822.3529558, 1642818822.3524888, 1642818822.3471866, 1642818822.3005185, 1642818821.7415214, 1642818817.1179569, 1642818756.7038794, 1642818283.0732899, 1642813588.5752773]
[1e-10, 1e-09, 1e-08, 1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01, 0.1]
[1836601962.4751682, 1836601962.4752123, 1836601962.475657, 1836601962.480097, 1836601962.5245085, 1836601962.9652405, 1836601967.4063494, 1836602011.8174434, 1836602455.9288514, 1836606882.1034737]

核心代码

PCA
class TruncatedSVD Found at: sklearn.decomposition._truncated_svdclass TruncatedSVD(TransformerMixin, BaseEstimator):"""Dimensionality reduction using truncated SVD (aka LSA).This transformer performs linear dimensionality reduction by means oftruncated singular value decomposition (SVD). Contrary to PCA, thisestimator does not center the data before computing the singular valuedecomposition. This means it can work with sparse matricesefficiently.In particular, truncated SVD works on term count/tf-idf matrices asreturned by the vectorizers in :mod:`sklearn.feature_extraction.text`. Inthat context, it is known as latent semantic analysis (LSA).This estimator supports two algorithms: a fast randomized SVD solver, anda "naive" algorithm that uses ARPACK as an eigensolver on `X * X.T` or`X.T * X`, whichever is more efficient.LinearRegression
class LinearRegression Found at: sklearn.linear_model._baseclass LinearRegression(MultiOutputMixin, RegressorMixin, LinearModel):"""Ordinary least squares Linear Regression.LinearRegression fits a linear model with coefficients w = (w1, ..., wp)to minimize the residual sum of squares between the observed targets inthe dataset, and the targets predicted by the linear approximation.Lasso
class Lasso Found at: sklearn.linear_model._coordinate_descent
class Lasso(ElasticNet):"""Linear Model trained with L1 prior as regularizer (aka the Lasso)The optimization objective for Lasso is::(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1Technically the Lasso model is optimizing the same objective function asthe Elastic Net with ``l1_ratio=1.0`` (no L2 penalty).Read more in the :ref:`User Guide <lasso>`.

ML之LiRLassoR:利用boston房价数据集(PCA处理)采用线性回归和Lasso套索回归算法实现房价预测模型评估相关推荐

  1. ML之xgboost:利用xgboost算法对Boston(波士顿房价)数据集【特征列分段→独热编码】进行回归预测(房价预测)+预测新数据得分

    ML之xgboost:利用xgboost算法对Boston(波士顿房价)数据集[特征列分段→独热编码]进行回归预测(房价预测)+预测新数据得分 导读 对Boston(波士顿房价)数据集进行特征工程,分 ...

  2. 十二、案例:加利福尼亚房屋价值数据集(多元线性回归) Lasso 岭回归 分箱处理非线性问题 多项式回归

    案例:加利福尼亚房屋价值数据集(线性回归)& Lasso & 岭回归 & 分箱处理非线性问题 点击标题即可获取文章源代码和笔记 1. 导入需要的模块和库 from sklear ...

  3. Python数据分析案例-利用多元线性回归与随机森林回归算法预测笔记本新品价格

    1.前言 目的: 本文通过多元线性回归与随机森林算法预测笔记本新品的发售价 工具: 语言:Python 3.8 软件:Jupyter Notebook 库:pandas.numpy.matplotli ...

  4. DL:基于sklearn的加利福尼亚房价数据集实现GD算法

    DL:基于sklearn的加利福尼亚房价数据集实现GD算法 目录 输出结果 代码设计 输出结果 该数据包含9个变量的20640个观测值,该数据集包含平均房屋价值作为目标变量和以下输入变量(特征):平均 ...

  5. 【机器学习】岭回归和LASSO回归详解以及相关计算实例-加利福尼亚的房价数据集、红酒数据集

    文章目录 一,岭回归和LASSO回归 1.1 多重共线性 1.2 岭回归接手 1.3 线性回归的漏洞(线性系数的矛盾解释) 1.4 Ridge Regression 1.5 岭回归实例(加利福尼亚的房 ...

  6. ML之KMeans:利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析

    ML之KMeans:利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析 目录 利用KMeans算法对Boston房价数据集(两特征+归一化)进行二聚类分析 设计思路 输出结果 ...

  7. ML之LiR:利用LiR线性回归算法(自定义目标函数MSE和优化器GD)对Boston房价数据集(两特征+归一化)进行回归预测

    ML之LiR:利用LiR线性回归算法(自定义目标函数MSE和优化器GD)对Boston房价数据集(两特征+归一化)进行回归预测 目录 利用LiR线性回归算法(自定义目标函数MSE和优化器GD)对Bos ...

  8. ML之回归预测:利用13种机器学习算法对Boston(波士顿房价)数据集【13+1,506】进行回归预测(房价预测)+预测新数据得分

    ML之回归预测:利用13种机器学习算法对Boston(波士顿房价)数据集[13+1,506]进行回归预测(房价预测)+预测新数据得分 导读 本文章基于前边的一篇文章,对13种机器学习的回归模型性能比较 ...

  9. ML之回归预测:利用13种机器学习算法对Boston(波士顿房价)数据集【13+1,506】进行回归预测(房价预测)来比较各模型性能

    ML之回归预测:利用13种机器学习算法对Boston(波士顿房价)数据集[13+1,506]进行回归预测(房价预测)来比较各模型性能 导读 通过利用13种机器学习算法,分别是LiR.kNN.SVR.D ...

最新文章

  1. 区块链金融应用论坛关于量化的分享
  2. 事件响应政策制定常见雷区,都踩了我就只能祝福你了……
  3. 启动nuxt项目fsevents报错
  4. oracle重做日志文件版本不一致问题处理
  5. android 建数据库 SQLite 存储sd 卡或者内存
  6. python数据科学讲解_数据科学的概念-Python数据科学技术详解与商业项目实战精讲 - Python学习网...
  7. 阿里巴巴集团成为国家信息安全漏洞库(CNNVD)技术支撑单位
  8. 计算机网络学习笔记(7. 报文交换与分组交换①)
  9. php mysql 简单,你想不到的最简单php操作MySQL
  10. java panel边框_java – 如何在jPanel上设置边框?
  11. 【安装记录】如何在官网找到老版本的jdk,如 jdk-8u271-windows-x64
  12. tidb源码编译安装,从入门到差点放弃
  13. Vue实现 侧边固定定位图标 滑动隐藏
  14. mongodb常用方法
  15. php的惰性加载,惰性加载
  16. leetcode 1567 替换所有问号
  17. 华为ENSP(VRP)命令行
  18. 巫妖王之怒诺森德大陆全貌视频 魔兽世界
  19. [洛谷]P2298 Mzc和男家丁的游戏 题解
  20. delphi透明panel组件或者制作方法

热门文章

  1. Java二手车交易系统
  2. 自适应中值滤波及实现
  3. php mysql数据库简介,mysql数据库
  4. Linux的tty设备介绍
  5. 因为 Java 和 Php 在获取客户端 cookie 方式不同引发的 bug
  6. 编码最佳实践——Liskov替换原则
  7. 第12章 样式(三)
  8. # Consumed parameters
  9. ObjectInputStreamObjectOutputStream工具类
  10. Android activity之间的滑入切换