文章目录

  • 项目说明
    • Boston 数据集
  • 代码实现
    • 数据处理
      • 下载、查看数据
      • 切分数据
      • 标准化
    • 训练模型
      • 方式一:LinearRegression
      • 方式二:SGDRegressor

项目说明


Boston 数据集

因为涉及种族问题(有一个和黑人人口占比相关的变量B),波士顿房价这个数据集将在sklearn 1.2版本中被移除。
这里使用的是 低版本的 sklearn

!pip3 install scikit-learn==0.24.1

load_boston has been removed from scikit-learn since version 1.2.


这个数据集有 506 条数据,相关属性:

  • CRIM 犯罪率;per capita crime rate by town
  • ZN proportion of residential land zoned for lots over 25,000 sq.ft.
  • INDUS 非零售商业用地占比;proportion of non-retail business acres per town
  • CHAS 是否临Charles河;Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  • NOX 氮氧化物浓度;nitric oxides concentration (parts per 10 million)
  • RM 房屋房间数;average number of rooms per dwelling
  • AGE 房屋年龄;proportion of owner-occupied units built prior to 1940
  • DIS 和就业中心的距离;weighted distances to five Boston employment centres
  • RAD 是否容易上高速路;index of accessibility to radial highways
  • TAX 税率;full-value property-tax rate per $10,000
  • PTRATIO 学生人数比老师人数;pupil-teacher ratio by town
  • B 城镇黑人比例计算的统计值;1000(Bk - 0.63)^2 where Bk is the proportion of black people by town
  • LSTAT 低收入人群比例;% lower status of the population
  • MEDV 房价中位数;Median value of owner-occupied homes in $1000’s

代码实现

数据处理

下载、查看数据

from sklearn.datasets import load_bostondata = load_boston()
data
{'data': array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,4.9800e+00],[2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9690e+02,9.1400e+00],[2.7290e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9283e+02,4.0300e+00],..., [4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,7.8800e+00]]),'target': array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17.5, 20.2, 18.2, 13.6, 19.6, ...23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9]),'feature_names': array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD','TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='<U7'),'DESCR': ".. _boston_dataset:Boston house prices dataset---------------------------**Data Set Characteristics:**  :Number of Instances: 506 :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.:Attribute Information (in order):- CRIM     per capita crime rate by town- ZN       proportion of residential land zoned for lots over 25,000 sq.ft.- INDUS    proportion of non-retail business acres per town- CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)- NOX      nitric oxides concentration (parts per 10 million)- RM       average number of rooms per dwelling- AGE      proportion of owner-occupied units built prior to 1940- DIS      weighted distances to five Boston employment centres- RAD      index of accessibility to radial highways- TAX      full-value property-tax rate per $10,000- PTRATIO  pupil-teacher ratio by town- B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town- LSTAT    % lower status of the population- MEDV     Median value of owner-occupied homes in $1000's:Missing Attribute Values: None:Creator: Harrison, D. and Rubinfeld, D.L.This is a copy of UCI ML housing dataset.https://archive.ics.uci.edu/ml/machine-learning-databases/housing/This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonicprices and the demand for clean air', J. Environ. Economics & Management,vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics...', Wiley, 1980.   N.B. Various transformations are used in the table onpages 244-261 of the latter.The Boston house-price data has been used in many machine learning papers that address regressionproblems.   .. topic:: References- Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.- Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.",'filename': '/Users/xx/opt/anaconda3/lib/python3.7/site-packages/sklearn/datasets/data/boston_house_prices.csv'}

# 查看数据描述
data.DESCR 

data 是 sklearn.utils.Bunch 类,这个类继承自 dict。
它在 sklearn/utils/__init__.py 文件中。


type(data) # sklearn.utils.Bunch;
list(data.keys())   # ['data', 'target', 'feature_names', 'DESCR', 'filename']
len(data.data)  # 506
# data.data
array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,4.9800e+00],[2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9690e+02,9.1400e+00], ..., [4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,7.8800e+00]])

切分数据

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target)
len(X_train), len(X_test), len(y_train), len(y_test)
# (379, 127, 379, 127)len(X_train)/ len(X_test) # 2.984251968503937

标准化

from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler# 将数据进行标准化处理
sds = StandardScaler()x_train = sds.fit_transform(X_train)
x_test = sds.transform(X_test)
# x_trainarray([[-0.40503224,  0.02669292, -0.73056566, ...,  0.22105105,0.42774266, -0.52050075],[ 1.94694906, -0.49157634,  1.01370896, ...,  0.81392993,-3.69945075,  3.11194771],[-0.41724734, -0.49157634, -0.97415513, ...,  0.17544498,0.30149725, -0.27341473],..., [ 0.60505563, -0.49157634,  1.01370896, ...,  0.81392993,0.42774266,  1.19206096]])

y_test = y_test.reshape(-1,1)
y_train = y_train.reshape(-1,1)
# y_test
array([[22.3],[17.4],[27.1],[22. ],...[10.9]])
sds_y = StandardScaler()
y_train = sds_y.fit_transform(y_train)
y_test = sds_y.transform(y_test)

训练模型

方式一:LinearRegression

from sklearn.linear_model import LinearRegression,SGDRegressor
lr = LinearRegression()
lr.fit(x_train,y_train) # LinearRegression()
# 通过线性回归估计 的权重数组;它的形状是(n_targets,n_features)
lr.coef_
array([[-0.0612378 ,  0.16416119,  0.00767045,  0.09201928, -0.22140224,0.23731323,  0.02417785, -0.34593363,  0.2620663 , -0.18835647,-0.2258351 ,  0.08609841, -0.46284107]])
y_predict = lr.predict(x_test)
# y_predict
array([[ 0.52472231],[-0.59075883],[ 0.43991597],[ 0.49699826],...[-0.80045313]])

y_predict_lr = sds_y.inverse_transform(y_predict)
# y_predict_lr array([[27.56580687],[17.29643043],[26.78506015],...[15.36593638]])

方式二:SGDRegressor

# SGD
sgd = SGDRegressor()
sgd.fit(x_train,y_train)# /Users/xx/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py:63: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(*args, **kwargs)
#     SGDRegressor()

sgd.coef_
array([-0.04768824,  0.13608779, -0.02603741,  0.09983776, -0.17376075,0.26927606,  0.00684786, -0.31192725,  0.16107088, -0.09118234,-0.21531016,  0.09122253, -0.44832677])

y_predict = sgd.predict(x_test)
# y_predictarray([ 0.46684306, -0.56863864,  0.41733968,  0.50252795, -0.9806062 ,-0.23795072, -1.52966828, -1.8598664 , -1.44979186, -1.6725667 ,...-0.45739942, -0.42882668, -1.06541168,  0.11113028,  0.29365225,-0.75703282, -0.7820252 ])

y_predict_sgd = sds_y.inverse_transform(y_predict)
# y_predict_sgdarray([27.03295709, 17.50007403, 26.57721759, 27.36148043, 13.70740567,20.54446321,  8.65261371,  5.61273368,  9.38797442,  7.33705788,...18.52416787, 18.78721513, 12.92666688, 23.75818334, 25.43852267,15.76567378, 15.53558811])

from sklearn.metrics import mean_squared_error
print('lr均方误差',mean_squared_error(sds_y.inverse_transform(y_test),y_predict_lr))  # 23.811573271484313
print('sgd均方误差',mean_squared_error(sds_y.inverse_transform(y_test),y_predict_sgd)) # 23.77573271117358
# mean_squared_error()

伊织 2023-02-25(六)

线性回归 - 波斯顿房价预测相关推荐

  1. 02-06 普通线性回归(波斯顿房价预测)+特征选择

    文章目录 普通线性回归(波士顿房价预测) 导入模块 获取数据 打印数据 特征选择 散点图矩阵 关联矩阵 训练模型 可视化 普通线性回归(波士顿房价预测) 导入模块 import pandas as p ...

  2. 线性回归-波斯顿房价预测

    笔记 1.遇到问题 按照代码正确输入,但是结果没有运行出来 2.过程 全程暴躁,不知道怎么办,标红问题也看了,改了就是不行 3.解决问题 在不停暴躁期间,查资料,室友帮忙,解决了问题(嘿嘿,还得是室友 ...

  3. scikit-learn线性回归实践 - 波斯顿房价预测

    Educoder实训平台机器学习-线性回归:scikit-learn线性回归实践 - 波斯顿房价预测 (下方代码已成功通过平台测试) 文章目录 机器学习:波士顿房价数据集 任务描述 相关知识 Line ...

  4. 波士顿房价预测python代码_Python之机器学习-波斯顿房价预测

    AI 人工智能 Python之机器学习-波斯顿房价预测 波士顿房价预测 导入模块 import pandas as pd import numpy as np import matplotlib.py ...

  5. bagging回归 波斯顿房价预测

    #bagging回归 波斯顿房价预测from sklearn.datasets import load_bostonboston = load_boston()from sklearn.model_s ...

  6. MOOC网深度学习应用开发1——Tensorflow基础、多元线性回归:波士顿房价预测问题Tensorflow实战、MNIST手写数字识别:分类应用入门、泰坦尼克生存预测

    Tensorflow基础 tensor基础 当数据类型不同时,程序做相加等运算会报错,可以通过隐式转换的方式避免此类报错. 单变量线性回归 监督式机器学习的基本术语 线性回归的Tensorflow实战 ...

  7. 基于多元线性回归的房价预测

    基于多元线性回归的房价预测 摘要 市场房价的走向受到多种因素的影响,通过对影响市场房价的多种因素进行分析,有助于对未来房价的走势进行较为准确的评估. 多元线性回归适用于对受到多因素影响的数据进行分析的 ...

  8. Eudcoder scikit-learn线性回归实践 - 波斯顿房价预测

    任务描述 本关任务:你需要调用 sklearn 中的线性回归模型,并通过波斯顿房价数据集中房价的13种属性与目标房价对线性回归模型进行训练.我们会调用你训练好的线性回归模型,来对房价进行预测. 相关知 ...

  9. python使用线性回归实现房价预测

    一.单变量房价预测 采用一元线性回归实现单变量房价预测.通过房屋面积与房价建立线性关系,通过梯度下降进行训练,拟合权重和偏置参数,使用训练到的参数进行房价预测. 1.房屋面积与房价数据 32.5023 ...

最新文章

  1. 做了5年Java,java文件下载代码
  2. 如何设计一门语言(六)——exception和error code
  3. Hyperledger Fabric 交易流程
  4. SAP官方社区上提供的一些下载资源
  5. Linux系统管理员的Bash指南,11条Bash实践经验!
  6. 这个世界本来就是残酷的,所以你不能怪C++向你展示了世界的本质!
  7. HBase模式设计之ID顺序增长(rowkey顺序增长)
  8. Linux 禁止和开启 ping 的方法
  9. RESTful中不同HTTP请求类型的含义
  10. 开源、绿色,解压即可运行的数据库连接工具推荐
  11. 2017年10大主流编程语言最新排行榜出炉
  12. 制作u盘winpe启动盘_U盘启动盘制作失败怎么办 U盘启动盘制作失败解决方法【详解】...
  13. vmware 设置ip
  14. 3d打印英语文献_3D打印合集,从设计,技术到工业制造应用!
  15. css背景图片和背景颜色一起显示
  16. Hbuildx 使用vue打包的App实现微信分享功能
  17. 数字图像隐藏图像的两种算法及实现代码
  18. echarts图表使用以及图片转码
  19. Google收购传感器公司Lumedyne
  20. 算法详解——后缀自动机

热门文章

  1. 递归3: 汉诺塔的递归与迭代实现
  2. 少儿编程课程设计理论(可用做论文)
  3. 今日早报 每日精选12条新闻简报 每天一分钟 知晓天下事 6月4日
  4. PHP 导入导出excel、csv百万数据到数据库
  5. 趣谈Python银杏树(初学者程序)
  6. bim综合软件如何操作局部三维?
  7. (C语言)蓝桥杯-振兴中华
  8. [大模型] LLaMA系列大模型调研与整理-llama/alpaca/lora(部分)
  9. 陀螺专访 | 趣链科技张帅:新基建“风口”下,区块链产业有哪些发展机遇?...
  10. python解析佳明fit文件