数值稳定性和激活函数总结

  1. relu容易导致梯度爆炸、sigmoid容易导致梯度消失
  2. xavier模型初始化方法
  3. Adam适应学习的范围更大一点

房价预测demo

下载数据

import hashlib
import os
import tarfile
import zipfile
import requests
DATA_HUB = dict()
DATA_URL = 'http://d2l-data.s3-accelerate.amazonaws.com/'
  • 断言 assert 等价于
    if not expression:raise AssertionError
def download(name, cache_dir=os.path.join('.', 'data')):  """下载一个DATA_HUB中的文件,返回本地文件名。"""assert name in DATA_HUB, f"{name} 不存在于 {DATA_HUB}."url, sha1_hash = DATA_HUB[name]os.makedirs(cache_dir, exist_ok=True)fname = os.path.join(cache_dir, url.split('/')[-1])if os.path.exists(fname):sha1 = hashlib.sha1()with open(fname, 'rb') as f:while True:data = f.read(1048576)if not data:breaksha1.update(data)if sha1.hexdigest() == sha1_hash:return fnameprint(f'正在从{url}下载{fname}...')r = requests.get(url, stream=True, verify=True)with open(fname, 'wb') as f:f.write(r.content)return fname
def download_extract(name, folder=None):  """下载并解压zip/tar文件。"""fname = download(name)base_dir = os.path.dirname(fname)data_dir, ext = os.path.splitext(fname)if ext == '.zip':fp = zipfile.ZipFile(fname, 'r')elif ext in ('.tar', '.gz'):fp = tarfile.open(fname, 'r')else:assert False, '只有zip/tar文件可以被解压缩。'fp.extractall(base_dir)return os.path.join(base_dir, folder) if folder else data_dir
def download_all():  """下载DATA_HUB中的所有文件。"""for name in DATA_HUB:download(name)
import numpy as np
import pandas as pd
import torch
from torch import nn
from d2l import torch as d2l
DATA_HUB['kaggle_house_train'] = (  DATA_URL + 'kaggle_house_pred_train.csv','585e9cc93e70b39160e7921475f9bcd7d31219ce')DATA_HUB['kaggle_house_test'] = (  DATA_URL + 'kaggle_house_pred_test.csv','fa19780a7b011d9b009e8bff8e99922a8ee2eb90')
train_data = pd.read_csv(download('kaggle_house_train'))
test_data = pd.read_csv(download('kaggle_house_test'))print(train_data.shape)
print(test_data.shape)
(1460, 81)
(1459, 80)
# 前四个和最后两个特征,以及相应标签
print(train_data.iloc[0:4,[0,1,2,3,-3,-2,-1]])
   Id  MSSubClass MSZoning  LotFrontage SaleType SaleCondition  SalePrice
0   1          60       RL         65.0       WD        Normal     208500
1   2          20       RL         80.0       WD        Normal     181500
2   3          60       RL         68.0       WD        Normal     223500
3   4          70       RL         60.0       WD       Abnorml     140000

特征工程

  • 需要注意,这里用的是所有数据集的均值和方差处理数据,实际中不一定能够拿到测试集
# 在每个样本中,第一个特征是ID,我们将其从数据集中删除,同时删除训练集中的标签
all_features = pd.concat((train_data.iloc[:,1:-1],test_data.iloc[:,1:]))
all_features.head()
MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig ... ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition
0 60 RL 65.0 8450 Pave NaN Reg Lvl AllPub Inside ... 0 0 NaN NaN NaN 0 2 2008 WD Normal
1 20 RL 80.0 9600 Pave NaN Reg Lvl AllPub FR2 ... 0 0 NaN NaN NaN 0 5 2007 WD Normal
2 60 RL 68.0 11250 Pave NaN IR1 Lvl AllPub Inside ... 0 0 NaN NaN NaN 0 9 2008 WD Normal
3 70 RL 60.0 9550 Pave NaN IR1 Lvl AllPub Corner ... 0 0 NaN NaN NaN 0 2 2006 WD Abnorml
4 60 RL 84.0 14260 Pave NaN IR1 Lvl AllPub FR2 ... 0 0 NaN NaN NaN 0 12 2008 WD Normal

5 rows × 79 columns

all_features.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2919 entries, 0 to 1458
Data columns (total 79 columns):#   Column         Non-Null Count  Dtype
---  ------         --------------  -----  0   MSSubClass     2919 non-null   int64  1   MSZoning       2915 non-null   object 2   LotFrontage    2433 non-null   float643   LotArea        2919 non-null   int64  4   Street         2919 non-null   object 5   Alley          198 non-null    object 6   LotShape       2919 non-null   object 7   LandContour    2919 non-null   object 8   Utilities      2917 non-null   object 9   LotConfig      2919 non-null   object 10  LandSlope      2919 non-null   object 11  Neighborhood   2919 non-null   object 12  Condition1     2919 non-null   object 13  Condition2     2919 non-null   object 14  BldgType       2919 non-null   object 15  HouseStyle     2919 non-null   object 16  OverallQual    2919 non-null   int64  17  OverallCond    2919 non-null   int64  18  YearBuilt      2919 non-null   int64  19  YearRemodAdd   2919 non-null   int64  20  RoofStyle      2919 non-null   object 21  RoofMatl       2919 non-null   object 22  Exterior1st    2918 non-null   object 23  Exterior2nd    2918 non-null   object 24  MasVnrType     2895 non-null   object 25  MasVnrArea     2896 non-null   float6426  ExterQual      2919 non-null   object 27  ExterCond      2919 non-null   object 28  Foundation     2919 non-null   object 29  BsmtQual       2838 non-null   object 30  BsmtCond       2837 non-null   object 31  BsmtExposure   2837 non-null   object 32  BsmtFinType1   2840 non-null   object 33  BsmtFinSF1     2918 non-null   float6434  BsmtFinType2   2839 non-null   object 35  BsmtFinSF2     2918 non-null   float6436  BsmtUnfSF      2918 non-null   float6437  TotalBsmtSF    2918 non-null   float6438  Heating        2919 non-null   object 39  HeatingQC      2919 non-null   object 40  CentralAir     2919 non-null   object 41  Electrical     2918 non-null   object 42  1stFlrSF       2919 non-null   int64  43  2ndFlrSF       2919 non-null   int64  44  LowQualFinSF   2919 non-null   int64  45  GrLivArea      2919 non-null   int64  46  BsmtFullBath   2917 non-null   float6447  BsmtHalfBath   2917 non-null   float6448  FullBath       2919 non-null   int64  49  HalfBath       2919 non-null   int64  50  BedroomAbvGr   2919 non-null   int64  51  KitchenAbvGr   2919 non-null   int64  52  KitchenQual    2918 non-null   object 53  TotRmsAbvGrd   2919 non-null   int64  54  Functional     2917 non-null   object 55  Fireplaces     2919 non-null   int64  56  FireplaceQu    1499 non-null   object 57  GarageType     2762 non-null   object 58  GarageYrBlt    2760 non-null   float6459  GarageFinish   2760 non-null   object 60  GarageCars     2918 non-null   float6461  GarageArea     2918 non-null   float6462  GarageQual     2760 non-null   object 63  GarageCond     2760 non-null   object 64  PavedDrive     2919 non-null   object 65  WoodDeckSF     2919 non-null   int64  66  OpenPorchSF    2919 non-null   int64  67  EnclosedPorch  2919 non-null   int64  68  3SsnPorch      2919 non-null   int64  69  ScreenPorch    2919 non-null   int64  70  PoolArea       2919 non-null   int64  71  PoolQC         10 non-null     object 72  Fence          571 non-null    object 73  MiscFeature    105 non-null    object 74  MiscVal        2919 non-null   int64  75  MoSold         2919 non-null   int64  76  YrSold         2919 non-null   int64  77  SaleType       2918 non-null   object 78  SaleCondition  2919 non-null   object
dtypes: float64(11), int64(25), object(43)
memory usage: 1.8+ MB
# 存在缺失值的列的数目
all_features.isnull().any(axis=0).sum()
34
# 存在缺失值的行的数目
all_features.isnull().any(axis=1).sum()
2919
# 将所有缺失的值替换为相应特征的平均值。 通过将特征重新缩放到零均值和单位方差来标准化数据
numeric_features = all_features.dtypes[all_features.dtypes != "object"].index # 在pandas中object就是字符串类型
all_features[numeric_features] = all_features[numeric_features].apply(\lambda x: (x- x.mean() / x.std())) # 对每一列进行操作
all_features[numeric_features] = all_features[numeric_features].fillna(0)
# 再看一下存在缺失值的列的数目
all_features.isnull().any(axis=0).sum()
23

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RvstCcp5-1668172706049)(attachment:faedb1f0-d795-4196-861c-164478db64e4.png)]

# 处理字符串,one-hot编码
all_features = pd.get_dummies(all_features, dummy_na=True)
all_features.shape
(2919, 331)
# 再看一下存在缺失值的列的数目
all_features.isnull().any(axis=0).sum()
0

转为张量

# 从pandas格式中提取Numpy格式,并将其转为张量
# 切记将其转换为float32,因为tensor常用的是float32
n_train = train_data.shape[0] # 行数
train_features = torch.tensor(all_features[:n_train].values, dtype=torch.float32)
train_features.shape
torch.Size([1460, 331])
test_features = torch.tensor(all_features[n_train:].values, dtype=torch.float32)
train_labels = torch.tensor(train_data.SalePrice.values.reshape(-1,1), dtype=torch.float32)
# 不将训练标签转换成矩阵的话训练过程中会有警告

模型及训练

模型

loss = nn.MSELoss()
in_features = train_features.shape[1]def get_net(): # 简单的线性回归net = nn.Sequential(nn.Linear(in_features=in_features,out_features=1))return net

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-aslA2oL1-1668172706050)(attachment:9ebb6e60-ef9c-433d-bc4e-0ae619824975.png)]

def log_rmse(net, features, labels):clipped_preds = torch.clamp(net(features), 1, float('inf'))rmse = torch.sqrt(loss(torch.log(clipped_preds), torch.log(labels)))return rmse.item()

训练函数

def train(net, train_features, train_labels, test_features, test_labels,num_epochs, learning_rate, weight_decay, batch_size):train_ls, test_ls = [], []train_iter = d2l.load_array((train_features, train_labels), batch_size)optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate,weight_decay=weight_decay)for epoch in range(num_epochs):for X, y in train_iter:optimizer.zero_grad()l = loss(net(X), y)l.backward()optimizer.step()train_ls.append(log_rmse(net, train_features, train_labels))if test_labels is not None:test_ls.append(log_rmse(net, test_features, test_labels))return train_ls, test_ls

k折交叉验证

注意:我们这里的验证集是从训练集中分出来的

slice(1,4) # 切片函数 跟python序列数据类型的切片一毛一样
slice(1, 4, None)
def get_k_fold_data(k,i,X,y):assert k > 1fold_size = X.shape[0] // kX_train, y_train = None, Nonefor j in range(k):idx = slice(j * fold_size, (j + 1) * fold_size)X_part, y_part = X[idx, :], y[idx]if j == i:X_valid, y_valid = X_part, y_partelif X_train is None:X_train, y_train = X_part, y_partelse:X_train = torch.cat([X_train, X_part], 0) # 行拼接y_train = torch.cat([y_train, y_part], 0)return X_train, y_train, X_valid, y_valid
def k_fold(k, X_train, y_train, num_epochs, learning_rate, weight_decay,batch_size):train_l_sum, valid_l_sum = 0, 0for i in range(k):data = get_k_fold_data(k, i, X_train, y_train)net = get_net()train_ls, valid_ls = train(net, *data, num_epochs, learning_rate,weight_decay, batch_size)train_l_sum += train_ls[-1]valid_l_sum += valid_ls[-1]if i == 0:d2l.plot(list(range(1, num_epochs + 1)), [train_ls, valid_ls],xlabel='epoch', ylabel='rmse', xlim=[1, num_epochs],legend=['train', 'valid'], yscale='log')print(f'fold {i + 1}, train log rmse {float(train_ls[-1]):f}, 'f'valid log rmse {float(valid_ls[-1]):f}')return train_l_sum / k, valid_l_sum / k

模型选择

调整超参数和模型架构

k, num_epochs, lr, weight_decay, batch_size = 5, 100, 0.03, 0.01, 64
# lr这么大的原因是选择了Adam优化器,他能接受的学习率范围更大
train_l, valid_l = k_fold(k, train_features, train_labels, num_epochs, lr,weight_decay, batch_size)
print(f'{k}-折验证: 平均训练log rmse: {float(train_l):f}, 'f'平均验证log rmse: {float(valid_l):f}')
fold 1, train log rmse 0.258938, valid log rmse 0.234451
fold 2, train log rmse 0.250196, valid log rmse 0.281742
fold 3, train log rmse 0.255146, valid log rmse 0.254656
fold 4, train log rmse 0.255734, valid log rmse 0.259894
fold 5, train log rmse 0.252717, valid log rmse 0.259565
5-折验证: 平均训练log rmse: 0.254546, 平均验证log rmse: 0.258062

提交kaggle预测num_epochs

def train_and_prde(train_features, test_features, train_labels, test_data,num_epochs, lr,weight_decay, batch_size):net = get_net()train_ls, _ = train(net, train_features, train_labels, None, None, num_epochs,lr,weight_decay,batch_size)d2l.plot(np.arange(1, num_epochs + 1), [train_ls], xlabel='epoch',ylabel='log rmse', xlim=[1, num_epochs], yscale='log')print(f'train log rmse {float(train_ls[-1]):f}') # 保留六位小数preds = net(test_features).detach().numpy()test_data['SalePrice'] = pd.Series(preds.reshape(1,-1)[0])# print(test_data['SalePrice'])submission = pd.concat([test_data['Id'], test_data['SalePrice']],axis=1)submission.to_csv('submission.csv', index=False)train_and_prde(train_features, test_features, train_labels, test_data,num_epochs, lr,weight_decay, batch_size)
train log rmse 0.245931

Pytorch房价预测相关推荐

  1. pytorch房价预测练习

    任务:基于 pytorch 实现房价预测 收集数据,对数据的属性进行介绍 编程实现数据预处理并保存 数据统计分析并绘制效果图 gitee仓库地址 个人博客地址 数据来源 导入包 import torc ...

  2. pytorch神经网络因素预测_实战:使用PyTorch构建神经网络进行房价预测

    微信公号:ilulaoshi / 个人网站:lulaoshi.info 本文将学习一下如何使用PyTorch创建一个前馈神经网络(或者叫做多层感知机,Multiple-Layer Perceptron ...

  3. Pytorch kaggle 房价预测实战

    Pytorch kaggle 房价预测实战 0. 环境介绍 环境使用 Kaggle 里免费建立的 Notebook 教程使用李沐老师的 动手学深度学习 网站和 视频讲解 小技巧:当遇到函数看不懂的时候 ...

  4. 深度学习+pytorch实战Kaggle比赛(一)——房价预测

    参考书籍<动手学深度学习(pytorch版),参考网址为: https://zh-v2.d2l.ai/chapter_multilayer-perceptrons/kaggle-house-pr ...

  5. 深度学习案例分享 | 房价预测 - PyTorch 实现

    原文链接 大家好,我是小寒. 今天来分享一个真实的 Kaggle ⽐赛案例:预测房价.此数据集由 Bart DeCock 于 2011 年收集,涵盖了2006-2010 年期间亚利桑那州埃姆斯市的房价 ...

  6. 超详解pytorch实战Kaggle比赛:房价预测

    详解pytorch实战Kaggle比赛:房价预测 教程名称 教程地址 机器学习/深度学习 [李宏毅]机器学习/深度学习国语教程(双语字幕) 生成对抗网络 [李宏毅]生成对抗网络国语教程(双语字幕) 目 ...

  7. 【深度学习】实战Kaggle竞赛之线性模型解决波士顿房价预测问题(Pytorch)

    [深度学习]实战Kaggle竞赛之线性模型解决波士顿房价预测问题 文章目录 1 概述1.1 Competition Description1.2 Practice Skills 2 数据处理 3 训练 ...

  8. pytorch学习笔记(十四):实战Kaggle比赛——房价预测

    文章目录 1. Kaggle比赛 2. 获取和读取数据集 3. 预处理数据 4. 训练模型 5. KKK折交叉验证 6. 模型选择 7. 预测并在Kaggle提交结果 1. Kaggle比赛 Kagg ...

  9. 教你使用百度深度学习框架PaddlePaddle完成波士顿房价预测(新手向)

    首先,本文是一篇纯新手向文章,我自己也只能算是入门,有说错的地方欢迎大家批评讨论 目录 一.人工智能.机器学习.深度学习 二.PaddlePaddle(飞桨) 三.波士顿房价预测模型 数据处理 模型设 ...

最新文章

  1. solr异常--Expected mime type application/octet-stream but got text/html.
  2. Java 基础-面试题
  3. js时间与毫秒互相转换
  4. jsp___jstl标签
  5. java 康塔纳 牙盘_这种配置不要六千?你敢信!
  6. MultiRow发现之旅(五)- MultiRow版俄罗斯方块(exe + 源码)
  7. One Web MKey
  8. ConceptDraw Office Pro v8.0.2 Keygen
  9. javascript 常用代码技巧大收集
  10. android imagebutton的点击事件,Android 点击ImageButton时有“按下”的效果的实现
  11. 公共DNS递归服务器(转发)
  12. 单细胞测序在免疫治疗研究中的应用
  13. 开源ERP安装之Opentaps和Openbravo安装指南
  14. 基于ssm框架开发的图书馆管理系统
  15. 看《延禧攻略》学配色与构图
  16. 职高计算机专业能考大学吗,为什么千万不要上职高 上职高能考大学吗
  17. 安装分区助手,总是显示“分区助手已安装到你的电脑中,怎么办
  18. 集线器,路由器,交换机的作用和区别是什么
  19. 2021江苏大学生编程大赛I题(省赛试水)
  20. mysql 热备份 数据一致性_MySQL 使用 XtraBackup 进行数据热备份指导 [全量+增量]

热门文章

  1. ubuntu串口调试工具kermit和minicom
  2. PDF Reader Pro for Mac 2.7.4.1 中文版 PDF编辑/批注/OCR/转换工具
  3. RISC-V MCU低功耗场景的应用分析
  4. MyBatis级联一对一与一对多
  5. 禁止html5手机端双击页面放大的问题
  6. 一文入门 Spring Boot
  7. 激光雷达与组合惯导联合标定--方案二(matlab)
  8. Git 命令使用体验的神器 -- tig
  9. 一个例子入坑布谷鸟算法(附完整py代码)
  10. 程序员考公指南1-59