Recap

Here’s the code you’ve written so far. Start by running it again.

# Code you have previously used to load data
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor# Path of the file to read. We changed the directory structure to simplify submitting to a competition
iowa_file_path = 'train.csv'home_data = pd.read_csv(iowa_file_path)
# Create target object and call it y
y = home_data.SalePrice
# Create X
features = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
X = home_data[features]# Split into validation and training data
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
# Specify Model
iowa_model = DecisionTreeRegressor(random_state=1)
# Fit Model
iowa_model.fit(train_X, train_y)# Make validation predictions and calculate mean absolute error
val_predictions = iowa_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print("Validation MAE when not specifying max_leaf_nodes: {:,.0f}".format(val_mae))# Using best value for max_leaf_nodes
iowa_model = DecisionTreeRegressor(max_leaf_nodes=100, random_state=1)
iowa_model.fit(train_X, train_y)
val_predictions = iowa_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print("Validation MAE for best value of max_leaf_nodes: {:,.0f}".format(val_mae))# Define the model. Set random_state to 1
rf_model = RandomForestRegressor(random_state=1)
rf_model.fit(train_X, train_y)
rf_val_predictions = rf_model.predict(val_X)
rf_val_mae = mean_absolute_error(rf_val_predictions, val_y)print("Validation MAE for Random Forest Model: {:,.0f}".format(rf_val_mae))

Validation MAE when not specifying max_leaf_nodes: 29,653
Validation MAE for best value of max_leaf_nodes: 27,283
Validation MAE for Random Forest Model: 22,762

Creating a Model For the Competition

Build a Random Forest model and train it on all of X and y.

# To improve accuracy, create a new Random Forest model which you will train on all training data
rf_model_on_full_data = RandomForestRegressor(random_state=1)# fit rf_model_on_full_data on all data from the training data
rf_model_on_full_data.fit(train_X,train_y)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,max_features='auto', max_leaf_nodes=None,min_impurity_decrease=0.0, min_impurity_split=None,min_samples_leaf=1, min_samples_split=2,min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,oob_score=False, random_state=1, verbose=0, warm_start=False)

Make Predictions

Read the file of “test” data. And apply your model to make predictions

# path to file you will use for predictions
test_data_path = 'test.csv'# read test data file using pandas
test_data = pd.read_csv(test_data_path)# create test_X which comes from test_data but includes only the columns you used for prediction.
# The list of columns is stored in a variable called features
test_X = test_data[['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']]# make predictions which we will submit.
test_preds = rf_model.predict(test_X)
# The lines below shows how to save predictions in format used for competition scoring
# Just uncomment them.output = pd.DataFrame({'Id': test_data.Id,'SalePrice': test_preds})
output.to_csv('submission.csv', index=False)

kaggle确实时一个不错的学习平台

kaggle机器学习作业(房价预测)相关推荐

动手学深度学习：3.16 实战Kaggle比赛：房价预测
3.16 实战Kaggle比赛:房价预测作为深度学习基础篇章的总结,我们将对本章内容学以致用.下面,让我们动手实战一个Kaggle比赛:房价预测.本节将提供未经调优的数据的预处理.模型的设计和超参数 ...
超详解pytorch实战Kaggle比赛：房价预测
详解pytorch实战Kaggle比赛:房价预测教程名称教程地址机器学习/深度学习 [李宏毅]机器学习/深度学习国语教程(双语字幕) 生成对抗网络 [李宏毅]生成对抗网络国语教程(双语字幕) 目 ...
[Kaggle] Housing Prices 房价预测
文章目录 1. Baseline 1. 特征选择 2. 异常值剔除 3. 建模预测 2. 待优化特征工程房价预测 kaggle 地址参考文章:kaggle比赛:房价预测(排名前4%) 1. Bas ...
波士顿房价预测python代码_Python之机器学习-波斯顿房价预测
AI 人工智能 Python之机器学习-波斯顿房价预测波士顿房价预测导入模块 import pandas as pd import numpy as np import matplotlib.py ...
【ML】基于机器学习的房价预测研究（系列7：双向LSTM模型）
写在前面: 首先感谢兄弟们的订阅,让我有创作的动力,在创作过程我会尽最大能力,保证作品的质量,如果有问题,可以私信我,让我们携手共进,共创辉煌. 本次实战的项目是:基于机器学习的房价预测研究(附完整代 ...
Kaggle实战之房价预测案例
房价预测案例(进阶版) 这是进阶版的notebook.主要是为了比较几种模型框架.所以前面的特征工程部分内容,我也并没有做任何改动,重点都在后面的模型建造section Step 1: 检视源数据集 ...
Kaggle经典项目——房价预测
写在前面: 这篇文章旨在梳理kaggle回归问题的一个基本流程.博主只是一个数据分析刚入门的新手,有些错漏之处还请批评指正.很遗憾这个项目最后提交的Private Score只达到了排行榜的TOP13 ...
kaggle小白入门——房价预测top2%~top1%
入门第二战,达到了top1%的分数,有点小兴奋,不过也有可能为公分的提高使模型过拟合了,但入门赛貌似也只能追求公分的提高. 言归正传,开战. 一.导包 # 数据处理及可视化 import numpy ...
利用机器学习进行房价预测
爬虫能做什么爬虫除了能够获取互联网的数据以外还能够帮我们完成很多繁琐的手动操作,这些操作不仅仅包括获取数据,还能够添加数据,比如: 投票管理多个平台的多个账户(如各个电商平台的账号) 微信聊天机器 ...

kaggle机器学习作业(房价预测)

Recap

Creating a Model For the Competition

Make Predictions

kaggle机器学习作业(房价预测)相关推荐

最新文章

热门文章