R语言随机森林模型回归randomForest

基于《The influence of the neighbourhood environment on peer-to-peer accommodations: A random forest regression analysis》文章，Redirectinghttps://doi.org/10.1016/j.jhtm.2022.02.028

Multiple linear regression and Random forest regression

# Use the software RStudio 4.0.5

# -*- coding: UTF-8 -*-

# This code uses All room as an example, other types of Airbnb such as entire home/apt, private room, shared room codes are also used to avoid redundancy and will be omitted from the classification codes.

## Loading packages and data------------------------------------------------------------

library(randomForest)
library(pheatmap)

library(extrafont)

library(corrplot)

library(car)

setwd("C:/Users/Desktop/Airbnb") # Setting up the work path

Data_Airbnb <- read.csv("Airbnb_data.csv", sep = ",") # Reading data

## Multiple linear regression---------------------------------------------------------------

Lm_Airbnb <- lm(Airbnb~c("PopDen", "PGDP", "HPrice", "Distance", "BusDen", "MetroDen", "CaterDen", "ShopDen", "RecrDen", "UnivDen", "HotelDen", "AttrDen") ,data = Data_Airbnb) # Modelling

summar(Lm_Airbnb) # View fitting results

lm.pred_Airbnb <- predict(lm_Airbnb, Data_Airbnb) # Predicted results

lm.pred_Airbnb1 <- data.frame(forest.pred_Airbnb, Data_Airbnb) # Comparison of predicted and actual results

# Multicollinearity test

vif(Lm_Airbnb, digits = 3) # Variance inflation factor(VIF)

## Random forest regression--------------------------------------------------------------

set.seed(1234) # Setting up random number seeds

Rf_Airbnb <- randomForest(Airbnb ~ c("popDen", "PGDP", "HPrice", "Distance", "BusDen", "MetroDen", "CaterDen", "ShopDen", "RecrDen", "UnivDen", "HotelDen", "AttrDen"), data = Data_Airbnb, ntree = 500, importance = TRUE) # Modelling

# Cross-validation

set.seed(1234)

result <- rfcv(Data_Airbnb[ ,2:12], Data_Airbnb$Airbnb, cv.fold = 2, scale = "log", step = 0.5) # rfcv is a random forest cross-validation function

result$error.cv # View the crossover error rate table

# Results of random forest regression

forest.pred_Airbnb <- predict(Rf_Airbnb, Data_Airbnb) # Predicted results

forest.pred_Airbnb1 <- data.frame(forest.pred_Airbnb, Data_Airbnb) # Comparison of predicted and actual results

# Checking out the chart

opar <- par(no.readonly = TRUE)

par(lwd = 2, cex = 1, cex.axis = 1, font = 2, cex.lab = 1, tck = -.02)

plot(forest.pred_Airbnb, main = " ", lwd = 2, font.lab = 2, font = 2, ann = FALSE, family = 'Times')

title(xlab = "Number of feature", ylab = "Cross-valication error", font.lab = 2)

par(opar)

# Variable importance - %lncMSE

varImpPlot(forest.pred_Airbnb, family = 'Times')

dev.off()

# The partial dependencies of variables

opar <- par(no.readonly = TRUE)

partialPlot(forest.pred_Airbnb, Data_Airbnb, PopDen, "0", main = " ", xlab = " ", ylab = " ", col = "black")

partialPlot(forest.pred_Airbnb, Data_Airbnb, PGDP, "0", main = " ", xlab = " ", ylab = " ", col = "black") # The same applies to the other variables "Distance", "BusDen", etc.

# Comparison of multiple linear regression and random forest regression results ----------------------------------------------------------------------------------------------

# R-value

cor(lm.pred_Airbnb, Data_Airbnb$Airbnb) # Multiple linear regression R-value

cor(forest.pred_Airbnb, Data_Airbnb$Airbnb) # Random forest regression R-value

# Mean absolute error (MAE)

MAE <- function(actual, predicted){mean(abs(actual - predicted))} # Formula to define MAE

MAE(lm.pred_Airbnb, Data_Airbnb$Airbnb) # Mean absolute error of multiple linear regression

MAE(forest.pred_Airbnb, Data_Airbnb$Airbnb) # Mean absolute error of random forest regression

dev.off()

R语言随机森林模型回归randomForest相关推荐

R语言随机森林模型：计算随机森林模型的特征重要度（feature importance）并可视化特征重要度、使用少数重要特征拟合随机森林模型（比较所有特征模型和重要特征模型在测试集上的表现差异）
R语言随机森林模型:计算随机森林模型的特征重要度(feature importance)并可视化特征重要度.使用少数重要特征拟合随机森林模型(比较所有特征模型和重要特征模型在测试集上的表现差异) 目录
R语言随机森林回归（randomforest）模型构建
R语言随机森林回归(randomforest)模型构建目录 R语言随机森林回归(randomforest)模型构建
R中随机森林模型的学习曲线怎么画
在 R 中,可以使用 caret 包中的 train 函数训练随机森林模型,并使用 plot 函数画出学习曲线. 示例代码: library(caret) data(mtcars)set.seed(1 ...
r语言随机森林回归预测_从零实现回归随机森林
一.前言回归随机森林作为一种机器学习和数据分析领域常用且有效的算法,对其原理和代码实现过程的掌握是非常有必要的.为此,本文将着重介绍从零开始实现回归随机森林的过程,对于随机森林和决策树的相关理论原理 ...
r语言随机森林_随机森林+时间序列（R语言版）
参考自: https://www.statworx.com/at/blog/time-series-forecasting-with-random-forest/ https://www.r-blog ...
随机森林特征重要性计算_R语言随机森林模型中具有相关特征的变量重要性
原文链接: http://tecdat.cn/?p=13546tecdat.cn 变量重要性图是查看模型中哪些变量有趣的好工具.由于我们通常在随机森林中使用它,因此它看起来非常适合非常大的数据集.大 ...
R语言随机森林报错The response has five or fewer unique values. Are you sure you want to do regression原因及解决办法
问题描述在使用随机森林算法建模时,R报错/警告如下: The response has five or fewer unique values. Are you sure you want to d ...
R语言用CPV模型的房地产信贷信用风险的度量和预测
全文链接:http://tecdat.cn/?p=30401 本文基于 CPV 模型, 对房地产信贷风险进行了度量与预测.我们被客户要求撰写关于CPV模型的研究报告(点击文末"阅读原文&qu ...
kaggle项目：基于随机森林模型的心脏病患者预测分类！
公众号:尤而小屋作者:Peter 编辑:Peter 大家好,我是Peter~ 新年的第一个项目实践~给大家分享一个新的kaggle案例:基于随机森林模型(RandomForest)的心脏病人预测分类 ...

R语言随机森林模型回归randomForest

R语言随机森林模型回归randomForest相关推荐

最新文章

热门文章