大家早上好，本人姓吴，大家也可以叫我吴老师。欢迎大家跟我一起走进数据分析的世界，一起学习！

感兴趣的朋友可以关注我或者我的数据分析专栏，里面有许多优质的文章跟大家分享哦。

其实一开始自己没怎么接触过数据分析这个领域，以前都是跟着导师做情感分析这方面的内容，包括了文本情感和多模态情感分类，第一次真正意义上接触数据分析是在一次课堂的大作业上，虽然我最终选题是之前写过的人脸情感分类，但是有朋友选的是二手房价格预测这个主题，在经历一些变数后，最终我同时接手了人脸情感分类和二手房价格预测两个课题。

本篇适合想要学习或从事数据分析行业的朋友阅读，全程干货，建议收藏。

项目来源：https://www.kaggle.com/c/house-prices-advanced-regression-techniques

1 分析目的

基于北京链家网2002年到2018年的二手房买卖成交数据，探索链家二手房估价系统。

2 数据收集

数据共计，26个字段，318852行。
数据详细描述如下：

3 数据清洗

3.1 选择子集

网址，经度，维度，id，Cid跟本次房价预测模型无关，可以选择删除。

其中totalPrice等于price乘以squaer，即房屋总价是平方价乘以平方数。所以，为了使后续预测更具实用性，我们只保留totalPrice和price这两个数据中的一个，选择除去totalPrice，选price作为后续建模的标签（要是同时保留totalPrice作为特征，那我只需要再知道square的大小，不就可以知道单价了，其他数据还有啥存在的意义）。

import pandas as pd
# 读取csv文件的内容
erhouse = pd.read_csv('./csv_files/new.csv',encoding='gb2312', low_memory=False)# 删除指定列
erhouse1 = erhouse.drop(['url','id','Lng','Lat','Cid', 'totalPrice'], axis=1)# 导出保存
erhouse1.to_csv('./csv_files/new_3_1.csv',encoding='gb2312', index=False)

3.2 删除重复值

import pandas as pd
data = pd.read_csv('./csv_files/new_3_1.csv', encoding='gb2312') print(list(data.keys()))  # 列名newdata = data.drop_duplicates(subset=list(data.keys()),  keep='first')newdata.to_csv('./csv_files/new_3_2.csv', encoding='gb2312', index=False)

3.3 数据一致化

未知全改为nan。
floor这里，为了方便后续建模，把floor值改为数值形式，高-1，中-2，低-3，底-4，顶-5，钢混结构和混合结构和楼层无关，全改为nan。
以防后续建模不方便，全部改为数值格式。

import pandas as pd
import numpy as nperhouse = pd.read_csv('./csv_files/new_3_2.csv', encoding='gb2312', low_memory=False)# 删除含有NAME，即数据错误的行
delete_row = []
row = 0
for livingRoom in erhouse['livingRoom']:if 'NAME' in livingRoom:delete_row.append(row)row += 1
# [92234,92250,92266,92269,92296,92298,92299,92303,92339,92348,92355,92397,92408,92413,92466,92519,92609,92659,92813,92844,92898,113271,141371,208208,220560,220562,220563,224329,243711,244034,245374]
erhouse.drop(erhouse.index[delete_row], inplace=True)# 将floor中文转换为数值
erhouse.loc[erhouse['floor'].str.contains('高'), 'floor'] = "1"
erhouse.loc[erhouse['floor'].str.contains('中'), 'floor'] = "2"
erhouse.loc[erhouse['floor'].str.contains('低'), 'floor'] = "3"
erhouse.loc[erhouse['floor'].str.contains('底'), 'floor'] = "4"
erhouse.loc[erhouse['floor'].str.contains('顶'), 'floor'] = "5"
erhouse.loc[erhouse['floor'].str.contains('未知|结构'), 'floor'] = np.nanerhouse.loc[erhouse['constructionTime'].str.contains('未知'), 'constructionTime'] = np.nanerhouse.to_csv('./csv_files/new_3_3.csv', encoding='gb2312', index=False)

3.4 删除异常值

首先查看不同年份的统计数量：

import pandas as pd
import matplotlib.pyplot as plterhouse = pd.read_csv('./csv_files/new_3_3.csv',encoding='gb2312')
y_old = []
y_2002 = erhouse[erhouse['tradeTime'].str.contains('2002')]
y_2003 = erhouse[erhouse['tradeTime'].str.contains('2003')]
y_2008 = erhouse[erhouse['tradeTime'].str.contains('2008')]
y_2009 = erhouse[erhouse['tradeTime'].str.contains('2009')]
y_2010 = erhouse[erhouse['tradeTime'].str.contains('2010')]
y_2011 = erhouse[erhouse['tradeTime'].str.contains('2011')]
y_2012 = erhouse[erhouse['tradeTime'].str.contains('2012')]
y_2013 = erhouse[erhouse['tradeTime'].str.contains('2013')]
y_2014 = erhouse[erhouse['tradeTime'].str.contains('2014')]
y_2015 = erhouse[erhouse['tradeTime'].str.contains('2015')]
y_2016 = erhouse[erhouse['tradeTime'].str.contains('2016')]
y_2017 = erhouse[erhouse['tradeTime'].str.contains('2017')]
y_2018 = erhouse[erhouse['tradeTime'].str.contains('2018')]# 画图显示
# 这两行代码解决 plt 中文显示的问题
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = Falseyear = ['2002年前','2002年','2003年','2008年','2009年','2010年','2011年','2012年','2013年','2014年','2015年','2016年','2017年','2018年']
year = tuple(year)
year_count = [len(y_old),len(y_2002),len(y_2003),len(y_2008),len(y_2009),len(y_2010),len(y_2011),len(y_2012),len(y_2013),len(y_2014),len(y_2015),len(y_2016),len(y_2017),len(y_2018)]for a, b in zip(year, year_count):print(a+": "+str(b))plt.bar(year, year_count)
plt.xlabel("年份")  # 设置X轴Y轴名称
plt.ylabel("数量")
plt.title('年份及其对应数量')# 显示数值
for a, b in zip(year, year_count):plt.text(a, b + 0.05, '%.0f' % b, ha='center', va='bottom', fontsize=8)
plt.show()

时间及其对应统计数量图如下：

有图可得，2010以前的数据很少，基本不对价格造成什么影响，予以删除。

delete_row = []
row = 0
for livingRoom in erhouse['livingRoom']:if '2002' in livingRoom or '2003' in livingRoom or '2008' in livingRoom or '2009' in livingRoom:delete_row.append(row)row += 1
# [94478,126822,126994,223691,93175,296584]
erhouse.drop(erhouse.index[delete_row], inplace=True)erhouse.to_csv('./csv_files/new_3_4.csv', encoding='gb2312', index=False)

3.5 数据导入

import pandas as pd#导入数据集
erhouse = pd.read_csv('./csv_files/new_3_4.csv', encoding='gb2312')#数据集指标检查
print(erhouse.columns)

3.6 数据观察

我们的目标是根据这些特征预测销售价格，下面则围绕价格展开特征的研究。

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns#导入数据集
erhouse = pd.read_csv('./csv_files/new_3_4.csv', encoding='gb2312')
print('The data size is : {} '.format(erhouse.shape))sns.distplot(erhouse['price'], kde=False,norm_hist=False)
plt.show()#skewness and kurtosis 偏峰和峰值
print('Skewness: %f' % erhouse['price'].skew())
print('Kurtosis: %f' % erhouse['price'].kurt())

共有318788行数据。为20列。

price的分布:

有图可知价格集中在20000-50000价格区间，偏度为1.303165和峰值2.173967。

3.7 异常值处理

异常值是在数据集中不合理存在的值，也称为离群点，是样本数据的干扰数据。

前面我们已经处理过时间特征的异常值，接下来，我们会对其它特征进行探索，尽可能减少干扰数据。

我们可以通过描述性统计或正态分布等统计方法查看异常值，具体的异常值划分标准以个人经验和实际情况决定。

异常值处理方法：删除（最简单粗暴的方法，异常值较少时可以采用，数据较多时会对整体数据分布造成不良影响）；按照缺失值处理方法进行；用均值，中位数等数据填充（本文采用的方法）；不处理。

我们主要探索各特征与价格预测之间的关系，异常值也围绕它们之间的关系展开。

先用热力图并绘制出这些特征之间，以及它们与销售价格之间的关系。

import pandas as pd
import seaborn as sns
# 导入数据集
erhouse = pd.read_csv('./csv_files/new_3_4.csv', encoding='gb2312')corrmat = erhouse.corr()
sns.heatmap(corrmat, vmax=.8, square=True)
plt.show()

可以很直观的看到特征与特征之间以及特征与价格之间的相关性强弱。

从上图可以看出，20个特征和价格的相关性有强有弱，那么，为了更好的剔除异常值，我们可以分开对特征和价格的联系进行进一步的研究，最终发现异常值。

price和DOM, followers, square之间的关系图如下：

sns.set()
cols = ['DOM', 'followers', 'square', 'price']
sns.pairplot(erhouse[cols], height=2.5)
plt.show()

从上面这些图中，我们能够很直观的看到这些特征以及它们与价格之间的联系都比较紧密，但是以上三个特征都有明显的异常值。

接下来，我们对这些特征和price的进行进一步的探索。

# price和DOM
var = 'DOM'
data = pd.concat([erhouse['price'], erhouse[var]], axis=1)
data.plot.scatter(x=var, y='price', ylim=(0,150000))
plt.show()

上图透露当DOM更大时，看起来相关性更弱，并且远离点群。

最右边的几个点，离数据点群非常的远，且不符合整体的图表走势，显然是异常值，但是，删除太多可能最后造成过拟合的结果，我们可以选择删除就删除最右边的值。

erhouse.sort_values(by = 'DOM', ascending = False)[:2]
erhouse = erhouse.drop(erhouse[erhouse['DOM'] == 1677].index)

删除以后再看：

# price和followers
var = 'followers'
data = pd.concat([erhouse['price'], erhouse[var]], axis=1)
data.plot.scatter(x=var, y='price', ylim=(0,150000))
plt.show()

上图异常情况不是很显著，不予删除。

# price和square
var = 'square'
data = pd.concat([erhouse['price'], erhouse[var]], axis=1)
data.plot.scatter(x=var, y='price', ylim=(0,150000))
plt.show()

依旧选择只删除最右边，最显著的点。

erhouse.sort_values(by = 'square', ascending = False)[:2]
erhouse = erhouse.drop(erhouse[erhouse['square'] == 1745.5].index)

删除后查看：

price和livingRoom, drawingRoom, kitchen, bathRoom, floor的关系

sns.set()
cols = [ 'livingRoom', 'drawingRoom', 'kitchen', 'bathRoom', 'floor', 'price']
sns.pairplot(erhouse[cols], height = 2.5)
plt.show()

没有很明显的趋势，也看不出明显的异常情况。

# price和buildingType
var = 'buildingType'
data = pd.concat([erhouse['price'], erhouse[var]], axis=1)
data.plot.scatter(x=var, y='price', ylim=(0,150000))
plt.show()

上图异常情况不是很显著，不予删除。

# price和renovationCondition
var = 'renovationCondition'
data = pd.concat([erhouse['price'], erhouse[var]], axis=1)
data.plot.scatter(x=var, y='price', ylim=(0,150000))
plt.show()

不予删除。

# price和buildingStructure
var = 'buildingStructure'
data = pd.concat([erhouse['price'], erhouse[var]], axis=1)
data.plot.scatter(x=var, y='price', ylim=(0,150000))
plt.show()

不予删除。

# price和ladderRatio
var = 'ladderRatio'
data = pd.concat([erhouse['price'], erhouse[var]], axis=1)
data.plot.scatter(x=var, y='price', ylim=(0,150000))
plt.show()

ladderRatio异常情况较为明显，依旧选择只删除最右边，最显著的2个点。

erhouse.sort_values(by='ladderRatio', ascending=False)[:2]
erhouse = erhouse.drop(erhouse[erhouse['ladderRatio'] == 10009400].index)  # 两个点都是10009400

删除以后再看：

勉强可以，剩下的就不处理了。

price和elevator, fiveYearsProperty, subway, district, communityAverage的关系：

sns.set()
cols = [ 'elevator', 'fiveYearsProperty', 'subway', 'district', 'communityAverage', 'price']
sns.pairplot(erhouse[cols], height = 2.5)
plt.show()

看不太出来明显的异常，不处理。

此时还剩多少数据：

print(erhouse.shape)
erhouse.to_csv('./csv_files/new_3_7.csv', encoding='gb2312', index=False)

输出结果为：(318784, 20)。

3.8 目标变量处理

下面我们将通过直方图来看price的分布：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#导入数据集
erhouse = pd.read_csv('../new_3_7.csv',encoding='gb2312')sns.distplot(erhouse['price'],kde=False,norm_hist=False)
plt.show()

可以看出，销售价格在右边倾斜，是非正态分布数据。因为大多数机器学习模型不能很好地处理非正态分布数据，应用log(1+x)变换来修正倾斜。

erhouse['price'] = np.log(erhouse['price'])
# 再画一次
sns.distplot(erhouse['price'],kde=False,norm_hist=False)
plt.show()

分离特征和标签：

erhouse_labels = erhouse['price'].reset_index(drop=True)
features = erhouse.drop(['price'], axis=1)erhouse_labels.to_csv('./csv_files/erhouse_labels.csv',encoding='gb2312', index=False)
features.to_csv('./csv_files/features_3_8.csv',encoding='gb2312', index=False)

3.9 缺失值处理

缺失值会对样本量产生影响，进而影响到整体数据质量。所以，我们应该对缺失值进行更：多的探索，以使我们的数据完整，更能符合建模的需要。

缺失值探索：

# 导入数据集
features = pd.read_csv('./csv_files/features.csv', encoding='gb2312')features_na = (features.isnull().sum() / len(features)) * 100
features_na = features_na.drop(features_na[features_na == 0].index).sort_values(ascending=False)[:30]
missing_data = pd.DataFrame({'Missing Ratio': features_na})
print(missing_data.head)

几种常见的缺失值处理方法如下：
a.删除：删除缺失特征或者单条数据。但是，会造成大量数据的浪费，同时造成样本整体的不均匀。
b.缺失值插补：人工填写；特殊值填充；替换填充（用未缺失的数据的均值填充，众数填充，中位数填充，拉格朗日插值填充）；预测填充。（本文主要采用的是插补的方法。）

# DOM为上架时间，可以用平均时间28.82319154替换
features['DOM'] = features['DOM'].fillna(28.82319154)
# buildingType为建筑类型，可以用常见的4（板楼）替换
features['buildingType'] = features['buildingType'].fillna(4)
# floor为所在楼层类型，可以用常见 2替换
features['floor'] = features['floor'].fillna('2')
# communityAverage为所在小区均价，可以用众数92360替换
features['communityAverage'] = features['communityAverage'].fillna(92360)
# constructionTime为建造时间，可以用众数2004替换
features['constructionTime'] = features['constructionTime'].fillna(2004)features.to_csv('./csv_files/features_3_9.csv', encoding='gb2312', index=False)# 是否还有缺失？
features = pd.read_csv('./csv_files/features_3_9.csv', encoding='gb2312')features_na = (features.isnull().sum() / len(features)) * 100
features_na = features_na.drop(features_na[features_na == 0].index).sort_values(ascending=False)[:30]
missing_data = pd.DataFrame({'Missing Ratio' :features_na})
print(missing_data.head)

此时已无缺失值。

4 特征工程

简单来说，就是使得特征数据转换成适用于模型训练的数据，这个过程可以是转换数据的形式，也可能是增加特征的数量，或者减少特征的数量。

4.1 数据转换

首先，我们需要将我们杂乱的数据变得规范。

数据转换的方式有很多种，比较常用的有对数转换，box-cox转换等变换方式。

在数据清洗的过程中，我们采用了对数转换对数据进行规范化处理，这里，我们将采用box-cox转换。

数据转换是针对数据变量进行的特征处理，先找出数值特征，非数值型先不处理。

numeric_dtypes = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
numeric = []
for i in features.columns:if features[i].dtype in numeric_dtypes:numeric.append(i)

找到倾斜的数值特征：

skew_features = features[numeric].apply(lambda x: skew(x)).sort_values(ascending=False)high_skew = skew_features[skew_features > 0.5]
skew_index = high_skew.indexprint('There are {} numerical features with Skew > 0.5 :'.format(high_skew.shape[0]))
skewness = pd.DataFrame({'Skew' :high_skew})
print(skew_features.head)

我们用scipy函数boxcox1p来计算Box-Cox转换。我们的目标是找到一个简单的转换方式使数据规范化。

for i in skew_index:features[i] = boxcox1p(features[i], boxcox_normmax(features[i] + 1))

确认是否处理完所有倾斜特征。

skew_features = features[numeric].apply(lambda x: skew(x)).sort_values(ascending=False)
high_skew = skew_features[skew_features > 0.5]
skew_index = high_skew.index
print('There are {} numerical features with Skew > 0.5 :'.format(high_skew.shape[0]))
skewness = pd.DataFrame({'Skew' :high_skew})
print(skew_features.head)features.to_csv('./csv_files/features_4_1.csv', encoding='gb2312', index=False)

现在，所有的特征都是正态分布的了。

4.2 增加特征

在这个环节中，要求我们要利用现有的特征数据，通过一些手段（加减乘除对数运算等）来生成新的数据，但是需要注意的是，生成的新数据必须是符号现实逻辑的，这样子用于后续的研究才会更有价值。

#通过加总的特征
#卧室，厨房，卫生间等全部相加
features['TotalNum'] = features['livingRoom'] +features['kitchen']+features['bathRoom']
#建筑类型，装修情况，建筑结构类型，是否满五年，是否有电梯，是否地铁沿线等全部相加
features['TotalYN'] = (features['buildingType'] + features['renovationCondition'] +features['buildingStructure']
+features['fiveYearsProperty']+features['elevator']+features['subway'])#通过相乘的特征
#市场价=区域价格*面积
features['TotaMprice'] = features['communityAverage'] * features['square']

4.3 特征转换

我们通过计算数值特征的对数和平方变换来创建更多的特征。

import pandas as pd
import numpy as npfeatures = pd.read_csv('./csv_files/features_4_2.csv', encoding='gb2312')
#通过对数处理获得新的特征
def logs(res, ls):m = res.shape[1]for l in ls:res = res.assign(newcol=pd.Series(np.log(1.01+res[l])).values)res.columns.values[m] = l + '_log'm += 1return reslog_features = ['DOM','followers','square','livingRoom','kitchen','bathRoom','buildingType','renovationCondition','buildingStructure','ladderRatio','district','communityAverage']features = logs(features, log_features)log_features =  ['DOM','followers','square','livingRoom','kitchen','bathRoom','buildingType','renovationCondition','buildingStructure','ladderRatio','district','communityAverage']features = logs(features, log_features)
#通过平方转换获得新的特征
def squares(res, ls):m = res.shape[1]for l in ls:res = res.assign(newcol=pd.Series(res[l]*res[l]).values)res.columns.values[m] = l + '_sq'm += 1return ressquared_features = ['DOM','followers','square','livingRoom','kitchen','bathRoom','buildingType','renovationCondition','buildingStructure','ladderRatio','district','communityAverage']features = squares(features, squared_features)features.to_csv('./csv_files/features_4_3.csv', encoding='gb2312', index=False)

4.4 去除字符型数据

import pandas as pdfeatures = pd.read_csv('./csv_files/features_4_3.csv', encoding='gb2312')
print(features.info())
features = features.drop(['tradeTime'], axis=1)
print(features.info())features.to_csv('./csv_files/features_4_4.csv', encoding='gb2312', index=False)

5 建立模型

在进行建模之前，我们需要进行一下主成分分析。

主成分分析，是考察多个变量间相关性一种多元统计方法，研究如何通过少数几个主成分来揭示多个变量间的内部结构，即从原始变量中导出少数几个主成分，使它们尽可能多地保留原始变量的信息，且彼此间互不相关.通常数学上的处理就是将原来P个指标作线性组合，作为新的综合指标。

前面我们新增加的特征之间可能存在高度相关性，为此我们可以利用到PCA降低特征间的相关性。需要注意的是，我们的目的不是为了降维，所以我选择的特征维度还是跟之前一样。

import pandas as pd
from sklearn.decomposition import PCAfeatures = pd.read_csv('./csv_files/features_4_4.csv', encoding='gb2312')pca_model = PCA(n_components=57)
features= pca_model.fit_transform(features)

本次项目我选用的模型是支持向量机模型，众所周知，SVM可用于分类问题，也可以用于回归问题。

from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
import joblibdef train_svr(x_train, y_train):# svr = SVR(verbose=True)# parameters = {'kernel':('linear', 'rbf'), 'C':[1, 2, 4], 'gamma':[0.5 ,1, 2]}# clf = GridSearchCV(svr, parameters, scoring='f1')# clf.fit(x_train, y_train, )# print('最佳参数: ')# print(clf.best_params_)  clf = SVR(kernel='rbf', C=1, gamma=1, verbose=True)clf.fit(x_train,y_train)

这里我利用了GridSearchCV的方法来试图找到最优参数，如果有时间的朋友可以跑一跑（估计一周时间），我也没跑完哈哈。

完整过程代码：

import joblib
import pandas as pd
import numpy as npfrom sklearn.decomposition import PCA
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import r2_scoreimport matplotlib.pyplot as pltif __name__ == '__main__':features = pd.read_csv('./csv_files/features_4_4.csv', encoding='gb2312')labels = pd.read_csv('./csv_files/erhouse_labels.csv', encoding='gb2312', header=None)pca_model = PCA(n_components=57)features = pca_model.fit_transform(features)print(features.shape)print(labels.shape)x_tran, x_test, y_train, y_test = train_test_split(features, labels, test_size=0.25)train_svr(x_tran, y_train)print('加载svm模型...')model = joblib.load('model/svr.pkl')y_pred = model.predict(x_test)print("得分:", r2_score(y_test, y_pred))r = len(x_test) + 1print(y_test)plt.plot(np.arange(1, r), y_pred, 'go-', label="predict")plt.plot(np.arange(1, r), y_test, 'co-', label="real")plt.legend()plt.show()

对了，如果想要尝试分类的朋友，可以将价格拆分成几个区间，具体几个取决于你想要做几分类的问题，同时再拆分的时候最好保证这几个区间的数据量差不多，避免出现不平衡问题。

拆分好之后呢，将上述代码中出现“svr”的改成“svc”，并且将“r2_score”改成“classification_report”即可。

结束语

做这个项目的时候自己刚接触数据分析，很多问题都需要学习，借鉴了不少网上的资料，十分感谢各位大牛！

推荐关注的专栏

【大作业项目】二手房价格预测 Python相关推荐

Python二手房价格预测（二）——数据处理及数据可视化
系列文章目录数据获取部分:Python二手房价格预测(一)--数据获取文章目录系列文章目录一.数据清洗二.数据可视化总结一.数据清洗 1.先导入需要的库: import pandas a ...
Python二手房价格预测（三）——二手房价格预测模型baseline
系列文章目录一.Python二手房价格预测(一)--数据获取二.Python二手房价格预测(二)--数据处理及数据可视化文章目录系列文章目录前言一.数据处理二.模型训练 1.引入库 2. ...
2022大作业项目报告
2022大作业项目报告项目名称:2022网络平台招聘信息汇总数据分析班级:大数据2002 专业:大数据技术与应用目录文章目录 2022大作业项目报告项目名称:2022网络平台招聘信息汇总数据 ...
【期末大作业】二手房Python爬虫+Flask前端展示+Echarts可视化大项目
目录前言一.项目目标二.使用工具三.结果展示总结前言爬虫 :一段自动抓取互联网信息的程序,从互联网上抓取对于我们有价值的信息. Python 爬虫架构主要由五个部分组成,分别是调度器.U ...
python二手房价格预测_用python对2019年二手房价格进行数据分析
原标题:用python对2019年二手房价格进行数据分析 ↑关注 + 置顶 ~ 有趣的不像个技术号本文为读者投稿,作者:董汇标MINUS 最近和朋友聊到买房问题,所以对某二手房价格信息进行了爬取,爬 ...
[B4]链家二手房价格预测
"这篇博客主要分享一个数据分析初级项目,基本概括了一个完整项目的各个分析阶段,但是数据获取是直接在链家官网爬取的,这部分先不分享了.过程中还有很多不足的地方,希望各位大佬多多指点." ...
nodejs爬虫大作业项目
第一个爬虫项目完成后,在mysql已经有了一张表格存储爬取到的三个网站的数据.现在大作业要求如下: 首先要对这个过程中使用的一些包进行安装,也就是在项目文件的终端中输入 npm install 将安装 ...
数据结构实验大作业（将之前预测ACM获奖的模型搬到Vue和django上）
目录前言成品展示(UI写的确实有点糊弄,太懒了不想弄了) Vue部分 App.vue(中间感觉还行,不算难看) result组件: 路由: Django部分 view functionset(自己 ...
python二手房价格预测_分析香港2万6千套在售二手房数据，1000万的居然有那么多...
本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理以下文章来源于菜J学Python ,作者J哥前言香港的贫富差距问题一直十分尖锐 ...

【大作业项目】二手房价格预测 Python