#!/usr/bin/env python
# coding: utf-8
# 明确分析⽬的:了解北京近年房价情况,为买房作出指导
# 各区房源数目、平均面积、均价
# 各区房屋总价均值-有/无地铁
# 各区-有地铁-是否配有电梯 均价
# 2017年 2室1厅1厨1卫户型房屋-有电梯/无电梯-有地铁/无地铁 各区均价
# 均价日趋势-统计每⽇所有房源的平均单价
# 2017年 总价200~400万、单价4~7万房源占比
# 引⼊使⽤的库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# 加载数据⽂件
# df = pd.read_csv('./beijing_houst_price.csv')
# 警告 DtypeWarning: Columns (0,6,7,9) have mixed types. Specify dtype option on import or set low_memory=False.
df = pd.read_csv('./beijing_houst_price.csv', dtype={'id':'str','tradeTime':'str', 'livingRoom':'str', 'drawingRoom':'str', 'bathRoom':'str'})
# 简单查看数据有哪些列
df.head()
id tradeTime followers totalPrice price square livingRoom drawingRoom kitchen bathRoom floor buildingType buildingStructure ladderRatio elevator fiveYearsProperty subway district communityAverage
0 101084782030 2016-08-09 106 415.0 31680 131.00 2 1 1 1 高 26 1.0 6 0.217 1.0 0.0 1.0 7 56021.0
1 101086012217 2016-07-28 126 575.0 43436 132.38 2 2 1 2 高 22 1.0 6 0.667 1.0 1.0 0.0 7 71539.0
2 101086041636 2016-12-11 48 1030.0 52021 198.00 3 2 1 3 中 4 4.0 6 0.500 1.0 0.0 0.0 7 48160.0
3 101086406841 2016-09-30 138 297.5 22202 134.00 3 1 1 1 底 21 1.0 6 0.273 1.0 0.0 0.0 6 51238.0
4 101086920653 2016-08-28 286 392.0 48396 81.00 2 1 1 1 中 6 4.0 2 0.333 0.0 1.0 1.0 1 62588.0
# 查看列数目、类型
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 318851 entries, 0 to 318850
Data columns (total 19 columns):
id                   318851 non-null object
tradeTime            318851 non-null object
followers            318851 non-null int64
totalPrice           318851 non-null float64
price                318851 non-null int64
square               318851 non-null float64
livingRoom           318851 non-null object
drawingRoom          318851 non-null object
kitchen              318851 non-null int64
bathRoom             318851 non-null object
floor                318851 non-null object
buildingType         316830 non-null float64
buildingStructure    318851 non-null int64
ladderRatio          318851 non-null float64
elevator             318819 non-null float64
fiveYearsProperty    318819 non-null float64
subway               318819 non-null float64
district             318851 non-null int64
communityAverage     318388 non-null float64
dtypes: float64(8), int64(5), object(6)
memory usage: 46.2+ MB
# 查看数值类型数据的整体信息 常用统计值
df.describe()
followers totalPrice price square kitchen buildingType buildingStructure ladderRatio elevator fiveYearsProperty subway district communityAverage
count 318851.000000 318851.000000 318851.000000 318851.000000 318851.000000 316830.000000 318851.000000 3.188510e+05 318819.000000 318819.000000 318819.000000 318851.000000 318388.000000
mean 16.731508 349.030201 43530.436379 83.240597 0.994599 3.009790 4.451026 6.316486e+01 0.577055 0.645601 0.601112 6.763564 63682.446305
std 34.209185 230.780778 21709.024204 37.234661 0.109609 1.269857 1.901753 2.506851e+04 0.494028 0.478331 0.489670 2.812616 22329.215447
min 0.000000 0.100000 1.000000 6.900000 0.000000 0.048000 0.000000 0.000000e+00 0.000000 0.000000 0.000000 1.000000 10847.000000
25% 0.000000 205.000000 28050.000000 57.900000 1.000000 1.000000 2.000000 2.500000e-01 0.000000 0.000000 0.000000 6.000000 46339.000000
50% 5.000000 294.000000 38737.000000 74.260000 1.000000 4.000000 6.000000 3.330000e-01 1.000000 1.000000 1.000000 7.000000 59015.000000
75% 18.000000 425.500000 53819.500000 98.710000 1.000000 4.000000 6.000000 5.000000e-01 1.000000 1.000000 1.000000 8.000000 75950.000000
max 1143.000000 18130.000000 156250.000000 1745.500000 4.000000 4.000000 6.000000 1.000940e+07 1.000000 1.000000 1.000000 13.000000 183109.000000
# 查看各列⾮空值数量
df.count()
id                   318851
tradeTime            318851
followers            318851
totalPrice           318851
price                318851
square               318851
livingRoom           318851
drawingRoom          318851
kitchen              318851
bathRoom             318851
floor                318851
buildingType         316830
buildingStructure    318851
ladderRatio          318851
elevator             318819
fiveYearsProperty    318819
subway               318819
district             318851
communityAverage     318388
dtype: int64
# 开始数据清理
# 查看是否有重复数据
df[df.duplicated()]
# -->无完全重复的条目
id tradeTime followers totalPrice price square livingRoom drawingRoom kitchen bathRoom floor buildingType buildingStructure ladderRatio elevator fiveYearsProperty subway district communityAverage
# 查看id字段是否有重复值
df[df['id'].duplicated()]
# -->无id重复的条目
id tradeTime followers totalPrice price square livingRoom drawingRoom kitchen bathRoom floor buildingType buildingStructure ladderRatio elevator fiveYearsProperty subway district communityAverage
# 根据分析目标,我们取出需要的列即可
# 'id', 'tradeTime', 'totalPrice', 'price', 'square', 'livingRoom', 'drawingRoom', 'kitchen', 'bathRoom', 'floor', 'elevator', 'subway','district', 'communityAverage'
df = df[['id', 'tradeTime', 'totalPrice', 'price', 'square', 'livingRoom', 'drawingRoom', 'kitchen', 'bathRoom', 'floor', 'elevator', 'subway','district', 'communityAverage']]
# 查看tradeTime列数据情况
df['tradeTime'].value_counts()
2016-02-28    1096
2016-03-06     948
2016-07-31     940
2016-08-31     910
2016-03-05     824...
2011-02-18       1
2010-08-13       1
2010-11-27       1
2010-01-15       1
2010-03-09       1
Name: tradeTime, Length: 2560, dtype: int64
# 可见tradeTime列数据时间跨度大,且年代久远的数据没有太多参考价值,有些时间段数据量太少不具有参考性
# 需要对tradeTime列进行清理
df['tradeTime'] = pd.to_datetime(df['tradeTime'])
# 查看数据类型
df.dtypes
id                          object
tradeTime           datetime64[ns]
totalPrice                 float64
price                        int64
square                     float64
livingRoom                  object
drawingRoom                 object
kitchen                      int64
bathRoom                    object
floor                       object
elevator                   float64
subway                     float64
district                     int64
communityAverage           float64
dtype: object
# 统计各年数据量
df['year'] = df['tradeTime'].dt.year
df['year'].value_counts()
# 02 03 08 09 10 18数据量较少
2016    90829
2015    69805
2017    43217
2013    38751
2012    37221
2014    32602
2011     6010
2018      221
2010      189
2002        3
2009        1
2008        1
2003        1
Name: year, dtype: int64
# 删除数据量较少和年代久远的数据,统计2013~2017年数据
df.drop(df[df['year']  < 2013].index, inplace = True)
df.drop(df[df['year']  > 2017].index, inplace = True)
# 清理totalPrice小于100万的数据-->偏远或者面积太小
df.drop(df[df['totalPrice']  < 100].index, inplace = True)
# 再次查看数据情况
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 272923 entries, 0 to 318850
Data columns (total 15 columns):
id                  272923 non-null object
tradeTime           272923 non-null datetime64[ns]
totalPrice          272923 non-null float64
price               272923 non-null int64
square              272923 non-null float64
livingRoom          272923 non-null object
drawingRoom         272923 non-null object
kitchen             272923 non-null int64
bathRoom            272923 non-null object
floor               272923 non-null object
elevator            272917 non-null float64
subway              272917 non-null float64
district            272923 non-null int64
communityAverage    272558 non-null float64
year                272923 non-null int64
dtypes: datetime64[ns](1), float64(5), int64(4), object(5)
memory usage: 33.3+ MB
# 对于elevator和subway列,是否存在空值
print(df['elevator'].isnull(), df['subway'].isnull())
0         False
1         False
2         False
3         False
4         False...
318846    False
318847    False
318848    False
318849    False
318850    False
Name: elevator, Length: 272923, dtype: bool 0         False
1         False
2         False
3         False
4         False...
318846    False
318847    False
318848    False
318849    False
318850    False
Name: subway, Length: 272923, dtype: bool
# 查看elevator和subway列是否有nan值
print(df['elevator'].value_counts(dropna = False))
print(df['subway'].value_counts(dropna = False))
1.0    157827
0.0    115090
NaN         6
Name: elevator, dtype: int64
1.0    164183
0.0    108734
NaN         6
Name: subway, dtype: int64
df.elevator.fillna('ABCNAN', inplace = True)
df.subway.fillna('ABCNAN', inplace = True)
# 查看数据情况
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 272923 entries, 0 to 318850
Data columns (total 15 columns):
id                  272923 non-null object
tradeTime           272923 non-null datetime64[ns]
totalPrice          272923 non-null float64
price               272923 non-null int64
square              272923 non-null float64
livingRoom          272923 non-null object
drawingRoom         272923 non-null object
kitchen             272923 non-null int64
bathRoom            272923 non-null object
floor               272923 non-null object
elevator            272923 non-null object
subway              272923 non-null object
district            272923 non-null int64
communityAverage    272558 non-null float64
year                272923 non-null int64
dtypes: datetime64[ns](1), float64(3), int64(4), object(7)
memory usage: 33.3+ MB
# 删除elevator和subway异常值数据行
df.drop(df[df['elevator'] == 'ABCNAN'].index, inplace = True)
df.drop(df[df['subway'] == 'ABCNAN'].index, inplace = True)
# 查看数据情况
df.info()
# 可见communityAverage有部分数据缺失
<class 'pandas.core.frame.DataFrame'>
Int64Index: 272917 entries, 0 to 318850
Data columns (total 15 columns):
id                  272917 non-null object
tradeTime           272917 non-null datetime64[ns]
totalPrice          272917 non-null float64
price               272917 non-null int64
square              272917 non-null float64
livingRoom          272917 non-null object
drawingRoom         272917 non-null object
kitchen             272917 non-null int64
bathRoom            272917 non-null object
floor               272917 non-null object
elevator            272917 non-null object
subway              272917 non-null object
district            272917 non-null int64
communityAverage    272552 non-null float64
year                272917 non-null int64
dtypes: datetime64[ns](1), float64(3), int64(4), object(7)
memory usage: 33.3+ MB
# communityAverage
df[df['communityAverage'].isnull()] #查看缺失值所在数据行
id tradeTime totalPrice price square livingRoom drawingRoom kitchen bathRoom floor elevator subway district communityAverage year
2027 101091727692 2016-12-05 1255.0 139290 90.10 4 0 0 0 底 1 0 1 10 NaN 2016
3902 101091913830 2016-06-20 238.0 51830 45.92 1 1 1 1 高 6 0 1 7 NaN 2016
4982 101092003852 2016-06-28 291.0 41195 70.64 1 1 1 1 高 11 1 1 7 NaN 2016
5809 101092065365 2016-09-30 176.0 110000 16.00 1 0 0 0 底 1 0 1 1 NaN 2016
6088 101092088297 2016-07-11 382.0 39024 97.89 2 2 1 1 中 28 1 1 7 NaN 2016
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
316175 BJXC91739524 2016-03-12 155.0 115586 13.41 1 0 0 0 底 1 0 1 10 NaN 2016
317054 BJXC92150717 2016-05-23 214.0 149442 14.32 1 0 0 0 底 1 0 0 10 NaN 2016
317133 BJXC92215207 2016-05-22 227.0 145981 15.55 1 0 0 0 底 1 0 1 10 NaN 2016
317186 BJXC92255534 2016-05-25 191.8 49987 38.36 1 1 1 1 底 1 0 1 10 NaN 2016
317217 BJXC92289286 2016-06-05 180.0 102390 17.58 1 0 0 0 底 1 0 1 10 NaN 2016

365 rows × 15 columns

# 使用平均值填充communityAverage缺失值
df['communityAverage'].fillna(df['communityAverage'].mean(), inplace=True)
# 查看数据情况
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 272917 entries, 0 to 318850
Data columns (total 15 columns):
id                  272917 non-null object
tradeTime           272917 non-null datetime64[ns]
totalPrice          272917 non-null float64
price               272917 non-null int64
square              272917 non-null float64
livingRoom          272917 non-null object
drawingRoom         272917 non-null object
kitchen             272917 non-null int64
bathRoom            272917 non-null object
floor               272917 non-null object
elevator            272917 non-null object
subway              272917 non-null object
district            272917 non-null int64
communityAverage    272917 non-null float64
year                272917 non-null int64
dtypes: datetime64[ns](1), float64(3), int64(4), object(7)
memory usage: 33.3+ MB
# 重新排序索引值
# 删除数据行后,行索引仍然不变,若想使用连续索引数值,则需重新生成
df = df.reset_index()
# 数据清洗完毕,开始分析# 常⽤统计值
df['year'] = df['year'].astype('str') #以免使用describe时对年份进行各种计算
df.describe()
index totalPrice price square kitchen district communityAverage
count 272917.000000 272917.00000 272917.000000 272917.000000 272917.000000 272917.000000 272917.000000
mean 154336.376825 374.59492 46617.560471 83.670028 0.996325 6.738968 63832.871276
std 94013.407724 235.14487 21598.793862 37.584340 0.095351 2.798208 22298.588824
min 0.000000 100.00000 4335.000000 6.900000 0.000000 1.000000 10847.000000
25% 68616.000000 227.00000 31075.000000 58.060000 1.000000 6.000000 46505.000000
50% 157116.000000 316.00000 41700.000000 74.580000 1.000000 7.000000 59179.000000
75% 234415.000000 450.00000 57072.000000 98.850000 1.000000 8.000000 76223.000000
max 318850.000000 18130.00000 156250.000000 1745.500000 4.000000 13.000000 183109.000000
# 各区房源数目、平均面积、均价
df_dis = df.groupby('district', as_index = False)
df_dis_count = df_dis.count()[['district','id']]
df_dis_count.rename(columns={'id':'num'},inplace = True)            # 各区房源数目
df_dis_mean_square = df_dis.mean()[['district','square']]           # 各区房源平均面积
df_dis_mean_comm = df_dis.mean()[['district','communityAverage']]   # 各区均价df_dis_info = pd.merge(df_dis_count, pd.merge(df_dis_mean_square, df_dis_mean_comm, on = 'district'), on = 'district')
df_dis_info.sort_values('num', ascending = False, inplace = True)  # 总表按照各区房源数目降序排列
df_dis_info
district num square communityAverage
6 7 92720 84.822103 63003.715434
5 6 33140 101.336912 43109.573240
7 8 32376 79.900437 79591.773777
9 10 26899 67.598403 101684.515659
1 2 24864 78.545938 54975.301568
0 1 14998 72.598140 89713.498843
3 4 13062 87.911008 44792.826757
10 11 11715 86.374318 44031.363518
8 9 9487 75.088564 50449.493201
12 13 7369 96.919533 39418.455028
4 5 2847 90.847980 36074.011195
2 3 2137 97.458615 48023.558727
11 12 1303 85.445825 39053.454139
df_dis.head()
index id tradeTime totalPrice price square livingRoom drawingRoom kitchen bathRoom floor elevator subway district communityAverage year
0 0 101084782030 2016-08-09 415.0 31680 131.00 2 1 1 1 高 26 1 1 7 56021.0 2016
1 1 101086012217 2016-07-28 575.0 43436 132.38 2 2 1 2 高 22 1 0 7 71539.0 2016
2 2 101086041636 2016-12-11 1030.0 52021 198.00 3 2 1 3 中 4 1 0 7 48160.0 2016
3 3 101086406841 2016-09-30 297.5 22202 134.00 3 1 1 1 底 21 1 0 6 51238.0 2016
4 4 101086920653 2016-08-28 392.0 48396 81.00 2 1 1 1 中 6 0 1 1 62588.0 2016
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
291 295 101090284214 2016-07-29 355.0 32569 109.00 3 2 1 1 中 22 1 0 9 47095.0 2016
397 403 101090681279 2016-07-27 395.0 29351 134.58 3 2 1 2 底 6 0 0 11 40026.0 2016
472 480 101090865126 2016-09-12 290.0 43524 66.63 1 2 1 1 底 9 1 1 11 39787.0 2016
578 587 101091076817 2016-08-31 410.0 33607 122.00 3 2 1 2 高 7 0 1 11 42790.0 2016
580 589 101091085930 2016-06-23 395.0 36336 108.71 2 2 1 2 低 12 1 0 11 43196.0 2016

65 rows × 16 columns

# 各区房屋总价均值-有/无地铁(假设subway值为1时为有地铁)
df_dis_sub = df[['id', 'district', 'subway','totalPrice']]
df_dis_sub = df_dis_sub.groupby(['district', 'subway']).mean()print(df_dis_sub)
# df_dis_sub_1 = df_dis[df_dis['subway'] == 1]# df_dis_sub_0 = df_dis[df_dis['subway'] == 0]
# df_dis_sub_0
                 totalPrice
district subway
1        0.0     469.8782651.0     465.033473
2        0.0     322.2504881.0     315.975104
3        0.0     372.5883841.0     257.979536
4        0.0     277.0029991.0     281.420831
5        0.0     238.8219411.0     281.531037
6        0.0     296.8973131.0     312.152069
7        0.0     375.2044741.0     401.876686
8        0.0     456.3726471.0     463.602992
9        0.0     286.7956641.0     291.473189
10       0.0     469.2435721.0     486.890207
11       0.0     242.5576781.0     264.659448
12       0.0     250.7085321.0     426.000000
13       0.0     257.4199351.0     231.075452
# 各区-有地铁的-是否配有电梯 均价
df_dis_sub_01 = df[['id', 'district', 'subway', 'elevator', 'totalPrice']]
df_dis_sub_1 = df_dis_sub_01[df_dis_sub_01['subway'] == 1]
df_dis_sub_1 = df_dis_sub_1.groupby(['district', 'elevator'], as_index = False).mean()
df_dis_sub_1.rename(columns = {'totalPrice':'totalPrice_mean'}, inplace = True)
print(df_dis_sub_1)
    district  elevator  totalPrice_mean
0          1       0.0       415.038504
1          1       1.0       500.240515
2          2       0.0       267.338108
3          2       1.0       334.515961
4          3       0.0       518.142857
5          3       1.0       236.086011
6          4       0.0       239.914759
7          4       1.0       335.179934
8          5       0.0       258.421368
9          5       1.0       287.453998
10         6       0.0       308.621994
11         6       1.0       316.539542
12         7       0.0       302.067409
13         7       1.0       441.639936
14         8       0.0       409.848486
15         8       1.0       512.433305
16         9       0.0       222.063187
17         9       1.0       345.343639
18        10       0.0       461.317588
19        10       1.0       510.121250
20        11       0.0       253.602230
21        11       1.0       268.960810
22        12       1.0       426.000000
23        13       0.0       202.840370
24        13       1.0       281.763431
# 2017年 2室1厅1厨1卫户型房屋-有电梯/无电梯-有地铁/无地铁 各区均价
df_dis_want = df[['id', 'district','livingRoom', 'drawingRoom', 'kitchen', 'bathRoom', 'subway', 'elevator', 'totalPrice','year']]
print(df_dis_want.info())
df_dis_w = df_dis_want[(df['year'] == '2017') & (df['livingRoom'] == '2') & (df['drawingRoom'] == '1') & (df['kitchen'] == 1) & (df['bathRoom'] == '1')]
# 注意到判别条件这里,数据类型不同判别条件中需要考虑是否加引号'',这也可认为是本次数据清洗环节的疏漏
df_dis_w = df_dis_w.groupby(['district', 'elevator', 'subway'], as_index = False).mean()
df_dis_w.rename(columns = {'totalPrice':'totalPrice_mean'}, inplace = True)
print(df_dis_w)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 272917 entries, 0 to 272916
Data columns (total 10 columns):
id             272917 non-null object
district       272917 non-null int64
livingRoom     272917 non-null object
drawingRoom    272917 non-null object
kitchen        272917 non-null int64
bathRoom       272917 non-null object
subway         272917 non-null object
elevator       272917 non-null object
totalPrice     272917 non-null float64
year           272917 non-null object
dtypes: float64(1), int64(2), object(7)
memory usage: 20.8+ MB
None
district elevator subway kitchen totalPrice_mean
0 1 0.0 0.0 1 493.103448
1 1 0.0 1.0 1 597.487764
2 1 1.0 0.0 1 656.100000
3 1 1.0 1.0 1 700.313229
4 2 0.0 0.0 1 355.323571
5 2 0.0 1.0 1 381.905593
6 2 1.0 0.0 1 495.709938
7 2 1.0 1.0 1 466.610455
8 3 0.0 0.0 1 358.381250
9 3 0.0 1.0 1 541.500000
10 3 1.0 0.0 1 457.250000
11 3 1.0 1.0 1 448.420000
12 4 0.0 0.0 1 311.772622
13 4 0.0 1.0 1 305.311983
14 4 1.0 0.0 1 439.410204
15 4 1.0 1.0 1 412.669841
16 5 0.0 0.0 1 256.658491
17 5 0.0 1.0 1 316.978571
18 5 1.0 0.0 1 317.132948
19 5 1.0 1.0 1 359.614839
20 6 0.0 0.0 1 352.025395
21 6 0.0 1.0 1 395.474759
22 6 1.0 0.0 1 420.451366
23 6 1.0 1.0 1 439.371875
24 7 0.0 0.0 1 364.164934
25 7 0.0 1.0 1 409.597200
26 7 1.0 0.0 1 554.104437
27 7 1.0 1.0 1 536.223223
28 8 0.0 0.0 1 503.982394
29 8 0.0 1.0 1 532.799109
30 8 1.0 0.0 1 621.156806
31 8 1.0 1.0 1 653.117304
32 9 0.0 0.0 1 315.139793
33 9 0.0 1.0 1 322.747917
34 9 1.0 0.0 1 480.467907
35 9 1.0 1.0 1 445.882243
36 10 0.0 0.0 1 644.037000
37 10 0.0 1.0 1 638.352427
38 10 1.0 0.0 1 741.245455
39 10 1.0 1.0 1 744.362667
40 11 0.0 0.0 1 356.081275
41 11 0.0 1.0 1 389.598276
42 11 1.0 0.0 1 374.167647
43 11 1.0 1.0 1 425.826744
44 12 0.0 0.0 1 298.251190
45 12 1.0 0.0 1 401.925397
46 12 1.0 1.0 1 390.000000
47 13 0.0 0.0 1 303.945556
48 13 0.0 1.0 1 290.141912
49 13 1.0 0.0 1 388.379070
50 13 1.0 1.0 1 409.612766
# 均价⽇趋势
# 统计每⽇所有房源的平均单价
df_day_price = df.groupby('tradeTime').mean()['price']
df_day_price.sort_index(inplace=True) # 按照索引排序
df_day_price.plot() # 画出趋势图


每年初期出现了明显异常值,是因为什么导致的/还是说本身就是错误值?

# 2017年 总价200~400万、单价5~8万、配电梯(假设elevator值为1时为有电梯) 的房源占比
df_2017 = df[df['year'] == '2017']
num1 = len(df[(df['totalPrice'] > 200) & (df['totalPrice'] < 400) & (df['price'] > 40000) &( df['price'] < 70000) & (df['elevator'] == 1)] )
num2 = len(df_2017) # 2017年数据条数
want_ratio = num1/num2
print(want_ratio) #占比
0.6929146649957065

Python数据分析-北京房价分析相关推荐

  1. Python数据分析初学之分析表格

    文章目录 Python数据分析初学之分析表格 任务要求 代码实现 Python数据分析初学之分析表格 任务要求 1)使用 pandas 读取文件 data.csv 中的数据 ,创建 DataFrame ...

  2. 【详解】Python数据分析第三方库分析

    Python数据分析第三方库分析 目录 Python数据分析第三方库分析 @常用库下载地址 1 Numpy 2 Matplotlib 3 Pandas 4 SciPy 5 Scikit-Learn 6 ...

  3. python数据分析的交叉分析和分组分析 -第三次笔记

    python数据分析 -第三次笔记 –1.交叉分析 –2.分组分析 1.交叉分析 交叉分析的含义是在纵向分析法和横向分析法的基础上,从交叉.立体的角度出发,由浅入深.由低级到高级的一种分析方法.这种方 ...

  4. python波士顿房价是什么数据,Python数据分析 | 波士顿房价回归分析

    分析目标: 将波士顿房价的数据集进行描述性数据分析.预测性数据分析(主要用了回归分析),可用于预测房价. 数据集介绍: 卡内基梅隆大学收集,StatLib库,1978年,涵盖了麻省波士顿的506个不同 ...

  5. python数据分析北京_Python实现的北京积分落户数据分析示例

    本文实例讲述了Python实现的北京积分落户数据分析.分享给大家供大家参考,具体如下: 北京积分落户状况 获取数据(爬虫/文件下载)-> 分析 (维度-指标) 从公司维度分析不同公司对落户人数指 ...

  6. Python数据分析——基金定投收益率分析,以及支付宝“慧定投”智能定投实现

    文章目录 一.关于基金定投 数据来源 接口规范 常见指数基金/股票代码 二.分析目标 三.代码实现 1.定义获取数据.清洗数据的函数 2.定义定投策略函数 3.计算2019年对沪深300指数基金进行定 ...

  7. python数据比例_#python# #数据分析# 性别比例分析

    手头有一份性别比例的样本数据,清洗后只保留了性别信息,做了一个数据分析. 数据清洗和数据统计的代码就不贴了,贴性别比例pie图和性别比例趋势图的代码. 性别比例pie图: def _plot_gend ...

  8. Python数据分析之探索性分析(多因子复合分析)

    目录 一.假设检验: 二.交叉分析 1.分析属性与属性之间关系的方法 2.透视表 三.分组与钻取: 四.相关分析 1.相关系数分析 2.熵:条件熵:互信息(熵增益):增益率:基尼系数: 3.衡量离散数 ...

  9. 【Python数据分析】房价数据分析实战(包含源码和数据)

    今天我们利用波士顿房价进行简单分析,快速熟悉数据挖掘和分析的一般流程. 1.导入数据. 2.查看数据维度,从结果可以出,该数据一共有506条记录,14个特征,然后再输出特征的名字和数据类型. 3.然后 ...

  10. python数据分析之对比分析

    对比分析 概念:两个互相联系的指标进行比较 类型:绝对数比较(相减) .相对数比较(相除) 其中相对数比较分析也包括:结构分析.比例分析.动态对比分析 1.绝对数比较 a.对比的指标在量级上不能差别过 ...

最新文章

  1. wordpress on Zencart (WOZ) Ultimate SEO URLs 静态化
  2. css3抽奖转盘,从零制作CSS3抽奖大转盘
  3. ASP.NETCore微服务(七)——【docker部署linux上线】(ECS+linux+docker+API上线部分)
  4. ZAB协议选主过程详解
  5. 单元测试debug过程中,显示variables are not available
  6. 中年高校教师、行政人员的21个特征!
  7. cortex M0 典型os模型
  8. python3怎么使用mnist_loader_Python读取mnist
  9. 项目管理指标_企业工程项目管理部门绩效考核KPI关键指标,共4个维度113项指标...
  10. 【Java】恶搞程序实现桌面无限弹窗
  11. 实习日记——Day38
  12. java毕业设计——基于java+Eclipse的扫雷游戏设计与实现(毕业论文+程序源码)——扫雷游戏
  13. 【Jupyter Notebook】添加目录--Table of Contents
  14. IOS APP 公司主体变更的转让流程
  15. 08年中报大幅预增股
  16. 题目1(15分)对spark1.txt文件进行筛选,将A或者包含A的字母筛选出来并统计个数,然后输出到dome1文件中。
  17. 卸载虚拟机出现用户已存在的错误_用虚拟机安装360全家桶是什么体验
  18. Linux下的AudoCAD替代软件
  19. 开企业邮箱需要服务器么,企业邮箱一定要虚拟主机吗
  20. 数仓工具—Hive实战之日活跃周活跃月活(12)

热门文章

  1. 3D打印机DIY之六------G代码命令
  2. 400 : perceived to be a client error 错误
  3. 计算机系固态硬盘机械硬盘,直观:如何在固态硬盘+机械硬盘上安装系统_IT /计算机_资料...
  4. 六年如逆旅,我亦是行人 ——一个顾问的六年安全从业经历
  5. 腾讯精选50题—Day6题目43,46,53
  6. 华为往事(十八)--CC08 STP:华为抢占制高点
  7. 这些个适合oier的网站丫太有趣了吧(不定期更新中)
  8. 百度笔记聚合是什么?
  9. 未能联接game center服务器,game center连接不成功怎么办 有哪些修复步骤 - 驱动管家...
  10. 产品经理不再纸上谈兵——关于用户默认头像的思考