Python数据分析-北京房价分析

#!/usr/bin/env python
# coding: utf-8

# 明确分析⽬的：了解北京近年房价情况，为买房作出指导
# 各区房源数目、平均面积、均价
# 各区房屋总价均值-有/无地铁
# 各区-有地铁-是否配有电梯 均价
# 2017年 2室1厅1厨1卫户型房屋-有电梯/无电梯-有地铁/无地铁 各区均价
# 均价日趋势-统计每⽇所有房源的平均单价
# 2017年 总价200~400万、单价4~7万房源占比

# 引⼊使⽤的库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# 加载数据⽂件
# df = pd.read_csv('./beijing_houst_price.csv')
# 警告 DtypeWarning: Columns (0,6,7,9) have mixed types. Specify dtype option on import or set low_memory=False.
df = pd.read_csv('./beijing_houst_price.csv', dtype={'id':'str','tradeTime':'str', 'livingRoom':'str', 'drawingRoom':'str', 'bathRoom':'str'})

# 简单查看数据有哪些列
df.head()

	id	tradeTime	followers	totalPrice	price	square	livingRoom	drawingRoom	kitchen	bathRoom	floor	buildingType	buildingStructure	ladderRatio	elevator	fiveYearsProperty	subway	district	communityAverage
0	101084782030	2016-08-09	106	415.0	31680	131.00	2	1	1	1	高 26	1.0	6	0.217	1.0	0.0	1.0	7	56021.0
1	101086012217	2016-07-28	126	575.0	43436	132.38	2	2	1	2	高 22	1.0	6	0.667	1.0	1.0	0.0	7	71539.0
2	101086041636	2016-12-11	48	1030.0	52021	198.00	3	2	1	3	中 4	4.0	6	0.500	1.0	0.0	0.0	7	48160.0
3	101086406841	2016-09-30	138	297.5	22202	134.00	3	1	1	1	底 21	1.0	6	0.273	1.0	0.0	0.0	6	51238.0
4	101086920653	2016-08-28	286	392.0	48396	81.00	2	1	1	1	中 6	4.0	2	0.333	0.0	1.0	1.0	1	62588.0

# 查看列数目、类型
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 318851 entries, 0 to 318850
Data columns (total 19 columns):
id                   318851 non-null object
tradeTime            318851 non-null object
followers            318851 non-null int64
totalPrice           318851 non-null float64
price                318851 non-null int64
square               318851 non-null float64
livingRoom           318851 non-null object
drawingRoom          318851 non-null object
kitchen              318851 non-null int64
bathRoom             318851 non-null object
floor                318851 non-null object
buildingType         316830 non-null float64
buildingStructure    318851 non-null int64
ladderRatio          318851 non-null float64
elevator             318819 non-null float64
fiveYearsProperty    318819 non-null float64
subway               318819 non-null float64
district             318851 non-null int64
communityAverage     318388 non-null float64
dtypes: float64(8), int64(5), object(6)
memory usage: 46.2+ MB

# 查看数值类型数据的整体信息 常用统计值
df.describe()

	followers	totalPrice	price	square	kitchen	buildingType	buildingStructure	ladderRatio	elevator	fiveYearsProperty	subway	district	communityAverage
count	318851.000000	318851.000000	318851.000000	318851.000000	318851.000000	316830.000000	318851.000000	3.188510e+05	318819.000000	318819.000000	318819.000000	318851.000000	318388.000000
mean	16.731508	349.030201	43530.436379	83.240597	0.994599	3.009790	4.451026	6.316486e+01	0.577055	0.645601	0.601112	6.763564	63682.446305
std	34.209185	230.780778	21709.024204	37.234661	0.109609	1.269857	1.901753	2.506851e+04	0.494028	0.478331	0.489670	2.812616	22329.215447
min	0.000000	0.100000	1.000000	6.900000	0.000000	0.048000	0.000000	0.000000e+00	0.000000	0.000000	0.000000	1.000000	10847.000000
25%	0.000000	205.000000	28050.000000	57.900000	1.000000	1.000000	2.000000	2.500000e-01	0.000000	0.000000	0.000000	6.000000	46339.000000
50%	5.000000	294.000000	38737.000000	74.260000	1.000000	4.000000	6.000000	3.330000e-01	1.000000	1.000000	1.000000	7.000000	59015.000000
75%	18.000000	425.500000	53819.500000	98.710000	1.000000	4.000000	6.000000	5.000000e-01	1.000000	1.000000	1.000000	8.000000	75950.000000
max	1143.000000	18130.000000	156250.000000	1745.500000	4.000000	4.000000	6.000000	1.000940e+07	1.000000	1.000000	1.000000	13.000000	183109.000000

# 查看各列⾮空值数量
df.count()

id                   318851
tradeTime            318851
followers            318851
totalPrice           318851
price                318851
square               318851
livingRoom           318851
drawingRoom          318851
kitchen              318851
bathRoom             318851
floor                318851
buildingType         316830
buildingStructure    318851
ladderRatio          318851
elevator             318819
fiveYearsProperty    318819
subway               318819
district             318851
communityAverage     318388
dtype: int64

# 开始数据清理

# 查看是否有重复数据
df[df.duplicated()]
# -->无完全重复的条目

	id	tradeTime	followers	totalPrice	price	square	livingRoom	drawingRoom	kitchen	bathRoom	floor	buildingType	buildingStructure	ladderRatio	elevator	fiveYearsProperty	subway	district	communityAverage

# 查看id字段是否有重复值
df[df['id'].duplicated()]
# -->无id重复的条目

	id	tradeTime	followers	totalPrice	price	square	livingRoom	drawingRoom	kitchen	bathRoom	floor	buildingType	buildingStructure	ladderRatio	elevator	fiveYearsProperty	subway	district	communityAverage

# 根据分析目标，我们取出需要的列即可
# 'id', 'tradeTime', 'totalPrice', 'price', 'square', 'livingRoom', 'drawingRoom', 'kitchen', 'bathRoom', 'floor', 'elevator', 'subway','district', 'communityAverage'
df = df[['id', 'tradeTime', 'totalPrice', 'price', 'square', 'livingRoom', 'drawingRoom', 'kitchen', 'bathRoom', 'floor', 'elevator', 'subway','district', 'communityAverage']]

# 查看tradeTime列数据情况
df['tradeTime'].value_counts()

2016-02-28    1096
2016-03-06     948
2016-07-31     940
2016-08-31     910
2016-03-05     824...
2011-02-18       1
2010-08-13       1
2010-11-27       1
2010-01-15       1
2010-03-09       1
Name: tradeTime, Length: 2560, dtype: int64

# 可见tradeTime列数据时间跨度大，且年代久远的数据没有太多参考价值，有些时间段数据量太少不具有参考性
# 需要对tradeTime列进行清理
df['tradeTime'] = pd.to_datetime(df['tradeTime'])
# 查看数据类型
df.dtypes

id                          object
tradeTime           datetime64[ns]
totalPrice                 float64
price                        int64
square                     float64
livingRoom                  object
drawingRoom                 object
kitchen                      int64
bathRoom                    object
floor                       object
elevator                   float64
subway                     float64
district                     int64
communityAverage           float64
dtype: object

# 统计各年数据量
df['year'] = df['tradeTime'].dt.year
df['year'].value_counts()
# 02 03 08 09 10 18数据量较少

2016    90829
2015    69805
2017    43217
2013    38751
2012    37221
2014    32602
2011     6010
2018      221
2010      189
2002        3
2009        1
2008        1
2003        1
Name: year, dtype: int64

# 删除数据量较少和年代久远的数据，统计2013~2017年数据
df.drop(df[df['year']  < 2013].index, inplace = True)
df.drop(df[df['year']  > 2017].index, inplace = True)

# 清理totalPrice小于100万的数据-->偏远或者面积太小
df.drop(df[df['totalPrice']  < 100].index, inplace = True)

# 再次查看数据情况
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 272923 entries, 0 to 318850
Data columns (total 15 columns):
id                  272923 non-null object
tradeTime           272923 non-null datetime64[ns]
totalPrice          272923 non-null float64
price               272923 non-null int64
square              272923 non-null float64
livingRoom          272923 non-null object
drawingRoom         272923 non-null object
kitchen             272923 non-null int64
bathRoom            272923 non-null object
floor               272923 non-null object
elevator            272917 non-null float64
subway              272917 non-null float64
district            272923 non-null int64
communityAverage    272558 non-null float64
year                272923 non-null int64
dtypes: datetime64[ns](1), float64(5), int64(4), object(5)
memory usage: 33.3+ MB

# 对于elevator和subway列，是否存在空值
print(df['elevator'].isnull(), df['subway'].isnull())

0         False
1         False
2         False
3         False
4         False...
318846    False
318847    False
318848    False
318849    False
318850    False
Name: elevator, Length: 272923, dtype: bool 0         False
1         False
2         False
3         False
4         False...
318846    False
318847    False
318848    False
318849    False
318850    False
Name: subway, Length: 272923, dtype: bool

# 查看elevator和subway列是否有nan值
print(df['elevator'].value_counts(dropna = False))
print(df['subway'].value_counts(dropna = False))

1.0    157827
0.0    115090
NaN         6
Name: elevator, dtype: int64
1.0    164183
0.0    108734
NaN         6
Name: subway, dtype: int64

df.elevator.fillna('ABCNAN', inplace = True)
df.subway.fillna('ABCNAN', inplace = True)

# 查看数据情况
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 272923 entries, 0 to 318850
Data columns (total 15 columns):
id                  272923 non-null object
tradeTime           272923 non-null datetime64[ns]
totalPrice          272923 non-null float64
price               272923 non-null int64
square              272923 non-null float64
livingRoom          272923 non-null object
drawingRoom         272923 non-null object
kitchen             272923 non-null int64
bathRoom            272923 non-null object
floor               272923 non-null object
elevator            272923 non-null object
subway              272923 non-null object
district            272923 non-null int64
communityAverage    272558 non-null float64
year                272923 non-null int64
dtypes: datetime64[ns](1), float64(3), int64(4), object(7)
memory usage: 33.3+ MB

# 删除elevator和subway异常值数据行
df.drop(df[df['elevator'] == 'ABCNAN'].index, inplace = True)
df.drop(df[df['subway'] == 'ABCNAN'].index, inplace = True)

# 查看数据情况
df.info()
# 可见communityAverage有部分数据缺失

<class 'pandas.core.frame.DataFrame'>
Int64Index: 272917 entries, 0 to 318850
Data columns (total 15 columns):
id                  272917 non-null object
tradeTime           272917 non-null datetime64[ns]
totalPrice          272917 non-null float64
price               272917 non-null int64
square              272917 non-null float64
livingRoom          272917 non-null object
drawingRoom         272917 non-null object
kitchen             272917 non-null int64
bathRoom            272917 non-null object
floor               272917 non-null object
elevator            272917 non-null object
subway              272917 non-null object
district            272917 non-null int64
communityAverage    272552 non-null float64
year                272917 non-null int64
dtypes: datetime64[ns](1), float64(3), int64(4), object(7)
memory usage: 33.3+ MB

# communityAverage
df[df['communityAverage'].isnull()] #查看缺失值所在数据行

	id	tradeTime	totalPrice	price	square	livingRoom	drawingRoom	kitchen	bathRoom	floor	elevator	subway	district	communityAverage	year
2027	101091727692	2016-12-05	1255.0	139290	90.10	4	0	0	0	底 1	0	1	10	NaN	2016
3902	101091913830	2016-06-20	238.0	51830	45.92	1	1	1	1	高 6	0	1	7	NaN	2016
4982	101092003852	2016-06-28	291.0	41195	70.64	1	1	1	1	高 11	1	1	7	NaN	2016
5809	101092065365	2016-09-30	176.0	110000	16.00	1	0	0	0	底 1	0	1	1	NaN	2016
6088	101092088297	2016-07-11	382.0	39024	97.89	2	2	1	1	中 28	1	1	7	NaN	2016
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
316175	BJXC91739524	2016-03-12	155.0	115586	13.41	1	0	0	0	底 1	0	1	10	NaN	2016
317054	BJXC92150717	2016-05-23	214.0	149442	14.32	1	0	0	0	底 1	0	0	10	NaN	2016
317133	BJXC92215207	2016-05-22	227.0	145981	15.55	1	0	0	0	底 1	0	1	10	NaN	2016
317186	BJXC92255534	2016-05-25	191.8	49987	38.36	1	1	1	1	底 1	0	1	10	NaN	2016
317217	BJXC92289286	2016-06-05	180.0	102390	17.58	1	0	0	0	底 1	0	1	10	NaN	2016

365 rows × 15 columns

# 使用平均值填充communityAverage缺失值
df['communityAverage'].fillna(df['communityAverage'].mean(), inplace=True)

# 查看数据情况
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 272917 entries, 0 to 318850
Data columns (total 15 columns):
id                  272917 non-null object
tradeTime           272917 non-null datetime64[ns]
totalPrice          272917 non-null float64
price               272917 non-null int64
square              272917 non-null float64
livingRoom          272917 non-null object
drawingRoom         272917 non-null object
kitchen             272917 non-null int64
bathRoom            272917 non-null object
floor               272917 non-null object
elevator            272917 non-null object
subway              272917 non-null object
district            272917 non-null int64
communityAverage    272917 non-null float64
year                272917 non-null int64
dtypes: datetime64[ns](1), float64(3), int64(4), object(7)
memory usage: 33.3+ MB

# 重新排序索引值
# 删除数据行后，行索引仍然不变，若想使用连续索引数值，则需重新生成
df = df.reset_index()

# 数据清洗完毕，开始分析# 常⽤统计值
df['year'] = df['year'].astype('str') #以免使用describe时对年份进行各种计算
df.describe()

	index	totalPrice	price	square	kitchen	district	communityAverage
count	272917.000000	272917.00000	272917.000000	272917.000000	272917.000000	272917.000000	272917.000000
mean	154336.376825	374.59492	46617.560471	83.670028	0.996325	6.738968	63832.871276
std	94013.407724	235.14487	21598.793862	37.584340	0.095351	2.798208	22298.588824
min	0.000000	100.00000	4335.000000	6.900000	0.000000	1.000000	10847.000000
25%	68616.000000	227.00000	31075.000000	58.060000	1.000000	6.000000	46505.000000
50%	157116.000000	316.00000	41700.000000	74.580000	1.000000	7.000000	59179.000000
75%	234415.000000	450.00000	57072.000000	98.850000	1.000000	8.000000	76223.000000
max	318850.000000	18130.00000	156250.000000	1745.500000	4.000000	13.000000	183109.000000

# 各区房源数目、平均面积、均价
df_dis = df.groupby('district', as_index = False)
df_dis_count = df_dis.count()[['district','id']]
df_dis_count.rename(columns={'id':'num'},inplace = True)            # 各区房源数目
df_dis_mean_square = df_dis.mean()[['district','square']]           # 各区房源平均面积
df_dis_mean_comm = df_dis.mean()[['district','communityAverage']]   # 各区均价df_dis_info = pd.merge(df_dis_count, pd.merge(df_dis_mean_square, df_dis_mean_comm, on = 'district'), on = 'district')
df_dis_info.sort_values('num', ascending = False, inplace = True)  # 总表按照各区房源数目降序排列
df_dis_info

	district	num	square	communityAverage
6	7	92720	84.822103	63003.715434
5	6	33140	101.336912	43109.573240
7	8	32376	79.900437	79591.773777
9	10	26899	67.598403	101684.515659
1	2	24864	78.545938	54975.301568
0	1	14998	72.598140	89713.498843
3	4	13062	87.911008	44792.826757
10	11	11715	86.374318	44031.363518
8	9	9487	75.088564	50449.493201
12	13	7369	96.919533	39418.455028
4	5	2847	90.847980	36074.011195
2	3	2137	97.458615	48023.558727
11	12	1303	85.445825	39053.454139

df_dis.head()

	index	id	tradeTime	totalPrice	price	square	livingRoom	drawingRoom	kitchen	bathRoom	floor	elevator	subway	district	communityAverage	year
0	0	101084782030	2016-08-09	415.0	31680	131.00	2	1	1	1	高 26	1	1	7	56021.0	2016
1	1	101086012217	2016-07-28	575.0	43436	132.38	2	2	1	2	高 22	1	0	7	71539.0	2016
2	2	101086041636	2016-12-11	1030.0	52021	198.00	3	2	1	3	中 4	1	0	7	48160.0	2016
3	3	101086406841	2016-09-30	297.5	22202	134.00	3	1	1	1	底 21	1	0	6	51238.0	2016
4	4	101086920653	2016-08-28	392.0	48396	81.00	2	1	1	1	中 6	0	1	1	62588.0	2016
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
291	295	101090284214	2016-07-29	355.0	32569	109.00	3	2	1	1	中 22	1	0	9	47095.0	2016
397	403	101090681279	2016-07-27	395.0	29351	134.58	3	2	1	2	底 6	0	0	11	40026.0	2016
472	480	101090865126	2016-09-12	290.0	43524	66.63	1	2	1	1	底 9	1	1	11	39787.0	2016
578	587	101091076817	2016-08-31	410.0	33607	122.00	3	2	1	2	高 7	0	1	11	42790.0	2016
580	589	101091085930	2016-06-23	395.0	36336	108.71	2	2	1	2	低 12	1	0	11	43196.0	2016

65 rows × 16 columns

# 各区房屋总价均值-有/无地铁(假设subway值为1时为有地铁)
df_dis_sub = df[['id', 'district', 'subway','totalPrice']]
df_dis_sub = df_dis_sub.groupby(['district', 'subway']).mean()print(df_dis_sub)
# df_dis_sub_1 = df_dis[df_dis['subway'] == 1]# df_dis_sub_0 = df_dis[df_dis['subway'] == 0]
# df_dis_sub_0

                 totalPrice
district subway
1        0.0     469.8782651.0     465.033473
2        0.0     322.2504881.0     315.975104
3        0.0     372.5883841.0     257.979536
4        0.0     277.0029991.0     281.420831
5        0.0     238.8219411.0     281.531037
6        0.0     296.8973131.0     312.152069
7        0.0     375.2044741.0     401.876686
8        0.0     456.3726471.0     463.602992
9        0.0     286.7956641.0     291.473189
10       0.0     469.2435721.0     486.890207
11       0.0     242.5576781.0     264.659448
12       0.0     250.7085321.0     426.000000
13       0.0     257.4199351.0     231.075452

# 各区-有地铁的-是否配有电梯 均价
df_dis_sub_01 = df[['id', 'district', 'subway', 'elevator', 'totalPrice']]
df_dis_sub_1 = df_dis_sub_01[df_dis_sub_01['subway'] == 1]
df_dis_sub_1 = df_dis_sub_1.groupby(['district', 'elevator'], as_index = False).mean()
df_dis_sub_1.rename(columns = {'totalPrice':'totalPrice_mean'}, inplace = True)
print(df_dis_sub_1)

    district  elevator  totalPrice_mean
0          1       0.0       415.038504
1          1       1.0       500.240515
2          2       0.0       267.338108
3          2       1.0       334.515961
4          3       0.0       518.142857
5          3       1.0       236.086011
6          4       0.0       239.914759
7          4       1.0       335.179934
8          5       0.0       258.421368
9          5       1.0       287.453998
10         6       0.0       308.621994
11         6       1.0       316.539542
12         7       0.0       302.067409
13         7       1.0       441.639936
14         8       0.0       409.848486
15         8       1.0       512.433305
16         9       0.0       222.063187
17         9       1.0       345.343639
18        10       0.0       461.317588
19        10       1.0       510.121250
20        11       0.0       253.602230
21        11       1.0       268.960810
22        12       1.0       426.000000
23        13       0.0       202.840370
24        13       1.0       281.763431

# 2017年 2室1厅1厨1卫户型房屋-有电梯/无电梯-有地铁/无地铁 各区均价
df_dis_want = df[['id', 'district','livingRoom', 'drawingRoom', 'kitchen', 'bathRoom', 'subway', 'elevator', 'totalPrice','year']]
print(df_dis_want.info())
df_dis_w = df_dis_want[(df['year'] == '2017') & (df['livingRoom'] == '2') & (df['drawingRoom'] == '1') & (df['kitchen'] == 1) & (df['bathRoom'] == '1')]
# 注意到判别条件这里，数据类型不同判别条件中需要考虑是否加引号''，这也可认为是本次数据清洗环节的疏漏
df_dis_w = df_dis_w.groupby(['district', 'elevator', 'subway'], as_index = False).mean()
df_dis_w.rename(columns = {'totalPrice':'totalPrice_mean'}, inplace = True)
print(df_dis_w)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 272917 entries, 0 to 272916
Data columns (total 10 columns):
id             272917 non-null object
district       272917 non-null int64
livingRoom     272917 non-null object
drawingRoom    272917 non-null object
kitchen        272917 non-null int64
bathRoom       272917 non-null object
subway         272917 non-null object
elevator       272917 non-null object
totalPrice     272917 non-null float64
year           272917 non-null object
dtypes: float64(1), int64(2), object(7)
memory usage: 20.8+ MB
None

	district	elevator	subway	kitchen	totalPrice_mean
0	1	0.0	0.0	1	493.103448
1	1	0.0	1.0	1	597.487764
2	1	1.0	0.0	1	656.100000
3	1	1.0	1.0	1	700.313229
4	2	0.0	0.0	1	355.323571
5	2	0.0	1.0	1	381.905593
6	2	1.0	0.0	1	495.709938
7	2	1.0	1.0	1	466.610455
8	3	0.0	0.0	1	358.381250
9	3	0.0	1.0	1	541.500000
10	3	1.0	0.0	1	457.250000
11	3	1.0	1.0	1	448.420000
12	4	0.0	0.0	1	311.772622
13	4	0.0	1.0	1	305.311983
14	4	1.0	0.0	1	439.410204
15	4	1.0	1.0	1	412.669841
16	5	0.0	0.0	1	256.658491
17	5	0.0	1.0	1	316.978571
18	5	1.0	0.0	1	317.132948
19	5	1.0	1.0	1	359.614839
20	6	0.0	0.0	1	352.025395
21	6	0.0	1.0	1	395.474759
22	6	1.0	0.0	1	420.451366
23	6	1.0	1.0	1	439.371875
24	7	0.0	0.0	1	364.164934
25	7	0.0	1.0	1	409.597200
26	7	1.0	0.0	1	554.104437
27	7	1.0	1.0	1	536.223223
28	8	0.0	0.0	1	503.982394
29	8	0.0	1.0	1	532.799109
30	8	1.0	0.0	1	621.156806
31	8	1.0	1.0	1	653.117304
32	9	0.0	0.0	1	315.139793
33	9	0.0	1.0	1	322.747917
34	9	1.0	0.0	1	480.467907
35	9	1.0	1.0	1	445.882243
36	10	0.0	0.0	1	644.037000
37	10	0.0	1.0	1	638.352427
38	10	1.0	0.0	1	741.245455
39	10	1.0	1.0	1	744.362667
40	11	0.0	0.0	1	356.081275
41	11	0.0	1.0	1	389.598276
42	11	1.0	0.0	1	374.167647
43	11	1.0	1.0	1	425.826744
44	12	0.0	0.0	1	298.251190
45	12	1.0	0.0	1	401.925397
46	12	1.0	1.0	1	390.000000
47	13	0.0	0.0	1	303.945556
48	13	0.0	1.0	1	290.141912
49	13	1.0	0.0	1	388.379070
50	13	1.0	1.0	1	409.612766

# 均价⽇趋势
# 统计每⽇所有房源的平均单价
df_day_price = df.groupby('tradeTime').mean()['price']
df_day_price.sort_index(inplace=True) # 按照索引排序
df_day_price.plot() # 画出趋势图

每年初期出现了明显异常值，是因为什么导致的/还是说本身就是错误值？

# 2017年 总价200~400万、单价5~8万、配电梯(假设elevator值为1时为有电梯) 的房源占比
df_2017 = df[df['year'] == '2017']
num1 = len(df[(df['totalPrice'] > 200) & (df['totalPrice'] < 400) & (df['price'] > 40000) &( df['price'] < 70000) & (df['elevator'] == 1)] )
num2 = len(df_2017) # 2017年数据条数
want_ratio = num1/num2
print(want_ratio) #占比

0.6929146649957065

Python数据分析-北京房价分析相关推荐

Python数据分析初学之分析表格
文章目录 Python数据分析初学之分析表格任务要求代码实现 Python数据分析初学之分析表格任务要求 1)使用 pandas 读取文件 data.csv 中的数据 ,创建 DataFrame ...
【详解】Python数据分析第三方库分析
Python数据分析第三方库分析目录 Python数据分析第三方库分析 @常用库下载地址 1 Numpy 2 Matplotlib 3 Pandas 4 SciPy 5 Scikit-Learn 6 ...
python数据分析的交叉分析和分组分析 -第三次笔记
python数据分析 -第三次笔记 –1.交叉分析 –2.分组分析 1.交叉分析交叉分析的含义是在纵向分析法和横向分析法的基础上,从交叉.立体的角度出发,由浅入深.由低级到高级的一种分析方法.这种方 ...
python波士顿房价是什么数据,Python数据分析 | 波士顿房价回归分析
分析目标: 将波士顿房价的数据集进行描述性数据分析.预测性数据分析(主要用了回归分析),可用于预测房价. 数据集介绍: 卡内基梅隆大学收集,StatLib库,1978年,涵盖了麻省波士顿的506个不同 ...
python数据分析北京_Python实现的北京积分落户数据分析示例
本文实例讲述了Python实现的北京积分落户数据分析.分享给大家供大家参考,具体如下: 北京积分落户状况获取数据(爬虫/文件下载)-> 分析 (维度-指标) 从公司维度分析不同公司对落户人数指 ...
Python数据分析——基金定投收益率分析，以及支付宝“慧定投”智能定投实现
文章目录一.关于基金定投数据来源接口规范常见指数基金/股票代码二.分析目标三.代码实现 1.定义获取数据.清洗数据的函数 2.定义定投策略函数 3.计算2019年对沪深300指数基金进行定 ...
python数据比例_#python# #数据分析# 性别比例分析
手头有一份性别比例的样本数据,清洗后只保留了性别信息,做了一个数据分析. 数据清洗和数据统计的代码就不贴了,贴性别比例pie图和性别比例趋势图的代码. 性别比例pie图: def _plot_gend ...
Python数据分析之探索性分析（多因子复合分析）
目录一.假设检验: 二.交叉分析 1.分析属性与属性之间关系的方法 2.透视表三.分组与钻取: 四.相关分析 1.相关系数分析 2.熵:条件熵:互信息(熵增益):增益率:基尼系数: 3.衡量离散数 ...
【Python数据分析】房价数据分析实战（包含源码和数据）
今天我们利用波士顿房价进行简单分析,快速熟悉数据挖掘和分析的一般流程. 1.导入数据. 2.查看数据维度,从结果可以出,该数据一共有506条记录,14个特征,然后再输出特征的名字和数据类型. 3.然后 ...
python数据分析之对比分析
对比分析概念:两个互相联系的指标进行比较类型:绝对数比较(相减) .相对数比较(相除) 其中相对数比较分析也包括:结构分析.比例分析.动态对比分析 1.绝对数比较 a.对比的指标在量级上不能差别过 ...

Python数据分析-北京房价分析

Python数据分析-北京房价分析相关推荐

最新文章

热门文章