data whale_data_analysis_task2

数据可视化

先导入所需的库和数据

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
text = pd.read_csv(r'result.csv')
text.head()

	Unnamed: 0	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1.0	A/5 21171	7.2500	NaN	S
1	1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1.0	PC 17599	71.2833	C85	C
2	2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0.0	STON/O2. 3101282	7.9250	NaN	S
3	3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1.0	113803	53.1000	C123	S
4	4	5	0	3	Allen, Mr. William Henry	male	35.0	0.0	373450	8.0500	NaN	S

2.7.1 数据可读化

#网络实在是太差了我根本找不到相关数据，函数也没怎么记住，有机会再补充吧先跳过了。

2.7.2 柱状图

这里用男女生存人数分布情况来举例柱状图的做法

sex = text.groupby('Sex')['Survived'].sum()
sex.plot.bar()
plt.title('suivived_count')
plt.show()

两个变量的柱状图，这里用男女存活和死亡为例

text.groupby(['Sex','Survived'])['Survived'].count()

Sex     Survived
female  0            811           233
male    0           4681           109
Name: Survived, dtype: int64

这里不难发现，上述代码分别统计了男女中的存活和死亡人数，对象类型是一个series

text.groupby(['Sex','Survived'])['Survived'].count().unstack()

Survived	0	1
Sex
female	81	233
male	468	109

这里回忆一下前面stack()的用法，stack函数操作后将dataframe格式变为series，stack的逆就是unstuck，可以将series变为dataframe。现在，我们就可以用plot函数将dataframe数据可视化。（按理说series应该也有相应的可视化函数，时间有限，学习结束后再整理）

text.groupby(['Sex','Survived'])['Survived'].count().unstack().plot(kind='bar',stacked='True')
plt.title('survived_count')
plt.ylabel('count')

2.7.3 折线图

这里用票价和存活的分布情况为例

fare_sur = text.groupby(['Fare'])['Survived'].value_counts()
fare_sur

Fare      Survived
0.0000    0           141            1
4.0125    0            1
5.0000    0            1
6.2375    0            1..
247.5208  1            1
262.3750  1            2
263.0000  0            21            2
512.3292  1            3
Name: Survived, Length: 330, dtype: int64

#不将存活人数排序绘图
fig = plt.figure(figsize=(20,18))
fare_sur.plot(grid=True)
plt.legend()
plt.show()

#排序
fare_sur.sort_values(ascending=False)

Fare     Survived
8.0500   0           38
7.8958   0           37
13.0000  0           26
7.7500   0           22
13.0000  1           16..
7.7417   0            1
26.2833  1            1
7.7375   1            1
26.3875  1            1
22.5250  0            1
Name: Survived, Length: 330, dtype: int64

fig = plt.figure(figsize=(20, 18))
fare_sur.plot(grid=True)
plt.legend()
plt.show()

2.7.4 用seaborn绘图

以仓位和存活人数的分布情况为例

import seaborn as sns
sns.countplot(x='Pclass',hue='Survived',data=text)

<AxesSubplot:xlabel='Pclass', ylabel='count'>

实话说seaborn好像不需要将所需数据提取并dataframe化，只要告诉seaborn每个坐标的意义和需要统计的对象以及数据来源就可以了
在这里抛出一个问题，machine learning不应该是machine去learn吗，为什么要我学？？？

2.7.5 随便画画

（1）不同年龄存活死亡情况分布图

text = pd.read_csv(r'result.csv')
df = text.groupby('Age')['Survived'].value_counts().unstack()
df=df.fillna(0)
df

Survived	0	1
Age
0.42	0.0	1.0
0.67	0.0	1.0
0.75	0.0	2.0
0.83	0.0	2.0
0.92	0.0	1.0
...	...	...
70.00	2.0	0.0
70.50	1.0	0.0
71.00	2.0	0.0
74.00	1.0	0.0
80.00	0.0	1.0

88 rows × 2 columns

sns.lineplot(x='Age',hue='Survived',data=df)

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-122-e008732b892e> in <module>
----> 1 sns.lineplot(x='Age',hue='Survived',data=df)~/opt/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py in inner_f(*args, **kwargs)44             )45         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46         return f(**kwargs)47     return inner_f48 ~/opt/anaconda3/lib/python3.8/site-packages/seaborn/relational.py in lineplot(x, y, hue, size, style, data, palette, hue_order, hue_norm, sizes, size_order, size_norm, dashes, markers, style_order, units, estimator, ci, n_boot, seed, sort, err_style, err_kws, legend, ax, **kwargs)683 684     variables = _LinePlotter.get_semantics(locals())
--> 685     p = _LinePlotter(686         data=data, variables=variables,687         estimator=estimator, ci=ci, n_boot=n_boot, seed=seed,~/opt/anaconda3/lib/python3.8/site-packages/seaborn/relational.py in __init__(self, data, variables, estimator, ci, n_boot, seed, sort, err_style, err_kws, legend)365         )366
--> 367         super().__init__(data=data, variables=variables)368 369         self.estimator = estimator~/opt/anaconda3/lib/python3.8/site-packages/seaborn/_core.py in __init__(self, data, variables)602     def __init__(self, data=None, variables={}):603
--> 604         self.assign_variables(data, variables)605 606         for var, cls in self._semantic_mappings.items():~/opt/anaconda3/lib/python3.8/site-packages/seaborn/_core.py in assign_variables(self, data, variables)665         else:666             self.input_format = "long"
--> 667             plot_data, variables = self._assign_variables_longform(668                 data, **variables,669             )~/opt/anaconda3/lib/python3.8/site-packages/seaborn/_core.py in _assign_variables_longform(self, data, **kwargs)900 901                 err = f"Could not interpret value `{val}` for parameter `{key}`"
--> 902                 raise ValueError(err)903 904             else:ValueError: Could not interpret value `Survived` for parameter `hue`

这图我画不出来谁来救救我

就写到这里吧，走过路过的大哥们可以看看上面lineplot为什么画不出来吗

data whale_data_analysis_task2_数据可视化相关推荐

vs2017数据可视化建模_介绍数据可视化社区调查2017
vs2017数据可视化建模 by lars verspohl 由拉斯·韦斯波尔介绍数据可视化社区调查2017 (Introducing the Data Visualization Communit ...
如何编写数据库可视化界面_编写用于数据可视化的替代文本
如何编写数据库可视化界面什么是替代文字 (What is Alt Text) Alt text (sometimes called Alt tags or alternative text) are ...
数据图表可视化_数据可视化十大最有用的图表
数据图表可视化分析师每天使用的最佳数据可视化图表列表. (List of best data visualization charts that Analysts use on a daily ba ...
elementui的tree组件页面显示不出数据_只要10分钟，教你配置出炫酷的数据可视化大屏...
在电影<摩天营救>中,监控中心的全方位展示屏幕给人印象深刻.现在这种立体化大屏幕似乎成了好莱坞大片的标配.其实,这种逼格很高的镜头就是一个数据可视化大屏.随着社会信息化的高速增长,数据可视 ...
炫酷大屏demo_只要10分钟，教你配置出炫酷的数据可视化大屏
在电影<摩天营救>中,监控中心的全方位展示屏幕给人印象深刻.现在这种立体化大屏幕似乎成了好莱坞大片的标配.其实,这种逼格很高的镜头就是一个数据可视化大屏. 随着社会信息化的高速增长,数据可 ...
数据可视化——人口地图
数据可视化--人口地图 <!doctype html> <html lang="en"><head><meta charset=" ...
爬取实习吧与python相关的招聘信息及分析与数据可视化
大数据时代,计算机行业蓬勃发展,越来越多的人投身计算机事业养家糊口.所以该如何选择工作,现在社会需要怎么样的计算机人才,我们该如何对应的提升自己的本领都是尤为重要的.这篇文章就是对实习吧招聘网站有关p ...
R语言ggplot2可视化：使用长表数据（窄表数据）（ Long Data Format）可视化多个时间序列数据、在同一个可视化图像中可视化多个时间序列数据（Multiple Time Series）
R语言ggplot2可视化:使用长表数据(窄表数据)( Long Data Format)可视化多个时间序列数据.在同一个可视化图像中可视化多个时间序列数据(Multiple Time Series) ...
Kaggle Lending Club Loan Data数据可视化分析与不良贷款预测
文章目录数据集介绍数据可视化分析前的数据预处理引入包和数据集对特征缺失值的处理保存处理好的数据集数据可视化分析申请贷款金额和实际贷款金额的数据分布每年贷款笔数直方图与每年贷款总金额直方 ...

data whale_data_analysis_task2_数据可视化