组队学习-动手学数据分析-第二章第2、3节
复习:在前面我们已经学习了Pandas基础,第二章我们开始进入数据分析的业务部分,在第二章第一节的内容中,我们学习了数据的清洗,这一部分十分重要,只有数据变得相对干净,我们之后对数据的分析才可以更有力。而这一节,我们要做的是数据重构,数据重构依旧属于数据理解(准备)的范围。
开始之前,导入numpy、pandas包和数据
# 导入基本库
import numpy as np
import pandas as pd
df = pd.DataFrame([[1.4, np.nan],[np.nan, 2]],index=['a','b'],columns=['one','two'])
df
one | two | |
---|---|---|
a | 1.4 | NaN |
b | NaN | 2.0 |
df.one + df.two
a NaN
b NaN
dtype: float64
df.sum(skipna=False)
one NaN
two NaN
dtype: float64
df.iloc[0,0]+df.iloc[1,0]
nan
# 载入data文件中的:train-left-up.csv
df10 = pd.read_csv(r'data/train-left-up.csv')
df10
PassengerId | Survived | Pclass | Name | |
---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) |
4 | 5 | 0 | 3 | Allen, Mr. William Henry |
... | ... | ... | ... | ... |
434 | 435 | 0 | 1 | Silvey, Mr. William Baird |
435 | 436 | 1 | 1 | Carter, Miss. Lucile Polk |
436 | 437 | 0 | 3 | Ford, Miss. Doolina Margaret "Daisy" |
437 | 438 | 1 | 2 | Richards, Mrs. Sidney (Emily Hocking) |
438 | 439 | 0 | 1 | Fortune, Mr. Mark |
439 rows × 4 columns
2 第二章:数据重构
2.4 数据的合并
2.4.1 任务一:将data文件夹里面的所有数据都载入,观察数据的之间的关系
#写入代码
df_left_up = pd.read_csv("data/train-left-up.csv")
df_left_down = pd.read_csv("data/train-left-down.csv")
df_right_up = pd.read_csv("data/train-right-up.csv")
df_right_down = pd.read_csv("data/train-right-down.csv")
#写入代码
df_left_up.head()
PassengerId | Survived | Pclass | Name | |
---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) |
4 | 5 | 0 | 3 | Allen, Mr. William Henry |
df_left_down.head()
PassengerId | Survived | Pclass | Name | |
---|---|---|---|---|
0 | 440 | 0 | 2 | Kvillner, Mr. Johan Henrik Johannesson |
1 | 441 | 1 | 2 | Hart, Mrs. Benjamin (Esther Ada Bloomfield) |
2 | 442 | 0 | 3 | Hampe, Mr. Leon |
3 | 443 | 0 | 3 | Petterson, Mr. Johan Emil |
4 | 444 | 1 | 2 | Reynaldo, Ms. Encarnacion |
df_right_up.head()
Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|
0 | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
df_right_down.head()
Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|
0 | male | 31.0 | 0 | 0 | C.A. 18723 | 10.500 | NaN | S |
1 | female | 45.0 | 1 | 1 | F.C.C. 13529 | 26.250 | NaN | S |
2 | male | 20.0 | 0 | 0 | 345769 | 9.500 | NaN | S |
3 | male | 25.0 | 1 | 0 | 347076 | 7.775 | NaN | S |
4 | female | 28.0 | 0 | 0 | 230434 | 13.000 | NaN | S |
【提示】结合之前我们加载的train.csv数据,大致预测一下上面的数据是什么
2.4.2:任务二:使用concat方法:将数据train-left-up.csv和train-right-up.csv横向合并为一张表,并保存这张表为result_up
#写入代码
result_up = pd.concat([df_left_up,df_right_up],axis =1)
result_up
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
434 | 435 | 0 | 1 | Silvey, Mr. William Baird | male | 50.0 | 1 | 0 | 13507 | 55.9000 | E44 | S |
435 | 436 | 1 | 1 | Carter, Miss. Lucile Polk | female | 14.0 | 1 | 2 | 113760 | 120.0000 | B96 B98 | S |
436 | 437 | 0 | 3 | Ford, Miss. Doolina Margaret "Daisy" | female | 21.0 | 2 | 2 | W./C. 6608 | 34.3750 | NaN | S |
437 | 438 | 1 | 2 | Richards, Mrs. Sidney (Emily Hocking) | female | 24.0 | 2 | 3 | 29106 | 18.7500 | NaN | S |
438 | 439 | 0 | 1 | Fortune, Mr. Mark | male | 64.0 | 1 | 4 | 19950 | 263.0000 | C23 C25 C27 | S |
439 rows × 12 columns
2.4.3 任务三:使用concat方法:将train-left-down和train-right-down横向合并为一张表,并保存这张表为result_down。然后将上边的result_up和result_down纵向合并为result。
#写入代码
result_down = pd.concat([df_left_down,df_right_down],axis=1)
result_down
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 440 | 0 | 2 | Kvillner, Mr. Johan Henrik Johannesson | male | 31.0 | 0 | 0 | C.A. 18723 | 10.500 | NaN | S |
1 | 441 | 1 | 2 | Hart, Mrs. Benjamin (Esther Ada Bloomfield) | female | 45.0 | 1 | 1 | F.C.C. 13529 | 26.250 | NaN | S |
2 | 442 | 0 | 3 | Hampe, Mr. Leon | male | 20.0 | 0 | 0 | 345769 | 9.500 | NaN | S |
3 | 443 | 0 | 3 | Petterson, Mr. Johan Emil | male | 25.0 | 1 | 0 | 347076 | 7.775 | NaN | S |
4 | 444 | 1 | 2 | Reynaldo, Ms. Encarnacion | female | 28.0 | 0 | 0 | 230434 | 13.000 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
447 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.000 | NaN | S |
448 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.000 | B42 | S |
449 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.450 | NaN | S |
450 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.000 | C148 | C |
451 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.750 | NaN | Q |
452 rows × 12 columns
result = pd.concat([result_up,result_down])
result
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
447 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
448 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
449 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
450 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C |
451 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q |
891 rows × 12 columns
2.4.4 任务四:使用DataFrame自带的方法join方法和append:完成任务二和任务三的任务
#写入代码
result_up = df_left_up.join(df_right_up)
result_down = df_left_down.join(df_right_down)
result = result_up.append(result_down)
result
C:\Users\Ji-Luo\AppData\Local\Temp\ipykernel_11888\552922610.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.result = result_up.append(result_down)
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
447 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
448 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
449 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
450 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C |
451 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q |
891 rows × 12 columns
2.4.5 任务五:使用Panads的merge方法和DataFrame的append方法:完成任务二和任务三的任务
#写入代码
result_up = pd.merge(df_left_up,df_right_up,left_index=True,right_index=True)
result_up
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
434 | 435 | 0 | 1 | Silvey, Mr. William Baird | male | 50.0 | 1 | 0 | 13507 | 55.9000 | E44 | S |
435 | 436 | 1 | 1 | Carter, Miss. Lucile Polk | female | 14.0 | 1 | 2 | 113760 | 120.0000 | B96 B98 | S |
436 | 437 | 0 | 3 | Ford, Miss. Doolina Margaret "Daisy" | female | 21.0 | 2 | 2 | W./C. 6608 | 34.3750 | NaN | S |
437 | 438 | 1 | 2 | Richards, Mrs. Sidney (Emily Hocking) | female | 24.0 | 2 | 3 | 29106 | 18.7500 | NaN | S |
438 | 439 | 0 | 1 | Fortune, Mr. Mark | male | 64.0 | 1 | 4 | 19950 | 263.0000 | C23 C25 C27 | S |
439 rows × 12 columns
result_down = pd.merge(df_left_down,df_right_down,left_index=True,right_index=True)
result_down
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 440 | 0 | 2 | Kvillner, Mr. Johan Henrik Johannesson | male | 31.0 | 0 | 0 | C.A. 18723 | 10.500 | NaN | S |
1 | 441 | 1 | 2 | Hart, Mrs. Benjamin (Esther Ada Bloomfield) | female | 45.0 | 1 | 1 | F.C.C. 13529 | 26.250 | NaN | S |
2 | 442 | 0 | 3 | Hampe, Mr. Leon | male | 20.0 | 0 | 0 | 345769 | 9.500 | NaN | S |
3 | 443 | 0 | 3 | Petterson, Mr. Johan Emil | male | 25.0 | 1 | 0 | 347076 | 7.775 | NaN | S |
4 | 444 | 1 | 2 | Reynaldo, Ms. Encarnacion | female | 28.0 | 0 | 0 | 230434 | 13.000 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
447 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.000 | NaN | S |
448 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.000 | B42 | S |
449 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.450 | NaN | S |
450 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.000 | C148 | C |
451 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.750 | NaN | Q |
452 rows × 12 columns
result = result_up.append(result_down)
result
C:\Users\Ji-Luo\AppData\Local\Temp\ipykernel_11888\552922610.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.result = result_up.append(result_down)
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
447 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
448 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
449 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
450 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C |
451 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q |
891 rows × 12 columns
【思考】对比merge、join以及concat的方法的不同以及相同。思考一下在任务四和任务五的情况下,为什么都要求使用DataFrame的append方法,如何只要求使用merge或者join可不可以完成任务四和任务五呢?
2.4.6 任务六:完成的数据保存为result.csv
#写入代码
result.to_csv('result.csv')
2.5 换一种角度看数据
2.5.1 任务一:将我们的数据变为Series类型的数据
#写入代码
unit_result=result.stack().head(20)
unit_result
0 PassengerId 1Survived 0Pclass 3Name Braund, Mr. Owen HarrisSex maleAge 22.0SibSp 1Parch 0Ticket A/5 21171Fare 7.25Embarked S
1 PassengerId 2Survived 1Pclass 1Name Cumings, Mrs. John Bradley (Florence Briggs Th...Sex femaleAge 38.0SibSp 1Parch 0Ticket PC 17599
dtype: object
#写入代码
复习:在前面我们已经学习了Pandas基础,第二章我们开始进入数据分析的业务部分,在第二章第一节的内容中,我们学习了数据的清洗,这一部分十分重要,只有数据变得相对干净,我们之后对数据的分析才可以更有力。而这一节,我们要做的是数据重构,数据重构依旧属于数据理解(准备)的范围。
开始之前,导入numpy、pandas包和数据
# 导入基本库
import numpy as np
import pandas as pd
# 载入上一个任务人保存的文件中:result.csv,并查看这个文件
df = pd.read_csv('result.csv')
df.head()
Unnamed: 0 | PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1.0 | 0.0 | A/5 21171 | 7.2500 | NaN | S |
1 | 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1.0 | 0.0 | PC 17599 | 71.2833 | C85 | C |
2 | 2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0.0 | 0.0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1.0 | 0.0 | 113803 | 53.1000 | C123 | S |
4 | 4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0.0 | 0.0 | 373450 | 8.0500 | NaN | S |
2 第二章:数据重构
第一部分:数据聚合与运算
2.6 数据运用
2.6.1 任务一:通过教材《Python for Data Analysis》P303、Google or anything来学习了解GroupBy机制
#写入心得
2.4.2:任务二:计算泰坦尼克号男性与女性的平均票价
# 写入代码
df.groupby('Sex')['Fare'].mean()
Sex
female 44.479818
male 25.523893
Name: Fare, dtype: float64
在了解GroupBy机制之后,运用这个机制完成一系列的操作,来达到我们的目的。
下面通过几个任务来熟悉GroupBy机制。
2.4.3:任务三:统计泰坦尼克号中男女的存活人数
# 写入代码
df.groupby('Sex')['Survived'].sum()
Sex
female 233
male 109
Name: Survived, dtype: int64
2.4.4:任务四:计算客舱不同等级的存活人数
# 写入代码
df.groupby('Pclass')['Survived'].sum()
Pclass
1 136
2 87
3 119
Name: Survived, dtype: int64
【提示:】表中的存活那一栏,可以发现如果还活着记为1,死亡记为0
【思考】从数据分析的角度,上面的统计结果可以得出那些结论
#思考心得
df.groupby('Pclass')['Survived'].apply(lambda x: x.sum() / x.count())
Pclass
1 0.629630
2 0.472826
3 0.242363
Name: Survived, dtype: float64
【思考】从任务二到任务三中,这些运算可以通过agg()函数来同时计算。并且可以使用rename函数修改列名。你可以按照提示写出这个过程吗?
#思考心得
df.groupby('Sex').agg({'Fare':'mean','Pclass':'count'}).rename(columns = {'Fare':'mean_fare','Pclass':'count_pclass'})
mean_fare | count_pclass | |
---|---|---|
Sex | ||
female | 44.479818 | 314 |
male | 25.523893 | 577 |
2.4.5:任务五:统计在不同等级的票中的不同年龄的船票花费的平均值
# 写入代码
df.groupby(['Pclass','Age'])['Fare'].mean()
Pclass Age
1 0.92 151.55002.00 151.55004.00 81.858311.00 120.000014.00 120.0000...
3 61.00 6.237563.00 9.587565.00 7.750070.50 7.750074.00 7.7750
Name: Fare, Length: 182, dtype: float64
2.4.6:任务六:将任务二和任务三的数据合并,并保存到sex_fare_survived.csv
# 写入代码
df1 = df.groupby('Sex')['Fare'].mean()
df2 = df.groupby('Sex')['Survived'].sum()
pd.merge(df1,df2,on='Sex')
Fare | Survived | |
---|---|---|
Sex | ||
female | 44.479818 | 233 |
male | 25.523893 | 109 |
2.4.7:任务七:得出不同年龄的总的存活人数,然后找出存活人数最多的年龄段,最后计算存活人数最高的存活率(存活人数/总人数)
# 写入代码
df['Age2'] = pd.cut(df['Age'],[0,5,15,30,50,80])
chrs = df.groupby('Age2')['Survived'].sum()
chrs
Age2
(0, 5] 44
(5, 15] 39
(15, 30] 326
(30, 50] 241
(50, 80] 64
Name: Survived, dtype: int64
# 写入代码
chrs.idxmax()
Interval(15, 30, closed='right')
# 写入代码# 各年龄段/各年龄段总人数存活率
df.groupby('Age2')['Survived'].apply(lambda x:x.sum() / x.count())
Age2
(0, 5] 0.704545
(5, 15] 0.461538
(15, 30] 0.358896
(30, 50] 0.423237
(50, 80] 0.343750
Name: Survived, dtype: float64
# 写入代码
# 总人数
df.shape[0]
# 存活人数# 各年龄段/总人数存活率
df.groupby('Age2')['Survived'].apply(lambda x:x.sum() / df.shape[0])
Age2
(0, 5] 0.034792
(5, 15] 0.020202
(15, 30] 0.131313
(30, 50] 0.114478
(50, 80] 0.024691
Name: Survived, dtype: float64
组队学习-动手学数据分析-第二章第2、3节相关推荐
- 组队学习-动手学数据分析
第一章 数据载入及初步观察 1.1 载入数据 import numpy as np import pandas as pd import os #相对路径加载数据 df=pd.read_csv('tr ...
- 【TL第二期】动手学数据分析-第二章 数据预处理
文章目录 第二章 第一节 数据清洗及特征处理 第二节 数据重构1 第三节 数据重构2 第四节 数据可视化 第二章 第一节 数据清洗及特征处理 数据清洗:对于原始数据中的缺失值.异常值进行处理.相当于数 ...
- Datawhale分组学习—动手学数据分析(五)
Datawhale分组学习-动手学数据分析(五)主要是做数据建模以及模型评估.模型搭建部分:1)切分数据集为训练集和测试集:2)搭建逻辑回归模型或随机森林模型完成分类任务,通过不断调参来优化模型.模型 ...
- Datawhale---动手学数据分析---第二章:第二章:数据清洗及特征处理(泰坦尼克的任务)
Datawhale---动手学数据分析---第一章:数据载入及初步观察(泰坦尼克的任务) [回顾&引言]前面一章的内容大家可以感觉到我们主要是对基础知识做一个梳理,让大家了解数 据分析的一些操 ...
- 【TL第二期】动手学数据分析-第一章 数据基本操作
文章目录 第一章 第一节 数据载入与初步观察 0 导库 1 载入数据 2 查看数据基本信息 第二节 pandas基础 1 数据类型DataFrame 和 Series 2 对文件数据的基本操作 3 数 ...
- 动手学数据分析 第一章之探索性数据分析
要不今天开篇先吐槽一下工作,一句话--底层数据乱得我不想说话.今天一天很平静,跟昨天很像,下面是给我自己说的...关键是我们做的东西是服务于我们自己的,难道不应该是我们想看什么,顺便给某人看一下什么数 ...
- 动手学计算机视觉--第二章,关于单通道卷积,多通道卷积的讨论
订正,本章内容仍使用keras框架进行分析,主要参考<Python深度学习(keras)>(Deep Learning with Python). 关于深度学习和计算机视觉相结合的卷积操作 ...
- 凌亮:动手学数据分析笔记
凌亮是华北电力大学数理系大二的学生,LSGO软件技术团队(Dreamtech算法组)成员,参加了多期Datawhale的组队学习. 这篇图文是他在线下组队学习时,为大家分享自己学习"动手学数 ...
- 【Datawhale】动手学数据分析
动手学数据分析 第一章:数据载入及初步观察 载入数据 任务一:导入numpy和pandas import numpy as np import pandas as pd 任务二:载入数据 train_ ...
最新文章
- 如何根据sessionID获取session解决方案
- but was actually of type 'com.sun.proxy.$Proxy**'的两种解决方法
- 使用Docker-容器命令案例1
- JSON数据格式以及与后台交互数据转换实例
- 实战03_SSM整合ActiveMQ支持多种类型消息
- nginx下虚拟目录配置301域名重定向
- MMdetection安装使用(1)
- leetcode - 98. 验证二叉搜索树
- Bootstrap 工具提示插件Tooltip 的选项
- win10 python免安装_使用Python编写免安装运行时、以Windows后台服务形式运行的WEB服务器...
- PlusWell FileMirror软件产品简介
- 神经网络过拟合解决方法,神经网络过拟合现象
- python 球的表面积和体积_[给球的体积算表面积]C语言求球的表面积和体积
- C技能树:运算符优先级与求值顺序
- Cesium中的相机—旋转矩阵
- Chrome播放视频时只有声音没有画面
- Python基础知识学习笔记(一)
- PHP仿拼多多拼团系统源码+可封装APP/多商家入驻
- matlab仿真没有synchr,Synchro 7怎么进行仿真?Synchro界面图标介绍
- 四川大学计算机考研难度,二本考生打算考研四川大学这样的985,难吗?该注意些什么?...
热门文章
- 有关保险及公积金的文章,阅读绝对获益!!
- JAVA实现MD5带盐加密_MD5加盐加密
- 不可错过的年度AI学术盛会 2021新一代人工智能院士高峰论坛暨启智开发者大会议程惊喜发布~
- VBA调用系统调色板
- 袁永福对北京奥运会的评论
- Kvaser Memorator Professional五通道CAN/CANFD总线分析记录仪
- 520表白浪漫的句子文案用便签记下来
- [048量化交易]python获取股票 量比 换手率 市盈率-动态 市净率 总市值 流通市值写入数据库MongoDB
- 计算机教师个人业绩成果自述,个人评价自述
- [工具使用]Wireshark