1. 导入 Pandas 库并简写为 pd，并输出版本号。

In [2]:

import pandas as pd
pd.__version__

Out[2]:

'1.4.4'

2. 从列表创建 Series

In [3]:

data = [1,2,3,4,5]
frame = pd.Series(data, index = ['a','b','c','d','e'])
frame

Out[3]:

a    1
b    2
c    3
d    4
e    5
dtype: int64

3. 从字典创建 Series

In [4]:

data = {'a':1, 'b':2, 'c':3, 'd':4,'e':5}
frame=pd.Series(data)
frame

Out[4]:

a    1
b    2
c    3
d    4
e    5
dtype: int64

4. 从 NumPy 随机数组创建 DataFrame，并以时间序列作为行索引，以字母作为列索引，

In [21]:

import numpy as np
dt1 = pd.date_range(start="today", periods=6, freq="D")
dt1

Out[21]:

DatetimeIndex(['2023-06-05 11:19:00.732923', '2023-06-06 11:19:00.732923','2023-06-07 11:19:00.732923', '2023-06-08 11:19:00.732923','2023-06-09 11:19:00.732923', '2023-06-10 11:19:00.732923'],dtype='datetime64[ns]', freq='D')

In [25]:
num_arr=np.random.randn(6,4)
columns=['A','B','C','D']
df=pd.DataFrame(num_arr,index=dt1,columns=columns)
df
Out[25]:

A B C D

2023-06-05 11:19:00.732923 0.853507 0.461207 -0.698314 1.271267

2023-06-06 11:19:00.732923 0.621321 -0.032685 0.334610 0.536929

2023-06-07 11:19:00.732923 0.774693 -1.199595 1.263980 -0.769168

2023-06-08 11:19:00.732923 -1.041118 0.610756 0.880698 0.474968

2023-06-09 11:19:00.732923 -1.963501 0.655607 -0.185408 -2.162950

2023-06-10 11:19:00.732923 -0.190997 1.608209 -1.175479 0.692370

	A	B	C	D
2023-06-05 11:19:00.732923	0.853507	0.461207	-0.698314	1.271267
2023-06-06 11:19:00.732923	0.621321	-0.032685	0.334610	0.536929
2023-06-07 11:19:00.732923	0.774693	-1.199595	1.263980	-0.769168
2023-06-08 11:19:00.732923	-1.041118	0.610756	0.880698	0.474968
2023-06-09 11:19:00.732923	-1.963501	0.655607	-0.185408	-2.162950
2023-06-10 11:19:00.732923	-0.190997	1.608209	-1.175479	0.692370

5. 创建一个结构如图所示的Serial对象，分别获取其索引、数据以及位置索引2对应的数据。

In [27]:

import pandas as pd
ser_obj=pd.Series([1,2,3,4,5],index=['No.0','No.1','No.2','No.3','No.4'])
ser_obj

Out[27]:

No.0    1
No.1    2
No.2    3
No.3    4
No.4    5
dtype: int64

6. 现有如下图所示的表格数据，请对该数据进行以下操作：

In [29]:

#（1）    创建一个结构上如上图所示的DataFrame对象
import numpy as np
import pandas as pd
df_data = np.array([[1, 5, 8, 8], [2, 2, 4, 9],[7, 4, 2, 3], [3, 0, 5, 2]])  # 创建数组
col_data = np.array(['A', 'B', 'C', 'D'])  # 创建数组
# 基于数组创建DataFrame对象
df_obj = pd.DataFrame(columns=col_data, data=df_data)
df_obj

Out[29]:

A B C D

0 1 5 8 8

1 2 2 4 9

2 7 4 2 3

3 3 0 5 2

	A	B	C	D
0	1	5	8	8
1	2	2	4	9
2	7	4	2	3
3	3	0	5	2

In [30]:

#（2）    将图中的B列数据按降序排列。
sort_values_data = df_obj.sort_values(by=['B'], ascending=False)
sort_values_data

Out[30]:

A B C D

0 1 5 8 8

2 7 4 2 3

1 2 2 4 9

3 3 0 5 2

	A	B	C	D
0	1	5	8	8
2	7	4	2	3
1	2	2	4	9
3	3	0	5	2

In [32]:

#（3）    将排序后的数据写入到CSV文件，取名为write_data.csv。
sort_values_data.to_csv(r'F:\实训\数据分析实训\项目二 Pandas基础练习\write_data.csv')
'写入完毕'

Out[32]:
'写入完毕'

7. 现有如下图所示的表格数据，请对该数据进行以下操作

In [39]:

mulitindex_series=pd.Series([15848,13472,12073.8,7813,7446,6444,15230,8269],index=[['河北省','河北省','河北省','河北省', '河南省','河南省','河南省','河南省'],['石家庄市','唐山市','邯郸市','秦皇岛市','郑州市','开封市','洛阳市','新乡市']])
mulitindex_series

Out[39]:

河北省  石家庄市    15848.0唐山市     13472.0邯郸市     12073.8秦皇岛市     7813.0
河南省  郑州市      7446.0开封市      6444.0洛阳市     15230.0新乡市      8269.0
dtype: float64

In [40]:

#（2）    获取所有外层索引为“河北省”的子集。
mulitindex_series['河北省']

Out[40]:

石家庄市    15848.0
唐山市     13472.0
邯郸市     12073.8
秦皇岛市     7813.0
dtype: float64

In [44]:

#（3）    获取内层索引“洛阳市”对应的子集。
mulitindex_series[:,'洛阳市']

Out[44]:

河南省    15230.0
dtype: float64

In [46]:

#（4）    交换外层索引和内层索引的位置。
mulitindex_series.swaplevel()

Out[46]:

石家庄市  河北省    15848.0
唐山市   河北省    13472.0
邯郸市   河北省    12073.8
秦皇岛市  河北省     7813.0
郑州市   河南省     7446.0
开封市   河南省     6444.0
洛阳市   河南省    15230.0
新乡市   河南省     8269.0
dtype: float64

8. 现有如下图所示的表格数据，请对该数据进行以下操作

In [47]:

#（1）    对列索引为C的数据进行升序排序。
import numpy as np
import pandas as pd
df_data = np.array([[1, 5, 8, 8], [2, 2, 4, 9],[7, 4, 2, 3], [3, 0, 5, 2]])  # 创建数组
col_data = np.array(['A', 'B', 'C', 'D'])  # 创建数组
# 基于数组创建DataFrame对象
df_obj = pd.DataFrame(columns=col_data, data=df_data)
df_obj

Out[47]:

A B C D

0 1 5 8 8

1 2 2 4 9

2 7 4 2 3

3 3 0 5 2

	A	B	C	D
0	1	5	8	8
1	2	2	4	9
2	7	4	2	3
3	3	0	5	2

In [48]:

sort_values_data = df_obj.sort_values(by=['C'])
sort_values_data

Out[48]:

	A	B	C	D
2	7	4	2	3
1	2	2	4	9
3	3	0	5	2
0	1	5	8	8

In [49]:

#（2）    分别计算每列的和，最大值及统计描述。
print(df_obj.sum())
print(df_obj.max())
print(df_obj.describe())

A    13
B    11
C    19
D    22
dtype: int64
A    7
B    5
C    8
D    9
dtype: int32A         B     C         D
count  4.000000  4.000000  4.00  4.000000
mean   3.250000  2.750000  4.75  5.500000
std    2.629956  2.217356  2.50  3.511885
min    1.000000  0.000000  2.00  2.000000
25%    1.750000  1.500000  3.50  2.750000
50%    2.500000  3.000000  4.50  5.500000
75%    4.000000  4.250000  5.75  8.250000
max    7.000000  5.000000  8.00  9.000000

9. 按要求创建DataFrame对象，并完成以下操作：

In [50]:

#（1）    从字典对象创建如下DataFrame对象，索引设置为labels。
import numpy as np
data = {'animal':['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],'priority':['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']
}labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(data, index=labels)
df

Out[50]:

animal age visits priority

a cat 2.5 1 yes

b cat 3.0 3 yes

c snake 0.5 2 no

d dog NaN 3 yes

e dog 5.0 2 no

f cat 2.0 3 no

g snake 4.5 1 no

h cat NaN 1 yes

i dog 7.0 2 no

j dog 3.0 1 no

	animal	age	visits	priority
a	cat	2.5	1	yes
b	cat	3.0	3	yes
c	snake	0.5	2	no
d	dog	NaN	3	yes
e	dog	5.0	2	no
f	cat	2.0	3	no
g	snake	4.5	1	no
h	cat	NaN	1	yes
i	dog	7.0	2	no
j	dog	3.0	1	no

In [60]:

#（2）    显示DataFrame的基础信息，包括行数，列名，值的数量和类型。
#df.describe()
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
animal      10 non-null object
age         8 non-null float64
visits      10 non-null int64
priority    10 non-null object
dtypes: float64(1), int64(1), object(2)
memory usage: 400.0+ bytes

In [61]:

#（3）    展示前三行（两种方式）。
#df.iloc[:3]
df.head(3)

Out[61]:

animal age visits priority

a cat 2.5 1 yes

b cat 3.0 3 yes

c snake 0.5 2 no

	animal	age	visits	priority
a	cat	2.5	1	yes
b	cat	3.0	3	yes
c	snake	0.5	2	no

In [62]:

#（4）    取出frame的animal和age列。
df.loc[:, ['animal', 'age']]
# df[['animal', 'age']]

Out[62]:

animal age

a cat 2.5

b cat 3.0

c snake 0.5

d dog NaN

e dog 5.0

f cat 2.0

g snake 4.5

h cat NaN

i dog 7.0

j dog 3.0

	animal	age
a	cat	2.5
b	cat	3.0
c	snake	0.5
d	dog	NaN
e	dog	5.0
f	cat	2.0
g	snake	4.5
h	cat	NaN
i	dog	7.0
j	dog	3.0

In [63]:

#（5）    取出索引为[3, 4, 8]行的animal和age列。
df.loc[df.index[[3, 4, 8]], ['animal', 'age']]

Out[63]:

animal age

d dog NaN

e dog 5.0

i dog 7.0

	animal	age
d	dog	NaN
e	dog	5.0
i	dog	7.0

In [78]:

#（6）    取出age值大于3的行。
df[df['age'] > 3]

Out[78]:

animal age visits priority

e dog 5.0 2 no

g snake 4.5 1 no

i dog 7.0 2 no

	animal	age	visits	priority
e	dog	5.0	2	no
g	snake	4.5	1	no
i	dog	7.0	2	no

In [79]:

#（7）    取出age值缺失的行。
df[df['age'].isnull()]

Out[79]:

animal age visits priority

d dog NaN 3 yes

h cat NaN 1 yes

	animal	age	visits	priority
d	dog	NaN	3	yes
h	cat	NaN	1	yes

In [82]:

#（8）    取出age在2,4间的行（不含）
#df[(df['age']>2) & (df['age']>4)]
df[df['age'].between(2, 4)]

Out[82]:

animal age visits priority

a cat 2.5 1 yes

b cat 3.0 3 yes

f cat 2.0 3 no

j dog 3.0 1 no

	animal	age	visits	priority
a	cat	2.5	1	yes
b	cat	3.0	3	yes
f	cat	2.0	3	no
j	dog	3.0	1	no

In [85]:
#（9）    f行的age改为1.5。
df.loc['f', 'age'] = 1.5
df
Out[85]:

animal age visits priority

a cat 2.5 1 yes

b cat 3.0 3 yes

c snake 0.5 2 no

d dog NaN 3 yes

e dog 5.0 2 no

f cat 1.5 3 no

g snake 4.5 1 no

h cat NaN 1 yes

i dog 7.0 2 no

j dog 3.0 1 no

	animal	age	visits	priority
a	cat	2.5	1	yes
b	cat	3.0	3	yes
c	snake	0.5	2	no
d	dog	NaN	3	yes
e	dog	5.0	2	no
f	cat	1.5	3	no
g	snake	4.5	1	no
h	cat	NaN	1	yes
i	dog	7.0	2	no
j	dog	3.0	1	no

In [84]:

#（10）   计算visits的总和。
df['visits'].sum()

Out[84]:

In [86]:

#（11）   计算每个不同种类animal的age的平均数。
df.groupby('animal')['age'].mean()

Out[86]:

animal
cat      2.333333
dog      5.000000
snake    2.500000
Name: age, dtype: float64

In [87]:
#（12）   计算df中每个种类animal的数量。
#插入
df.loc['k'] = [5.5, 'dog', 'no', 2]
# 删除
df = df.drop('k')
df
Out[87]:

animal age visits priority

a cat 2.5 1 yes

b cat 3 3 yes

c snake 0.5 2 no

d dog NaN 3 yes

e dog 5 2 no

f cat 1.5 3 no

g snake 4.5 1 no

h cat NaN 1 yes

i dog 7 2 no

j dog 3 1 no

	animal	age	visits	priority
a	cat	2.5	1	yes
b	cat	3	3	yes
c	snake	0.5	2	no
d	dog	NaN	3	yes
e	dog	5	2	no
f	cat	1.5	3	no
g	snake	4.5	1	no
h	cat	NaN	1	yes
i	dog	7	2	no
j	dog	3	1	no

In [88]:
#（13）   先按age降序排列，后按visits升序排列。
df.sort_values(by=['age', 'visits'], ascending=[False, True])
Out[88]:

animal age visits priority

i dog 7 2 no

e dog 5 2 no

g snake 4.5 1 no

j dog 3 1 no

b cat 3 3 yes

a cat 2.5 1 yes

f cat 1.5 3 no

c snake 0.5 2 no

h cat NaN 1 yes

d dog NaN 3 yes

	animal	age	visits	priority
i	dog	7	2	no
e	dog	5	2	no
g	snake	4.5	1	no
j	dog	3	1	no
b	cat	3	3	yes
a	cat	2.5	1	yes
f	cat	1.5	3	no
c	snake	0.5	2	no
h	cat	NaN	1	yes
d	dog	NaN	3	yes

In [89]:
#（14）   将priority列中的yes, no替换为布尔值True, False。
df['priority'] = df['priority'].map({'yes': True, 'no': False})
df
Out[89]:

animal age visits priority

a cat 2.5 1 True

b cat 3 3 True

c snake 0.5 2 False

d dog NaN 3 True

e dog 5 2 False

f cat 1.5 3 False

g snake 4.5 1 False

h cat NaN 1 True

i dog 7 2 False

j dog 3 1 False

	animal	age	visits	priority
a	cat	2.5	1	True
b	cat	3	3	True
c	snake	0.5	2	False
d	dog	NaN	3	True
e	dog	5	2	False
f	cat	1.5	3	False
g	snake	4.5	1	False
h	cat	NaN	1	True
i	dog	7	2	False
j	dog	3	1	False

In [90]:
#（15）   将animal列中的snake替换为python。
df['animal'] = df['animal'].replace('snake', 'python')
df
Out[90]:

animal age visits priority

a cat 2.5 1 True

b cat 3 3 True

c python 0.5 2 False

d dog NaN 3 True

e dog 5 2 False

f cat 1.5 3 False

g python 4.5 1 False

h cat NaN 1 True

i dog 7 2 False

j dog 3 1 False

	animal	age	visits	priority
a	cat	2.5	1	True
b	cat	3	3	True
c	python	0.5	2	False
d	dog	NaN	3	True
e	dog	5	2	False
f	cat	1.5	3	False
g	python	4.5	1	False
h	cat	NaN	1	True
i	dog	7	2	False
j	dog	3	1	False

In [100]:

#（16）  对每种animal的每种不同数量visits，计算平均age，即，返回一个表格，行是aniaml种类，列是visits数量，表格值是行动物种类列访客数量的平均年龄。
df.age=df.age.astype(float)
df.dtypes
df.pivot_table(index='animal', columns='visits', values='age', aggfunc='mean')

Out[100]:

visits	1	2	3
animal
cat	2.5	NaN	2.25
dog	3.0	6.0	NaN
python	4.5	0.5	NaN

In [93]:
#（17）   在frame中插入新行k，['cat',5,2,'no'],然后删除该行。
#插入
df.loc['k'] = [5.5, 'dog', 'no', 2]
# 删除
df = df.drop('k')
df
Out[93]:

animal age visits priority

a cat 2.5 1 1

b cat 3 3 1

c python 0.5 2 0

d dog NaN 3 1

e dog 5 2 0

f cat 1.5 3 0

g python 4.5 1 0

h cat NaN 1 1

i dog 7 2 0

j dog 3 1 0

	animal	age	visits	priority
a	cat	2.5	1	1
b	cat	3	3	1
c	python	0.5	2	0
d	dog	NaN	3	1
e	dog	5	2	0
f	cat	1.5	3	0
g	python	4.5	1	0
h	cat	NaN	1	1
i	dog	7	2	0
j	dog	3	1	0

10. 读取并查看P2P网络贷款数据主表的基本信息

In [104]:
#（1）读取数据Training_Master.csv；
import pandas as pd
dt1=open('F:/实训/数据分析实训/项目二 Pandas基础练习/Training_Master.csv')
data=pd.read_csv(dt1)
data
Out[104]:

Idx UserInfo_1 UserInfo_2 UserInfo_3 UserInfo_4 WeblogInfo_1 WeblogInfo_2 WeblogInfo_3 WeblogInfo_4 WeblogInfo_5 ... SocialNetwork_10 SocialNetwork_11 SocialNetwork_12 SocialNetwork_13 SocialNetwork_14 SocialNetwork_15 SocialNetwork_16 SocialNetwork_17 target ListingInfo

0 10001 1.0 深圳 4.0 深圳 NaN 1.0 NaN 1.0 1.0 ... 222 -1 0 0 0 0 0 1 0 2014/3/5

1 10002 1.0 温州 4.0 温州 NaN 0.0 NaN 1.0 1.0 ... 1 -1 0 0 0 0 0 2 0 2014/2/26

2 10003 1.0 宜昌 3.0 宜昌 NaN 0.0 NaN 2.0 2.0 ... -1 -1 -1 1 0 0 0 0 0 2014/2/28

3 10006 4.0 南平 1.0 南平 NaN NaN NaN NaN NaN ... -1 -1 -1 0 0 0 0 0 0 2014/2/25

4 10007 5.0 辽阳 1.0 辽阳 NaN 0.0 NaN 1.0 1.0 ... -1 -1 -1 0 0 0 0 0 0 2014/2/27

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

29995 9991 3.0 南阳 4.0 南阳 NaN 1.0 NaN 3.0 2.0 ... 0 -1 0 1 0 0 0 1 0 2014/2/22

29996 9992 3.0 宁德 4.0 泉州 NaN 0.0 NaN 6.0 1.0 ... 407 -1 0 0 0 0 0 1 0 2014/2/28

29997 9995 1.0 天津 2.0 天津 NaN 0.0 NaN 2.0 2.0 ... -1 -1 -1 0 0 0 0 0 0 2014/2/24

29998 9997 3.0 运城 3.0 运城 NaN 0.0 NaN 1.0 1.0 ... 612 -1 0 1 0 0 0 1 0 2014/2/28

29999 9998 4.0 金华 5.0 无锡 NaN 0.0 NaN 1.0 1.0 ... -1 -1 -1 0 0 0 0 0 0 2014/3/5

30000 rows × 228 columns

In [105]:

#（2）   使用ndim、shape、memory_usage属性分别维度、大小和占用内存信息；
#查看主表信息的维度
print("主表信息的维度为：",data.ndim)
#查看主表信息的大小
print("主表信息的大小为：",data.shape)
#查看出表信息的占用内存信息
print("主表信息的占用内存信息是：\n",data.memory_usage())

主表信息的维度为： 2
主表信息的大小为： (30000, 228)
主表信息的占用内存信息是：Index                  128
Idx                 240000
UserInfo_1          240000
UserInfo_2          240000
UserInfo_3          240000...
SocialNetwork_15    240000
SocialNetwork_16    240000
SocialNetwork_17    240000
target              240000
ListingInfo         240000
Length: 229, dtype: int64

In [106]:

#（3）   使用describe方法进行描述性统计。
a_describe = data.describe()
print("使用describe方法进行描述性统计:",a_describe)

使用describe方法进行描述性统计:                 Idx    UserInfo_1    UserInfo_3  WeblogInfo_1  WeblogInfo_2  \
count  30000.000000  29994.000000  29993.000000    970.000000  28342.000000
mean   46318.673267      3.219911      4.694329      2.201031      0.131466
std    26640.397805      1.827684      1.321458      7.831679      0.358486
min        3.000000      0.000000      0.000000      1.000000      0.000000
25%    22924.250000      1.000000      4.000000      1.000000      0.000000
50%    46849.500000      3.000000      5.000000      1.000000      0.000000
75%    69447.250000      5.000000      5.000000      1.000000      0.000000
max    91703.000000      7.000000      7.000000    133.000000      4.000000   WeblogInfo_3  WeblogInfo_4  WeblogInfo_5  WeblogInfo_6  WeblogInfo_7  \
count    970.000000  28349.000000  28349.000000  28349.000000  30000.000000
mean       1.308247      3.025962      1.816960      2.948711     10.632800
std        7.866457      3.772421      1.701177      3.770300     16.097588
min        0.000000      1.000000      1.000000      1.000000      0.000000
25%        0.000000      1.000000      1.000000      1.000000      2.000000
50%        0.000000      2.000000      1.000000      2.000000      6.000000
75%        1.000000      3.000000      2.000000      3.000000     13.000000
max      133.000000    165.000000     73.000000    165.000000    722.000000   ...  SocialNetwork_9  SocialNetwork_10  SocialNetwork_11  \
count  ...     30000.000000      30000.000000      30000.000000
mean   ...        35.516167         75.211233         -0.999267
std    ...       135.954587        742.978305          0.052911
min    ...        -1.000000         -1.000000         -1.000000
25%    ...        -1.000000         -1.000000         -1.000000
50%    ...        -1.000000         -1.000000         -1.000000
75%    ...        -1.000000         -1.000000         -1.000000
max    ...      3242.000000      71253.000000          6.000000   SocialNetwork_12  SocialNetwork_13  SocialNetwork_14  SocialNetwork_15  \
count      30000.000000      30000.000000      30000.000000      30000.000000
mean          -0.745033          0.221167          0.062033          0.027967
std            0.441473          0.420545          0.242598          0.164880
min           -1.000000          0.000000          0.000000          0.000000
25%           -1.000000          0.000000          0.000000          0.000000
50%           -1.000000          0.000000          0.000000          0.000000
75%            0.000000          0.000000          0.000000          0.000000
max            1.000000          2.000000          3.000000          1.000000   SocialNetwork_16  SocialNetwork_17        target
count      30000.000000      30000.000000  30000.000000
mean           0.016633          0.253467      0.073267
std            0.127895          0.437296      0.260578
min            0.000000          0.000000      0.000000
25%            0.000000          0.000000      0.000000
50%            0.000000          0.000000      0.000000
75%            0.000000          1.000000      0.000000
max            1.000000          3.000000      1.000000  [8 rows x 208 columns]

11. 探索2012欧洲杯数据

In [121]:
import pandas as pd
dt1=open('F:/实训/数据分析实训/项目二 Pandas基础练习/Euro2012.csv')
data=pd.read_csv(dt1)
data
Out[121]:

Team Goals Shots on target Shots off target Shooting Accuracy % Goals-to-shots Total shots (inc. Blocked) Hit Woodwork Penalty goals Penalties not scored ... Saves made Saves-to-shots ratio Fouls Won Fouls Conceded Offsides Yellow Cards Red Cards Subs on Subs off Players Used

0 Croatia 4 13 12 51.9% 16.0% 32 0 0 0 ... 13 81.3% 41 62 2 9 0 9 9 16

1 Czech Republic 4 13 18 41.9% 12.9% 39 0 0 0 ... 9 60.1% 53 73 8 7 0 11 11 19

2 Denmark 4 10 10 50.0% 20.0% 27 1 0 0 ... 10 66.7% 25 38 8 4 0 7 7 15

3 England 5 11 18 50.0% 17.2% 40 0 0 0 ... 22 88.1% 43 45 6 5 0 11 11 16

4 France 3 22 24 37.9% 6.5% 65 1 0 0 ... 6 54.6% 36 51 5 6 0 11 11 19

5 Germany 10 32 32 47.8% 15.6% 80 2 1 0 ... 10 62.6% 63 49 12 4 0 15 15 17

6 Greece 5 8 18 30.7% 19.2% 32 1 1 1 ... 13 65.1% 67 48 12 9 1 12 12 20

7 Italy 6 34 45 43.0% 7.5% 110 2 0 0 ... 20 74.1% 101 89 16 16 0 18 18 19

8 Netherlands 2 12 36 25.0% 4.1% 60 2 0 0 ... 12 70.6% 35 30 3 5 0 7 7 15

9 Poland 2 15 23 39.4% 5.2% 48 0 0 0 ... 6 66.7% 48 56 3 7 1 7 7 17

10 Portugal 6 22 42 34.3% 9.3% 82 6 0 0 ... 10 71.5% 73 90 10 12 0 14 14 16

11 Republic of Ireland 1 7 12 36.8% 5.2% 28 0 0 0 ... 17 65.4% 43 51 11 6 1 10 10 17

12 Russia 5 9 31 22.5% 12.5% 59 2 0 0 ... 10 77.0% 34 43 4 6 0 7 7 16

13 Spain 12 42 33 55.9% 16.0% 100 0 1 0 ... 15 93.8% 102 83 19 11 0 17 17 18

14 Sweden 5 17 19 47.2% 13.8% 39 3 0 0 ... 8 61.6% 35 51 7 7 0 9 9 18

15 Ukraine 2 7 26 21.2% 6.0% 38 0 0 0 ... 13 76.5% 48 31 4 5 0 9 9 18

16 rows × 35 columns

In [124]:
#(1)  将数据集命名为euro12
#将数据集命名为euro12
#从目标路径导入数据集
path2 = "F:/实训/数据分析实训/项目二 Pandas基础练习/Euro2012.csv"
# Euro2012_stats.csv
euro12 = pd.read_csv(path2)
euro12
Out[124]:

Team Goals Shots on target Shots off target Shooting Accuracy % Goals-to-shots Total shots (inc. Blocked) Hit Woodwork Penalty goals Penalties not scored ... Saves made Saves-to-shots ratio Fouls Won Fouls Conceded Offsides Yellow Cards Red Cards Subs on Subs off Players Used

0 Croatia 4 13 12 51.9% 16.0% 32 0 0 0 ... 13 81.3% 41 62 2 9 0 9 9 16

1 Czech Republic 4 13 18 41.9% 12.9% 39 0 0 0 ... 9 60.1% 53 73 8 7 0 11 11 19

2 Denmark 4 10 10 50.0% 20.0% 27 1 0 0 ... 10 66.7% 25 38 8 4 0 7 7 15

3 England 5 11 18 50.0% 17.2% 40 0 0 0 ... 22 88.1% 43 45 6 5 0 11 11 16

4 France 3 22 24 37.9% 6.5% 65 1 0 0 ... 6 54.6% 36 51 5 6 0 11 11 19

5 Germany 10 32 32 47.8% 15.6% 80 2 1 0 ... 10 62.6% 63 49 12 4 0 15 15 17

6 Greece 5 8 18 30.7% 19.2% 32 1 1 1 ... 13 65.1% 67 48 12 9 1 12 12 20

7 Italy 6 34 45 43.0% 7.5% 110 2 0 0 ... 20 74.1% 101 89 16 16 0 18 18 19

8 Netherlands 2 12 36 25.0% 4.1% 60 2 0 0 ... 12 70.6% 35 30 3 5 0 7 7 15

9 Poland 2 15 23 39.4% 5.2% 48 0 0 0 ... 6 66.7% 48 56 3 7 1 7 7 17

10 Portugal 6 22 42 34.3% 9.3% 82 6 0 0 ... 10 71.5% 73 90 10 12 0 14 14 16

11 Republic of Ireland 1 7 12 36.8% 5.2% 28 0 0 0 ... 17 65.4% 43 51 11 6 1 10 10 17

12 Russia 5 9 31 22.5% 12.5% 59 2 0 0 ... 10 77.0% 34 43 4 6 0 7 7 16

13 Spain 12 42 33 55.9% 16.0% 100 0 1 0 ... 15 93.8% 102 83 19 11 0 17 17 18

14 Sweden 5 17 19 47.2% 13.8% 39 3 0 0 ... 8 61.6% 35 51 7 7 0 9 9 18

15 Ukraine 2 7 26 21.2% 6.0% 38 0 0 0 ... 13 76.5% 48 31 4 5 0 9 9 18

16 rows × 35 columns

In [125]:
#(2)  只选取 Goals 这一列
#只选取 Goals 这一列
euro12.Goals
Out[125]:
0      4
1      4
2      4
3      5
4      3
5     10
6      5
7      6
8      2
9      2
10     6
11     1
12     5
13    12
14     5
15     2
Name: Goals, dtype: int64
In [126]:
#(3)  有多少球队参与了2012欧洲杯？
euro12.shape[0]
Out[126]:
16
In [127]:
#(4)该数据集中一共有多少列(columns)?
euro12.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 35 columns):
Team                          16 non-null object
Goals                         16 non-null int64
Shots on target               16 non-null int64
Shots off target              16 non-null int64
Shooting Accuracy             16 non-null object
% Goals-to-shots              16 non-null object
Total shots (inc. Blocked)    16 non-null int64
Hit Woodwork                  16 non-null int64
Penalty goals                 16 non-null int64
Penalties not scored          16 non-null int64
Headed goals                  16 non-null int64
Passes                        16 non-null int64
Passes completed              16 non-null int64
Passing Accuracy              16 non-null object
Touches                       16 non-null int64
Crosses                       16 non-null int64
Dribbles                      16 non-null int64
Corners Taken                 16 non-null int64
Tackles                       16 non-null int64
Clearances                    16 non-null int64
Interceptions                 16 non-null int64
Clearances off line           15 non-null float64
Clean Sheets                  16 non-null int64
Blocks                        16 non-null int64
Goals conceded                16 non-null int64
Saves made                    16 non-null int64
Saves-to-shots ratio          16 non-null object
Fouls Won                     16 non-null int64
Fouls Conceded                16 non-null int64
Offsides                      16 non-null int64
Yellow Cards                  16 non-null int64
Red Cards                     16 non-null int64
Subs on                       16 non-null int64
Subs off                      16 non-null int64
Players Used                  16 non-null int64
dtypes: float64(1), int64(29), object(5)
memory usage: 4.5+ KB
In [128]:
#(5)将数据集中的列Team, Yellow Cards和Red Cards单独存在一个名叫discipline的数据框中
discipline = euro12[['Team','Yellow Cards','Red Cards']]
discipline
Out[128]:

Team Yellow Cards Red Cards

0 Croatia 9 0

1 Czech Republic 7 0

2 Denmark 4 0

3 England 5 0

4 France 6 0

5 Germany 4 0

6 Greece 9 1

7 Italy 16 0

8 Netherlands 5 0

9 Poland 7 1

10 Portugal 12 0

11 Republic of Ireland 6 1

12 Russia 6 0

13 Spain 11 0

14 Sweden 7 0

15 Ukraine 5 0

In [129]:
#(6)对数据框discipline按照先Red Cards再Yellow Cards进行排序
discipline.sort_values(['Red Cards','Yellow Cards'],ascending = False)
Out[129]:

Team Yellow Cards Red Cards

6 Greece 9 1

9 Poland 7 1

11 Republic of Ireland 6 1

7 Italy 16 0

10 Portugal 12 0

13 Spain 11 0

0 Croatia 9 0

1 Czech Republic 7 0

14 Sweden 7 0

4 France 6 0

12 Russia 6 0

3 England 5 0

8 Netherlands 5 0

15 Ukraine 5 0

2 Denmark 4 0

5 Germany 4 0

In [130]:
#(7)计算每个球队拿到的黄牌数的平均值
round(discipline['Yellow Cards'].mean())
Out[130]:
7
In [131]:
#(8)找到进球数Goals超过6的球队数据
euro12[euro12.Goals > 6]
Out[131]:

Team Goals Shots on target Shots off target Shooting Accuracy % Goals-to-shots Total shots (inc. Blocked) Hit Woodwork Penalty goals Penalties not scored ... Saves made Saves-to-shots ratio Fouls Won Fouls Conceded Offsides Yellow Cards Red Cards Subs on Subs off Players Used

5 Germany 10 32 32 47.8% 15.6% 80 2 1 0 ... 10 62.6% 63 49 12 4 0 15 15 17

13 Spain 12 42 33 55.9% 16.0% 100 0 1 0 ... 15 93.8% 102 83 19 11 0 17 17 18

2 rows × 35 columns

In [132]:
#(9)选取以字母G开头的球队数据
euro12[euro12.Team.str.startswith('G')]
Out[132]:

Team Goals Shots on target Shots off target Shooting Accuracy % Goals-to-shots Total shots (inc. Blocked) Hit Woodwork Penalty goals Penalties not scored ... Saves made Saves-to-shots ratio Fouls Won Fouls Conceded Offsides Yellow Cards Red Cards Subs on Subs off Players Used

5 Germany 10 32 32 47.8% 15.6% 80 2 1 0 ... 10 62.6% 63 49 12 4 0 15 15 17

6 Greece 5 8 18 30.7% 19.2% 32 1 1 1 ... 13 65.1% 67 48 12 9 1 12 12 20

2 rows × 35 columns

In [133]:
#(10)选取前7列
euro12.iloc[:,0:7]
Out[133]:

Team Goals Shots on target Shots off target Shooting Accuracy % Goals-to-shots Total shots (inc. Blocked)

0 Croatia 4 13 12 51.9% 16.0% 32

1 Czech Republic 4 13 18 41.9% 12.9% 39

2 Denmark 4 10 10 50.0% 20.0% 27

3 England 5 11 18 50.0% 17.2% 40

4 France 3 22 24 37.9% 6.5% 65

5 Germany 10 32 32 47.8% 15.6% 80

6 Greece 5 8 18 30.7% 19.2% 32

7 Italy 6 34 45 43.0% 7.5% 110

8 Netherlands 2 12 36 25.0% 4.1% 60

9 Poland 2 15 23 39.4% 5.2% 48

10 Portugal 6 22 42 34.3% 9.3% 82

11 Republic of Ireland 1 7 12 36.8% 5.2% 28

12 Russia 5 9 31 22.5% 12.5% 59

13 Spain 12 42 33 55.9% 16.0% 100

14 Sweden 5 17 19 47.2% 13.8% 39

15 Ukraine 2 7 26 21.2% 6.0% 38

In [134]:
#(11)选取除了最后3列之外的全部列
euro12.iloc[:,:-3]
Out[134]:

Team Goals Shots on target Shots off target Shooting Accuracy % Goals-to-shots Total shots (inc. Blocked) Hit Woodwork Penalty goals Penalties not scored ... Clean Sheets Blocks Goals conceded Saves made Saves-to-shots ratio Fouls Won Fouls Conceded Offsides Yellow Cards Red Cards

0 Croatia 4 13 12 51.9% 16.0% 32 0 0 0 ... 0 10 3 13 81.3% 41 62 2 9 0

1 Czech Republic 4 13 18 41.9% 12.9% 39 0 0 0 ... 1 10 6 9 60.1% 53 73 8 7 0

2 Denmark 4 10 10 50.0% 20.0% 27 1 0 0 ... 1 10 5 10 66.7% 25 38 8 4 0

3 England 5 11 18 50.0% 17.2% 40 0 0 0 ... 2 29 3 22 88.1% 43 45 6 5 0

4 France 3 22 24 37.9% 6.5% 65 1 0 0 ... 1 7 5 6 54.6% 36 51 5 6 0

5 Germany 10 32 32 47.8% 15.6% 80 2 1 0 ... 1 11 6 10 62.6% 63 49 12 4 0

6 Greece 5 8 18 30.7% 19.2% 32 1 1 1 ... 1 23 7 13 65.1% 67 48 12 9 1

7 Italy 6 34 45 43.0% 7.5% 110 2 0 0 ... 2 18 7 20 74.1% 101 89 16 16 0

8 Netherlands 2 12 36 25.0% 4.1% 60 2 0 0 ... 0 9 5 12 70.6% 35 30 3 5 0

9 Poland 2 15 23 39.4% 5.2% 48 0 0 0 ... 0 8 3 6 66.7% 48 56 3 7 1

10 Portugal 6 22 42 34.3% 9.3% 82 6 0 0 ... 2 11 4 10 71.5% 73 90 10 12 0

11 Republic of Ireland 1 7 12 36.8% 5.2% 28 0 0 0 ... 0 23 9 17 65.4% 43 51 11 6 1

12 Russia 5 9 31 22.5% 12.5% 59 2 0 0 ... 0 8 3 10 77.0% 34 43 4 6 0

13 Spain 12 42 33 55.9% 16.0% 100 0 1 0 ... 5 8 1 15 93.8% 102 83 19 11 0

14 Sweden 5 17 19 47.2% 13.8% 39 3 0 0 ... 1 12 5 8 61.6% 35 51 7 7 0

15 Ukraine 2 7 26 21.2% 6.0% 38 0 0 0 ... 0 4 4 13 76.5% 48 31 4 5 0

16 rows × 32 columns

In [135]:
#(12)找到英格兰(England)、意大利(Italy)和俄罗斯(Russia)的射正率(Shooting Accuracy)
euro12.loc[euro12.Team.isin(['England','Italy','Russia']),['Team','Shooting Accuracy']]
Out[135]:

Team Shooting Accuracy

3 England 50.0%

7 Italy 43.0%

12 Russia 22.5%

	Team	Yellow Cards	Red Cards
0	Croatia	9	0
1	Czech Republic	7	0
2	Denmark	4	0
3	England	5	0
4	France	6	0
5	Germany	4	0
6	Greece	9	1
7	Italy	16	0
8	Netherlands	5	0
9	Poland	7	1
10	Portugal	12	0
11	Republic of Ireland	6	1
12	Russia	6	0
13	Spain	11	0
14	Sweden	7	0
15	Ukraine	5	0

	Team	Yellow Cards	Red Cards
6	Greece	9	1
9	Poland	7	1
11	Republic of Ireland	6	1
7	Italy	16	0
10	Portugal	12	0
13	Spain	11	0
0	Croatia	9	0
1	Czech Republic	7	0
14	Sweden	7	0
4	France	6	0
12	Russia	6	0
3	England	5	0
8	Netherlands	5	0
15	Ukraine	5	0
2	Denmark	4	0
5	Germany	4	0

	Team	Goals	Shots on target	Shots off target	Shooting Accuracy	% Goals-to-shots	Total shots (inc. Blocked)
0	Croatia	4	13	12	51.9%	16.0%	32
1	Czech Republic	4	13	18	41.9%	12.9%	39
2	Denmark	4	10	10	50.0%	20.0%	27
3	England	5	11	18	50.0%	17.2%	40
4	France	3	22	24	37.9%	6.5%	65
5	Germany	10	32	32	47.8%	15.6%	80
6	Greece	5	8	18	30.7%	19.2%	32
7	Italy	6	34	45	43.0%	7.5%	110
8	Netherlands	2	12	36	25.0%	4.1%	60
9	Poland	2	15	23	39.4%	5.2%	48
10	Portugal	6	22	42	34.3%	9.3%	82
11	Republic of Ireland	1	7	12	36.8%	5.2%	28
12	Russia	5	9	31	22.5%	12.5%	59
13	Spain	12	42	33	55.9%	16.0%	100
14	Sweden	5	17	19	47.2%	13.8%	39
15	Ukraine	2	7	26	21.2%	6.0%	38

	Team	Shooting Accuracy
3	England	50.0%
7	Italy	43.0%
12	Russia	22.5%

12. 探索Chipotle快餐数据

In [137]:
#(1) 将数据集存入一个名为chipo的数据框内
import pandas as pd
chipo = pd.read_csv('F:/实训/数据分析实训/项目二 Pandas基础练习/chipotle.csv',sep='\t')
'完成'
Out[137]:
'完成'
In [138]:
#查看前10行内容
chipo.head(10)
Out[138]:

order_id quantity item_name choice_description item_price

0 1 1 Chips and Fresh Tomato Salsa NaN $2.39

1 1 1 Izze [Clementine] $3.39

2 1 1 Nantucket Nectar [Apple] $3.39

3 1 1 Chips and Tomatillo-Green Chili Salsa NaN $2.39

4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98

5 3 1 Chicken Bowl [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... $10.98

6 3 1 Side of Chips NaN $1.69

7 4 1 Steak Burrito [Tomatillo Red Chili Salsa, [Fajita Vegetables... $11.75

8 4 1 Steak Soft Tacos [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... $9.25

9 5 1 Steak Burrito [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... $9.25

In [139]:
#查看数据后10行
chipo.tail(10)
Out[139]:

order_id quantity item_name choice_description item_price

4612 1831 1 Carnitas Bowl [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... $9.25

4613 1831 1 Chips NaN $2.15

4614 1831 1 Bottled Water NaN $1.50

4615 1832 1 Chicken Soft Tacos [Fresh Tomato Salsa, [Rice, Cheese, Sour Cream]] $8.75

4616 1832 1 Chips and Guacamole NaN $4.45

4617 1833 1 Steak Burrito [Fresh Tomato Salsa, [Rice, Black Beans, Sour ... $11.75

4618 1833 1 Steak Burrito [Fresh Tomato Salsa, [Rice, Sour Cream, Cheese... $11.75

4619 1834 1 Chicken Salad Bowl [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... $11.25

4620 1834 1 Chicken Salad Bowl [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... $8.75

4621 1834 1 Chicken Salad Bowl [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... $8.75

In [140]:
#查看形状，数据的行数和列数，输出（行数,列数）
chipo.shape
Out[140]:
(4622, 5)
In [141]:
#(5) 数据集中有多少个列(columns)
chipo.columns.size
#chipo.shape[1]
Out[141]:
5
In [161]:
#(6) 打印出全部的列名称
chipo.columns
#chipo.keys()
Out[161]:
Index(['order_id', 'quantity', 'item_name', 'choice_description','item_price'],dtype='object')
In [162]:
#(7)   数据集的索引是怎样的？
chipo.index
Out[162]:
RangeIndex(start=0, stop=4622, step=1)
In [163]:
#(8) 查看数值型列的数据汇总统计
chipo.describe()
Out[163]:

order_id quantity

count 4622.000000 4622.000000

mean 927.254868 1.075725

std 528.890796 0.410186

min 1.000000 1.000000

25% 477.250000 1.000000

50% 926.000000 1.000000

75% 1393.000000 1.000000

max 1834.000000 15.000000

In [164]:
#(9)    查看列索引（Columns）、数据类型(Dtype)、缺失值个数(Non-Null Count)和内存信息(memery usage)
chipo.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
order_id              4622 non-null int64
quantity              4622 non-null int64
item_name             4622 non-null object
choice_description    3376 non-null object
item_price            4622 non-null object
dtypes: int64(2), object(3)
memory usage: 180.7+ KB
In [165]:
#(10)  查看产品名称这一列
chipo.item_name
chipo['item_name']
Out[165]:
0                Chips and Fresh Tomato Salsa
1                                        Izze
2                            Nantucket Nectar
3       Chips and Tomatillo-Green Chili Salsa
4                                Chicken Bowl...
4617                            Steak Burrito
4618                            Steak Burrito
4619                       Chicken Salad Bowl
4620                       Chicken Salad Bowl
4621                       Chicken Salad Bowl
Name: item_name, Length: 4622, dtype: object
In [169]:
#(11)  查看产品名称及数量这两列，返回数据为DataFrame
chipo[['item_name','quantity']]
Out[169]:

item_name quantity

0 Chips and Fresh Tomato Salsa 1

1 Izze 1

2 Nantucket Nectar 1

3 Chips and Tomatillo-Green Chili Salsa 1

4 Chicken Bowl 2

... ... ...

4617 Steak Burrito 1

4618 Steak Burrito 1

4619 Chicken Salad Bowl 1

4620 Chicken Salad Bowl 1

4621 Chicken Salad Bowl 1

4622 rows × 2 columns

In [170]:
#(12)    查看行索引从3开始到10结束（不包含）
chipo[3:15]
Out[170]:

order_id quantity item_name choice_description item_price

3 1 1 Chips and Tomatillo-Green Chili Salsa NaN $2.39

4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98

5 3 1 Chicken Bowl [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... $10.98

6 3 1 Side of Chips NaN $1.69

7 4 1 Steak Burrito [Tomatillo Red Chili Salsa, [Fajita Vegetables... $11.75

8 4 1 Steak Soft Tacos [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... $9.25

9 5 1 Steak Burrito [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... $9.25

10 5 1 Chips and Guacamole NaN $4.45

11 6 1 Chicken Crispy Tacos [Roasted Chili Corn Salsa, [Fajita Vegetables,... $8.75

12 6 1 Chicken Soft Tacos [Roasted Chili Corn Salsa, [Rice, Black Beans,... $8.75

13 7 1 Chicken Bowl [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... $11.25

14 7 1 Chips and Guacamole NaN $4.45

In [172]:
#(13)  查看销售数量大于5的商品订单信息
cond = chipo.quantity>5
#返回值是boolean类型的Series
chipo[cond]
#返回数量quantity>5的商品订单信息
Out[172]:

order_id quantity item_name choice_description item_price

3598 1443 15 Chips and Fresh Tomato Salsa NaN $44.25

3599 1443 7 Bottled Water NaN $10.50

3887 1559 8 Side of Chips NaN $13.52

4152 1660 10 Bottled Water NaN $15.00

In [173]:
#(14)  查看销售数量大于50，商品名称为’Bottled Water’的订单信息
cond = (chipo.quantity>5) & (chipo.item_name =='Bottled Water')#与运算，返回布尔值
chipo[cond]
Out[173]:

order_id quantity item_name choice_description item_price

3599 1443 7 Bottled Water NaN $10.50

4152 1660 10 Bottled Water NaN $15.00

In [180]:
#(15)  被下单数最多商品(item)是什么?
#chipo[['item_name','quantity']].groupby(by=['item_name']).sum().sort_values(by=['quantity'],ascending=False)
chipo["item_name"].value_counts().head(1)
#下单数最多的商品是Chicken Bowl
Out[180]:
Chicken Bowl    726
Name: item_name, dtype: int64
In [166]:
#(16) 在item_name这一列中，一共有多少种商品被下单？
len(chipo["item_name"].unique())
#chipo["item_name"].nunique()
Out[166]:
50
In [183]:
#(17)   在choice_description中，下单次数最多的商品是什么？
chipo[['choice_description','quantity']].groupby(by=['choice_description']).sum().sort_values(by=['quantity'],ascending=False)
chipo['choice_description'].value_counts().head(1)
Out[183]:
[Diet Coke]    134
Name: choice_description, dtype: int64
In [167]:
#(18)    一共有多少商品被下单？
chipo["quantity"].sum()
Out[167]:
4972
In [175]:
#(19) 将item_price转换为浮点数
print("转换前的数据类型",chipo["item_price"].dtypes)
for i in range(len(chipo["item_price"])):chipo["item_price"][i]=chipo["item_price"][i].replace('$','')
chipo["item_price"]=chipo["item_price"].astype('float')
print("转换后的数据类型",chipo["item_price"].dtypes)
转换前的数据类型 object
C:\Anaconda3\lib\site-packages\ipykernel_launcher.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrameSee the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copyafter removing the cwd from sys.path.
转换后的数据类型 float64
In [176]:
#(20)  在该数据集对应的时期内，收入(revenue)是多少？
chipo['sub_total'] = round(chipo['item_price'] * chipo['quantity'],2)#单价x数量
chipo['sub_total'].sum()
Out[176]:
39237.02
In [174]:
#(21) 在该数据集对应的时期内，一共有多少订单？
len(chipo["order_id"].unique())
#chipo["order_id"].nunique()
Out[174]:
1834
In [177]:
#(22) 每一单(order)对应的平均总价是多少？
(chipo['quantity']*chipo['item_price']).sum()/chipo["order_id"].nunique()
Out[177]:
21.39423118865867

	order_id	quantity	item_name	choice_description	item_price
0	1	1	Chips and Fresh Tomato Salsa	NaN	$2.39
1	1	1	Izze	[Clementine]	$3.39
2	1	1	Nantucket Nectar	[Apple]	$3.39
3	1	1	Chips and Tomatillo-Green Chili Salsa	NaN	$2.39
4	2	2	Chicken Bowl	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$16.98
5	3	1	Chicken Bowl	[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...	$10.98
6	3	1	Side of Chips	NaN	$1.69
7	4	1	Steak Burrito	[Tomatillo Red Chili Salsa, [Fajita Vegetables...	$11.75
8	4	1	Steak Soft Tacos	[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...	$9.25
9	5	1	Steak Burrito	[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...	$9.25

	order_id	quantity	item_name	choice_description	item_price
4612	1831	1	Carnitas Bowl	[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...	$9.25
4613	1831	1	Chips	NaN	$2.15
4614	1831	1	Bottled Water	NaN	$1.50
4615	1832	1	Chicken Soft Tacos	[Fresh Tomato Salsa, [Rice, Cheese, Sour Cream]]	$8.75
4616	1832	1	Chips and Guacamole	NaN	$4.45
4617	1833	1	Steak Burrito	[Fresh Tomato Salsa, [Rice, Black Beans, Sour ...	$11.75
4618	1833	1	Steak Burrito	[Fresh Tomato Salsa, [Rice, Sour Cream, Cheese...	$11.75
4619	1834	1	Chicken Salad Bowl	[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...	$11.25
4620	1834	1	Chicken Salad Bowl	[Fresh Tomato Salsa, [Fajita Vegetables, Lettu...	$8.75
4621	1834	1	Chicken Salad Bowl	[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...	$8.75

	order_id	quantity
count	4622.000000	4622.000000
mean	927.254868	1.075725
std	528.890796	0.410186
min	1.000000	1.000000
25%	477.250000	1.000000
50%	926.000000	1.000000
75%	1393.000000	1.000000
max	1834.000000	15.000000

	item_name	quantity
0	Chips and Fresh Tomato Salsa	1
1	Izze	1
2	Nantucket Nectar	1
3	Chips and Tomatillo-Green Chili Salsa	1
4	Chicken Bowl	2
...	...	...
4617	Steak Burrito	1
4618	Steak Burrito	1
4619	Chicken Salad Bowl	1
4620	Chicken Salad Bowl	1
4621	Chicken Salad Bowl	1

	order_id	quantity	item_name	choice_description	item_price
3	1	1	Chips and Tomatillo-Green Chili Salsa	NaN	$2.39
4	2	2	Chicken Bowl	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$16.98
5	3	1	Chicken Bowl	[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...	$10.98
6	3	1	Side of Chips	NaN	$1.69
7	4	1	Steak Burrito	[Tomatillo Red Chili Salsa, [Fajita Vegetables...	$11.75
8	4	1	Steak Soft Tacos	[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...	$9.25
9	5	1	Steak Burrito	[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...	$9.25
10	5	1	Chips and Guacamole	NaN	$4.45
11	6	1	Chicken Crispy Tacos	[Roasted Chili Corn Salsa, [Fajita Vegetables,...	$8.75
12	6	1	Chicken Soft Tacos	[Roasted Chili Corn Salsa, [Rice, Black Beans,...	$8.75
13	7	1	Chicken Bowl	[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...	$11.25
14	7	1	Chips and Guacamole	NaN	$4.45

	order_id	quantity	item_name	choice_description	item_price
3598	1443	15	Chips and Fresh Tomato Salsa	NaN	$44.25
3599	1443	7	Bottled Water	NaN	$10.50
3887	1559	8	Side of Chips	NaN	$13.52
4152	1660	10	Bottled Water	NaN	$15.00

	order_id	quantity	item_name	choice_description	item_price
3599	1443	7	Bottled Water	NaN	$10.50
4152	1660	10	Bottled Water	NaN	$15.00

Python数据分析练习（二）数据分析工具Pandas相关推荐

python数据处理模块pandas_数据处理工具--Pandas模块
强大的数据处理模块Pandas,可以解决数据的预处理工作,如数据类型的转换.缺失值的处理.描述性统计分析和数据的汇总等一.序列与数据框的构造 Pandas模块的核心操作对象为序列和数据框.序列指数据 ...
【Python学习系列二十一】pandas库基本操作
pandas很强大,操作参考官网:http://pandas.pydata.org/pandas-docs/stable/ 也有一份10分钟入门的材料:http://pandas.pydata.org ...
python数据建模工具_python数据分析工具——Pandas、StatsModels、Scikit-Learn
Pandas Pandas是 Python下最强大的数据分析和探索工具.它包含高级的数据结构和精巧的工具,使得在 Python中处理数据非常快速和简单. Pandas构建在 Numpy之上,它使得以 ...
python基础知识及数据分析工具安装及简单使用(Numpy/Scipy/Matplotlib/Pandas/StatsModels/Scikit-Learn/Keras/Gensim))
Python介绍. Unix & Linux & Window & Mac 平台安装更新 Python3 及VSCode下Python环境配置配置 python基础知识及数据分 ...
Python中的数据可视化工具与方法——常用的数据分析包numpy、pandas、statistics的理解实现和可视化工具matplotlib的使用
Python中的数据可视化工具与方法本文主要总结了: 1.本人在初学python时对常用的数据分析包numpy.pandas.statistics的学习理解以及简单的实例实现 2.可视化工具matp ...
【Python有趣打卡】利用pandas完成数据分析项目（二）——爬微信好友+分析
今天依然是跟着罗罗攀学习数据分析,原创:罗罗攀(公众号:luoluopan1) Python有趣|数据可视化那些事(二) 今天主要是学习pyecharts(http://pyecharts.org/# ...
Python数据处理035：结构化数据分析工具Pandas之Pandas概览
Pandas是做数据分析最核心的一个工具.我们要先了解数据分析,才能更好的明白Pandas,因此,本文分为三个部分: 1.数据分析 2.Pandas概述 3.Pandas安装anaconda 文章目录 ...
数据分析---数据处理工具pandas（二）
文章目录数据分析---数据处理工具pandas(二) 一.Pandas数据结构Dataframe:基本概念及创建 1.DataFrame简介 2.创建Dataframe (1)方法一:由数组/lis ...
小白学 Python 数据分析（3）：Pandas （二）数据结构 Series
在家为国家做贡献太无聊,不如跟我一起学点 Python 顺便问一下,你们都喜欢什么什么样的文章封面图,老用这一张感觉有点丑人生苦短,我用 Python 前文传送门: 小白学 Python 数据分析( ...
python数据分析图_Python数据分析:手把手教你用Pandas生成可视化图表的教程
大家都知道,Matplotlib 是众多 Python 可视化包的鼻祖,也是Python最常用的标准可视化库,其功能非常强大,同时也非常复杂,想要搞明白并非易事.但自从Python进入3.0时代以后, ...

Python数据分析练习（二）数据分析工具Pandas