Python数据分析基础

Preparation
Exercise 1- MPG Cars
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).
- Step 3. Assign each to a variable called cars1 and cars2
- Step 4. Ops it seems our first dataset has some unnamed blank columns, fix cars1
- Step 5. What is the number of observations in each dataset?
- Step 6. Join cars1 and cars2 into a single DataFrame called cars
- Step 7. Ops there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.
- Step 8. Add the column owners to cars
Exercise 2-Fictitious Names
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. Create the 3 DataFrames based on the followin raw data
- Step 3. Assign each to a variable called data1, data2, data3
- Step 4. Join the two dataframes along rows and assign all_data
- Step 5. Join the two dataframes along columns and assing to all_data_col
- Step 6. Print data3
- Step 7. Merge all_data and data3 along the subject_id value
- Step 8. Merge only the data that has the same 'subject_id' on both data1 and data2
- Step 9. Merge all values in data1 and data2, with matching records from both sides where available.
Exercise 3-Housing Market
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. Create 3 differents Series, each of length 100, as follows:
- Step 3. Let's create a DataFrame by joinning the Series by column
- Step 4. Change the name of the columns to bedrs, bathrs, price_sqr_meter
- Step 5. Create a one column DataFrame with the values of the 3 Series and assign it to 'bigcolumn'
- Step 6. Ops it seems it is going only until index 99. Is it true?
- Step 7. Reindex the DataFrame so it goes from 0 to 299
Conclusion

Preparation

需要数据集可以自行网上寻找或私聊博主，传到csdn，你们下载要会员，就不传了。下面数据集链接下载不一定能成功。

Exercise 1- MPG Cars

Introduction:

The following exercise utilizes data from UC Irvine Machine Learning Repository

Step 1. Import the necessary libraries

代码如下：

import pandas as pd
import numpy as np

Step 2. Import the first dataset cars1 and cars2.

Step 3. Assign each to a variable called cars1 and cars2

代码如下：

cars1 = pd.read_csv('cars1.csv')
cars2 = pd.read_csv('cars2.csv')
print(cars1.head())
print(cars2.head())

输出结果如下：

    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70
1  15.0          8           350        165    3693          11.5     70
2  18.0          8           318        150    3436          11.0     70
3  16.0          8           304        150    3433          12.0     70
4  17.0          8           302        140    3449          10.5     70   origin                        car  Unnamed: 9  Unnamed: 10  Unnamed: 11  \
0       1  chevrolet chevelle malibu         NaN          NaN          NaN
1       1          buick skylark 320         NaN          NaN          NaN
2       1         plymouth satellite         NaN          NaN          NaN
3       1              amc rebel sst         NaN          NaN          NaN
4       1                ford torino         NaN          NaN          NaN   Unnamed: 12  Unnamed: 13
0          NaN          NaN
1          NaN          NaN
2          NaN          NaN
3          NaN          NaN
4          NaN          NaN  mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  33.0          4            91         53    1795          17.4     76
1  20.0          6           225        100    3651          17.7     76
2  18.0          6           250         78    3574          21.0     76
3  18.5          6           250        110    3645          16.2     76
4  17.5          6           258         95    3193          17.8     76   origin                 car
0       3         honda civic
1       1      dodge aspen se
2       1   ford granada ghia
3       1  pontiac ventura sj
4       1       amc pacer d/l

Step 4. Ops it seems our first dataset has some unnamed blank columns, fix cars1

代码如下：

# cars1.dropna(axis=1)
cars1 = cars1.loc[:, "mpg" : "car"]   # 取mpg列到car列，赋值给cars1
cars1.head()

输出结果如下：

	mpg	cylinders	displacement	horsepower	weight	acceleration	model	origin	car
0	18.0	8	307	130	3504	12.0	70	1	chevrolet chevelle malibu
1	15.0	8	350	165	3693	11.5	70	1	buick skylark 320
2	18.0	8	318	150	3436	11.0	70	1	plymouth satellite
3	16.0	8	304	150	3433	12.0	70	1	amc rebel sst
4	17.0	8	302	140	3449	10.5	70	1	ford torino

Step 5. What is the number of observations in each dataset?

代码如下：

print(cars1.shape)
print(cars2.shape)

输出结果如下：

(198, 9)
(200, 9)

Step 6. Join cars1 and cars2 into a single DataFrame called cars

代码如下：

cars = cars1.append(cars2)   # cars1后追加cars2
# 或者cars = pd.concat([cars1, cars2], axis=0, ignore_index=True)
cars

输出结果如下：

	mpg	cylinders	displacement	horsepower	weight	acceleration	model	origin	car
0	18.0	8	307	130	3504	12.0	70	1	chevrolet chevelle malibu
1	15.0	8	350	165	3693	11.5	70	1	buick skylark 320
2	18.0	8	318	150	3436	11.0	70	1	plymouth satellite
3	16.0	8	304	150	3433	12.0	70	1	amc rebel sst
4	17.0	8	302	140	3449	10.5	70	1	ford torino
5	15.0	8	429	198	4341	10.0	70	1	ford galaxie 500
6	14.0	8	454	220	4354	9.0	70	1	chevrolet impala
7	14.0	8	440	215	4312	8.5	70	1	plymouth fury iii
8	14.0	8	455	225	4425	10.0	70	1	pontiac catalina
9	15.0	8	390	190	3850	8.5	70	1	amc ambassador dpl
10	15.0	8	383	170	3563	10.0	70	1	dodge challenger se
11	14.0	8	340	160	3609	8.0	70	1	plymouth 'cuda 340
12	15.0	8	400	150	3761	9.5	70	1	chevrolet monte carlo
13	14.0	8	455	225	3086	10.0	70	1	buick estate wagon (sw)
14	24.0	4	113	95	2372	15.0	70	3	toyota corona mark ii
15	22.0	6	198	95	2833	15.5	70	1	plymouth duster
16	18.0	6	199	97	2774	15.5	70	1	amc hornet
17	21.0	6	200	85	2587	16.0	70	1	ford maverick
18	27.0	4	97	88	2130	14.5	70	3	datsun pl510
19	26.0	4	97	46	1835	20.5	70	2	volkswagen 1131 deluxe sedan
20	25.0	4	110	87	2672	17.5	70	2	peugeot 504
21	24.0	4	107	90	2430	14.5	70	2	audi 100 ls
22	25.0	4	104	95	2375	17.5	70	2	saab 99e
23	26.0	4	121	113	2234	12.5	70	2	bmw 2002
24	21.0	6	199	90	2648	15.0	70	1	amc gremlin
25	10.0	8	360	215	4615	14.0	70	1	ford f250
26	10.0	8	307	200	4376	15.0	70	1	chevy c20
27	11.0	8	318	210	4382	13.5	70	1	dodge d200
28	9.0	8	304	193	4732	18.5	70	1	hi 1200d
29	27.0	4	97	88	2130	14.5	71	3	datsun pl510
...	...	...	...	...	...	...	...	...	...
170	27.0	4	112	88	2640	18.6	82	1	chevrolet cavalier wagon
171	34.0	4	112	88	2395	18.0	82	1	chevrolet cavalier 2-door
172	31.0	4	112	85	2575	16.2	82	1	pontiac j2000 se hatchback
173	29.0	4	135	84	2525	16.0	82	1	dodge aries se
174	27.0	4	151	90	2735	18.0	82	1	pontiac phoenix
175	24.0	4	140	92	2865	16.4	82	1	ford fairmont futura
176	23.0	4	151	?	3035	20.5	82	1	amc concord dl
177	36.0	4	105	74	1980	15.3	82	2	volkswagen rabbit l
178	37.0	4	91	68	2025	18.2	82	3	mazda glc custom l
179	31.0	4	91	68	1970	17.6	82	3	mazda glc custom
180	38.0	4	105	63	2125	14.7	82	1	plymouth horizon miser
181	36.0	4	98	70	2125	17.3	82	1	mercury lynx l
182	36.0	4	120	88	2160	14.5	82	3	nissan stanza xe
183	36.0	4	107	75	2205	14.5	82	3	honda accord
184	34.0	4	108	70	2245	16.9	82	3	toyota corolla
185	38.0	4	91	67	1965	15.0	82	3	honda civic
186	32.0	4	91	67	1965	15.7	82	3	honda civic (auto)
187	38.0	4	91	67	1995	16.2	82	3	datsun 310 gx
188	25.0	6	181	110	2945	16.4	82	1	buick century limited
189	38.0	6	262	85	3015	17.0	82	1	oldsmobile cutlass ciera (diesel)
190	26.0	4	156	92	2585	14.5	82	1	chrysler lebaron medallion
191	22.0	6	232	112	2835	14.7	82	1	ford granada l
192	32.0	4	144	96	2665	13.9	82	3	toyota celica gt
193	36.0	4	135	84	2370	13.0	82	1	dodge charger 2.2
194	27.0	4	151	90	2950	17.3	82	1	chevrolet camaro
195	27.0	4	140	86	2790	15.6	82	1	ford mustang gl
196	44.0	4	97	52	2130	24.6	82	2	vw pickup
197	32.0	4	135	84	2295	11.6	82	1	dodge rampage
198	28.0	4	120	79	2625	18.6	82	1	ford ranger
199	31.0	4	119	82	2720	19.4	82	1	chevy s-10

398 rows × 9 columns

Step 7. Ops there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

代码如下：

# 创建一个随机的Series，数的范围为15000到73000
my_owners = np.random.randint(15000, 73000, 398)
my_owners

输出结果如下：

array([30395, 42733, 44554, 34325, 50270, 60139, 24218, 25925, 42502,45041, 21449, 34472, 42783, 56380, 15707, 25707, 61160, 29297,42237, 72966, 71738, 56392, 69335, 17479, 30914, 29516, 36953,51000, 39315, 32876, 18305, 27092, 16590, 46419, 32564, 72843,46094, 50032, 22524, 16894, 54936, 18294, 44021, 42157, 61278,55678, 58345, 32391, 17736, 56275, 21903, 47867, 22928, 52829,67523, 55847, 25127, 40745, 23557, 54718, 18046, 35915, 65050,49568, 61822, 60210, 17202, 30865, 71921, 23434, 66579, 55818,56517, 33692, 55612, 32730, 22067, 65470, 50373, 58544, 38244,21356, 70010, 49500, 56970, 50040, 48606, 65609, 37288, 19547,32552, 71469, 69222, 36178, 44561, 40260, 44320, 28935, 57835,24374, 65163, 43465, 22097, 59672, 42933, 47359, 18186, 17173,66674, 55787, 29976, 40561, 36443, 68754, 48264, 31182, 70643,15752, 29759, 37604, 21019, 49529, 61506, 25802, 29858, 23015,27686, 65069, 62086, 33228, 16118, 71662, 70313, 24824, 24145,65096, 20493, 30484, 68996, 26227, 53008, 53758, 18948, 54496,64296, 50249, 35804, 44871, 39478, 63729, 19158, 21156, 64732,16314, 51031, 36171, 16193, 17655, 38842, 61288, 31747, 50995,63973, 52996, 26386, 57648, 26917, 59280, 30409, 27326, 48687,20302, 54604, 62031, 62863, 31196, 67807, 30862, 66646, 20763,65260, 66917, 67245, 26877, 24180, 70477, 46640, 36947, 16129,55475, 32569, 53886, 19898, 62866, 42115, 18904, 28941, 48321,28726, 19294, 17524, 30191, 29962, 64426, 60301, 71109, 70145,60671, 62912, 57491, 48347, 28355, 29315, 39817, 71448, 62550,59895, 17500, 21399, 52074, 32021, 54743, 67416, 27439, 18368,21339, 18891, 26910, 66961, 15866, 71688, 24802, 15530, 23647,44735, 72447, 64943, 67634, 67242, 61201, 36495, 42778, 66391,25980, 61012, 51792, 45485, 52052, 27935, 66677, 29556, 67718,63235, 66715, 39916, 54433, 63466, 61667, 21403, 53130, 45514,55541, 54951, 66835, 37705, 34943, 18583, 26945, 31816, 30104,52488, 46073, 39184, 26461, 64275, 60612, 27026, 37623, 22297,33671, 53580, 38553, 29536, 56143, 47368, 16612, 54661, 49403,70564, 30202, 56649, 26010, 65496, 63384, 17810, 64697, 48685,56686, 35658, 15539, 49614, 42165, 17433, 51415, 35637, 50719,47660, 15843, 45879, 41314, 39516, 61481, 68731, 29011, 51430,20347, 41176, 50809, 55824, 37399, 40692, 18155, 69199, 38232,32516, 57175, 38183, 21583, 66353, 18430, 16846, 61518, 70780,71784, 38712, 61313, 55800, 61001, 52706, 18203, 17225, 66550,34556, 25500, 65731, 15544, 69825, 68116, 34481, 60377, 29735,47846, 51439, 53054, 45308, 66654, 65698, 18421, 59846, 15493,53974, 41658, 30768, 23367, 15484, 28173, 18845, 15455, 42450,18834, 59814, 55643, 38475, 45623, 23382, 50896, 66593, 72178,29783, 39787, 46350, 42547, 65359, 62119, 53808, 45300, 48233,34077, 60663, 46497, 48174, 19764, 56893, 52080, 41104, 21126,56865, 39795])

Step 8. Add the column owners to cars

代码如下：

# 增加一列owners
cars['owners'] = my_owners
cars.tail()

输出结果如下：

	mpg	cylinders	displacement	horsepower	weight	acceleration	model	origin	car	owners
195	27.0	4	140	86	2790	15.6	82	1	ford mustang gl	52080
196	44.0	4	97	52	2130	24.6	82	2	vw pickup	41104
197	32.0	4	135	84	2295	11.6	82	1	dodge rampage	21126
198	28.0	4	120	79	2625	18.6	82	1	ford ranger	56865
199	31.0	4	119	82	2720	19.4	82	1	chevy s-10	39795

Exercise 2-Fictitious Names

Introduction:

This time you will create a data again

Special thanks to Chris Albon for sharing the dataset and materials.
All the credits to this exercise belongs to him.

In order to understand about it go here.

Step 1. Import the necessary libraries

代码如下：

import pandas as pd

Step 2. Create the 3 DataFrames based on the followin raw data

代码如下：

raw_data_1 = {'subject_id': ['1', '2', '3', '4', '5'],'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}raw_data_2 = {'subject_id': ['4', '5', '6', '7', '8'],'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}raw_data_3 = {'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}

Step 3. Assign each to a variable called data1, data2, data3

代码如下：

data1 = pd.DataFrame(raw_data_1, columns = ['subject_id', 'first_name', 'last_name'])
data2 = pd.DataFrame(raw_data_2, columns = ['subject_id', 'first_name', 'last_name'])
data3 = pd.DataFrame(raw_data_3, columns = ['subject_id', 'test_id'])
print(data1)
print(data2)
print(data3)

输出结果如下：

  subject_id first_name last_name
0          1       Alex  Anderson
1          2        Amy  Ackerman
2          3      Allen       Ali
3          4      Alice      Aoni
4          5     Ayoung   Atichessubject_id first_name last_name
0          4      Billy    Bonder
1          5      Brian     Black
2          6       Bran   Balwner
3          7      Bryce     Brice
4          8      Betty    Btisansubject_id  test_id
0          1       51
1          2       15
2          3       15
3          4       61
4          5       16
5          7       14
6          8       15
7          9        1
8         10       61
9         11       16

Step 4. Join the two dataframes along rows and assign all_data

代码如下：

# all_data = data1.append(data2)
# all_data = pd.merge(data1, data2, how='outer')
# 以上两种方法也都可
all_data = pd.concat([data1, data2])
all_data

输出结果如下：

	subject_id	first_name	last_name
0	1	Alex	Anderson
1	2	Amy	Ackerman
2	3	Allen	Ali
3	4	Alice	Aoni
4	5	Ayoung	Atiches
0	4	Billy	Bonder
1	5	Brian	Black
2	6	Bran	Balwner
3	7	Bryce	Brice
4	8	Betty	Btisan

Step 5. Join the two dataframes along columns and assing to all_data_col

代码如下：

# 按列合并并赋值给all_data_col
all_data_col = pd.concat([data1, data2], axis=1)
all_data_col

输出结果如下：

	subject_id	first_name	last_name	subject_id	first_name	last_name
0	1	Alex	Anderson	4	Billy	Bonder
1	2	Amy	Ackerman	5	Brian	Black
2	3	Allen	Ali	6	Bran	Balwner
3	4	Alice	Aoni	7	Bryce	Brice
4	5	Ayoung	Atiches	8	Betty	Btisan

Step 6. Print data3

代码如下：

data3

输出结果如下：

	subject_id	test_id
0	1	51
1	2	15
2	3	15
3	4	61
4	5	16
5	7	14
6	8	15
7	9	1
8	10	61
9	11	16

Step 7. Merge all_data and data3 along the subject_id value

代码如下：

all_data_3 = pd.merge(all_data, data3, on='subject_id') # 默认how='inner'
all_data_3

输出结果如下：

	subject_id	first_name	last_name	test_id
0	1	Alex	Anderson	51
1	2	Amy	Ackerman	15
2	3	Allen	Ali	15
3	4	Alice	Aoni	61
4	4	Billy	Bonder	61
5	5	Ayoung	Atiches	16
6	5	Brian	Black	16
7	7	Bryce	Brice	14
8	8	Betty	Btisan	15

Step 8. Merge only the data that has the same ‘subject_id’ on both data1 and data2

代码如下：

data = pd.merge(data1, data2, on='subject_id', how='inner')
data

输出结果如下：

	subject_id	first_name_x	last_name_x	first_name_y	last_name_y
0	4	Alice	Aoni	Billy	Bonder
1	5	Ayoung	Atiches	Brian	Black

Step 9. Merge all values in data1 and data2, with matching records from both sides where available.

代码如下：

# 合并 data1 和 data2 中的所有值，并在可用的情况下使用双方的匹配记录。
pd.merge(data1, data2, on='subject_id', how='outer')  # 可通过suffixes=['_A', '_B']设置保证合并不重复

输出结果如下：

	subject_id	first_name_x	last_name_x	first_name_y	last_name_y
0	1	Alex	Anderson	NaN	NaN
1	2	Amy	Ackerman	NaN	NaN
2	3	Allen	Ali	NaN	NaN
3	4	Alice	Aoni	Billy	Bonder
4	5	Ayoung	Atiches	Brian	Black
5	6	NaN	NaN	Bran	Balwner
6	7	NaN	NaN	Bryce	Brice
7	8	NaN	NaN	Betty	Btisan

Exercise 3-Housing Market

Introduction:

This time we will create our own dataset with fictional numbers to describe a house market. As we are going to create random data don’t try to reason of the numbers.

Step 1. Import the necessary libraries

代码如下：

import pandas as pd
import numpy as np

Step 2. Create 3 differents Series, each of length 100, as follows:

The first a random number from 1 to 4
The second a random number from 1 to 3
The third a random number from 10,000 to 30,000

代码如下：

s1 = pd.Series(np.random.randint(1, 4, 100))
s2 = pd.Series(np.random.randint(1, 3, 100))
s3 = pd.Series(np.random.randint(10000, 30000, 100))
print(s1, s2, s3)

输出结果如下：

0     2
1     3
2     1
3     3
4     1
5     1
6     2
7     1
8     1
9     1
10    1
11    3
12    1
13    2
14    3
15    2
16    1
17    1
18    3
19    3
20    1
21    3
22    3
23    1
24    1
25    2
26    1
27    1
28    2
29    1..
70    1
71    1
72    3
73    2
74    2
75    1
76    2
77    1
78    3
79    2
80    3
81    3
82    3
83    2
84    1
85    3
86    2
87    1
88    3
89    3
90    1
91    3
92    2
93    3
94    1
95    2
96    3
97    2
98    3
99    1
Length: 100, dtype: int32 0     1
1     2
2     1
3     2
4     1
5     2
6     1
7     1
8     1
9     2
10    2
11    1
12    1
13    2
14    2
15    2
16    1
17    2
18    1
19    1
20    2
21    2
22    1
23    1
24    1
25    1
26    1
27    2
28    1
29    1..
70    2
71    2
72    1
73    1
74    1
75    1
76    2
77    2
78    2
79    2
80    1
81    2
82    1
83    2
84    1
85    1
86    2
87    2
88    1
89    2
90    1
91    2
92    1
93    1
94    1
95    2
96    1
97    1
98    2
99    2
Length: 100, dtype: int32 0     11973
1     10804
2     26866
3     25940
4     23147
5     14552
6     22151
7     19312
8     25373
9     29329
10    17069
11    19629
12    26174
13    20524
14    16489
15    22613
16    25266
17    11566
18    28599
19    27562
20    12922
21    29055
22    12709
23    21727
24    16735
25    20818
26    20199
27    21400
28    21602
29    16792...
70    10076
71    20091
72    28284
73    12185
74    15879
75    12907
76    24946
77    20168
78    24435
79    12175
80    18286
81    18001
82    10938
83    19116
84    12802
85    11623
86    15048
87    10624
88    18989
89    19797
90    17798
91    21317
92    27047
93    25692
94    27564
95    23411
96    18808
97    16854
98    21737
99    18968
Length: 100, dtype: int32

Step 3. Let’s create a DataFrame by joinning the Series by column

代码如下：

housemkt = pd.concat([s1, s2, s3], axis=1)
housemkt.head()

输出结果如下：

	0	1	2
0	2	1	11973
1	3	2	10804
2	1	1	26866
3	3	2	25940
4	1	1	23147

Step 4. Change the name of the columns to bedrs, bathrs, price_sqr_meter

代码如下：

'''
rename函数主要用到的参数有：
columns：列名
index：行名
axis：指定坐标轴
inplace：是否替换，默认为False。inplace为False时返回修改后结果，变量自身不修改。inplace为True时返回None,变量自身被修改。
'''
housemkt.rename(columns={0: 'bedrs', 1: 'bathrs', 2: 'price_sqr_meter'}, inplace = True)
housemkt.head()

输出结果如下：

	bedrs	bathrs	price_sqr_meter
0	2	1	11973
1	3	2	10804
2	1	1	26866
3	3	2	25940
4	1	1	23147

Step 5. Create a one column DataFrame with the values of the 3 Series and assign it to ‘bigcolumn’

代码如下：

bigcolumn = pd.concat([s1, s2, s3], axis=0)
bigcolumn = bigcolumn.to_frame()   # 可以将数组转换为DataFrame格式
print(type(bigcolumn))
bigcolumn

输出结果如下：

<class 'pandas.core.frame.DataFrame'>

	0
0	2
1	3
2	1
3	3
4	1
5	1
6	2
7	1
8	1
9	1
10	1
11	3
12	1
13	2
14	3
15	2
16	1
17	1
18	3
19	3
20	1
21	3
22	3
23	1
24	1
25	2
26	1
27	1
28	2
29	1
...	...
70	10076
71	20091
72	28284
73	12185
74	15879
75	12907
76	24946
77	20168
78	24435
79	12175
80	18286
81	18001
82	10938
83	19116
84	12802
85	11623
86	15048
87	10624
88	18989
89	19797
90	17798
91	21317
92	27047
93	25692
94	27564
95	23411
96	18808
97	16854
98	21737
99	18968

300 rows × 1 columns

Step 6. Ops it seems it is going only until index 99. Is it true?

代码如下：

len(bigcolumn)

输出结果如下：

Step 7. Reindex the DataFrame so it goes from 0 to 299

代码如下：

# reset_index()函数,重置索引后,drop参数默认为False,想要删除原先的索引列要置为True.想要在原数据上修改要inplace=True.特别是不赋值的情况必须要加,否则drop无效
'''
set_index():
函数原型：DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
参数解释：
keys：列标签或列标签/数组列表，需要设置为索引的列
drop：默认为True，删除用作新索引的列
append：默认为False，是否将列附加到现有索引
inplace：默认为False，适当修改DataFrame(不要创建新对象)
verify_integrity：默认为false，检查新索引的副本。否则，请将检查推迟到必要时进行。将其设置为false将提高该方法的性能。reset_index():
函数原型：DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')
参数解释:
level：int、str、tuple或list，默认无，仅从索引中删除给定级别。默认情况下移除所有级别。控制了具体要还原的那个等级的索引
drop：drop为False则索引列会被还原为普通列，否则会丢失
inplace：默认为false，适当修改DataFrame(不要创建新对象)
col_level：int或str，默认值为0，如果列有多个级别，则确定将标签插入到哪个级别。默认情况下，它将插入到第一级。
col_fill：对象，默认‘’，如果列有多个级别，则确定其他级别的命名方式。如果没有，则重复索引名
注：reset_index还原分为两种类型，第一种是对原DataFrame进行reset，第二种是对使用过set_index()函数的DataFrame进行reset
'''
bigcolumn.reset_index(drop=True, inplace=True)
bigcolumn

输出结果如下：

	0
0	2
1	3
2	1
3	3
4	1
5	1
6	2
7	1
8	1
9	1
10	1
11	3
12	1
13	2
14	3
15	2
16	1
17	1
18	3
19	3
20	1
21	3
22	3
23	1
24	1
25	2
26	1
27	1
28	2
29	1
...	...
270	10076
271	20091
272	28284
273	12185
274	15879
275	12907
276	24946
277	20168
278	24435
279	12175
280	18286
281	18001
282	10938
283	19116
284	12802
285	11623
286	15048
287	10624
288	18989
289	19797
290	17798
291	21317
292	27047
293	25692
294	27564
295	23411
296	18808
297	16854
298	21737
299	18968

300 rows × 1 columns

Conclusion

今天主要练习了合并函数的操作以及其他相关函数的使用。再次提醒本专栏pandas使用了anaconda—jupyter notebook。推荐使用，好用极了！

Python数据分析pandas入门练习题（七）相关推荐

Python数据分析pandas入门练习题（四）
Python数据分析基础 Preparation Exercise 1 - Filtering and Sorting Data Step 1. Import the necessary librar ...
Python数据分析pandas入门（一）------十分钟入门pandas
Python数据分析基础一.导入常用库二.创建对象三.查看数据四.选取五.通过标签选取六.通过位置选取七.布尔索引八.赋值九.缺失值处理十.运算与统计十一.Apply函数的作用 ...
python数据分析df_Python数据分析pandas入门！(附数据分析资料)
Python数据分析pandas入门!(附数据分析资料) 1.pandas数据结构之DataFrame+ 这是小编准备的python数据分析资料!进群:700341555即可获取! Python数据分 ...
python数据分析从入门到精通电子工业出版社_荐书丨Python数据分析从入门到精通...
点击上方"程序人生",选择"置顶公众号" 第一时间关注程序猿(媛)身边的故事采用Python 3.6版本,兼容Python 3.X等众多版本一本书搞定IPy ...
Python数据分析pandas之分组统计透视表
Python数据分析pandas之分组统计透视表数据聚合统计 Padans里的聚合统计即是应用分组的方法对数据框进行聚合统计,常见的有min(最小).max(最大).avg(平均值).sum(求和) ...
Python数据分析pandas之数据拼接与连接
Python数据分析pandas之数据拼接与连接数据拼接处理数据拼接处理指的是numpy.pandas里对数据的拼接.连接.合并等多种方法的概称.有时我们处理的数据会分很多步骤,而中间或者最终的结 ...
Python数据分析技术入门
Python数据分析技术入门数据分析入门指南一.前言二.Python基础知识 1. Python环境配置 2. Python基础语法 3. Python常用库的导入和安装三.数据处理基础 1. ...
Python数据分析——Pandas基础：dt.datetime与pivot_table()数据透视表
系列文章目录 Chapter 1:创建与探索DF.排序.子集化:Python数据分析--Pandas基础入门+代码(一) Chapter 2:聚合函数,groupby,统计分析:Python数据分析- ...
Python数据分析pandas之dataframe初识
Python数据分析pandas之dataframe初识声明与简介 pandas是一个基于python的.快速的.高效.灵活.易用的开源的数据处理.分析包(工具)..pandas构建在numpy之上 ...
Python数据分析pandas之series初识
Python数据分析pandas之series初识声明与简介 pandas是一个基于python的.快速的.高效.灵活.易用的开源的数据处理.分析包(工具)..pandas构建在numpy之上,它通 ...

Python数据分析pandas入门练习题（七）

Python数据分析基础

Preparation

Exercise 1- MPG Cars

Introduction:

Step 1. Import the necessary libraries

Step 2. Import the first dataset cars1 and cars2.

Step 3. Assign each to a variable called cars1 and cars2

Step 4. Ops it seems our first dataset has some unnamed blank columns, fix cars1

Step 5. What is the number of observations in each dataset?

Step 6. Join cars1 and cars2 into a single DataFrame called cars

Step 7. Ops there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

Step 8. Add the column owners to cars

Exercise 2-Fictitious Names

Introduction:

Step 1. Import the necessary libraries

Step 2. Create the 3 DataFrames based on the followin raw data

Step 3. Assign each to a variable called data1, data2, data3

Step 4. Join the two dataframes along rows and assign all_data

Step 5. Join the two dataframes along columns and assing to all_data_col

Step 6. Print data3

Step 7. Merge all_data and data3 along the subject_id value

Step 8. Merge only the data that has the same ‘subject_id’ on both data1 and data2

Step 9. Merge all values in data1 and data2, with matching records from both sides where available.

Exercise 3-Housing Market

Introduction:

Step 1. Import the necessary libraries

Step 2. Create 3 differents Series, each of length 100, as follows:

Step 3. Let’s create a DataFrame by joinning the Series by column

Step 4. Change the name of the columns to bedrs, bathrs, price_sqr_meter

Step 5. Create a one column DataFrame with the values of the 3 Series and assign it to ‘bigcolumn’

Step 6. Ops it seems it is going only until index 99. Is it true?

Step 7. Reindex the DataFrame so it goes from 0 to 299

Conclusion

Python数据分析pandas入门练习题（七）相关推荐

最新文章

热门文章