Python数据分析pandas入门练习题(七)
Python数据分析基础
- Preparation
- Exercise 1- MPG Cars
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).
- Step 3. Assign each to a variable called cars1 and cars2
- Step 4. Ops it seems our first dataset has some unnamed blank columns, fix cars1
- Step 5. What is the number of observations in each dataset?
- Step 6. Join cars1 and cars2 into a single DataFrame called cars
- Step 7. Ops there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.
- Step 8. Add the column owners to cars
- Exercise 2-Fictitious Names
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. Create the 3 DataFrames based on the followin raw data
- Step 3. Assign each to a variable called data1, data2, data3
- Step 4. Join the two dataframes along rows and assign all_data
- Step 5. Join the two dataframes along columns and assing to all_data_col
- Step 6. Print data3
- Step 7. Merge all_data and data3 along the subject_id value
- Step 8. Merge only the data that has the same 'subject_id' on both data1 and data2
- Step 9. Merge all values in data1 and data2, with matching records from both sides where available.
- Exercise 3-Housing Market
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. Create 3 differents Series, each of length 100, as follows:
- Step 3. Let's create a DataFrame by joinning the Series by column
- Step 4. Change the name of the columns to bedrs, bathrs, price_sqr_meter
- Step 5. Create a one column DataFrame with the values of the 3 Series and assign it to 'bigcolumn'
- Step 6. Ops it seems it is going only until index 99. Is it true?
- Step 7. Reindex the DataFrame so it goes from 0 to 299
- Conclusion
Preparation
需要数据集可以自行网上寻找或私聊博主,传到csdn,你们下载要会员,就不传了。下面数据集链接下载不一定能成功。
Exercise 1- MPG Cars
Introduction:
The following exercise utilizes data from UC Irvine Machine Learning Repository
Step 1. Import the necessary libraries
代码如下:
import pandas as pd
import numpy as np
Step 2. Import the first dataset cars1 and cars2.
Step 3. Assign each to a variable called cars1 and cars2
代码如下:
cars1 = pd.read_csv('cars1.csv')
cars2 = pd.read_csv('cars2.csv')
print(cars1.head())
print(cars2.head())
输出结果如下:
mpg cylinders displacement horsepower weight acceleration model \
0 18.0 8 307 130 3504 12.0 70
1 15.0 8 350 165 3693 11.5 70
2 18.0 8 318 150 3436 11.0 70
3 16.0 8 304 150 3433 12.0 70
4 17.0 8 302 140 3449 10.5 70 origin car Unnamed: 9 Unnamed: 10 Unnamed: 11 \
0 1 chevrolet chevelle malibu NaN NaN NaN
1 1 buick skylark 320 NaN NaN NaN
2 1 plymouth satellite NaN NaN NaN
3 1 amc rebel sst NaN NaN NaN
4 1 ford torino NaN NaN NaN Unnamed: 12 Unnamed: 13
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN mpg cylinders displacement horsepower weight acceleration model \
0 33.0 4 91 53 1795 17.4 76
1 20.0 6 225 100 3651 17.7 76
2 18.0 6 250 78 3574 21.0 76
3 18.5 6 250 110 3645 16.2 76
4 17.5 6 258 95 3193 17.8 76 origin car
0 3 honda civic
1 1 dodge aspen se
2 1 ford granada ghia
3 1 pontiac ventura sj
4 1 amc pacer d/l
Step 4. Ops it seems our first dataset has some unnamed blank columns, fix cars1
代码如下:
# cars1.dropna(axis=1)
cars1 = cars1.loc[:, "mpg" : "car"] # 取mpg列到car列,赋值给cars1
cars1.head()
输出结果如下:
mpg | cylinders | displacement | horsepower | weight | acceleration | model | origin | car | |
---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307 | 130 | 3504 | 12.0 | 70 | 1 | chevrolet chevelle malibu |
1 | 15.0 | 8 | 350 | 165 | 3693 | 11.5 | 70 | 1 | buick skylark 320 |
2 | 18.0 | 8 | 318 | 150 | 3436 | 11.0 | 70 | 1 | plymouth satellite |
3 | 16.0 | 8 | 304 | 150 | 3433 | 12.0 | 70 | 1 | amc rebel sst |
4 | 17.0 | 8 | 302 | 140 | 3449 | 10.5 | 70 | 1 | ford torino |
Step 5. What is the number of observations in each dataset?
代码如下:
print(cars1.shape)
print(cars2.shape)
输出结果如下:
(198, 9)
(200, 9)
Step 6. Join cars1 and cars2 into a single DataFrame called cars
代码如下:
cars = cars1.append(cars2) # cars1后追加cars2
# 或者cars = pd.concat([cars1, cars2], axis=0, ignore_index=True)
cars
输出结果如下:
mpg | cylinders | displacement | horsepower | weight | acceleration | model | origin | car | |
---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307 | 130 | 3504 | 12.0 | 70 | 1 | chevrolet chevelle malibu |
1 | 15.0 | 8 | 350 | 165 | 3693 | 11.5 | 70 | 1 | buick skylark 320 |
2 | 18.0 | 8 | 318 | 150 | 3436 | 11.0 | 70 | 1 | plymouth satellite |
3 | 16.0 | 8 | 304 | 150 | 3433 | 12.0 | 70 | 1 | amc rebel sst |
4 | 17.0 | 8 | 302 | 140 | 3449 | 10.5 | 70 | 1 | ford torino |
5 | 15.0 | 8 | 429 | 198 | 4341 | 10.0 | 70 | 1 | ford galaxie 500 |
6 | 14.0 | 8 | 454 | 220 | 4354 | 9.0 | 70 | 1 | chevrolet impala |
7 | 14.0 | 8 | 440 | 215 | 4312 | 8.5 | 70 | 1 | plymouth fury iii |
8 | 14.0 | 8 | 455 | 225 | 4425 | 10.0 | 70 | 1 | pontiac catalina |
9 | 15.0 | 8 | 390 | 190 | 3850 | 8.5 | 70 | 1 | amc ambassador dpl |
10 | 15.0 | 8 | 383 | 170 | 3563 | 10.0 | 70 | 1 | dodge challenger se |
11 | 14.0 | 8 | 340 | 160 | 3609 | 8.0 | 70 | 1 | plymouth 'cuda 340 |
12 | 15.0 | 8 | 400 | 150 | 3761 | 9.5 | 70 | 1 | chevrolet monte carlo |
13 | 14.0 | 8 | 455 | 225 | 3086 | 10.0 | 70 | 1 | buick estate wagon (sw) |
14 | 24.0 | 4 | 113 | 95 | 2372 | 15.0 | 70 | 3 | toyota corona mark ii |
15 | 22.0 | 6 | 198 | 95 | 2833 | 15.5 | 70 | 1 | plymouth duster |
16 | 18.0 | 6 | 199 | 97 | 2774 | 15.5 | 70 | 1 | amc hornet |
17 | 21.0 | 6 | 200 | 85 | 2587 | 16.0 | 70 | 1 | ford maverick |
18 | 27.0 | 4 | 97 | 88 | 2130 | 14.5 | 70 | 3 | datsun pl510 |
19 | 26.0 | 4 | 97 | 46 | 1835 | 20.5 | 70 | 2 | volkswagen 1131 deluxe sedan |
20 | 25.0 | 4 | 110 | 87 | 2672 | 17.5 | 70 | 2 | peugeot 504 |
21 | 24.0 | 4 | 107 | 90 | 2430 | 14.5 | 70 | 2 | audi 100 ls |
22 | 25.0 | 4 | 104 | 95 | 2375 | 17.5 | 70 | 2 | saab 99e |
23 | 26.0 | 4 | 121 | 113 | 2234 | 12.5 | 70 | 2 | bmw 2002 |
24 | 21.0 | 6 | 199 | 90 | 2648 | 15.0 | 70 | 1 | amc gremlin |
25 | 10.0 | 8 | 360 | 215 | 4615 | 14.0 | 70 | 1 | ford f250 |
26 | 10.0 | 8 | 307 | 200 | 4376 | 15.0 | 70 | 1 | chevy c20 |
27 | 11.0 | 8 | 318 | 210 | 4382 | 13.5 | 70 | 1 | dodge d200 |
28 | 9.0 | 8 | 304 | 193 | 4732 | 18.5 | 70 | 1 | hi 1200d |
29 | 27.0 | 4 | 97 | 88 | 2130 | 14.5 | 71 | 3 | datsun pl510 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
170 | 27.0 | 4 | 112 | 88 | 2640 | 18.6 | 82 | 1 | chevrolet cavalier wagon |
171 | 34.0 | 4 | 112 | 88 | 2395 | 18.0 | 82 | 1 | chevrolet cavalier 2-door |
172 | 31.0 | 4 | 112 | 85 | 2575 | 16.2 | 82 | 1 | pontiac j2000 se hatchback |
173 | 29.0 | 4 | 135 | 84 | 2525 | 16.0 | 82 | 1 | dodge aries se |
174 | 27.0 | 4 | 151 | 90 | 2735 | 18.0 | 82 | 1 | pontiac phoenix |
175 | 24.0 | 4 | 140 | 92 | 2865 | 16.4 | 82 | 1 | ford fairmont futura |
176 | 23.0 | 4 | 151 | ? | 3035 | 20.5 | 82 | 1 | amc concord dl |
177 | 36.0 | 4 | 105 | 74 | 1980 | 15.3 | 82 | 2 | volkswagen rabbit l |
178 | 37.0 | 4 | 91 | 68 | 2025 | 18.2 | 82 | 3 | mazda glc custom l |
179 | 31.0 | 4 | 91 | 68 | 1970 | 17.6 | 82 | 3 | mazda glc custom |
180 | 38.0 | 4 | 105 | 63 | 2125 | 14.7 | 82 | 1 | plymouth horizon miser |
181 | 36.0 | 4 | 98 | 70 | 2125 | 17.3 | 82 | 1 | mercury lynx l |
182 | 36.0 | 4 | 120 | 88 | 2160 | 14.5 | 82 | 3 | nissan stanza xe |
183 | 36.0 | 4 | 107 | 75 | 2205 | 14.5 | 82 | 3 | honda accord |
184 | 34.0 | 4 | 108 | 70 | 2245 | 16.9 | 82 | 3 | toyota corolla |
185 | 38.0 | 4 | 91 | 67 | 1965 | 15.0 | 82 | 3 | honda civic |
186 | 32.0 | 4 | 91 | 67 | 1965 | 15.7 | 82 | 3 | honda civic (auto) |
187 | 38.0 | 4 | 91 | 67 | 1995 | 16.2 | 82 | 3 | datsun 310 gx |
188 | 25.0 | 6 | 181 | 110 | 2945 | 16.4 | 82 | 1 | buick century limited |
189 | 38.0 | 6 | 262 | 85 | 3015 | 17.0 | 82 | 1 | oldsmobile cutlass ciera (diesel) |
190 | 26.0 | 4 | 156 | 92 | 2585 | 14.5 | 82 | 1 | chrysler lebaron medallion |
191 | 22.0 | 6 | 232 | 112 | 2835 | 14.7 | 82 | 1 | ford granada l |
192 | 32.0 | 4 | 144 | 96 | 2665 | 13.9 | 82 | 3 | toyota celica gt |
193 | 36.0 | 4 | 135 | 84 | 2370 | 13.0 | 82 | 1 | dodge charger 2.2 |
194 | 27.0 | 4 | 151 | 90 | 2950 | 17.3 | 82 | 1 | chevrolet camaro |
195 | 27.0 | 4 | 140 | 86 | 2790 | 15.6 | 82 | 1 | ford mustang gl |
196 | 44.0 | 4 | 97 | 52 | 2130 | 24.6 | 82 | 2 | vw pickup |
197 | 32.0 | 4 | 135 | 84 | 2295 | 11.6 | 82 | 1 | dodge rampage |
198 | 28.0 | 4 | 120 | 79 | 2625 | 18.6 | 82 | 1 | ford ranger |
199 | 31.0 | 4 | 119 | 82 | 2720 | 19.4 | 82 | 1 | chevy s-10 |
398 rows × 9 columns
Step 7. Ops there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.
代码如下:
# 创建一个随机的Series,数的范围为15000到73000
my_owners = np.random.randint(15000, 73000, 398)
my_owners
输出结果如下:
array([30395, 42733, 44554, 34325, 50270, 60139, 24218, 25925, 42502,45041, 21449, 34472, 42783, 56380, 15707, 25707, 61160, 29297,42237, 72966, 71738, 56392, 69335, 17479, 30914, 29516, 36953,51000, 39315, 32876, 18305, 27092, 16590, 46419, 32564, 72843,46094, 50032, 22524, 16894, 54936, 18294, 44021, 42157, 61278,55678, 58345, 32391, 17736, 56275, 21903, 47867, 22928, 52829,67523, 55847, 25127, 40745, 23557, 54718, 18046, 35915, 65050,49568, 61822, 60210, 17202, 30865, 71921, 23434, 66579, 55818,56517, 33692, 55612, 32730, 22067, 65470, 50373, 58544, 38244,21356, 70010, 49500, 56970, 50040, 48606, 65609, 37288, 19547,32552, 71469, 69222, 36178, 44561, 40260, 44320, 28935, 57835,24374, 65163, 43465, 22097, 59672, 42933, 47359, 18186, 17173,66674, 55787, 29976, 40561, 36443, 68754, 48264, 31182, 70643,15752, 29759, 37604, 21019, 49529, 61506, 25802, 29858, 23015,27686, 65069, 62086, 33228, 16118, 71662, 70313, 24824, 24145,65096, 20493, 30484, 68996, 26227, 53008, 53758, 18948, 54496,64296, 50249, 35804, 44871, 39478, 63729, 19158, 21156, 64732,16314, 51031, 36171, 16193, 17655, 38842, 61288, 31747, 50995,63973, 52996, 26386, 57648, 26917, 59280, 30409, 27326, 48687,20302, 54604, 62031, 62863, 31196, 67807, 30862, 66646, 20763,65260, 66917, 67245, 26877, 24180, 70477, 46640, 36947, 16129,55475, 32569, 53886, 19898, 62866, 42115, 18904, 28941, 48321,28726, 19294, 17524, 30191, 29962, 64426, 60301, 71109, 70145,60671, 62912, 57491, 48347, 28355, 29315, 39817, 71448, 62550,59895, 17500, 21399, 52074, 32021, 54743, 67416, 27439, 18368,21339, 18891, 26910, 66961, 15866, 71688, 24802, 15530, 23647,44735, 72447, 64943, 67634, 67242, 61201, 36495, 42778, 66391,25980, 61012, 51792, 45485, 52052, 27935, 66677, 29556, 67718,63235, 66715, 39916, 54433, 63466, 61667, 21403, 53130, 45514,55541, 54951, 66835, 37705, 34943, 18583, 26945, 31816, 30104,52488, 46073, 39184, 26461, 64275, 60612, 27026, 37623, 22297,33671, 53580, 38553, 29536, 56143, 47368, 16612, 54661, 49403,70564, 30202, 56649, 26010, 65496, 63384, 17810, 64697, 48685,56686, 35658, 15539, 49614, 42165, 17433, 51415, 35637, 50719,47660, 15843, 45879, 41314, 39516, 61481, 68731, 29011, 51430,20347, 41176, 50809, 55824, 37399, 40692, 18155, 69199, 38232,32516, 57175, 38183, 21583, 66353, 18430, 16846, 61518, 70780,71784, 38712, 61313, 55800, 61001, 52706, 18203, 17225, 66550,34556, 25500, 65731, 15544, 69825, 68116, 34481, 60377, 29735,47846, 51439, 53054, 45308, 66654, 65698, 18421, 59846, 15493,53974, 41658, 30768, 23367, 15484, 28173, 18845, 15455, 42450,18834, 59814, 55643, 38475, 45623, 23382, 50896, 66593, 72178,29783, 39787, 46350, 42547, 65359, 62119, 53808, 45300, 48233,34077, 60663, 46497, 48174, 19764, 56893, 52080, 41104, 21126,56865, 39795])
Step 8. Add the column owners to cars
代码如下:
# 增加一列owners
cars['owners'] = my_owners
cars.tail()
输出结果如下:
mpg | cylinders | displacement | horsepower | weight | acceleration | model | origin | car | owners | |
---|---|---|---|---|---|---|---|---|---|---|
195 | 27.0 | 4 | 140 | 86 | 2790 | 15.6 | 82 | 1 | ford mustang gl | 52080 |
196 | 44.0 | 4 | 97 | 52 | 2130 | 24.6 | 82 | 2 | vw pickup | 41104 |
197 | 32.0 | 4 | 135 | 84 | 2295 | 11.6 | 82 | 1 | dodge rampage | 21126 |
198 | 28.0 | 4 | 120 | 79 | 2625 | 18.6 | 82 | 1 | ford ranger | 56865 |
199 | 31.0 | 4 | 119 | 82 | 2720 | 19.4 | 82 | 1 | chevy s-10 | 39795 |
Exercise 2-Fictitious Names
Introduction:
This time you will create a data again
Special thanks to Chris Albon for sharing the dataset and materials.
All the credits to this exercise belongs to him.
In order to understand about it go here.
Step 1. Import the necessary libraries
代码如下:
import pandas as pd
Step 2. Create the 3 DataFrames based on the followin raw data
代码如下:
raw_data_1 = {'subject_id': ['1', '2', '3', '4', '5'],'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}raw_data_2 = {'subject_id': ['4', '5', '6', '7', '8'],'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}raw_data_3 = {'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}
Step 3. Assign each to a variable called data1, data2, data3
代码如下:
data1 = pd.DataFrame(raw_data_1, columns = ['subject_id', 'first_name', 'last_name'])
data2 = pd.DataFrame(raw_data_2, columns = ['subject_id', 'first_name', 'last_name'])
data3 = pd.DataFrame(raw_data_3, columns = ['subject_id', 'test_id'])
print(data1)
print(data2)
print(data3)
输出结果如下:
subject_id first_name last_name
0 1 Alex Anderson
1 2 Amy Ackerman
2 3 Allen Ali
3 4 Alice Aoni
4 5 Ayoung Atichessubject_id first_name last_name
0 4 Billy Bonder
1 5 Brian Black
2 6 Bran Balwner
3 7 Bryce Brice
4 8 Betty Btisansubject_id test_id
0 1 51
1 2 15
2 3 15
3 4 61
4 5 16
5 7 14
6 8 15
7 9 1
8 10 61
9 11 16
Step 4. Join the two dataframes along rows and assign all_data
代码如下:
# all_data = data1.append(data2)
# all_data = pd.merge(data1, data2, how='outer')
# 以上两种方法也都可
all_data = pd.concat([data1, data2])
all_data
输出结果如下:
subject_id | first_name | last_name | |
---|---|---|---|
0 | 1 | Alex | Anderson |
1 | 2 | Amy | Ackerman |
2 | 3 | Allen | Ali |
3 | 4 | Alice | Aoni |
4 | 5 | Ayoung | Atiches |
0 | 4 | Billy | Bonder |
1 | 5 | Brian | Black |
2 | 6 | Bran | Balwner |
3 | 7 | Bryce | Brice |
4 | 8 | Betty | Btisan |
Step 5. Join the two dataframes along columns and assing to all_data_col
代码如下:
# 按列合并并赋值给all_data_col
all_data_col = pd.concat([data1, data2], axis=1)
all_data_col
输出结果如下:
subject_id | first_name | last_name | subject_id | first_name | last_name | |
---|---|---|---|---|---|---|
0 | 1 | Alex | Anderson | 4 | Billy | Bonder |
1 | 2 | Amy | Ackerman | 5 | Brian | Black |
2 | 3 | Allen | Ali | 6 | Bran | Balwner |
3 | 4 | Alice | Aoni | 7 | Bryce | Brice |
4 | 5 | Ayoung | Atiches | 8 | Betty | Btisan |
Step 6. Print data3
代码如下:
data3
输出结果如下:
subject_id | test_id | |
---|---|---|
0 | 1 | 51 |
1 | 2 | 15 |
2 | 3 | 15 |
3 | 4 | 61 |
4 | 5 | 16 |
5 | 7 | 14 |
6 | 8 | 15 |
7 | 9 | 1 |
8 | 10 | 61 |
9 | 11 | 16 |
Step 7. Merge all_data and data3 along the subject_id value
代码如下:
all_data_3 = pd.merge(all_data, data3, on='subject_id') # 默认how='inner'
all_data_3
输出结果如下:
subject_id | first_name | last_name | test_id | |
---|---|---|---|---|
0 | 1 | Alex | Anderson | 51 |
1 | 2 | Amy | Ackerman | 15 |
2 | 3 | Allen | Ali | 15 |
3 | 4 | Alice | Aoni | 61 |
4 | 4 | Billy | Bonder | 61 |
5 | 5 | Ayoung | Atiches | 16 |
6 | 5 | Brian | Black | 16 |
7 | 7 | Bryce | Brice | 14 |
8 | 8 | Betty | Btisan | 15 |
Step 8. Merge only the data that has the same ‘subject_id’ on both data1 and data2
代码如下:
data = pd.merge(data1, data2, on='subject_id', how='inner')
data
输出结果如下:
subject_id | first_name_x | last_name_x | first_name_y | last_name_y | |
---|---|---|---|---|---|
0 | 4 | Alice | Aoni | Billy | Bonder |
1 | 5 | Ayoung | Atiches | Brian | Black |
Step 9. Merge all values in data1 and data2, with matching records from both sides where available.
代码如下:
# 合并 data1 和 data2 中的所有值,并在可用的情况下使用双方的匹配记录。
pd.merge(data1, data2, on='subject_id', how='outer') # 可通过suffixes=['_A', '_B']设置保证合并不重复
输出结果如下:
subject_id | first_name_x | last_name_x | first_name_y | last_name_y | |
---|---|---|---|---|---|
0 | 1 | Alex | Anderson | NaN | NaN |
1 | 2 | Amy | Ackerman | NaN | NaN |
2 | 3 | Allen | Ali | NaN | NaN |
3 | 4 | Alice | Aoni | Billy | Bonder |
4 | 5 | Ayoung | Atiches | Brian | Black |
5 | 6 | NaN | NaN | Bran | Balwner |
6 | 7 | NaN | NaN | Bryce | Brice |
7 | 8 | NaN | NaN | Betty | Btisan |
Exercise 3-Housing Market
Introduction:
This time we will create our own dataset with fictional numbers to describe a house market. As we are going to create random data don’t try to reason of the numbers.
Step 1. Import the necessary libraries
代码如下:
import pandas as pd
import numpy as np
Step 2. Create 3 differents Series, each of length 100, as follows:
- The first a random number from 1 to 4
- The second a random number from 1 to 3
- The third a random number from 10,000 to 30,000
代码如下:
s1 = pd.Series(np.random.randint(1, 4, 100))
s2 = pd.Series(np.random.randint(1, 3, 100))
s3 = pd.Series(np.random.randint(10000, 30000, 100))
print(s1, s2, s3)
输出结果如下:
0 2
1 3
2 1
3 3
4 1
5 1
6 2
7 1
8 1
9 1
10 1
11 3
12 1
13 2
14 3
15 2
16 1
17 1
18 3
19 3
20 1
21 3
22 3
23 1
24 1
25 2
26 1
27 1
28 2
29 1..
70 1
71 1
72 3
73 2
74 2
75 1
76 2
77 1
78 3
79 2
80 3
81 3
82 3
83 2
84 1
85 3
86 2
87 1
88 3
89 3
90 1
91 3
92 2
93 3
94 1
95 2
96 3
97 2
98 3
99 1
Length: 100, dtype: int32 0 1
1 2
2 1
3 2
4 1
5 2
6 1
7 1
8 1
9 2
10 2
11 1
12 1
13 2
14 2
15 2
16 1
17 2
18 1
19 1
20 2
21 2
22 1
23 1
24 1
25 1
26 1
27 2
28 1
29 1..
70 2
71 2
72 1
73 1
74 1
75 1
76 2
77 2
78 2
79 2
80 1
81 2
82 1
83 2
84 1
85 1
86 2
87 2
88 1
89 2
90 1
91 2
92 1
93 1
94 1
95 2
96 1
97 1
98 2
99 2
Length: 100, dtype: int32 0 11973
1 10804
2 26866
3 25940
4 23147
5 14552
6 22151
7 19312
8 25373
9 29329
10 17069
11 19629
12 26174
13 20524
14 16489
15 22613
16 25266
17 11566
18 28599
19 27562
20 12922
21 29055
22 12709
23 21727
24 16735
25 20818
26 20199
27 21400
28 21602
29 16792...
70 10076
71 20091
72 28284
73 12185
74 15879
75 12907
76 24946
77 20168
78 24435
79 12175
80 18286
81 18001
82 10938
83 19116
84 12802
85 11623
86 15048
87 10624
88 18989
89 19797
90 17798
91 21317
92 27047
93 25692
94 27564
95 23411
96 18808
97 16854
98 21737
99 18968
Length: 100, dtype: int32
Step 3. Let’s create a DataFrame by joinning the Series by column
代码如下:
housemkt = pd.concat([s1, s2, s3], axis=1)
housemkt.head()
输出结果如下:
0 | 1 | 2 | |
---|---|---|---|
0 | 2 | 1 | 11973 |
1 | 3 | 2 | 10804 |
2 | 1 | 1 | 26866 |
3 | 3 | 2 | 25940 |
4 | 1 | 1 | 23147 |
Step 4. Change the name of the columns to bedrs, bathrs, price_sqr_meter
代码如下:
'''
rename函数主要用到的参数有:
columns:列名
index:行名
axis:指定坐标轴
inplace:是否替换,默认为False。inplace为False时返回修改后结果,变量自身不修改。inplace为True时返回None,变量自身被修改。
'''
housemkt.rename(columns={0: 'bedrs', 1: 'bathrs', 2: 'price_sqr_meter'}, inplace = True)
housemkt.head()
输出结果如下:
bedrs | bathrs | price_sqr_meter | |
---|---|---|---|
0 | 2 | 1 | 11973 |
1 | 3 | 2 | 10804 |
2 | 1 | 1 | 26866 |
3 | 3 | 2 | 25940 |
4 | 1 | 1 | 23147 |
Step 5. Create a one column DataFrame with the values of the 3 Series and assign it to ‘bigcolumn’
代码如下:
bigcolumn = pd.concat([s1, s2, s3], axis=0)
bigcolumn = bigcolumn.to_frame() # 可以将数组转换为DataFrame格式
print(type(bigcolumn))
bigcolumn
输出结果如下:
<class 'pandas.core.frame.DataFrame'>
0 | |
---|---|
0 | 2 |
1 | 3 |
2 | 1 |
3 | 3 |
4 | 1 |
5 | 1 |
6 | 2 |
7 | 1 |
8 | 1 |
9 | 1 |
10 | 1 |
11 | 3 |
12 | 1 |
13 | 2 |
14 | 3 |
15 | 2 |
16 | 1 |
17 | 1 |
18 | 3 |
19 | 3 |
20 | 1 |
21 | 3 |
22 | 3 |
23 | 1 |
24 | 1 |
25 | 2 |
26 | 1 |
27 | 1 |
28 | 2 |
29 | 1 |
... | ... |
70 | 10076 |
71 | 20091 |
72 | 28284 |
73 | 12185 |
74 | 15879 |
75 | 12907 |
76 | 24946 |
77 | 20168 |
78 | 24435 |
79 | 12175 |
80 | 18286 |
81 | 18001 |
82 | 10938 |
83 | 19116 |
84 | 12802 |
85 | 11623 |
86 | 15048 |
87 | 10624 |
88 | 18989 |
89 | 19797 |
90 | 17798 |
91 | 21317 |
92 | 27047 |
93 | 25692 |
94 | 27564 |
95 | 23411 |
96 | 18808 |
97 | 16854 |
98 | 21737 |
99 | 18968 |
300 rows × 1 columns
Step 6. Ops it seems it is going only until index 99. Is it true?
代码如下:
len(bigcolumn)
输出结果如下:
300
Step 7. Reindex the DataFrame so it goes from 0 to 299
代码如下:
# reset_index()函数,重置索引后,drop参数默认为False,想要删除原先的索引列要置为True.想要在原数据上修改要inplace=True.特别是不赋值的情况必须要加,否则drop无效
'''
set_index():
函数原型:DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
参数解释:
keys:列标签或列标签/数组列表,需要设置为索引的列
drop:默认为True,删除用作新索引的列
append:默认为False,是否将列附加到现有索引
inplace:默认为False,适当修改DataFrame(不要创建新对象)
verify_integrity:默认为false,检查新索引的副本。否则,请将检查推迟到必要时进行。将其设置为false将提高该方法的性能。reset_index():
函数原型:DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')
参数解释:
level:int、str、tuple或list,默认无,仅从索引中删除给定级别。默认情况下移除所有级别。控制了具体要还原的那个等级的索引
drop:drop为False则索引列会被还原为普通列,否则会丢失
inplace:默认为false,适当修改DataFrame(不要创建新对象)
col_level:int或str,默认值为0,如果列有多个级别,则确定将标签插入到哪个级别。默认情况下,它将插入到第一级。
col_fill:对象,默认‘’,如果列有多个级别,则确定其他级别的命名方式。如果没有,则重复索引名
注:reset_index还原分为两种类型,第一种是对原DataFrame进行reset,第二种是对使用过set_index()函数的DataFrame进行reset
'''
bigcolumn.reset_index(drop=True, inplace=True)
bigcolumn
输出结果如下:
0 | |
---|---|
0 | 2 |
1 | 3 |
2 | 1 |
3 | 3 |
4 | 1 |
5 | 1 |
6 | 2 |
7 | 1 |
8 | 1 |
9 | 1 |
10 | 1 |
11 | 3 |
12 | 1 |
13 | 2 |
14 | 3 |
15 | 2 |
16 | 1 |
17 | 1 |
18 | 3 |
19 | 3 |
20 | 1 |
21 | 3 |
22 | 3 |
23 | 1 |
24 | 1 |
25 | 2 |
26 | 1 |
27 | 1 |
28 | 2 |
29 | 1 |
... | ... |
270 | 10076 |
271 | 20091 |
272 | 28284 |
273 | 12185 |
274 | 15879 |
275 | 12907 |
276 | 24946 |
277 | 20168 |
278 | 24435 |
279 | 12175 |
280 | 18286 |
281 | 18001 |
282 | 10938 |
283 | 19116 |
284 | 12802 |
285 | 11623 |
286 | 15048 |
287 | 10624 |
288 | 18989 |
289 | 19797 |
290 | 17798 |
291 | 21317 |
292 | 27047 |
293 | 25692 |
294 | 27564 |
295 | 23411 |
296 | 18808 |
297 | 16854 |
298 | 21737 |
299 | 18968 |
300 rows × 1 columns
Conclusion
今天主要练习了合并函数的操作以及其他相关函数的使用。再次提醒本专栏pandas使用了anaconda—jupyter notebook。推荐使用,好用极了!
Python数据分析pandas入门练习题(七)相关推荐
- Python数据分析pandas入门练习题(四)
Python数据分析基础 Preparation Exercise 1 - Filtering and Sorting Data Step 1. Import the necessary librar ...
- Python数据分析pandas入门(一)------十分钟入门pandas
Python数据分析基础 一.导入常用库 二.创建对象 三.查看数据 四.选取 五.通过标签选取 六.通过位置选取 七.布尔索引 八.赋值 九.缺失值处理 十.运算与统计 十一.Apply函数的作用 ...
- python数据分析df_Python数据分析pandas入门!(附数据分析资料)
Python数据分析pandas入门!(附数据分析资料) 1.pandas数据结构之DataFrame+ 这是小编准备的python数据分析资料!进群:700341555即可获取! Python数据分 ...
- python数据分析从入门到精通电子工业出版社_荐书丨Python数据分析从入门到精通...
点击上方"程序人生",选择"置顶公众号" 第一时间关注程序猿(媛)身边的故事 采用Python 3.6版本,兼容Python 3.X等众多版本 一本书搞定IPy ...
- Python数据分析pandas之分组统计透视表
Python数据分析pandas之分组统计透视表 数据聚合统计 Padans里的聚合统计即是应用分组的方法对数据框进行聚合统计,常见的有min(最小).max(最大).avg(平均值).sum(求和) ...
- Python数据分析pandas之数据拼接与连接
Python数据分析pandas之数据拼接与连接 数据拼接处理 数据拼接处理指的是numpy.pandas里对数据的拼接.连接.合并等多种方法的概称.有时我们处理的数据会分很多步骤,而中间或者最终的结 ...
- Python数据分析技术入门
Python数据分析技术入门 数据分析入门指南 一.前言 二.Python基础知识 1. Python环境配置 2. Python基础语法 3. Python常用库的导入和安装 三.数据处理基础 1. ...
- Python数据分析——Pandas基础:dt.datetime与pivot_table()数据透视表
系列文章目录 Chapter 1:创建与探索DF.排序.子集化:Python数据分析--Pandas基础入门+代码(一) Chapter 2:聚合函数,groupby,统计分析:Python数据分析- ...
- Python数据分析pandas之dataframe初识
Python数据分析pandas之dataframe初识 声明与简介 pandas是一个基于python的.快速的.高效.灵活.易用的开源的数据处理.分析包(工具)..pandas构建在numpy之上 ...
- Python数据分析pandas之series初识
Python数据分析pandas之series初识 声明与简介 pandas是一个基于python的.快速的.高效.灵活.易用的开源的数据处理.分析包(工具)..pandas构建在numpy之上,它通 ...
最新文章
- Myeclipse启动不了的解决方法
- c++ django上传图片
- 如何防止话筒拾音的声学相位抵消?
- 国家“十三五”重点出版规划获批
- java 1.8签名apk_给Android的APK程序签名和重新签名的方法
- 前端学习(2658):vue3 computed
- oracle数据块调用存储过程,VC调用存储过程的通用方法(ORACLE篇)
- 修改marathon源码后,如何编译,部署到集群中?
- java 淘口令_简单实现淘口令
- Linux环境下向github上传代码(生成token、生成本地密钥)
- 让生活服务“说到做到”,美团语音应用平台的底气在哪里?
- 泡泡龙游戏c语言程序,《泡泡龙》发射技巧总结_图文攻略_高分攻略_百度攻略...
- springcloud微服务学习笔记(五十一):Config配置总控中心搭建
- 数据中心机房与机柜理线方法介绍
- 谷爱凌母亲 24 年前重磅采访:远见卓识的人,可以改变世界
- nginx做反向代理和后端web服务器之间的交互
- 7-12 打印倒直角三角形图案
- JSP+Struct+MySql基于BBS管理系统设计与实现(源代码+论文+中英资料+开题报告+答辩PPT)
- 方格走法-牛客网(排列组合和动态规划)
- html网页添加友链,如何为typecho添加独立友链页面