Python数据分析基础

  • Preparation
  • Exercise 1- MPG Cars
    • Introduction:
    • Step 1. Import the necessary libraries
    • Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).
    • Step 3. Assign each to a variable called cars1 and cars2
    • Step 4. Ops it seems our first dataset has some unnamed blank columns, fix cars1
    • Step 5. What is the number of observations in each dataset?
    • Step 6. Join cars1 and cars2 into a single DataFrame called cars
    • Step 7. Ops there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.
    • Step 8. Add the column owners to cars
  • Exercise 2-Fictitious Names
    • Introduction:
    • Step 1. Import the necessary libraries
    • Step 2. Create the 3 DataFrames based on the followin raw data
    • Step 3. Assign each to a variable called data1, data2, data3
    • Step 4. Join the two dataframes along rows and assign all_data
    • Step 5. Join the two dataframes along columns and assing to all_data_col
    • Step 6. Print data3
    • Step 7. Merge all_data and data3 along the subject_id value
    • Step 8. Merge only the data that has the same 'subject_id' on both data1 and data2
    • Step 9. Merge all values in data1 and data2, with matching records from both sides where available.
  • Exercise 3-Housing Market
    • Introduction:
    • Step 1. Import the necessary libraries
    • Step 2. Create 3 differents Series, each of length 100, as follows:
    • Step 3. Let's create a DataFrame by joinning the Series by column
    • Step 4. Change the name of the columns to bedrs, bathrs, price_sqr_meter
    • Step 5. Create a one column DataFrame with the values of the 3 Series and assign it to 'bigcolumn'
    • Step 6. Ops it seems it is going only until index 99. Is it true?
    • Step 7. Reindex the DataFrame so it goes from 0 to 299
  • Conclusion

Preparation

需要数据集可以自行网上寻找或私聊博主,传到csdn,你们下载要会员,就不传了。下面数据集链接下载不一定能成功。

Exercise 1- MPG Cars

Introduction:

The following exercise utilizes data from UC Irvine Machine Learning Repository

Step 1. Import the necessary libraries

代码如下:

import pandas as pd
import numpy as np

Step 2. Import the first dataset cars1 and cars2.

Step 3. Assign each to a variable called cars1 and cars2

代码如下:

cars1 = pd.read_csv('cars1.csv')
cars2 = pd.read_csv('cars2.csv')
print(cars1.head())
print(cars2.head())

输出结果如下:

    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70
1  15.0          8           350        165    3693          11.5     70
2  18.0          8           318        150    3436          11.0     70
3  16.0          8           304        150    3433          12.0     70
4  17.0          8           302        140    3449          10.5     70   origin                        car  Unnamed: 9  Unnamed: 10  Unnamed: 11  \
0       1  chevrolet chevelle malibu         NaN          NaN          NaN
1       1          buick skylark 320         NaN          NaN          NaN
2       1         plymouth satellite         NaN          NaN          NaN
3       1              amc rebel sst         NaN          NaN          NaN
4       1                ford torino         NaN          NaN          NaN   Unnamed: 12  Unnamed: 13
0          NaN          NaN
1          NaN          NaN
2          NaN          NaN
3          NaN          NaN
4          NaN          NaN  mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  33.0          4            91         53    1795          17.4     76
1  20.0          6           225        100    3651          17.7     76
2  18.0          6           250         78    3574          21.0     76
3  18.5          6           250        110    3645          16.2     76
4  17.5          6           258         95    3193          17.8     76   origin                 car
0       3         honda civic
1       1      dodge aspen se
2       1   ford granada ghia
3       1  pontiac ventura sj
4       1       amc pacer d/l

Step 4. Ops it seems our first dataset has some unnamed blank columns, fix cars1

代码如下:

# cars1.dropna(axis=1)
cars1 = cars1.loc[:, "mpg" : "car"]   # 取mpg列到car列,赋值给cars1
cars1.head()

输出结果如下:

mpg cylinders displacement horsepower weight acceleration model origin car
0 18.0 8 307 130 3504 12.0 70 1 chevrolet chevelle malibu
1 15.0 8 350 165 3693 11.5 70 1 buick skylark 320
2 18.0 8 318 150 3436 11.0 70 1 plymouth satellite
3 16.0 8 304 150 3433 12.0 70 1 amc rebel sst
4 17.0 8 302 140 3449 10.5 70 1 ford torino

Step 5. What is the number of observations in each dataset?

代码如下:

print(cars1.shape)
print(cars2.shape)

输出结果如下:

(198, 9)
(200, 9)

Step 6. Join cars1 and cars2 into a single DataFrame called cars

代码如下:

cars = cars1.append(cars2)   # cars1后追加cars2
# 或者cars = pd.concat([cars1, cars2], axis=0, ignore_index=True)
cars

输出结果如下:

mpg cylinders displacement horsepower weight acceleration model origin car
0 18.0 8 307 130 3504 12.0 70 1 chevrolet chevelle malibu
1 15.0 8 350 165 3693 11.5 70 1 buick skylark 320
2 18.0 8 318 150 3436 11.0 70 1 plymouth satellite
3 16.0 8 304 150 3433 12.0 70 1 amc rebel sst
4 17.0 8 302 140 3449 10.5 70 1 ford torino
5 15.0 8 429 198 4341 10.0 70 1 ford galaxie 500
6 14.0 8 454 220 4354 9.0 70 1 chevrolet impala
7 14.0 8 440 215 4312 8.5 70 1 plymouth fury iii
8 14.0 8 455 225 4425 10.0 70 1 pontiac catalina
9 15.0 8 390 190 3850 8.5 70 1 amc ambassador dpl
10 15.0 8 383 170 3563 10.0 70 1 dodge challenger se
11 14.0 8 340 160 3609 8.0 70 1 plymouth 'cuda 340
12 15.0 8 400 150 3761 9.5 70 1 chevrolet monte carlo
13 14.0 8 455 225 3086 10.0 70 1 buick estate wagon (sw)
14 24.0 4 113 95 2372 15.0 70 3 toyota corona mark ii
15 22.0 6 198 95 2833 15.5 70 1 plymouth duster
16 18.0 6 199 97 2774 15.5 70 1 amc hornet
17 21.0 6 200 85 2587 16.0 70 1 ford maverick
18 27.0 4 97 88 2130 14.5 70 3 datsun pl510
19 26.0 4 97 46 1835 20.5 70 2 volkswagen 1131 deluxe sedan
20 25.0 4 110 87 2672 17.5 70 2 peugeot 504
21 24.0 4 107 90 2430 14.5 70 2 audi 100 ls
22 25.0 4 104 95 2375 17.5 70 2 saab 99e
23 26.0 4 121 113 2234 12.5 70 2 bmw 2002
24 21.0 6 199 90 2648 15.0 70 1 amc gremlin
25 10.0 8 360 215 4615 14.0 70 1 ford f250
26 10.0 8 307 200 4376 15.0 70 1 chevy c20
27 11.0 8 318 210 4382 13.5 70 1 dodge d200
28 9.0 8 304 193 4732 18.5 70 1 hi 1200d
29 27.0 4 97 88 2130 14.5 71 3 datsun pl510
... ... ... ... ... ... ... ... ... ...
170 27.0 4 112 88 2640 18.6 82 1 chevrolet cavalier wagon
171 34.0 4 112 88 2395 18.0 82 1 chevrolet cavalier 2-door
172 31.0 4 112 85 2575 16.2 82 1 pontiac j2000 se hatchback
173 29.0 4 135 84 2525 16.0 82 1 dodge aries se
174 27.0 4 151 90 2735 18.0 82 1 pontiac phoenix
175 24.0 4 140 92 2865 16.4 82 1 ford fairmont futura
176 23.0 4 151 ? 3035 20.5 82 1 amc concord dl
177 36.0 4 105 74 1980 15.3 82 2 volkswagen rabbit l
178 37.0 4 91 68 2025 18.2 82 3 mazda glc custom l
179 31.0 4 91 68 1970 17.6 82 3 mazda glc custom
180 38.0 4 105 63 2125 14.7 82 1 plymouth horizon miser
181 36.0 4 98 70 2125 17.3 82 1 mercury lynx l
182 36.0 4 120 88 2160 14.5 82 3 nissan stanza xe
183 36.0 4 107 75 2205 14.5 82 3 honda accord
184 34.0 4 108 70 2245 16.9 82 3 toyota corolla
185 38.0 4 91 67 1965 15.0 82 3 honda civic
186 32.0 4 91 67 1965 15.7 82 3 honda civic (auto)
187 38.0 4 91 67 1995 16.2 82 3 datsun 310 gx
188 25.0 6 181 110 2945 16.4 82 1 buick century limited
189 38.0 6 262 85 3015 17.0 82 1 oldsmobile cutlass ciera (diesel)
190 26.0 4 156 92 2585 14.5 82 1 chrysler lebaron medallion
191 22.0 6 232 112 2835 14.7 82 1 ford granada l
192 32.0 4 144 96 2665 13.9 82 3 toyota celica gt
193 36.0 4 135 84 2370 13.0 82 1 dodge charger 2.2
194 27.0 4 151 90 2950 17.3 82 1 chevrolet camaro
195 27.0 4 140 86 2790 15.6 82 1 ford mustang gl
196 44.0 4 97 52 2130 24.6 82 2 vw pickup
197 32.0 4 135 84 2295 11.6 82 1 dodge rampage
198 28.0 4 120 79 2625 18.6 82 1 ford ranger
199 31.0 4 119 82 2720 19.4 82 1 chevy s-10

398 rows × 9 columns

Step 7. Ops there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

代码如下:

# 创建一个随机的Series,数的范围为15000到73000
my_owners = np.random.randint(15000, 73000, 398)
my_owners

输出结果如下:

array([30395, 42733, 44554, 34325, 50270, 60139, 24218, 25925, 42502,45041, 21449, 34472, 42783, 56380, 15707, 25707, 61160, 29297,42237, 72966, 71738, 56392, 69335, 17479, 30914, 29516, 36953,51000, 39315, 32876, 18305, 27092, 16590, 46419, 32564, 72843,46094, 50032, 22524, 16894, 54936, 18294, 44021, 42157, 61278,55678, 58345, 32391, 17736, 56275, 21903, 47867, 22928, 52829,67523, 55847, 25127, 40745, 23557, 54718, 18046, 35915, 65050,49568, 61822, 60210, 17202, 30865, 71921, 23434, 66579, 55818,56517, 33692, 55612, 32730, 22067, 65470, 50373, 58544, 38244,21356, 70010, 49500, 56970, 50040, 48606, 65609, 37288, 19547,32552, 71469, 69222, 36178, 44561, 40260, 44320, 28935, 57835,24374, 65163, 43465, 22097, 59672, 42933, 47359, 18186, 17173,66674, 55787, 29976, 40561, 36443, 68754, 48264, 31182, 70643,15752, 29759, 37604, 21019, 49529, 61506, 25802, 29858, 23015,27686, 65069, 62086, 33228, 16118, 71662, 70313, 24824, 24145,65096, 20493, 30484, 68996, 26227, 53008, 53758, 18948, 54496,64296, 50249, 35804, 44871, 39478, 63729, 19158, 21156, 64732,16314, 51031, 36171, 16193, 17655, 38842, 61288, 31747, 50995,63973, 52996, 26386, 57648, 26917, 59280, 30409, 27326, 48687,20302, 54604, 62031, 62863, 31196, 67807, 30862, 66646, 20763,65260, 66917, 67245, 26877, 24180, 70477, 46640, 36947, 16129,55475, 32569, 53886, 19898, 62866, 42115, 18904, 28941, 48321,28726, 19294, 17524, 30191, 29962, 64426, 60301, 71109, 70145,60671, 62912, 57491, 48347, 28355, 29315, 39817, 71448, 62550,59895, 17500, 21399, 52074, 32021, 54743, 67416, 27439, 18368,21339, 18891, 26910, 66961, 15866, 71688, 24802, 15530, 23647,44735, 72447, 64943, 67634, 67242, 61201, 36495, 42778, 66391,25980, 61012, 51792, 45485, 52052, 27935, 66677, 29556, 67718,63235, 66715, 39916, 54433, 63466, 61667, 21403, 53130, 45514,55541, 54951, 66835, 37705, 34943, 18583, 26945, 31816, 30104,52488, 46073, 39184, 26461, 64275, 60612, 27026, 37623, 22297,33671, 53580, 38553, 29536, 56143, 47368, 16612, 54661, 49403,70564, 30202, 56649, 26010, 65496, 63384, 17810, 64697, 48685,56686, 35658, 15539, 49614, 42165, 17433, 51415, 35637, 50719,47660, 15843, 45879, 41314, 39516, 61481, 68731, 29011, 51430,20347, 41176, 50809, 55824, 37399, 40692, 18155, 69199, 38232,32516, 57175, 38183, 21583, 66353, 18430, 16846, 61518, 70780,71784, 38712, 61313, 55800, 61001, 52706, 18203, 17225, 66550,34556, 25500, 65731, 15544, 69825, 68116, 34481, 60377, 29735,47846, 51439, 53054, 45308, 66654, 65698, 18421, 59846, 15493,53974, 41658, 30768, 23367, 15484, 28173, 18845, 15455, 42450,18834, 59814, 55643, 38475, 45623, 23382, 50896, 66593, 72178,29783, 39787, 46350, 42547, 65359, 62119, 53808, 45300, 48233,34077, 60663, 46497, 48174, 19764, 56893, 52080, 41104, 21126,56865, 39795])

Step 8. Add the column owners to cars

代码如下:

# 增加一列owners
cars['owners'] = my_owners
cars.tail()

输出结果如下:

mpg cylinders displacement horsepower weight acceleration model origin car owners
195 27.0 4 140 86 2790 15.6 82 1 ford mustang gl 52080
196 44.0 4 97 52 2130 24.6 82 2 vw pickup 41104
197 32.0 4 135 84 2295 11.6 82 1 dodge rampage 21126
198 28.0 4 120 79 2625 18.6 82 1 ford ranger 56865
199 31.0 4 119 82 2720 19.4 82 1 chevy s-10 39795

Exercise 2-Fictitious Names

Introduction:

This time you will create a data again

Special thanks to Chris Albon for sharing the dataset and materials.
All the credits to this exercise belongs to him.

In order to understand about it go here.

Step 1. Import the necessary libraries

代码如下:

import pandas as pd

Step 2. Create the 3 DataFrames based on the followin raw data

代码如下:

raw_data_1 = {'subject_id': ['1', '2', '3', '4', '5'],'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}raw_data_2 = {'subject_id': ['4', '5', '6', '7', '8'],'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}raw_data_3 = {'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}

Step 3. Assign each to a variable called data1, data2, data3

代码如下:

data1 = pd.DataFrame(raw_data_1, columns = ['subject_id', 'first_name', 'last_name'])
data2 = pd.DataFrame(raw_data_2, columns = ['subject_id', 'first_name', 'last_name'])
data3 = pd.DataFrame(raw_data_3, columns = ['subject_id', 'test_id'])
print(data1)
print(data2)
print(data3)

输出结果如下:

  subject_id first_name last_name
0          1       Alex  Anderson
1          2        Amy  Ackerman
2          3      Allen       Ali
3          4      Alice      Aoni
4          5     Ayoung   Atichessubject_id first_name last_name
0          4      Billy    Bonder
1          5      Brian     Black
2          6       Bran   Balwner
3          7      Bryce     Brice
4          8      Betty    Btisansubject_id  test_id
0          1       51
1          2       15
2          3       15
3          4       61
4          5       16
5          7       14
6          8       15
7          9        1
8         10       61
9         11       16

Step 4. Join the two dataframes along rows and assign all_data

代码如下:

# all_data = data1.append(data2)
# all_data = pd.merge(data1, data2, how='outer')
# 以上两种方法也都可
all_data = pd.concat([data1, data2])
all_data

输出结果如下:

subject_id first_name last_name
0 1 Alex Anderson
1 2 Amy Ackerman
2 3 Allen Ali
3 4 Alice Aoni
4 5 Ayoung Atiches
0 4 Billy Bonder
1 5 Brian Black
2 6 Bran Balwner
3 7 Bryce Brice
4 8 Betty Btisan

Step 5. Join the two dataframes along columns and assing to all_data_col

代码如下:

# 按列合并并赋值给all_data_col
all_data_col = pd.concat([data1, data2], axis=1)
all_data_col

输出结果如下:

subject_id first_name last_name subject_id first_name last_name
0 1 Alex Anderson 4 Billy Bonder
1 2 Amy Ackerman 5 Brian Black
2 3 Allen Ali 6 Bran Balwner
3 4 Alice Aoni 7 Bryce Brice
4 5 Ayoung Atiches 8 Betty Btisan

Step 6. Print data3

代码如下:

data3

输出结果如下:

subject_id test_id
0 1 51
1 2 15
2 3 15
3 4 61
4 5 16
5 7 14
6 8 15
7 9 1
8 10 61
9 11 16

Step 7. Merge all_data and data3 along the subject_id value

代码如下:

all_data_3 = pd.merge(all_data, data3, on='subject_id') # 默认how='inner'
all_data_3

输出结果如下:

subject_id first_name last_name test_id
0 1 Alex Anderson 51
1 2 Amy Ackerman 15
2 3 Allen Ali 15
3 4 Alice Aoni 61
4 4 Billy Bonder 61
5 5 Ayoung Atiches 16
6 5 Brian Black 16
7 7 Bryce Brice 14
8 8 Betty Btisan 15

Step 8. Merge only the data that has the same ‘subject_id’ on both data1 and data2

代码如下:

data = pd.merge(data1, data2, on='subject_id', how='inner')
data

输出结果如下:

subject_id first_name_x last_name_x first_name_y last_name_y
0 4 Alice Aoni Billy Bonder
1 5 Ayoung Atiches Brian Black

Step 9. Merge all values in data1 and data2, with matching records from both sides where available.

代码如下:

# 合并 data1 和 data2 中的所有值,并在可用的情况下使用双方的匹配记录。
pd.merge(data1, data2, on='subject_id', how='outer')  # 可通过suffixes=['_A', '_B']设置保证合并不重复

输出结果如下:

subject_id first_name_x last_name_x first_name_y last_name_y
0 1 Alex Anderson NaN NaN
1 2 Amy Ackerman NaN NaN
2 3 Allen Ali NaN NaN
3 4 Alice Aoni Billy Bonder
4 5 Ayoung Atiches Brian Black
5 6 NaN NaN Bran Balwner
6 7 NaN NaN Bryce Brice
7 8 NaN NaN Betty Btisan

Exercise 3-Housing Market

Introduction:

This time we will create our own dataset with fictional numbers to describe a house market. As we are going to create random data don’t try to reason of the numbers.

Step 1. Import the necessary libraries

代码如下:

import pandas as pd
import numpy as np

Step 2. Create 3 differents Series, each of length 100, as follows:

  1. The first a random number from 1 to 4
  2. The second a random number from 1 to 3
  3. The third a random number from 10,000 to 30,000

代码如下:

s1 = pd.Series(np.random.randint(1, 4, 100))
s2 = pd.Series(np.random.randint(1, 3, 100))
s3 = pd.Series(np.random.randint(10000, 30000, 100))
print(s1, s2, s3)

输出结果如下:

0     2
1     3
2     1
3     3
4     1
5     1
6     2
7     1
8     1
9     1
10    1
11    3
12    1
13    2
14    3
15    2
16    1
17    1
18    3
19    3
20    1
21    3
22    3
23    1
24    1
25    2
26    1
27    1
28    2
29    1..
70    1
71    1
72    3
73    2
74    2
75    1
76    2
77    1
78    3
79    2
80    3
81    3
82    3
83    2
84    1
85    3
86    2
87    1
88    3
89    3
90    1
91    3
92    2
93    3
94    1
95    2
96    3
97    2
98    3
99    1
Length: 100, dtype: int32 0     1
1     2
2     1
3     2
4     1
5     2
6     1
7     1
8     1
9     2
10    2
11    1
12    1
13    2
14    2
15    2
16    1
17    2
18    1
19    1
20    2
21    2
22    1
23    1
24    1
25    1
26    1
27    2
28    1
29    1..
70    2
71    2
72    1
73    1
74    1
75    1
76    2
77    2
78    2
79    2
80    1
81    2
82    1
83    2
84    1
85    1
86    2
87    2
88    1
89    2
90    1
91    2
92    1
93    1
94    1
95    2
96    1
97    1
98    2
99    2
Length: 100, dtype: int32 0     11973
1     10804
2     26866
3     25940
4     23147
5     14552
6     22151
7     19312
8     25373
9     29329
10    17069
11    19629
12    26174
13    20524
14    16489
15    22613
16    25266
17    11566
18    28599
19    27562
20    12922
21    29055
22    12709
23    21727
24    16735
25    20818
26    20199
27    21400
28    21602
29    16792...
70    10076
71    20091
72    28284
73    12185
74    15879
75    12907
76    24946
77    20168
78    24435
79    12175
80    18286
81    18001
82    10938
83    19116
84    12802
85    11623
86    15048
87    10624
88    18989
89    19797
90    17798
91    21317
92    27047
93    25692
94    27564
95    23411
96    18808
97    16854
98    21737
99    18968
Length: 100, dtype: int32

Step 3. Let’s create a DataFrame by joinning the Series by column

代码如下:

housemkt = pd.concat([s1, s2, s3], axis=1)
housemkt.head()

输出结果如下:

0 1 2
0 2 1 11973
1 3 2 10804
2 1 1 26866
3 3 2 25940
4 1 1 23147

Step 4. Change the name of the columns to bedrs, bathrs, price_sqr_meter

代码如下:

'''
rename函数主要用到的参数有:
columns:列名
index:行名
axis:指定坐标轴
inplace:是否替换,默认为False。inplace为False时返回修改后结果,变量自身不修改。inplace为True时返回None,变量自身被修改。
'''
housemkt.rename(columns={0: 'bedrs', 1: 'bathrs', 2: 'price_sqr_meter'}, inplace = True)
housemkt.head()

输出结果如下:

bedrs bathrs price_sqr_meter
0 2 1 11973
1 3 2 10804
2 1 1 26866
3 3 2 25940
4 1 1 23147

Step 5. Create a one column DataFrame with the values of the 3 Series and assign it to ‘bigcolumn’

代码如下:

bigcolumn = pd.concat([s1, s2, s3], axis=0)
bigcolumn = bigcolumn.to_frame()   # 可以将数组转换为DataFrame格式
print(type(bigcolumn))
bigcolumn

输出结果如下:

<class 'pandas.core.frame.DataFrame'>
0
0 2
1 3
2 1
3 3
4 1
5 1
6 2
7 1
8 1
9 1
10 1
11 3
12 1
13 2
14 3
15 2
16 1
17 1
18 3
19 3
20 1
21 3
22 3
23 1
24 1
25 2
26 1
27 1
28 2
29 1
... ...
70 10076
71 20091
72 28284
73 12185
74 15879
75 12907
76 24946
77 20168
78 24435
79 12175
80 18286
81 18001
82 10938
83 19116
84 12802
85 11623
86 15048
87 10624
88 18989
89 19797
90 17798
91 21317
92 27047
93 25692
94 27564
95 23411
96 18808
97 16854
98 21737
99 18968

300 rows × 1 columns

Step 6. Ops it seems it is going only until index 99. Is it true?

代码如下:

len(bigcolumn)

输出结果如下:

300

Step 7. Reindex the DataFrame so it goes from 0 to 299

代码如下:

# reset_index()函数,重置索引后,drop参数默认为False,想要删除原先的索引列要置为True.想要在原数据上修改要inplace=True.特别是不赋值的情况必须要加,否则drop无效
'''
set_index():
函数原型:DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
参数解释:
keys:列标签或列标签/数组列表,需要设置为索引的列
drop:默认为True,删除用作新索引的列
append:默认为False,是否将列附加到现有索引
inplace:默认为False,适当修改DataFrame(不要创建新对象)
verify_integrity:默认为false,检查新索引的副本。否则,请将检查推迟到必要时进行。将其设置为false将提高该方法的性能。reset_index():
函数原型:DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')
参数解释:
level:int、str、tuple或list,默认无,仅从索引中删除给定级别。默认情况下移除所有级别。控制了具体要还原的那个等级的索引
drop:drop为False则索引列会被还原为普通列,否则会丢失
inplace:默认为false,适当修改DataFrame(不要创建新对象)
col_level:int或str,默认值为0,如果列有多个级别,则确定将标签插入到哪个级别。默认情况下,它将插入到第一级。
col_fill:对象,默认‘’,如果列有多个级别,则确定其他级别的命名方式。如果没有,则重复索引名
注:reset_index还原分为两种类型,第一种是对原DataFrame进行reset,第二种是对使用过set_index()函数的DataFrame进行reset
'''
bigcolumn.reset_index(drop=True, inplace=True)
bigcolumn

输出结果如下:

0
0 2
1 3
2 1
3 3
4 1
5 1
6 2
7 1
8 1
9 1
10 1
11 3
12 1
13 2
14 3
15 2
16 1
17 1
18 3
19 3
20 1
21 3
22 3
23 1
24 1
25 2
26 1
27 1
28 2
29 1
... ...
270 10076
271 20091
272 28284
273 12185
274 15879
275 12907
276 24946
277 20168
278 24435
279 12175
280 18286
281 18001
282 10938
283 19116
284 12802
285 11623
286 15048
287 10624
288 18989
289 19797
290 17798
291 21317
292 27047
293 25692
294 27564
295 23411
296 18808
297 16854
298 21737
299 18968

300 rows × 1 columns

Conclusion

今天主要练习了合并函数的操作以及其他相关函数的使用。再次提醒本专栏pandas使用了anaconda—jupyter notebook。推荐使用,好用极了!

Python数据分析pandas入门练习题(七)相关推荐

  1. Python数据分析pandas入门练习题(四)

    Python数据分析基础 Preparation Exercise 1 - Filtering and Sorting Data Step 1. Import the necessary librar ...

  2. Python数据分析pandas入门(一)------十分钟入门pandas

    Python数据分析基础 一.导入常用库 二.创建对象 三.查看数据 四.选取 五.通过标签选取 六.通过位置选取 七.布尔索引 八.赋值 九.缺失值处理 十.运算与统计 十一.Apply函数的作用 ...

  3. python数据分析df_Python数据分析pandas入门!(附数据分析资料)

    Python数据分析pandas入门!(附数据分析资料) 1.pandas数据结构之DataFrame+ 这是小编准备的python数据分析资料!进群:700341555即可获取! Python数据分 ...

  4. python数据分析从入门到精通电子工业出版社_荐书丨Python数据分析从入门到精通...

    点击上方"程序人生",选择"置顶公众号" 第一时间关注程序猿(媛)身边的故事 采用Python 3.6版本,兼容Python 3.X等众多版本 一本书搞定IPy ...

  5. Python数据分析pandas之分组统计透视表

    Python数据分析pandas之分组统计透视表 数据聚合统计 Padans里的聚合统计即是应用分组的方法对数据框进行聚合统计,常见的有min(最小).max(最大).avg(平均值).sum(求和) ...

  6. Python数据分析pandas之数据拼接与连接

    Python数据分析pandas之数据拼接与连接 数据拼接处理 数据拼接处理指的是numpy.pandas里对数据的拼接.连接.合并等多种方法的概称.有时我们处理的数据会分很多步骤,而中间或者最终的结 ...

  7. Python数据分析技术入门

    Python数据分析技术入门 数据分析入门指南 一.前言 二.Python基础知识 1. Python环境配置 2. Python基础语法 3. Python常用库的导入和安装 三.数据处理基础 1. ...

  8. Python数据分析——Pandas基础:dt.datetime与pivot_table()数据透视表

    系列文章目录 Chapter 1:创建与探索DF.排序.子集化:Python数据分析--Pandas基础入门+代码(一) Chapter 2:聚合函数,groupby,统计分析:Python数据分析- ...

  9. Python数据分析pandas之dataframe初识

    Python数据分析pandas之dataframe初识 声明与简介 pandas是一个基于python的.快速的.高效.灵活.易用的开源的数据处理.分析包(工具)..pandas构建在numpy之上 ...

  10. Python数据分析pandas之series初识

    Python数据分析pandas之series初识 声明与简介 pandas是一个基于python的.快速的.高效.灵活.易用的开源的数据处理.分析包(工具)..pandas构建在numpy之上,它通 ...

最新文章

  1. Myeclipse启动不了的解决方法
  2. c++ django上传图片
  3. 如何防止话筒拾音的声学相位抵消?
  4. 国家“十三五”重点出版规划获批
  5. java 1.8签名apk_给Android的APK程序签名和重新签名的方法
  6. 前端学习(2658):vue3 computed
  7. oracle数据块调用存储过程,VC调用存储过程的通用方法(ORACLE篇)
  8. 修改marathon源码后,如何编译,部署到集群中?
  9. java 淘口令_简单实现淘口令
  10. Linux环境下向github上传代码(生成token、生成本地密钥)
  11. 让生活服务“说到做到”,美团语音应用平台的底气在哪里?
  12. 泡泡龙游戏c语言程序,《泡泡龙》发射技巧总结_图文攻略_高分攻略_百度攻略...
  13. springcloud微服务学习笔记(五十一):Config配置总控中心搭建
  14. 数据中心机房与机柜理线方法介绍
  15. 谷爱凌母亲 24 年前重磅采访:远见卓识的人,可以改变世界
  16. nginx做反向代理和后端web服务器之间的交互
  17. 7-12 打印倒直角三角形图案
  18. JSP+Struct+MySql基于BBS管理系统设计与实现(源代码+论文+中英资料+开题报告+答辩PPT)
  19. 方格走法-牛客网(排列组合和动态规划)
  20. html网页添加友链,如何为typecho添加独立友链页面

热门文章

  1. 服务器winsxs文件夹怎么清理工具,如何清理Win7系统winsxs文件夹中的垃圾?
  2. 计算机网络:运输层(流量控制,拥塞控制,连接管理)
  3. 计算机上的del键功能是什么,计算机上的DEL是什么?
  4. android 得到屏幕尺寸 状态栏尺寸 标题栏尺寸
  5. 读易[12]·如何做好职业规划(乾卦)
  6. 转载Faster-rcnn理解
  7. 常用嵌套sql语句查询
  8. linux grep,sed和awk常用操作
  9. ARM中的浮点运算测试
  10. 【Python CI】圈复杂度 lizard