数据分析task01(2021.06.15)
1 第一章:数据载入及初步观察
1.1 载入数据
1.1.1 任务一:导入numpy和pandas
import numpy as np
import pandas as pd
1.1.2 任务二:载入数据
(1) 使用相对路径载入数据
df = pd.read_csv("train.csv")
(2) 使用绝对路径载入数据
df = pd.read_csv(r"C:\Users\Administrator\Desktop\数据分析/train.csv")
【提示】相对路径载入报错时,尝试使用os.getcwd()查看当前工作目录。
【思考】pd.read_csv()和pd.read_table()的不同
df = pd.read_csv("train.csv")
print(df.values)
df.values.shape
[[1 0 3 ... 7.25 nan 'S'][2 1 1 ... 71.2833 'C85' 'C'][3 1 3 ... 7.925 nan 'S']...[889 0 3 ... 23.45 nan 'S'][890 1 1 ... 30.0 'C148' 'C'][891 0 3 ... 7.75 nan 'Q']]
(891, 12)
df = pd.read_table("train.csv")
df.values.shape
(891, 1)
如上所示,首先两者的默认分隔符不同其次两者分割的值和方向不同.通过上述例子可以看到read_csv读取时每一个字符串都作为一列,而read_table读取时把整体字符串作为一列
【总结】加载的数据是所有工作的第一步,我们的工作会接触到不同的数据格式(eg:.csv;.tsv;.xlsx),但是加载的方法和思路都是一样的
1.1.3 任务三:每1000行为一个数据模块,逐块读取
df = pd.read_csv("train.csv", chunksize=1000)
df = pd.read_csv("train.csv", chunksize=500)
for temp in df:print(temp)
PassengerId Survived Pclass \
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
5 6 0 3
6 7 0 1
7 8 0 3
8 9 1 3
9 10 1 2
10 11 1 3
11 12 1 1
12 13 0 3
13 14 0 3
14 15 0 3
15 16 1 2
16 17 0 3
17 18 1 2
18 19 0 3
19 20 1 3
20 21 0 2
21 22 1 2
22 23 1 3
23 24 1 1
24 25 0 3
25 26 1 3
26 27 0 3
27 28 0 1
28 29 1 3
29 30 0 3
.. ... ... ...
470 471 0 3
471 472 0 3
472 473 1 2
473 474 1 2
474 475 0 3
475 476 0 1
476 477 0 2
477 478 0 3
478 479 0 3
479 480 1 3
480 481 0 3
481 482 0 2
482 483 0 3
483 484 1 3
484 485 1 1
485 486 0 3
486 487 1 1
487 488 0 1
488 489 0 3
489 490 1 3
490 491 0 3
491 492 0 3
492 493 0 1
493 494 0 1
494 495 0 3
495 496 0 3
496 497 1 1
497 498 0 3
498 499 0 1
499 500 0 3 Name Sex Age SibSp \
0 Braund, Mr. Owen Harris male 22.0 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0
5 Moran, Mr. James male NaN 0
6 McCarthy, Mr. Timothy J male 54.0 0
7 Palsson, Master. Gosta Leonard male 2.0 3
8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0
9 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1
10 Sandstrom, Miss. Marguerite Rut female 4.0 1
11 Bonnell, Miss. Elizabeth female 58.0 0
12 Saundercock, Mr. William Henry male 20.0 0
13 Andersson, Mr. Anders Johan male 39.0 1
14 Vestrom, Miss. Hulda Amanda Adolfina female 14.0 0
15 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0
16 Rice, Master. Eugene male 2.0 4
17 Williams, Mr. Charles Eugene male NaN 0
18 Vander Planke, Mrs. Julius (Emelia Maria Vande... female 31.0 1
19 Masselmani, Mrs. Fatima female NaN 0
20 Fynney, Mr. Joseph J male 35.0 0
21 Beesley, Mr. Lawrence male 34.0 0
22 McGowan, Miss. Anna "Annie" female 15.0 0
23 Sloper, Mr. William Thompson male 28.0 0
24 Palsson, Miss. Torborg Danira female 8.0 3
25 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... female 38.0 1
26 Emir, Mr. Farred Chehab male NaN 0
27 Fortune, Mr. Charles Alexander male 19.0 3
28 O'Dwyer, Miss. Ellen "Nellie" female NaN 0
29 Todoroff, Mr. Lalio male NaN 0
.. ... ... ... ...
470 Keefe, Mr. Arthur male NaN 0
471 Cacic, Mr. Luka male 38.0 0
472 West, Mrs. Edwy Arthur (Ada Mary Worth) female 33.0 1
473 Jerwan, Mrs. Amin S (Marie Marthe Thuillard) female 23.0 0
474 Strandberg, Miss. Ida Sofia female 22.0 0
475 Clifford, Mr. George Quincy male NaN 0
476 Renouf, Mr. Peter Henry male 34.0 1
477 Braund, Mr. Lewis Richard male 29.0 1
478 Karlsson, Mr. Nils August male 22.0 0
479 Hirvonen, Miss. Hildur E female 2.0 0
480 Goodwin, Master. Harold Victor male 9.0 5
481 Frost, Mr. Anthony Wood "Archie" male NaN 0
482 Rouse, Mr. Richard Henry male 50.0 0
483 Turkula, Mrs. (Hedwig) female 63.0 0
484 Bishop, Mr. Dickinson H male 25.0 1
485 Lefebre, Miss. Jeannie female NaN 3
486 Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby) female 35.0 1
487 Kent, Mr. Edward Austin male 58.0 0
488 Somerton, Mr. Francis William male 30.0 0
489 Coutts, Master. Eden Leslie "Neville" male 9.0 1
490 Hagland, Mr. Konrad Mathias Reiersen male NaN 1
491 Windelov, Mr. Einar male 21.0 0
492 Molson, Mr. Harry Markland male 55.0 0
493 Artagaveytia, Mr. Ramon male 71.0 0
494 Stanley, Mr. Edward Roland male 21.0 0
495 Yousseff, Mr. Gerious male NaN 0
496 Eustis, Miss. Elizabeth Mussey female 54.0 1
497 Shellard, Mr. Frederick William male NaN 0
498 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25.0 1
499 Svensson, Mr. Olof male 24.0 0 Parch Ticket Fare Cabin Embarked
0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
5 0 330877 8.4583 NaN Q
6 0 17463 51.8625 E46 S
7 1 349909 21.0750 NaN S
8 2 347742 11.1333 NaN S
9 0 237736 30.0708 NaN C
10 1 PP 9549 16.7000 G6 S
11 0 113783 26.5500 C103 S
12 0 A/5. 2151 8.0500 NaN S
13 5 347082 31.2750 NaN S
14 0 350406 7.8542 NaN S
15 0 248706 16.0000 NaN S
16 1 382652 29.1250 NaN Q
17 0 244373 13.0000 NaN S
18 0 345763 18.0000 NaN S
19 0 2649 7.2250 NaN C
20 0 239865 26.0000 NaN S
21 0 248698 13.0000 D56 S
22 0 330923 8.0292 NaN Q
23 0 113788 35.5000 A6 S
24 1 349909 21.0750 NaN S
25 5 347077 31.3875 NaN S
26 0 2631 7.2250 NaN C
27 2 19950 263.0000 C23 C25 C27 S
28 0 330959 7.8792 NaN Q
29 0 349216 7.8958 NaN S
.. ... ... ... ... ...
470 0 323592 7.2500 NaN S
471 0 315089 8.6625 NaN S
472 2 C.A. 34651 27.7500 NaN S
473 0 SC/AH Basle 541 13.7917 D C
474 0 7553 9.8375 NaN S
475 0 110465 52.0000 A14 S
476 0 31027 21.0000 NaN S
477 0 3460 7.0458 NaN S
478 0 350060 7.5208 NaN S
479 1 3101298 12.2875 NaN S
480 2 CA 2144 46.9000 NaN S
481 0 239854 0.0000 NaN S
482 0 A/5 3594 8.0500 NaN S
483 0 4134 9.5875 NaN S
484 0 11967 91.0792 B49 C
485 1 4133 25.4667 NaN S
486 0 19943 90.0000 C93 S
487 0 11771 29.7000 B37 C
488 0 A.5. 18509 8.0500 NaN S
489 1 C.A. 37671 15.9000 NaN S
490 0 65304 19.9667 NaN S
491 0 SOTON/OQ 3101317 7.2500 NaN S
492 0 113787 30.5000 C30 S
493 0 PC 17609 49.5042 NaN C
494 0 A/4 45380 8.0500 NaN S
495 0 2627 14.4583 NaN C
496 0 36947 78.2667 D20 C
497 0 C.A. 6212 15.1000 NaN S
498 2 113781 151.5500 C22 C26 S
499 0 350035 7.7958 NaN S [500 rows x 12 columns]PassengerId Survived Pclass \
500 501 0 3
501 502 0 3
502 503 0 3
503 504 0 3
504 505 1 1
505 506 0 1
506 507 1 2
507 508 1 1
508 509 0 3
509 510 1 3
510 511 1 3
511 512 0 3
512 513 1 1
513 514 1 1
514 515 0 3
515 516 0 1
516 517 1 2
517 518 0 3
518 519 1 2
519 520 0 3
520 521 1 1
521 522 0 3
522 523 0 3
523 524 1 1
524 525 0 3
525 526 0 3
526 527 1 2
527 528 0 1
528 529 0 3
529 530 0 2
.. ... ... ...
861 862 0 2
862 863 1 1
863 864 0 3
864 865 0 2
865 866 1 2
866 867 1 2
867 868 0 1
868 869 0 3
869 870 1 3
870 871 0 3
871 872 1 1
872 873 0 1
873 874 0 3
874 875 1 2
875 876 1 3
876 877 0 3
877 878 0 3
878 879 0 3
879 880 1 1
880 881 1 2
881 882 0 3
882 883 0 3
883 884 0 2
884 885 0 3
885 886 0 3
886 887 0 2
887 888 1 1
888 889 0 3
889 890 1 1
890 891 0 3 Name Sex Age SibSp \
500 Calic, Mr. Petar male 17.0 0
501 Canavan, Miss. Mary female 21.0 0
502 O'Sullivan, Miss. Bridget Mary female NaN 0
503 Laitinen, Miss. Kristina Sofia female 37.0 0
504 Maioni, Miss. Roberta female 16.0 0
505 Penasco y Castellana, Mr. Victor de Satode male 18.0 1
506 Quick, Mrs. Frederick Charles (Jane Richards) female 33.0 0
507 Bradley, Mr. George ("George Arthur Brayton") male NaN 0
508 Olsen, Mr. Henry Margido male 28.0 0
509 Lang, Mr. Fang male 26.0 0
510 Daly, Mr. Eugene Patrick male 29.0 0
511 Webber, Mr. James male NaN 0
512 McGough, Mr. James Robert male 36.0 0
513 Rothschild, Mrs. Martin (Elizabeth L. Barrett) female 54.0 1
514 Coleff, Mr. Satio male 24.0 0
515 Walker, Mr. William Anderson male 47.0 0
516 Lemore, Mrs. (Amelia Milley) female 34.0 0
517 Ryan, Mr. Patrick male NaN 0
518 Angle, Mrs. William A (Florence "Mary" Agnes H... female 36.0 1
519 Pavlovic, Mr. Stefo male 32.0 0
520 Perreault, Miss. Anne female 30.0 0
521 Vovk, Mr. Janko male 22.0 0
522 Lahoud, Mr. Sarkis male NaN 0
523 Hippach, Mrs. Louis Albert (Ida Sophia Fischer) female 44.0 0
524 Kassem, Mr. Fared male NaN 0
525 Farrell, Mr. James male 40.5 0
526 Ridsdale, Miss. Lucy female 50.0 0
527 Farthing, Mr. John male NaN 0
528 Salonen, Mr. Johan Werner male 39.0 0
529 Hocking, Mr. Richard George male 23.0 2
.. ... ... ... ...
861 Giles, Mr. Frederick Edward male 21.0 1
862 Swift, Mrs. Frederick Joel (Margaret Welles Ba... female 48.0 0
863 Sage, Miss. Dorothy Edith "Dolly" female NaN 8
864 Gill, Mr. John William male 24.0 0
865 Bystrom, Mrs. (Karolina) female 42.0 0
866 Duran y More, Miss. Asuncion female 27.0 1
867 Roebling, Mr. Washington Augustus II male 31.0 0
868 van Melkebeke, Mr. Philemon male NaN 0
869 Johnson, Master. Harold Theodor male 4.0 1
870 Balkic, Mr. Cerin male 26.0 0
871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47.0 1
872 Carlsson, Mr. Frans Olof male 33.0 0
873 Vander Cruyssen, Mr. Victor male 47.0 0
874 Abelson, Mrs. Samuel (Hannah Wizosky) female 28.0 1
875 Najib, Miss. Adele Kiamie "Jane" female 15.0 0
876 Gustafsson, Mr. Alfred Ossian male 20.0 0
877 Petroff, Mr. Nedelio male 19.0 0
878 Laleff, Mr. Kristo male NaN 0
879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0
880 Shelley, Mrs. William (Imanita Parrish Hall) female 25.0 0
881 Markun, Mr. Johann male 33.0 0
882 Dahlberg, Miss. Gerda Ulrika female 22.0 0
883 Banfield, Mr. Frederick James male 28.0 0
884 Sutehall, Mr. Henry Jr male 25.0 0
885 Rice, Mrs. William (Margaret Norton) female 39.0 0
886 Montvila, Rev. Juozas male 27.0 0
887 Graham, Miss. Margaret Edith female 19.0 0
888 Johnston, Miss. Catherine Helen "Carrie" female NaN 1
889 Behr, Mr. Karl Howell male 26.0 0
890 Dooley, Mr. Patrick male 32.0 0 Parch Ticket Fare Cabin Embarked
500 0 315086 8.6625 NaN S
501 0 364846 7.7500 NaN Q
502 0 330909 7.6292 NaN Q
503 0 4135 9.5875 NaN S
504 0 110152 86.5000 B79 S
505 0 PC 17758 108.9000 C65 C
506 2 26360 26.0000 NaN S
507 0 111427 26.5500 NaN S
508 0 C 4001 22.5250 NaN S
509 0 1601 56.4958 NaN S
510 0 382651 7.7500 NaN Q
511 0 SOTON/OQ 3101316 8.0500 NaN S
512 0 PC 17473 26.2875 E25 S
513 0 PC 17603 59.4000 NaN C
514 0 349209 7.4958 NaN S
515 0 36967 34.0208 D46 S
516 0 C.A. 34260 10.5000 F33 S
517 0 371110 24.1500 NaN Q
518 0 226875 26.0000 NaN S
519 0 349242 7.8958 NaN S
520 0 12749 93.5000 B73 S
521 0 349252 7.8958 NaN S
522 0 2624 7.2250 NaN C
523 1 111361 57.9792 B18 C
524 0 2700 7.2292 NaN C
525 0 367232 7.7500 NaN Q
526 0 W./C. 14258 10.5000 NaN S
527 0 PC 17483 221.7792 C95 S
528 0 3101296 7.9250 NaN S
529 1 29104 11.5000 NaN S
.. ... ... ... ... ...
861 0 28134 11.5000 NaN S
862 0 17466 25.9292 D17 S
863 2 CA. 2343 69.5500 NaN S
864 0 233866 13.0000 NaN S
865 0 236852 13.0000 NaN S
866 0 SC/PARIS 2149 13.8583 NaN C
867 0 PC 17590 50.4958 A24 S
868 0 345777 9.5000 NaN S
869 1 347742 11.1333 NaN S
870 0 349248 7.8958 NaN S
871 1 11751 52.5542 D35 S
872 0 695 5.0000 B51 B53 B55 S
873 0 345765 9.0000 NaN S
874 0 P/PP 3381 24.0000 NaN C
875 0 2667 7.2250 NaN C
876 0 7534 9.8458 NaN S
877 0 349212 7.8958 NaN S
878 0 349217 7.8958 NaN S
879 1 11767 83.1583 C50 C
880 1 230433 26.0000 NaN S
881 0 349257 7.8958 NaN S
882 0 7552 10.5167 NaN S
883 0 C.A./SOTON 34068 10.5000 NaN S
884 0 SOTON/OQ 392076 7.0500 NaN S
885 5 382652 29.1250 NaN Q
886 0 211536 13.0000 NaN S
887 0 112053 30.0000 B42 S
888 2 W./C. 6607 23.4500 NaN S
889 0 111369 30.0000 C148 C
890 0 370376 7.7500 NaN Q [391 rows x 12 columns]
【思考】什么是逐块读取?为什么要逐块读取呢?
通过将数据集划分,按块读取数据集
read_csv中的chunksize参数设置分块大小,返回的是可迭代对象
逐块读取原因:
1.数据集较大,完全读取不易看到样貌
2.读取时间消耗大,占用内存大
3.简单读取遇到MemoryError
1.1.4 任务四:将表头改成中文,索引改为乘客ID [对于某些英文资料,我们可以通过翻译来更直观的熟悉我们的数据]
PassengerId => 乘客ID
Survived => 是否幸存
Pclass => 乘客等级(1/2/3等舱位)
Name => 乘客姓名
Sex => 性别
Age => 年龄
SibSp => 堂兄弟/妹个数
Parch => 父母与小孩个数
Ticket => 船票信息
Fare => 票价
Cabin => 客舱
Embarked => 登船港口
titles = ["乘客ID", "是否幸存", "乘客等级(1/2/3等舱位)", "乘客姓名", "性别", "年龄", "堂兄弟/妹个数", "父母与小孩个数", "船票信息", "票价", "客舱", "登船港口"]
index_name = "乘客ID"
df = pd.read_csv("train.csv")
df.columns = titles
df = df.set_index("乘客ID")
df.head()
1.2 初步观察
1.2.1 任务一:查看数据的基本信息
df.info(): # 打印摘要
df.describe(): # 描述性统计信息
df.values: # 数据
df.shape: # 形状 (行数, 列数)
df.columns: # 列标签
df.columns.values: # 列标签
df.index: # 行标签
df.index.values: # 行标签
df.head(n): # 前n行
df.tail(n): # 尾n行
pd.options.display.max_columns=n: # 最多显示n列
pd.options.display.max_rows=n: # 最多显示n行
df.memory_usage(): # 占用内存(字节B)
1.2.2 任务二:观察表格前10行的数据和后15行的数据
#前十行的乘客
df.head(10)
#后15行的乘客
df.tail(15)
1.2.4 任务三:判断数据是否为空,为空的地方返回True,其余地方返回False
df.isna()
df.isna().sum()
df[df.notna().all(1)]
是否幸存 乘客等级(1/2/3等舱位) 乘客姓名 性别 年龄 堂兄弟/妹个数 父母与小孩个数 船票信息 票价 客舱 登船港口
乘客ID
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
7 0 1 McCarthy, Mr. Timothy J male 54.0 0 0 17463 51.8625 E46 S
11 1 3 Sandstrom, Miss. Marguerite Rut female 4.0 1 1 PP 9549 16.7000 G6 S
12 1 1 Bonnell, Miss. Elizabeth female 58.0 0 0 113783 26.5500 C103 S
22 1 2 Beesley, Mr. Lawrence male 34.0 0 0 248698 13.0000 D56 S
24 1 1 Sloper, Mr. William Thompson male 28.0 0 0 113788 35.5000 A6 S
28 0 1 Fortune, Mr. Charles Alexander male 19.0 3 2 19950 263.0000 C23 C25 C27 S
53 1 1 Harper, Mrs. Henry Sleeper (Myna Haxtun) female 49.0 1 0 PC 17572 76.7292 D33 C
55 0 1 Ostby, Mr. Engelhart Cornelius male 65.0 0 1 113509 61.9792 B30 C
63 0 1 Harris, Mr. Henry Birkhardt male 45.0 1 0 36973 83.4750 C83 S
67 1 2 Nye, Mrs. (Elizabeth Ramell) female 29.0 0 0 C.A. 29395 10.5000 F33 S
76 0 3 Moen, Mr. Sigurd Hansen male 25.0 0 0 348123 7.6500 F G73 S
89 1 1 Fortune, Miss. Mabel Helen female 23.0 3 2 19950 263.0000 C23 C25 C27 S
93 0 1 Chaffee, Mr. Herbert Fuller male 46.0 1 0 W.E.P. 5734 61.1750 E31 S
97 0 1 Goldschmidt, Mr. George B male 71.0 0 0 PC 17754 34.6542 A5 C
98 1 1 Greenfield, Mr. William Bertram male 23.0 0 1 PC 17759 63.3583 D10 D12 C
103 0 1 White, Mr. Richard Frasar male 21.0 0 1 35281 77.2875 D26 S
111 0 1 Porter, Mr. Walter Chamberlain male 47.0 0 0 110465 52.0000 C110 S
119 0 1 Baxter, Mr. Quigg Edmond male 24.0 0 1 PC 17558 247.5208 B58 B60 C
124 1 2 Webber, Miss. Susan female 32.5 0 0 27267 13.0000 E101 S
125 0 1 White, Mr. Percival Wayland male 54.0 0 1 35281 77.2875 D26 S
137 1 1 Newsom, Miss. Helen Monypeny female 19.0 0 2 11752 26.2833 D47 S
138 0 1 Futrelle, Mr. Jacques Heath male 37.0 1 0 113803 53.1000 C123 S
140 0 1 Giglio, Mr. Victor male 24.0 0 0 PC 17593 79.2000 B86 C
149 0 2 Navratil, Mr. Michel ("Louis M Hoffman") male 36.5 0 2 230080 26.0000 F2 S
152 1 1 Pears, Mrs. Thomas (Edith Wearne) female 22.0 1 0 113776 66.6000 C2 S
171 0 1 Van der hoef, Mr. Wyckoff male 61.0 0 0 111240 33.5000 B19 S
175 0 1 Smith, Mr. James Clinch male 56.0 0 0 17764 30.6958 A7 C
178 0 1 Isham, Miss. Ann Elizabeth female 50.0 0 0 PC 17595 28.7125 C49 C
... ... ... ... ... ... ... ... ... ... ... ...
738 1 1 Lesurer, Mr. Gustave J male 35.0 0 0 PC 17755 512.3292 B101 C
742 0 1 Cavendish, Mr. Tyrell William male 36.0 1 0 19877 78.8500 C46 S
743 1 1 Ryerson, Miss. Susan Parker "Suzette" female 21.0 2 2 PC 17608 262.3750 B57 B59 B63 B66 C
746 0 1 Crosby, Capt. Edward Gifford male 70.0 1 1 WE/P 5735 71.0000 B22 S
749 0 1 Marvin, Mr. Daniel Warner male 19.0 1 0 113773 53.1000 D30 S
752 1 3 Moor, Master. Meier male 6.0 0 1 392096 12.4750 E121 S
760 1 1 Rothes, the Countess. of (Lucy Noel Martha Dye... female 33.0 0 0 110152 86.5000 B77 S
764 1 1 Carter, Mrs. William Ernest (Lucile Polk) female 36.0 1 2 113760 120.0000 B96 B98 S
766 1 1 Hogeboom, Mrs. John C (Anna Andrews) female 51.0 1 0 13502 77.9583 D11 S
773 0 2 Mack, Mrs. (Mary) female 57.0 0 0 S.O./P.P. 3 10.5000 E77 S
780 1 1 Robert, Mrs. Edward Scott (Elisabeth Walton Mc... female 43.0 0 1 24160 211.3375 B3 S
782 1 1 Dick, Mrs. Albert Adrian (Vera Gillespie) female 17.0 1 0 17474 57.0000 B20 S
783 0 1 Long, Mr. Milton Clyde male 29.0 0 0 113501 30.0000 D6 S
790 0 1 Guggenheim, Mr. Benjamin male 46.0 0 0 PC 17593 79.2000 B82 B84 C
797 1 1 Leader, Dr. Alice (Farnham) female 49.0 0 0 17465 25.9292 D17 S
803 1 1 Carter, Master. William Thornton II male 11.0 1 2 113760 120.0000 B96 B98 S
807 0 1 Andrews, Mr. Thomas Jr male 39.0 0 0 112050 0.0000 A36 S
810 1 1 Chambers, Mrs. Norman Campbell (Bertha Griggs) female 33.0 1 0 113806 53.1000 E8 S
821 1 1 Hays, Mrs. Charles Melville (Clara Jennings Gr... female 52.0 1 1 12749 93.5000 B69 S
824 1 3 Moor, Mrs. (Beila) female 27.0 0 1 392096 12.4750 E121 S
836 1 1 Compton, Miss. Sara Rebecca female 39.0 1 1 PC 17756 83.1583 E49 C
854 1 1 Lines, Miss. Mary Conover female 16.0 0 1 PC 17592 39.4000 D28 S
858 1 1 Daly, Mr. Peter Denis male 51.0 0 0 113055 26.5500 E17 S
863 1 1 Swift, Mrs. Frederick Joel (Margaret Welles Ba... female 48.0 0 0 17466 25.9292 D17 S
868 0 1 Roebling, Mr. Washington Augustus II male 31.0 0 0 PC 17590 50.4958 A24 S
872 1 1 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47.0 1 1 11751 52.5542 D35 S
873 0 1 Carlsson, Mr. Frans Olof male 33.0 0 0 695 5.0000 B51 B53 B55 S
880 1 1 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 1 11767 83.1583 C50 C
888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
183 rows × 11 columns
1.3 保存数据
1.3.1 任务一:将你加载并做出改变的数据,在工作目录下保存为一个新文件train_chinese.csv
df.to_csv("train_chinese.csv" "encoding=‘utf-8")
注意:不同的操作系统保存下来可能会有乱码。大家可以加入"encoding=“GBK” 或者 "encoding = “uft-8"”
数据分析task01(2021.06.15)相关推荐
- 《惢客创业日记》2021.06.15(周二)凉粉儿的防骗三板斧
今天,凉粉儿给我打了个电话,告诉我一个消息,说是有一个投资机构对我们惢客项目感兴趣,想聊聊.说实话,有了上一次"深圳行"的经历,我和凉粉儿都怕了,为此,我还在2021年3月2日写下 ...
- Python 最近两条好消息:①TIOBE排名超过C和Java②新版本发布3.10.0,还有今天刚发布的《What’s New in Python(2021.10.15)》
来自TIOBE的最新10月份统计数据显示,Python首次超越Java.JavaScript.C语言等,成为最受欢迎的编程语言.TIOBE过去20年一直在追踪编程语言的受欢迎程度,其数据来自于对25个 ...
- 2021.06.03邮票面值设计
2021.06.03邮票面值设计 题目描述 给定一个信封,最多只允许粘贴 N 张邮票,计算在给定 K(N+K≤15)种邮票的情况下(假定所有的邮票数量都足够),如何设计邮票的面值,能得到最大值 MAX ...
- 2021.06.02税收和补贴问题
2021.06.02税收和补贴问题 (题目来源:洛谷-P1023) 题目描述 每样商品的价格越低,其销量就会相应增大.现已知某种商品的成本及其在若干价位上的销量(产品不会低于成本销售),并假设相邻价位 ...
- 【跃迁之路】【495天】程序员高效学习方法论探索系列(实验阶段252-2018.06.15)...
@(跃迁之路)专栏 实验说明 从2017.10.6起,开启这个系列,目标只有一个:探索新的学习方法,实现跃迁式成长 实验期2年(2017.10.06 - 2019.10.06) 我将以自己为实验对象. ...
- 电动力学每日一题 2021/10/15 Fourier变换法计算均匀电流密度产生的磁场
电动力学每日一题 2021/10/15 Fourier变换法计算均匀电流密度产生的磁场 无限长均匀电流 无限长圆柱面均匀电流密度 无限长均匀电流 假设z轴上有一根非常细的电线,携带均匀电流I0I_0I ...
- ALLyeSNO 优化版浩方 第二版 Ver 2007 06 15 清除广告 自动挤房间
分享一下我老师大神的人工智能教程.零基础!通俗易懂!风趣幽默!还带黄段子!希望你也加入到我们人工智能的队伍中来!https://blog.csdn.net/jiangjunshow allyesno ...
- NGS数据分析实践:06. 数据预处理 - 序列比对+PCR重复标记+Indel区域重比对+碱基质量重校正
NGS数据分析实践:06. 数据预处理 - 序列比对+PCR重复标记+Indel区域重比对+碱基质量重校正 1. 序列比对 1.1 参考基因组建索引 1.2 序列比对 2. 排序 3. PCR重复标记 ...
- 2021.1.15——星露谷作物计算器的小改进
2021.1.15--星露谷作物计算器的小改进 前言 目标 excel表格 代码 总结 前言 2021.1.13做的星露谷作物计算器,初步只完成了对excel表内数据和图表的生成,交互也只是input ...
最新文章
- 你是否对它有一种责任感
- 深度学习实现缺陷检测
- Windows 10全新分支版本曝光!专门优化高配置PC
- 游戏开发基础:A*算法(转)
- 74HC595驱动程序
- 细数:数据中心机房对环境的严格要求有哪些?
- 2019年平面设计趋势
- 接口使用jwt返回token_JWT实现token验证
- 区间调度之区间交集问题
- 多命令顺序执行,单引号,双引号,反引号,转义符
- 【翻译】C#表达式中的动态查询
- QImage与Mat之间的相互转换
- C库函数与Linux系统函数之间的关系
- Sublime Text 3无法安装Package Control插件的解决
- 一种单片机支持WiFi的应用——SimpleWiFi在单片机中的应用
- Golang | Go 语言 编程练习 100题
- Ubuntu 关闭服务详解
- 【华为云】python调用华为云API,获取token值
- c语言 打印奇数魔法阵,[luogu2119]魔法阵 NOIP2016T4
- AMA回顾|走进“元宇宙工厂”BreederDAO
热门文章
- 【C语言/C++程序员编程】一小时做出来的数字雨(一颗开花的树)!
- electron-builder打包过程中报错——网络下载篇
- 中国魔笛痛改前非做好准备 国足不能失去传奇大师
- 微软行星云计算Planetary Computer——从 STAC API 读取数据
- 不同网络环境下监控视频统一汇聚集中管理方案介绍
- 参加过知了堂成都Java培训后,需要多久能达到年薪十万?
- S5PV210Kernel移植6之什么是进程,线程?
- 南京邮电大学计算机专业录取分数线2019,南京邮电大学录取分数线
- 与鲨共舞:当AI遇见海洋杀手
- Request(HTTP请求对象)的笔记和底层原理