• 问题描述

比赛地址
kaggle泰坦尼克号比赛说明

泰坦尼克号的沉没是历史上最著名的沉船之一。1912年4月15日,在她的首航中,泰坦尼克号在与冰山相撞后沉没,在2224名乘客和机组人员中造成1502人死亡。这场耸人听闻的悲剧震惊了国际社会,并促进了更严格的船舶安全规定产生。

造成海难失事的原因之一是乘客和机组人员没有足够的救生艇。尽管幸存下沉有一些运气因素,但有些人比其他人更容易生存,比如女人,孩子和上流社会。

在这个挑战中,我们要求您完成对哪些人可能存活的分析。特别是,我们要求您运用机器学习工具来预测哪些乘客在悲剧中幸存下来。

  • 文件代码
import os
import pandas as pd
import numpy as np
import warnings
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import seaborn as sns
from sklearn import preprocessing
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.ensemble import GradientBoostingClassifier
from sklearn import model_selection
excl = lambda x: os.popen(x).readlines()
%matplotlib inline
warnings.filterwarnings('ignore')
train = pd.read_csv('./titanic_datas/train.csv')
train.head()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
test = pd.read_csv('./titanic_datas/test.csv')
test.head()
PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 892 3 Kelly, Mr. James male 34.5 0 0 330911 7.8292 NaN Q
1 893 3 Wilkes, Mrs. James (Ellen Needs) female 47.0 1 0 363272 7.0000 NaN S
2 894 2 Myles, Mr. Thomas Francis male 62.0 0 0 240276 9.6875 NaN Q
3 895 3 Wirz, Mr. Albert male 27.0 0 0 315154 8.6625 NaN S
4 896 3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female 22.0 1 1 3101298 12.2875 NaN S
train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB
test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 11 columns):
PassengerId    418 non-null int64
Pclass         418 non-null int64
Name           418 non-null object
Sex            418 non-null object
Age            418 non-null float64
SibSp          418 non-null int64
Parch          418 non-null int64
Ticket         418 non-null object
Fare           418 non-null float64
Cabin          91 non-null object
Embarked       418 non-null object
dtypes: float64(2), int64(4), object(5)
memory usage: 36.0+ KB
train.describe()
PassengerId Survived Pclass Age SibSp Parch Fare
count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000
mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208
std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429
min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000
25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400
50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200
75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000
max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200
test.describe()
PassengerId Pclass Age SibSp Parch Fare
count 418.000000 418.000000 418.000000 418.000000 418.000000 418.000000
mean 1100.500000 2.265550 30.154603 0.447368 0.392344 35.619000
std 120.810458 0.841838 12.636666 0.896760 0.981429 55.840751
min 892.000000 1.000000 0.170000 0.000000 0.000000 0.000000
25% 996.250000 1.000000 23.000000 0.000000 0.000000 7.895800
50% 1100.500000 3.000000 29.699118 0.000000 0.000000 14.454200
75% 1204.750000 3.000000 35.750000 1.000000 0.000000 31.500000
max 1309.000000 3.000000 76.000000 8.000000 9.000000 512.329200
fare_mean = train["Fare"].mean()
test.loc[pd.isnull(test.Fare),'Fare'] = fare_mean
embarked_mode = train['Embarked'].mode()
train.loc[pd.isnull(train.Embarked),['Embarked']] = embarked_mode[0]
age_mean = train['Age'].mean()
train.loc[pd.isnull(train.Age),['Age']] = age_mean
test.loc[pd.isnull(test.Age),['Age']] = age_mean
label = train['Survived']
train.drop('Survived',axis=1,inplace=True)X_train,X_test,Y_train,Y_test = train_test_split(train,label,test_size = 0.3,random_state = 1)X_train['Survived'] = Y_train
X_test['Survived'] = Y_test
fig, (axis1,axis2) = plt.subplots(1,2,figsize=(15,5))
sns.barplot('Sex', 'Survived', data=X_train, ax=axis1)
sns.barplot('Sex', 'Survived', data=X_test, ax=axis2)
<matplotlib.axes._subplots.AxesSubplot at 0x1a23a577f0>

train['Sex'] = train['Sex'].apply(lambda x: 1 if x == 'male' else 0)
test['Sex'] = test['Sex'].apply(lambda x: 1 if x == 'male' else 0)train = pd.get_dummies(data= train,columns=['Sex'])
test = pd.get_dummies(data= test,columns=['Sex'])
def Name_Title_Code(x):if x == 'Mr.':return 1if (x == 'Mrs.') or (x=='Ms.') or (x=='Lady.') or (x == 'Mlle.') or (x =='Mme'):return 2if x == 'Miss':return 3if x == 'Rev.':return 4return 5
X_train['Name_Title'] = X_train['Name'].apply(lambda x: x.split(',')[1]).apply(lambda x: x.split()[0])
X_test['Name_Title'] = X_test['Name'].apply(lambda x: x.split(',')[1]).apply(lambda x: x.split()[0])
X_train.groupby('Name_Title')['Survived'].count()
Name_Title
Capt.        1
Col.         2
Don.         1
Dr.          4
Lady.        1
Major.       1
Master.     27
Miss.      126
Mlle.        1
Mme.         1
Mr.        365
Mrs.        87
Rev.         5
the          1
Name: Survived, dtype: int64
fig, (axis1,axis2) = plt.subplots(1,2,figsize=(15,5))
sns.barplot('Name_Title', 'Survived', data=X_train.sort_values('Name_Title'), ax=axis1)
sns.barplot('Name_Title', 'Survived', data=X_test.sort_values('Name_Title'), ax=axis2)
<matplotlib.axes._subplots.AxesSubplot at 0x1a23e730f0>

def Name_Title_Code(x):if x == 'Mr.':return 1if (x == 'Mrs.') or (x=='Ms.') or (x=='Lady.') or (x == 'Mlle.') or (x =='Mme'):return 2if x == 'Miss':return 3if x == 'Rev.':return 4return 5
train['Name_Title'] = train['Name'].apply(lambda x: x.split(',')[1]).apply(lambda x: x.split()[0])
test['Name_Title'] = test['Name'].apply(lambda x: x.split(',')[1]).apply(lambda x: x.split()[0])train['Name_Title'] = train['Name_Title'].apply(Name_Title_Code)
test['Name_Title'] = test['Name_Title'].apply(Name_Title_Code)
train = pd.get_dummies(columns = ['Name_Title'], data = train)
test = pd.get_dummies(columns = ['Name_Title'], data = test)
train.head()
PassengerId Pclass Name Age SibSp Parch Ticket Fare Cabin Embarked Sex_0 Sex_1 Name_Title_1 Name_Title_2 Name_Title_4 Name_Title_5
0 1 3 Braund, Mr. Owen Harris 22.0 1 0 A/5 21171 7.2500 NaN S 0 1 1 0 0 0
1 2 1 Cumings, Mrs. John Bradley (Florence Briggs Th... 38.0 1 0 PC 17599 71.2833 C85 C 1 0 0 1 0 0
2 3 3 Heikkinen, Miss. Laina 26.0 0 0 STON/O2. 3101282 7.9250 NaN S 1 0 0 0 0 1
3 4 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) 35.0 1 0 113803 53.1000 C123 S 1 0 0 1 0 0
4 5 3 Allen, Mr. William Henry 35.0 0 0 373450 8.0500 NaN S 0 1 1 0 0 0
X_train['Name_len'] = X_train['Name'].apply(lambda x: len(x))
X_test['Name_len'] = X_test['Name'].apply(lambda x: len(x))
fig, (axis1,axis2) = plt.subplots(1,2,figsize=(20,10))
sns.barplot('Name_len', 'Survived', data=X_train.sort_values(['Name_len']), ax=axis1)
sns.barplot('Name_len', 'Survived', data=X_test.sort_values(['Name_len']), ax=axis2)
<matplotlib.axes._subplots.AxesSubplot at 0x1a24bdc4e0>

train['Name_len'] = train['Name'].apply(lambda x: len(x))
test['Name_len'] = test['Name'].apply(lambda x: len(x))
def Ticket_First_Let(x):return x[0]
X_train['Ticket_First_Letter'] = X_train['Ticket'].apply(Ticket_First_Let)
X_test['Ticket_First_Letter'] = X_test['Ticket'].apply(Ticket_First_Let)
X_train.groupby('Ticket_First_Letter')['Survived'].count()
Ticket_First_Letter
1     87
2    129
3    225
4     10
5      2
6      6
7      6
8      1
9      1
A     20
C     32
F      3
L      3
P     49
S     40
W      9
Name: Survived, dtype: int64
fig, (axis1,axis2) = plt.subplots(1,2,figsize=(15,5))
sns.barplot('Ticket_First_Letter', 'Survived', data=X_train.sort_values('Ticket_First_Letter'), ax=axis1)
sns.barplot('Ticket_First_Letter', 'Survived', data=X_test.sort_values('Ticket_First_Letter'), ax=axis2)
<matplotlib.axes._subplots.AxesSubplot at 0x1a24dc7358>

def Ticket_First_Letter_Code(x):if (x == '1'):return 1if x == '3':return 2if x == '4':return 3if x == 'C':return 4if x == 'S':return 5if x == 'P':return 6if x == '6':return 7if x == '7':return 8if x == 'A':return 9if x == 'W':return 10return 11
train['Ticket_First_Letter'] = train['Ticket'].apply(Ticket_First_Let)
test['Ticket_First_Letter'] = test['Ticket'].apply(Ticket_First_Let)train['Ticket_First_Letter'].unique()
array(['A', 'P', 'S', '1', '3', '2', 'C', '7', 'W', '4', 'F', 'L', '9','6', '5', '8'], dtype=object)
test['Ticket_First_Letter'].unique()
array(['3', '2', '7', 'A', '6', 'W', 'S', 'P', 'C', '1', 'F', '4', '9','L'], dtype=object)
train['Ticket_First_Letter'] = train['Ticket_First_Letter'].apply(Ticket_First_Letter_Code)
test['Ticket_First_Letter'] = test['Ticket_First_Letter'].apply(Ticket_First_Letter_Code)
X_train['Cabin'] = X_train['Cabin'].fillna('Missing')
X_test['Cabin'] = X_test['Cabin'].fillna('Missing')
def Cabin_First_Letter(x):if x == 'Missing':return 'XX'return x[0]
X_train['Cabin_First_Letter'] = X_train['Cabin'].apply(Cabin_First_Letter)
X_test['Cabin_First_Letter'] = X_test['Cabin'].apply(Cabin_First_Letter)
X_train.groupby('Cabin_First_Letter')['Survived'].count()
Cabin_First_Letter
A      12
B      28
C      41
D      21
E      22
F       8
G       3
XX    488
Name: Survived, dtype: int64
fig, (axis1,axis2) = plt.subplots(1,2,figsize=(15,5))
sns.barplot('Cabin_First_Letter', 'Survived', data=X_train.sort_values('Cabin_First_Letter'), ax=axis1)
sns.barplot('Cabin_First_Letter', 'Survived', data=X_test.sort_values('Cabin_First_Letter'), ax=axis2)
<matplotlib.axes._subplots.AxesSubplot at 0x1a24ba4208>

def Cabin_First_Letter_Code(x):if x == 'XX':return 1if x == 'B':return 2if x == 'C':return 3if x == 'D':return 4     return 5
train['Cabin'] = train['Cabin'].fillna('Missing')
test['Cabin'] = test['Cabin'].fillna('Missing')train['Cabin_First_Letter'] = train['Cabin'].apply(Cabin_First_Letter)
test['Cabin_First_Letter'] = test['Cabin'].apply(Cabin_First_Letter)
train['Cabin_First_Letter'] = train['Cabin_First_Letter'].apply(Cabin_First_Letter_Code)
test['Cabin_First_Letter'] = test['Cabin_First_Letter'].apply(Cabin_First_Letter_Code)train = pd.get_dummies(columns = ['Cabin_First_Letter'], data = train)
test = pd.get_dummies(columns = ['Cabin_First_Letter'], data = test)
fig, (axis1,axis2) = plt.subplots(1,2,figsize=(15,5))
sns.barplot('Embarked', 'Survived', data=X_train.sort_values('Embarked'), ax=axis1)
sns.barplot('Embarked', 'Survived', data=X_test.sort_values('Embarked'), ax=axis2)
<matplotlib.axes._subplots.AxesSubplot at 0x1a259bc6d8>

train = pd.get_dummies(train,columns = ['Embarked'])
test = pd.get_dummies(test,columns = ['Embarked'])
fig, (axis1,axis2) = plt.subplots(1,2,figsize=(15,5))
sns.barplot('SibSp', 'Survived', data=X_train.sort_values('SibSp'), ax=axis1)
sns.barplot('SibSp', 'Survived', data=X_test.sort_values('SibSp'), ax=axis2)
<matplotlib.axes._subplots.AxesSubplot at 0x1a25b7ea90>

X_train['Fam_Size'] = X_train['SibSp']  + X_train['Parch']
X_test['Fam_Size'] = X_test['SibSp']  + X_test['Parch']
fig, (axis1,axis2) = plt.subplots(1,2,figsize=(15,5))
sns.barplot('Fam_Size', 'Survived', data=X_train.sort_values('Parch'), ax=axis1)
sns.barplot('Fam_Size', 'Survived', data=X_test.sort_values('Parch'), ax=axis2)
<matplotlib.axes._subplots.AxesSubplot at 0x1a25c67fd0>

def Family_feature(train, test):for i in [train, test]:i['Fam_Size'] = np.where((i['SibSp']+i['Parch']) == 0 , 'Solo',np.where((i['SibSp']+i['Parch']) <= 3,'Nuclear', 'Big'))del i['SibSp']del i['Parch']return train, test
train, test  = Family_feature(train, test)train = pd.get_dummies(train,columns = ['Fam_Size'])
test =  pd.get_dummies(test,columns = ['Fam_Size'])
fig, (axis1,axis2) = plt.subplots(1,2,figsize=(15,5))
sns.barplot('Pclass', 'Survived', data=X_train.sort_values('Pclass'), ax=axis1)
sns.barplot('Pclass', 'Survived', data=X_test.sort_values('Pclass'), ax=axis2) 
<matplotlib.axes._subplots.AxesSubplot at 0x1a25e84c50>

train['Pclass_1']  = np.int32(train['Pclass'] == 1)
train['Pclass_2']  = np.int32(train['Pclass'] == 2)
train['Pclass_3']  = np.int32(train['Pclass'] == 3)  test['Pclass_1']  = np.int32(test['Pclass'] == 1)
test['Pclass_2']  = np.int32(test['Pclass'] == 2)
test['Pclass_3']  = np.int32(test['Pclass'] == 3)
fig, (axis1,axis2) = plt.subplots(1,2,figsize=(15,5))
sns.distplot(X_train[X_train.Survived==1]['Age'].dropna().values, bins=range(0, 81, 6),color='red', ax=axis1)
sns.distplot(X_train[X_train.Survived==0]['Age'].dropna().values, bins=range(0, 81, 6),color = 'blue', ax=axis1) sns.distplot(X_test[X_test.Survived==1]['Age'].dropna().values, bins=range(0, 81, 6),color='red', ax=axis2)
sns.distplot(X_test[X_test.Survived==0]['Age'].dropna().values, bins=range(0, 81, 6),color = 'blue', ax=axis2)
<matplotlib.axes._subplots.AxesSubplot at 0x1a2614dd30>

train['Small_Age'] = np.int32(train['Age'] <= 5)
train['Old_Age'] = np.int32(train['Age'] >= 65)
train['Middle_Age'] = np.int32((train['Age'] >= 15) & (train['Age'] <= 25))  test['Small_Age'] = np.int32(test['Age'] <= 5)
test['Old_Age'] = np.int32(test['Age'] >= 65)
test['Middle_Age'] = np.int32((test['Age'] >= 15) & (test['Age'] <= 25))
X_train['Fare'] = X_train['Fare'] + 1
X_test['Fare'] = X_test['Fare'] + 1X_train['Fare'] = X_train['Fare'].apply(np.log)
X_test['Fare'] = X_test['Fare'].apply(np.log)
fig, (axis1,axis2) = plt.subplots(1,2,figsize=(15,5))
sns.distplot(X_train[X_train.Survived==1]['Fare'].dropna().values, bins=range(0, 10, 1),color='red', ax=axis1)
sns.distplot(X_train[X_train.Survived==0]['Fare'].dropna().values, bins=range(0, 10, 1),color = 'blue', ax=axis1) sns.distplot(X_test[X_test.Survived==1]['Fare'].dropna().values, bins=range(0, 10, 1),color='red', ax=axis2)
sns.distplot(X_test[X_test.Survived==0]['Fare'].dropna().values, bins=range(0, 10, 1),color = 'blue', ax=axis2)
<matplotlib.axes._subplots.AxesSubplot at 0x1a2627ac50>

train['Fare'] = train['Fare'] + 1
test['Fare'] = test['Fare'] + 1train['Fare'] = train['Fare'].apply(np.log)
test['Fare'] = test['Fare'].apply(np.log)
train['Fare_0_2'] = np.int32(train['Fare'] <= 2)
train['Fare_2_3'] = np.int32((train['Fare'] > 2) & (train['Fare'] <= 3) )
train['Fare_3_4'] = np.int32((train['Fare'] > 3) & (train['Fare'] <= 4) )
train['Fare_4_5'] = np.int32((train['Fare'] > 4) & (train['Fare'] <= 5))
train['Fare_5_'] = np.int32(train['Fare'] > 5)test['Fare_0_2'] = np.int32(test['Fare'] <= 2)
test['Fare_2_3'] = np.int32((test['Fare'] > 2) & (test['Fare'] <= 3) )
test['Fare_3_4'] = np.int32((test['Fare'] > 3) & (test['Fare'] <= 4) )
test['Fare_4_5'] = np.int32((test['Fare'] > 4) & (test['Fare'] <= 5))
test['Fare_5_'] = np.int32(test['Fare'] > 5)
train.drop(['Ticket','PassengerId','Name','Age','Cabin','Pclass'],axis = 1, inplace=True)
test.drop( ['PassengerId','Ticket','Name','Age','Cabin','Pclass'],axis =1, inplace=True)
X_train_ = train.loc[X_train.index]
X_test_ = train.loc[X_test.index]Y_train_ = label.loc[X_train.index]
Y_test_ = label.loc[X_test.index]X_test_ = X_test_[X_train_.columns]
pd.set_option('display.max_columns',50)
train.head()
Fare Sex_0 Sex_1 Name_Title_1 Name_Title_2 Name_Title_4 Name_Title_5 Name_len Ticket_First_Letter Cabin_First_Letter_1 Cabin_First_Letter_2 Cabin_First_Letter_3 Cabin_First_Letter_4 Cabin_First_Letter_5 Embarked_C Embarked_Q Embarked_S Fam_Size_Big Fam_Size_Nuclear Fam_Size_Solo Pclass_1 Pclass_2 Pclass_3 Small_Age Old_Age Middle_Age Fare_0_2 Fare_2_3 Fare_3_4 Fare_4_5 Fare_5_
0 2.110213 0 1 1 0 0 0 23 9 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1 0 1 0 0 0
1 4.280593 1 0 0 1 0 0 51 6 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0
2 2.188856 1 0 0 0 0 1 22 5 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0
3 3.990834 1 0 0 1 0 0 44 1 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0
4 2.202765 0 1 1 0 0 0 24 2 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0
test = test[train.columns]
rf_ = RandomForestClassifier(criterion='gini', n_estimators=700,
#                              max_depth=5,min_samples_split=16,min_samples_leaf=1,max_features='auto',  random_state=10,n_jobs=-1)
rf_.fit(X_train_,Y_train_)
rf_.score(X_test_,Y_test_)
0.7910447761194029
rf_.fit(train,label)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',max_depth=None, max_features='auto', max_leaf_nodes=None,min_impurity_decrease=0.0, min_impurity_split=None,min_samples_leaf=1, min_samples_split=16,min_weight_fraction_leaf=0.0, n_estimators=700, n_jobs=-1,oob_score=False, random_state=10, verbose=0, warm_start=False)
pd.concat((pd.DataFrame(train.columns, columns = ['variable']), pd.DataFrame(rf_.feature_importances_, columns = ['importance'])), axis = 1).sort_values(by='importance', ascending = False)[:20]
variable importance
1 Sex_0 0.136334
3 Name_Title_1 0.125036
2 Sex_1 0.118254
0 Fare 0.096483
7 Name_len 0.089186
6 Name_Title_5 0.055360
22 Pclass_3 0.050127
8 Ticket_First_Letter 0.045200
9 Cabin_First_Letter_1 0.034312
17 Fam_Size_Big 0.033951
4 Name_Title_2 0.033745
20 Pclass_1 0.022517
18 Fam_Size_Nuclear 0.021219
21 Pclass_2 0.015824
23 Small_Age 0.014996
27 Fare_2_3 0.013717
16 Embarked_S 0.012581
19 Fam_Size_Solo 0.011034
29 Fare_4_5 0.010546
14 Embarked_C 0.008645
excl("ls titanic_datas")
['gender_submission.csv\n', 'test.csv\n', 'train.csv\n']
submit = pd.read_csv('./titanic_datas/gender_submission.csv')
submit.set_index('PassengerId',inplace=True)res_rf = rf_.predict(test)
submit['Survived'] = res_rf
submit['Survived'] = submit['Survived'].apply(int)
submit.to_csv('./titanic_datas/submit.csv')
excl("ls titanic_datas")
['gender_submission.csv\n', 'submit.csv\n', 'test.csv\n', 'train.csv\n']
  • 提交结果
Your Best Entry
Your submission scored 0.81339

[sklearn数据科学浅尝]kaggle泰坦尼克号幸存预测问题(入全球前10%)相关推荐

  1. Kaggle泰坦尼克号生存预测挑战——模型建立、模型调参、融合

    Kaggle泰坦尼克号生存预测挑战 这是kaggle上Getting Started 的Prediction Competition,也是比较入门和简单的新人赛,我的最好成绩好像有进入top8%,重新 ...

  2. Kaggle泰坦尼克号生存预测挑战——数据分析

    Kaggle泰坦尼克号生存预测挑战 这是kaggle上Getting Started 的Prediction Competition,也是比较入门和简单的新人赛,我的最好成绩好像有进入top8%,重新 ...

  3. 机器学习实战(入门级) ------ Kaggle 泰坦尼克号幸存者预测 (随机森林,KNN,SVM)

    文章目录 前言 数据集介绍 gender_submision.csv: train.csv: test.csv 数据清洗 数据预处理 缺失值填充 数据优化 训练过程 SVM 完整代码 KNN K-Me ...

  4. kaggle 泰坦尼克号生存预测——六种算法模型实现与比较

    Hi,大家好,这是我第一篇博客. 作为非专业程序小白,博客内容必然有不少错误之处,还望各位大神多多批评指正. 在开始正式内容想先介绍下自己和一些异想天开的想法. 我是一名研究生,研究的方向是蛋白质结构 ...

  5. 数据科学作业2_房屋交易价格预测

    这是我去年选修数据科学时候的作业二,当时是肖若秀老师教的,但听说我们这届之后计科和物联信安一个难度授课了这篇文章可能也就只是自己记录帮不上学弟学妹了,但当时我上数据科学时候肖老师不签到老好了最后四个作 ...

  6. python百科全书_维基百科中的数据科学:手把手教你用Python读懂全球最大百科全书...

    image 大数据文摘出品 编译:狗小白.李佳.张弛.魏子敏 没人否认,维基百科是现代最令人惊叹的人类发明之一. 几年前谁能想到,匿名贡献者们的义务工作竟创造出前所未有的巨大在线知识库?维基百科不仅是 ...

  7. 机器学习实战——kaggle 泰坦尼克号生存预测——六种算法模型实现与比较

    一.初识 kaggle kaggle是一个非常适合初学者去实操实战技能的一个网站,它可以根据你做的项目来评估你的得分和排名.让你对自己的能力有更清楚的了解,当然,在这个网站上,也有很多项目的教程,可以 ...

  8. 机器学习实战:Kaggle泰坦尼克号生存预测 利用决策树进行预测

    决策树分类的应用场景非常广泛,在各行各业都有应用,比如在金融行业可以用决策树做贷款风险评估,医疗行业可以用决策树生成辅助诊断,电商行业可以用决策树对销售额进行预测等. 我们利用 sklearn 工具中 ...

  9. [Kaggle]泰坦尼克号沉没预测

    一.背景 泰坦尼克号的沉没是历史上最臭名昭著的沉船事件之一. 1912年4月15日,在其处女航期间,这艘被广泛认为"永不沉没"的RMS泰坦尼克号与冰山相撞后沉没.不幸的是,没有足够 ...

最新文章

  1. python 判断字符串是否为空,字典是否为空,列表是否为空,元组是否为空的方法
  2. Mac-使用技巧之快捷键
  3. python语言入门p-python初学者怎么入门
  4. Qt学习之路(28): 坐标变换
  5. 各层电子数排布规则_原子核外电子排布原理
  6. 胰腺癌代谢生物标志物最新研究成果:诊断效率明显优于传统标志物
  7. centos7 安装mysql8_CentOS7中安装MySQL8.0.21爬坑记录
  8. 计算机总线控制驱动,什么是sm总线控制器,总线控制器驱动怎么安装?
  9. excel转word_这3种Word、Excel格式不变的互转方法,实在太好用了
  10. 用Python徒手撸一个股票回测框架!
  11. python调研报告总结体会_调查报告的心得体会
  12. 入手评测 联想小新Pro16和thinkbook15p有什么区别 选哪个
  13. 毕业前三年如何拿到好绩效
  14. Desolate Era Book 1, Chapter 1
  15. 力扣(LeetCode)——编译、提交和注释快捷键
  16. AT24C02 能读不能写的问题
  17. NASA发布史上最深的宇宙全彩照!韦伯如何回传150万公里外的太空数据?
  18. 取得最高系统管理员权限的有效方法
  19. 安全驾驶-座椅枕头高度(九)
  20. Spring Boot 整合 MyBatis Plus实现多数据源的两种方式

热门文章

  1. LED数码管的驱动方式:静态驱动和动态驱动
  2. wordpress 网站备份
  3. 用双眼无法扑捉 索尼Z2的细节改动
  4. (入门自用)--Linux--进程终止与等待--0914--0919
  5. 浏览器为低版本IE的时候的信息提示;旧版 Internet Explorer 升级提示页;旧版 Internet Explorer 淘汰行动
  6. 蓝桥杯 成绩统计 Python实现
  7. 如何让人脉圈子帮你实现收入倍增?
  8. 个人网盘VS企业网盘 亿级市场的龙争虎斗
  9. 卡方检验怎么检验模型拟合优度
  10. Oracle 11g_管理表空间和数据文件(7)