ML之CatboostC：基于titanic泰坦尼克数据集利用catboost算法实现二分类

基于titanic泰坦尼克数据集利用catboost算法实现二分类

设计思路

输出结果

核心代码

相关内容
ML之CatBoost：CatBoost算法的简介、安装、案例应用之详细攻略
ML之CatboostC：基于titanic泰坦尼克数据集利用catboost算法实现二分类
ML之CatboostC：基于titanic泰坦尼克数据集利用catboost算法实现二分类实现

基于titanic泰坦尼克数据集利用catboost算法实现二分类

设计思路

输出结果

   Pclass     Sex   Age  SibSp  Parch  Survived
0       3    male  22.0      1      0         0
1       1  female  38.0      1      0         1
2       3  female  26.0      0      0         1
3       1  female  35.0      1      0         1
4       3    male  35.0      0      0         0
Pclass        int64
Sex          object
Age         float64
SibSp         int64
Parch         int64
Survived      int64
dtype: object
object_features_ID： [1]
0:  learn: 0.5469469    test: 0.5358272 best: 0.5358272 (0) total: 98.1ms   remaining: 9.71s
1:  learn: 0.4884967    test: 0.4770551 best: 0.4770551 (1) total: 98.7ms   remaining: 4.84s
2:  learn: 0.4459496    test: 0.4453159 best: 0.4453159 (2) total: 99.3ms   remaining: 3.21s
3:  learn: 0.4331858    test: 0.4352757 best: 0.4352757 (3) total: 99.8ms   remaining: 2.4s
4:  learn: 0.4197131    test: 0.4266055 best: 0.4266055 (4) total: 100ms    remaining: 1.91s
5:  learn: 0.4085381    test: 0.4224953 best: 0.4224953 (5) total: 101ms    remaining: 1.58s
6:  learn: 0.4063807    test: 0.4209804 best: 0.4209804 (6) total: 102ms    remaining: 1.35s
7:  learn: 0.4007713    test: 0.4155077 best: 0.4155077 (7) total: 102ms    remaining: 1.17s
8:  learn: 0.3971064    test: 0.4135872 best: 0.4135872 (8) total: 103ms    remaining: 1.04s
9:  learn: 0.3943774    test: 0.4105674 best: 0.4105674 (9) total: 103ms    remaining: 928ms
10: learn: 0.3930801    test: 0.4099915 best: 0.4099915 (10)    total: 104ms    remaining: 839ms
11: learn: 0.3904409    test: 0.4089840 best: 0.4089840 (11)    total: 104ms    remaining: 764ms
12: learn: 0.3890830    test: 0.4091666 best: 0.4089840 (11)    total: 105ms    remaining: 701ms
13: learn: 0.3851196    test: 0.4108839 best: 0.4089840 (11)    total: 105ms    remaining: 647ms
14: learn: 0.3833366    test: 0.4106298 best: 0.4089840 (11)    total: 106ms    remaining: 600ms
15: learn: 0.3792283    test: 0.4126097 best: 0.4089840 (11)    total: 106ms    remaining: 558ms
16: learn: 0.3765680    test: 0.4114997 best: 0.4089840 (11)    total: 107ms    remaining: 522ms
17: learn: 0.3760966    test: 0.4112166 best: 0.4089840 (11)    total: 107ms    remaining: 489ms
18: learn: 0.3736951    test: 0.4122305 best: 0.4089840 (11)    total: 108ms    remaining: 461ms
19: learn: 0.3719966    test: 0.4101199 best: 0.4089840 (11)    total: 109ms    remaining: 435ms
20: learn: 0.3711460    test: 0.4097299 best: 0.4089840 (11)    total: 109ms    remaining: 411ms
21: learn: 0.3707144    test: 0.4093512 best: 0.4089840 (11)    total: 110ms    remaining: 389ms
22: learn: 0.3699238    test: 0.4083409 best: 0.4083409 (22)    total: 110ms    remaining: 370ms
23: learn: 0.3670864    test: 0.4071850 best: 0.4071850 (23)    total: 111ms    remaining: 351ms
24: learn: 0.3635514    test: 0.4038399 best: 0.4038399 (24)    total: 111ms    remaining: 334ms
25: learn: 0.3627657    test: 0.4025837 best: 0.4025837 (25)    total: 112ms    remaining: 319ms
26: learn: 0.3621028    test: 0.4018449 best: 0.4018449 (26)    total: 113ms    remaining: 304ms
27: learn: 0.3616121    test: 0.4011693 best: 0.4011693 (27)    total: 113ms    remaining: 291ms
28: learn: 0.3614262    test: 0.4011820 best: 0.4011693 (27)    total: 114ms    remaining: 278ms
29: learn: 0.3610673    test: 0.4005475 best: 0.4005475 (29)    total: 114ms    remaining: 267ms
30: learn: 0.3588062    test: 0.4002801 best: 0.4002801 (30)    total: 115ms    remaining: 256ms
31: learn: 0.3583703    test: 0.3997255 best: 0.3997255 (31)    total: 116ms    remaining: 246ms
32: learn: 0.3580553    test: 0.4001878 best: 0.3997255 (31)    total: 116ms    remaining: 236ms
33: learn: 0.3556808    test: 0.4004169 best: 0.3997255 (31)    total: 118ms    remaining: 228ms
34: learn: 0.3536833    test: 0.4003229 best: 0.3997255 (31)    total: 119ms    remaining: 220ms
35: learn: 0.3519948    test: 0.4008047 best: 0.3997255 (31)    total: 119ms    remaining: 212ms
36: learn: 0.3515452    test: 0.4000576 best: 0.3997255 (31)    total: 120ms    remaining: 204ms
37: learn: 0.3512962    test: 0.3997214 best: 0.3997214 (37)    total: 120ms    remaining: 196ms
38: learn: 0.3507648    test: 0.4001569 best: 0.3997214 (37)    total: 121ms    remaining: 189ms
39: learn: 0.3489575    test: 0.4009203 best: 0.3997214 (37)    total: 121ms    remaining: 182ms
40: learn: 0.3480966    test: 0.4014031 best: 0.3997214 (37)    total: 122ms    remaining: 175ms
41: learn: 0.3477613    test: 0.4009293 best: 0.3997214 (37)    total: 122ms    remaining: 169ms
42: learn: 0.3472945    test: 0.4006602 best: 0.3997214 (37)    total: 123ms    remaining: 163ms
43: learn: 0.3465271    test: 0.4007531 best: 0.3997214 (37)    total: 124ms    remaining: 157ms
44: learn: 0.3461538    test: 0.4010608 best: 0.3997214 (37)    total: 124ms    remaining: 152ms
45: learn: 0.3455060    test: 0.4012489 best: 0.3997214 (37)    total: 125ms    remaining: 146ms
46: learn: 0.3449922    test: 0.4013439 best: 0.3997214 (37)    total: 125ms    remaining: 141ms
47: learn: 0.3445333    test: 0.4010754 best: 0.3997214 (37)    total: 126ms    remaining: 136ms
48: learn: 0.3443186    test: 0.4011180 best: 0.3997214 (37)    total: 126ms    remaining: 132ms
49: learn: 0.3424633    test: 0.4016071 best: 0.3997214 (37)    total: 127ms    remaining: 127ms
50: learn: 0.3421565    test: 0.4013135 best: 0.3997214 (37)    total: 128ms    remaining: 123ms
51: learn: 0.3417523    test: 0.4009993 best: 0.3997214 (37)    total: 128ms    remaining: 118ms
52: learn: 0.3415669    test: 0.4009101 best: 0.3997214 (37)    total: 129ms    remaining: 114ms
53: learn: 0.3413867    test: 0.4010833 best: 0.3997214 (37)    total: 130ms    remaining: 110ms
54: learn: 0.3405166    test: 0.4014830 best: 0.3997214 (37)    total: 130ms    remaining: 107ms
55: learn: 0.3401535    test: 0.4015556 best: 0.3997214 (37)    total: 131ms    remaining: 103ms
56: learn: 0.3395217    test: 0.4021097 best: 0.3997214 (37)    total: 132ms    remaining: 99.4ms
57: learn: 0.3393024    test: 0.4023377 best: 0.3997214 (37)    total: 132ms    remaining: 95.8ms
58: learn: 0.3389909    test: 0.4019616 best: 0.3997214 (37)    total: 133ms    remaining: 92.3ms
59: learn: 0.3388494    test: 0.4019746 best: 0.3997214 (37)    total: 133ms    remaining: 88.9ms
60: learn: 0.3384901    test: 0.4017470 best: 0.3997214 (37)    total: 134ms    remaining: 85.6ms
61: learn: 0.3382250    test: 0.4018783 best: 0.3997214 (37)    total: 134ms    remaining: 82.4ms
62: learn: 0.3345761    test: 0.4039633 best: 0.3997214 (37)    total: 135ms    remaining: 79.3ms
63: learn: 0.3317548    test: 0.4050218 best: 0.3997214 (37)    total: 136ms    remaining: 76.3ms
64: learn: 0.3306501    test: 0.4036656 best: 0.3997214 (37)    total: 136ms    remaining: 73.3ms
65: learn: 0.3292310    test: 0.4034339 best: 0.3997214 (37)    total: 137ms    remaining: 70.5ms
66: learn: 0.3283600    test: 0.4033661 best: 0.3997214 (37)    total: 137ms    remaining: 67.6ms
67: learn: 0.3282389    test: 0.4034237 best: 0.3997214 (37)    total: 138ms    remaining: 64.9ms
68: learn: 0.3274603    test: 0.4039310 best: 0.3997214 (37)    total: 138ms    remaining: 62.2ms
69: learn: 0.3273430    test: 0.4041663 best: 0.3997214 (37)    total: 139ms    remaining: 59.6ms
70: learn: 0.3271585    test: 0.4044144 best: 0.3997214 (37)    total: 140ms    remaining: 57.1ms
71: learn: 0.3268457    test: 0.4046981 best: 0.3997214 (37)    total: 140ms    remaining: 54.6ms
72: learn: 0.3266497    test: 0.4042724 best: 0.3997214 (37)    total: 141ms    remaining: 52.1ms
73: learn: 0.3259684    test: 0.4048797 best: 0.3997214 (37)    total: 141ms    remaining: 49.7ms
74: learn: 0.3257845    test: 0.4044766 best: 0.3997214 (37)    total: 142ms    remaining: 47.3ms
75: learn: 0.3256157    test: 0.4047031 best: 0.3997214 (37)    total: 143ms    remaining: 45.1ms
76: learn: 0.3251433    test: 0.4043698 best: 0.3997214 (37)    total: 144ms    remaining: 42.9ms
77: learn: 0.3247743    test: 0.4041652 best: 0.3997214 (37)    total: 144ms    remaining: 40.6ms
78: learn: 0.3224876    test: 0.4058880 best: 0.3997214 (37)    total: 145ms    remaining: 38.5ms
79: learn: 0.3223339    test: 0.4058139 best: 0.3997214 (37)    total: 145ms    remaining: 36.3ms
80: learn: 0.3211858    test: 0.4060056 best: 0.3997214 (37)    total: 146ms    remaining: 34.2ms
81: learn: 0.3200423    test: 0.4067103 best: 0.3997214 (37)    total: 147ms    remaining: 32.2ms
82: learn: 0.3198329    test: 0.4069039 best: 0.3997214 (37)    total: 147ms    remaining: 30.1ms
83: learn: 0.3196561    test: 0.4067853 best: 0.3997214 (37)    total: 148ms    remaining: 28.1ms
84: learn: 0.3193160    test: 0.4072288 best: 0.3997214 (37)    total: 148ms    remaining: 26.1ms
85: learn: 0.3184463    test: 0.4077451 best: 0.3997214 (37)    total: 149ms    remaining: 24.2ms
86: learn: 0.3175777    test: 0.4086243 best: 0.3997214 (37)    total: 149ms    remaining: 22.3ms
87: learn: 0.3173824    test: 0.4082013 best: 0.3997214 (37)    total: 150ms    remaining: 20.4ms
88: learn: 0.3172840    test: 0.4083946 best: 0.3997214 (37)    total: 150ms    remaining: 18.6ms
89: learn: 0.3166252    test: 0.4086761 best: 0.3997214 (37)    total: 151ms    remaining: 16.8ms
90: learn: 0.3164144    test: 0.4083237 best: 0.3997214 (37)    total: 151ms    remaining: 15ms
91: learn: 0.3162137    test: 0.4083699 best: 0.3997214 (37)    total: 152ms    remaining: 13.2ms
92: learn: 0.3155611    test: 0.4091627 best: 0.3997214 (37)    total: 152ms    remaining: 11.5ms
93: learn: 0.3153976    test: 0.4089484 best: 0.3997214 (37)    total: 153ms    remaining: 9.76ms
94: learn: 0.3139281    test: 0.4116939 best: 0.3997214 (37)    total: 154ms    remaining: 8.08ms
95: learn: 0.3128878    test: 0.4146652 best: 0.3997214 (37)    total: 154ms    remaining: 6.42ms
96: learn: 0.3127863    test: 0.4145767 best: 0.3997214 (37)    total: 155ms    remaining: 4.78ms
97: learn: 0.3126696    test: 0.4142118 best: 0.3997214 (37)    total: 155ms    remaining: 3.17ms
98: learn: 0.3120048    test: 0.4140831 best: 0.3997214 (37)    total: 156ms    remaining: 1.57ms
99: learn: 0.3117563    test: 0.4138267 best: 0.3997214 (37)    total: 156ms    remaining: 0usbestTest = 0.3997213503
bestIteration = 37Shrink model to first 38 iterations.

核心代码

class CatBoostClassifier Found at: catboost.coreclass CatBoostClassifier(CatBoost):_estimator_type = 'classifier'"""Implementation of the scikit-learn API for CatBoost classification.Parameters----------iterations : int, [default=500]Max count of trees.range: [1,+inf]learning_rate : float, [default value is selected automatically for binary classification with other parameters set to default. In all other cases default is 0.03]Step size shrinkage used in update to prevents overfitting.range: (0,1]depth : int, [default=6]Depth of a tree. All trees are the same depth.range: [1,+inf]l2_leaf_reg : float, [default=3.0]Coefficient at the L2 regularization term of the cost function.range: [0,+inf]model_size_reg : float, [default=None]Model size regularization coefficient.range: [0,+inf]rsm : float, [default=None]Subsample ratio of columns when constructing each tree.range: (0,1]loss_function : string or object, [default='Logloss']The metric to use in training and also selector of the machine learningproblem to solve. If string, then the name of a supported metric,optionally suffixed with parameter description.If object, it shall provide methods 'calc_ders_range' or 'calc_ders_multi'.border_count : int, [default = 254 for training on CPU or 128 for training on GPU]The number of partitions in numeric features binarization. Used in the preliminary calculation.range: [1,65535] on CPU, [1,255] on GPUfeature_border_type : string, [default='GreedyLogSum']The binarization mode in numeric features binarization. Used in the preliminary calculation.Possible values:- 'Median'- 'Uniform'- 'UniformAndQuantiles'- 'GreedyLogSum'- 'MaxLogSum'- 'MinEntropy'per_float_feature_quantization : list of strings, [default=None]List of float binarization descriptions.Format : described in documentation on catboost.aiExample 1: ['0:1024'] means that feature 0 will have 1024 borders.Example 2: ['0:border_count=1024', '1:border_count=1024', ...] means that two first features have 1024 borders.Example 3: ['0:nan_mode=Forbidden,border_count=32,border_type=GreedyLogSum','1:nan_mode=Forbidden,border_count=32,border_type=GreedyLogSum'] - defines more quantization properties for first two features.input_borders : string, [default=None]input file with borders used in numeric features binarization.output_borders : string, [default=None]output file for borders that were used in numeric features binarization.fold_permutation_block : int, [default=1]To accelerate the learning.The recommended value is within [1, 256]. On small samples, must be set to 1.range: [1,+inf]od_pval : float, [default=None]Use overfitting detector to stop training when reaching a specified threshold.Can be used only with eval_set.range: [0,1]od_wait : int, [default=None]Number of iterations which overfitting detector will wait after new best error.od_type : string, [default=None]Type of overfitting detector which will be used in program.Posible values:- 'IncToDec'- 'Iter'For 'Iter' type od_pval must not be set.If None, then od_type=IncToDec.nan_mode : string, [default=None]Way to process missing values for numeric features.Possible values:- 'Forbidden' - raises an exception if there is a missing value for a numeric feature in a dataset.- 'Min' - each missing value will be processed as the minimum numerical value.- 'Max' - each missing value will be processed as the maximum numerical value.If None, then nan_mode=Min.counter_calc_method : string, [default=None]The method used to calculate counters for dataset with Counter type.Possible values:- 'PrefixTest' - only objects up to current in the test dataset are considered- 'FullTest' - all objects are considered in the test dataset- 'SkipTest' - Objects from test dataset are not considered- 'Full' - all objects are considered for both learn and test datasetIf None, then counter_calc_method=PrefixTest.leaf_estimation_iterations : int, [default=None]The number of steps in the gradient when calculating the values in the leaves.If None, then leaf_estimation_iterations=1.range: [1,+inf]leaf_estimation_method : string, [default=None]The method used to calculate the values in the leaves.Possible values:- 'Newton'- 'Gradient'thread_count : int, [default=None]Number of parallel threads used to run CatBoost.If None or -1, then the number of threads is set to the number of CPU cores.range: [1,+inf]random_seed : int, [default=None]Random number seed.If None, 0 is used.range: [0,+inf]use_best_model : bool, [default=None]To limit the number of trees in predict() using information about the optimal value of the error function.Can be used only with eval_set.best_model_min_trees : int, [default=None]The minimal number of trees the best model should have.verbose: boolWhen set to True, logging_level is set to 'Verbose'.When set to False, logging_level is set to 'Silent'.silent: bool, synonym for verboselogging_level : string, [default='Verbose']Possible values:- 'Silent'- 'Verbose'- 'Info'- 'Debug'metric_period : int, [default=1]The frequency of iterations to print the information to stdout. The value should be a positive integer.simple_ctr: list of strings, [default=None]Binarization settings for categorical features.Format : see documentationExample: ['Borders:CtrBorderCount=5:Prior=0:Prior=0.5', 'BinarizedTargetMeanValue:TargetBorderCount=10:TargetBorderType=MinEntropy', ...]CTR types:CPU and GPU- 'Borders'- 'Buckets'CPU only- 'BinarizedTargetMeanValue'- 'Counter'GPU only- 'FloatTargetMeanValue'- 'FeatureFreq'Number_of_borders, binarization type, target borders and binarizations, priors are optional parametrscombinations_ctr: list of strings, [default=None]per_feature_ctr: list of strings, [default=None]ctr_target_border_count: int, [default=None]Maximum number of borders used in target binarization for categorical features that need it.If TargetBorderCount is specified in 'simple_ctr', 'combinations_ctr' or 'per_feature_ctr' option itoverrides this value.range: [1, 255]ctr_leaf_count_limit : int, [default=None]The maximum number of leaves with categorical features.If the number of leaves exceeds the specified limit, some leaves are discarded.The leaves to be discarded are selected as follows:- The leaves are sorted by the frequency of the values.- The top N leaves are selected, where N is the value specified in the parameter.- All leaves starting from N+1 are discarded.This option reduces the resulting model sizeand the amount of memory required for training.Note that the resulting quality of the model can be affected.range: [1,+inf] (for zero limit use ignored_features)store_all_simple_ctr : bool, [default=None]Ignore categorical features, which are not used in feature combinations,when choosing candidates for exclusion.Use this parameter with ctr_leaf_count_limit only.max_ctr_complexity : int, [default=4]The maximum number of Categ features that can be combined.range: [0,+inf]has_time : bool, [default=False]To use the order in which objects are represented in the input data(do not perform a random permutation of the dataset at the preprocessing stage).allow_const_label : bool, [default=False]To allow the constant label value in dataset.target_border: float, [default=None]Border for target binarization.classes_count : int, [default=None]The upper limit for the numeric class label.Defines the number of classes for multiclassification.Only non-negative integers can be specified.The given integer should be greater than any of the target values.If this parameter is specified the labels for all classes in the input datasetshould be smaller than the given value.If several of 'classes_count', 'class_weights', 'class_names' parameters are definedthe numbers of classes specified by each of them must be equal.class_weights : list or dict, [default=None]Classes weights. The values are used as multipliers for the object weights.If None, all classes are supposed to have weight one.If list - class weights in order of class_names or sequential classes if class_names is undefinedIf dict - dict of class_name -> class_weight.If several of 'classes_count', 'class_weights', 'class_names' parameters are definedthe numbers of classes specified by each of them must be equal.auto_class_weights : string [default=None]Enables automatic class weights calculation. Possible values:- Balanced  # weight = maxSummaryClassWeight / summaryClassWeight, statistics determined from train pool- SqrtBalanced  # weight = sqrt(maxSummaryClassWeight / summaryClassWeight)class_names: list of strings, [default=None]Class names. Allows to redefine the default values for class labels (integer numbers).If several of 'classes_count', 'class_weights', 'class_names' parameters are definedthe numbers of classes specified by each of them must be equal.one_hot_max_size : int, [default=None]Convert the feature to floatif the number of different values that it takes exceeds the specified value.Ctrs are not calculated for such features.random_strength : float, [default=1]Score standard deviation multiplier.name : string, [default='experiment']The name that should be displayed in the visualization tools.ignored_features : list, [default=None]Indices or names of features that should be excluded when training.train_dir : string, [default=None]The directory in which you want to record generated in the process of learning files.custom_metric : string or list of strings, [default=None]To use your own metric function.custom_loss: alias to custom_metriceval_metric : string or object, [default=None]To optimize your custom metric in loss.bagging_temperature : float, [default=None]Controls intensity of Bayesian bagging. The higher the temperature the more aggressive bagging is.Typical values are in range [0, 1] (0 - no bagging, 1 - default).save_snapshot : bool, [default=None]Enable progress snapshotting for restoring progress after crashes or interruptionssnapshot_file : string, [default=None]Learn progress snapshot file path, if None will use default filenamesnapshot_interval: int, [default=600]Interval between saving snapshots (seconds)fold_len_multiplier : float, [default=None]Fold length multiplier. Should be greater than 1used_ram_limit : string or number, [default=None]Set a limit on memory consumption (value like '1.2gb' or 1.2e9).WARNING: Currently this option affects CTR memory usage only.gpu_ram_part : float, [default=0.95]Fraction of the GPU RAM to use for training, a value from (0, 1].pinned_memory_size: int [default=None]Size of additional CPU pinned memory used for GPU learning,usually is estimated automatically, thus usually should not be set.allow_writing_files : bool, [default=True]If this flag is set to False, no files with different diagnostic info will be created during training.With this flag no snapshotting can be done. Plus visualisation will notwork, because visualisation uses files that are created and updated during training.final_ctr_computation_mode : string, [default='Default']Possible values:- 'Default' - Compute final ctrs for all pools.- 'Skip' - Skip final ctr computation. WARNING: model without ctrs can't be applied.approx_on_full_history : bool, [default=False]If this flag is set to True, each approximated value is calculated using all the preceeding rows in the fold (slower, more accurate).If this flag is set to False, each approximated value is calculated using only the beginning 1/fold_len_multiplier fraction of the fold (faster, slightly less accurate).boosting_type : string, default value depends on object count and feature count in train dataset and on learning mode.Boosting scheme.Possible values:- 'Ordered' - Gives better quality, but may slow down the training.- 'Plain' - The classic gradient boosting scheme. May result in quality degradation, but does not slow down the training.task_type : string, [default=None]The calcer type used to train the model.Possible values:- 'CPU'- 'GPU'device_config : string, [default=None], deprecated, use devices insteaddevices : list or string, [default=None], GPU devices to use.String format is: '0' for 1 device or '0:1:3' for multiple devices or '0-3' for range of devices.List format is : [0] for 1 device or [0,1,3] for multiple devices.bootstrap_type : string, Bayesian, Bernoulli, Poisson, MVS.Default bootstrap is Bayesian for GPU and MVS for CPU.Poisson bootstrap is supported only on GPU.MVS bootstrap is supported only on CPU.subsample : float, [default=None]Sample rate for bagging. This parameter can be used Poisson or Bernoully bootstrap types.mvs-reg : float, [default is set automatically at each iteration based on gradient distribution]Regularization parameter for MVS sampling algorithmmonotone_constraints : list or numpy.ndarray or string or dict, [default=None]Monotone constraints for features.feature_weights : list or numpy.ndarray or string or dict, [default=None]Coefficient to multiply split gain with specific feature use. Should be non-negative.penalties_coefficient : float, [default=1]Common coefficient for all penalties. Should be non-negative.first_feature_use_penalties : list or numpy.ndarray or string or dict, [default=None]Penalties to first use of specific feature in model. Should be non-negative.per_object_feature_penalties : list or numpy.ndarray or string or dict, [default=None]Penalties for first use of feature for each object. Should be non-negative.sampling_frequency : string, [default=PerTree]Frequency to sample weights and objects when building trees.Possible values:- 'PerTree' - Before constructing each new tree- 'PerTreeLevel' - Before choosing each new split of a treesampling_unit : string, [default='Object'].Possible values:- 'Object'- 'Group'The parameter allows to specify the sampling scheme:sample weights for each object individually or for an entire group of objects together.dev_score_calc_obj_block_size: int, [default=5000000]CPU only. Size of block of samples in score calculation. Should be > 0Used only for learning speed tuning.Changing this parameter can affect results due to numerical accuracy differencesdev_efb_max_buckets : int, [default=1024]CPU only. Maximum bucket count in exclusive features bundle. Should be in an integer between 0 and 65536.Used only for learning speed tuning.sparse_features_conflict_fraction : float, [default=0.0]CPU only. Maximum allowed fraction of conflicting non-default values for features in exclusive features bundle.Should be a real value in [0, 1) interval.grow_policy : string, [SymmetricTree,Lossguide,Depthwise], [default=SymmetricTree]The tree growing policy. It describes how to perform greedy tree construction.min_data_in_leaf : int, [default=1].The minimum training samples count in leaf.CatBoost will not search for new splits in leaves with samples count less than min_data_in_leaf.This parameter is used only for Depthwise and Lossguide growing policies.max_leaves : int, [default=31],The maximum leaf count in resulting tree.This parameter is used only for Lossguide growing policy.score_function : string, possible values L2, Cosine, NewtonL2, NewtonCosine, [default=Cosine]For growing policy Lossguide default=NewtonL2.GPU only. Score that is used during tree construction to select the next tree split.max_depth : int, Synonym for depth.n_estimators : int, synonym for iterations.num_trees : int, synonym for iterations.num_boost_round : int, synonym for iterations.colsample_bylevel : float, synonym for rsm.random_state : int, synonym for random_seed.reg_lambda : float, synonym for l2_leaf_reg.objective : string, synonym for loss_function.num_leaves : int, synonym for max_leaves.min_child_samples : int, synonym for min_data_in_leafeta : float, synonym for learning_rate.max_bin : float, synonym for border_count.scale_pos_weight : float, synonym for class_weights.Can be used only for binary classification. Sets weight multiplier forclass 1 to scale_pos_weight value.metadata : dict, string to string key-value pairs to be stored in model metadata storageearly_stopping_rounds : intSynonym for od_wait. Only one of these parameters should be set.cat_features : list or numpy.ndarray, [default=None]If not None, giving the list of Categ features indices or names (names are represented as strings).If it contains feature names, feature names must be defined for the training dataset passed to 'fit'.text_features : list or numpy.ndarray, [default=None]If not None, giving the list of Text features indices or names (names are represented as strings).If it contains feature names, feature names must be defined for the training dataset passed to 'fit'.embedding_features : list or numpy.ndarray, [default=None]If not None, giving the list of Embedding features indices or names (names are represented as strings).If it contains feature names, feature names must be defined for the training dataset passed to 'fit'.leaf_estimation_backtracking : string, [default=None]Type of backtracking during gradient descent.Possible values:- 'No' - never backtrack; supported on CPU and GPU- 'AnyImprovement' - reduce the descent step until the value of loss function is less than before the step; supported on CPU and GPU- 'Armijo' - reduce the descent step until Armijo condition is satisfied; supported on GPU onlymodel_shrink_rate : float, [default=0]This parameter enables shrinkage of model at the start of each iteration. CPU only.For Constant mode shrinkage coefficient is calculated as (1 - model_shrink_rate * learning_rate).For Decreasing mode shrinkage coefficient is calculated as (1 - model_shrink_rate / iteration).Shrinkage coefficient should be in [0, 1).model_shrink_mode : string, [default=None]Mode of shrinkage coefficient calculation. CPU only.Possible values:- 'Constant' - Shrinkage coefficient is constant at each iteration.- 'Decreasing' - Shrinkage coefficient decreases at each iteration.langevin : bool, [default=False]Enables the Stochastic Gradient Langevin Boosting. CPU only.diffusion_temperature : float, [default=0]Langevin boosting diffusion temperature. CPU only.posterior_sampling : bool, [default=False]Set group of parameters for further use Uncertainty prediction:- Langevin = True- Model Shrink Rate = 1/(2N), where N is dataset size- Model Shrink Mode = Constant- Diffusion-temperature = N, where N is dataset size. CPU only.boost_from_average : bool, [default=True for RMSE, False for other losses]Enables to initialize approx values by best constant value for specified loss function.Available for RMSE, Logloss, CrossEntropy, Quantile and MAE.tokenizers : list of dicts,Each dict is a tokenizer description. Example:```[{'tokenizer_id': 'Tokenizer',  # Tokeinzer identifier.'lowercasing': 'false',  # Possible values: 'true', 'false'.'number_process_policy': 'LeaveAsIs',  # Possible values: 'Skip', 'LeaveAsIs', 'Replace'.'number_token': '%',  # Rarely used character. Used in conjunction with Replace NumberProcessPolicy.'separator_type': 'ByDelimiter',  # Possible values: 'ByDelimiter', 'BySense'.'delimiter': ' ',  # Used in conjunction with ByDelimiter SeparatorType.'split_by_set': 'false',  # Each single character in delimiter used as individual delimiter.'skip_empty': 'true',  # Possible values: 'true', 'false'.'token_types': ['Word', 'Number', 'Unknown'],  # Used in conjunction with BySense SeparatorType.# Possible values: 'Word', 'Number', 'Punctuation', 'SentenceBreak', 'ParagraphBreak', 'Unknown'.'subtokens_policy': 'SingleToken',  # Possible values:# 'SingleToken' - All subtokens are interpreted as single token).# 'SeveralTokens' - All subtokens are interpreted as several token.},...]```dictionaries : list of dicts,Each dict is a tokenizer description. Example:```[{'dictionary_id': 'Dictionary',  # Dictionary identifier.'token_level_type': 'Word',  # Possible values: 'Word', 'Letter'.'gram_order': '1',  # 1 for Unigram, 2 for Bigram, ...'skip_step': '0',  # 1 for 1-skip-gram, ...'end_of_word_token_policy': 'Insert',  # Possible values: 'Insert', 'Skip'.'end_of_sentence_token_policy': 'Skip',  # Possible values: 'Insert', 'Skip'.'occurrence_lower_bound': '3',  # The lower bound of token occurrences in the text to include it in the dictionary.'max_dictionary_size': '50000',  # The max dictionary size.},...]```feature_calcers : list of strings,Each string is a calcer description. Example:```['NaiveBayes','BM25','BoW:top_tokens_count=2000',]```text_processing : dict,Text processging description."""def __init__(self, iterations=None, learning_rate=None, depth=None, l2_leaf_reg=None, model_size_reg=None, rsm=None, loss_function=None, border_count=None, feature_border_type=None, per_float_feature_quantization=None, input_borders=None, output_borders=None, fold_permutation_block=None, od_pval=None, od_wait=None, od_type=None, nan_mode=None, counter_calc_method=None, leaf_estimation_iterations=None, leaf_estimation_method=None, thread_count=None, random_seed=None, use_best_model=None, best_model_min_trees=None, verbose=None, silent=None, logging_level=None, metric_period=None, ctr_leaf_count_limit=None, store_all_simple_ctr=None, max_ctr_complexity=None, has_time=None, allow_const_label=None, target_border=None, classes_count=None, class_weights=None, auto_class_weights=None, class_names=None, one_hot_max_size=None, random_strength=None, name=None, ignored_features=None, train_dir=None, custom_loss=None, custom_metric=None, eval_metric=None, bagging_temperature=None, save_snapshot=None, snapshot_file=None, snapshot_interval=None, fold_len_multiplier=None, used_ram_limit=None, gpu_ram_part=None, pinned_memory_size=None, allow_writing_files=None, final_ctr_computation_mode=None, approx_on_full_history=None, boosting_type=None, simple_ctr=None, combinations_ctr=None, per_feature_ctr=None, ctr_description=None, ctr_target_border_count=None, task_type=None, device_config=None, devices=None, bootstrap_type=None, subsample=None, mvs_reg=None, sampling_unit=None, sampling_frequency=None, dev_score_calc_obj_block_size=None, dev_efb_max_buckets=None, sparse_features_conflict_fraction=None, max_depth=None, n_estimators=None, num_boost_round=None, num_trees=None, colsample_bylevel=None, random_state=None, reg_lambda=None, objective=None, eta=None, max_bin=None, scale_pos_weight=None, gpu_cat_features_storage=None, data_partition=None, metadata=None, early_stopping_rounds=None, cat_features=None, grow_policy=None, min_data_in_leaf=None, min_child_samples=None, max_leaves=None, num_leaves=None, score_function=None, leaf_estimation_backtracking=None, ctr_history_unit=None, monotone_constraints=None, feature_weights=None, penalties_coefficient=None, first_feature_use_penalties=None, per_object_feature_penalties=None, model_shrink_rate=None, model_shrink_mode=None, langevin=None, diffusion_temperature=None, posterior_sampling=None, boost_from_average=None, text_features=None, tokenizers=None, dictionaries=None, feature_calcers=None, text_processing=None, embedding_features=None):params = {}not_params = ["not_params", "self", "params", "__class__"]for key, value in iteritems(locals().copy()):if key not in not_params and value is not None:params[key] = valuesuper(CatBoostClassifier, self).__init__(params)def fit(self, X, y=None, cat_features=None, text_features=None, embedding_features=None, sample_weight=None, baseline=None, use_best_model=None, eval_set=None, verbose=None, logging_level=None, plot=False, column_description=None, verbose_eval=None, metric_period=None, silent=None, early_stopping_rounds=None, save_snapshot=None, snapshot_file=None, snapshot_interval=None, init_model=None):"""Fit the CatBoostClassifier model.Parameters----------X : catboost.Pool or list or numpy.ndarray or pandas.DataFrame or pandas.SeriesIf not catboost.Pool, 2 dimensional Feature matrix or string - file with dataset.y : list or numpy.ndarray or pandas.DataFrame or pandas.Series, optional (default=None)Labels, 1 dimensional array like.Use only if X is not catboost.Pool.cat_features : list or numpy.ndarray, optional (default=None)If not None, giving the list of Categ columns indices.Use only if X is not catboost.Pool.text_features : list or numpy.ndarray, optional (default=None)If not None, giving the list of Text columns indices.Use only if X is not catboost.Pool.embedding_features : list or numpy.ndarray, optional (default=None)If not None, giving the list of Embedding columns indices.Use only if X is not catboost.Pool.sample_weight : list or numpy.ndarray or pandas.DataFrame or pandas.Series, optional (default=None)Instance weights, 1 dimensional array like.baseline : list or numpy.ndarray, optional (default=None)If not None, giving 2 dimensional array like data.Use only if X is not catboost.Pool.use_best_model : bool, optional (default=None)Flag to use best modeleval_set : catboost.Pool or list, optional (default=None)A list of (X, y) tuple pairs to use as a validation set for early-stoppingmetric_period : intFrequency of evaluating metrics.verbose : bool or intIf verbose is bool, then if set to True, logging_level is set to Verbose,if set to False, logging_level is set to Silent.If verbose is int, it determines the frequency of writing metrics to output andlogging_level is set to Verbose.silent : boolIf silent is True, logging_level is set to Silent.If silent is False, logging_level is set to Verbose.logging_level : string, optional (default=None)Possible values:- 'Silent'- 'Verbose'- 'Info'- 'Debug'plot : bool, optional (default=False)If True, draw train and eval error in Jupyter notebookverbose_eval : bool or intSynonym for verbose. Only one of these parameters should be set.early_stopping_rounds : intActivates Iter overfitting detector with od_wait set to early_stopping_rounds.save_snapshot : bool, [default=None]Enable progress snapshotting for restoring progress after crashes or interruptionssnapshot_file : string, [default=None]Learn progress snapshot file path, if None will use default filenamesnapshot_interval: int, [default=600]Interval between saving snapshots (seconds)init_model : CatBoost class or string, [default=None]Continue training starting from the existing model.If this parameter is a string, load initial model from the path specified by this string.Returns-------model : CatBoost"""params = self._init_params.copy()_process_synonyms(params)if 'loss_function' in params:self._check_is_classification_objective(params['loss_function'])self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline, use_best_model, eval_set, verbose, logging_level, plot, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model)return selfdef predict(self, data, prediction_type='Class', ntree_start=0, ntree_end=0, thread_count=-1, verbose=None):"""Predict with data.Parameters----------data : catboost.Pool or list of features or list of lists or numpy.ndarray or pandas.DataFrame or pandas.Seriesor catboost.FeaturesDataData to apply model on.If data is a simple list (not list of lists) or a one-dimensional numpy.ndarray it is interpretedas a list of features for a single object.prediction_type : string, optional (default='Class')Can be:- 'RawFormulaVal' : return raw formula value.- 'Class' : return class label.- 'Probability' : return probability for every class.- 'LogProbability' : return log probability for every class.ntree_start: int, optional (default=0)Model is applied on the interval [ntree_start, ntree_end) (zero-based indexing).ntree_end: int, optional (default=0)Model is applied on the interval [ntree_start, ntree_end) (zero-based indexing).If value equals to 0 this parameter is ignored and ntree_end equal to tree_count_.thread_count : int (default=-1)The number of threads to use when applying the model.Allows you to optimize the speed of execution. This parameter doesn't affect results.If -1, then the number of threads is set to the number of CPU cores.verbose : bool, optional (default=False)If True, writes the evaluation metric measured set to stderr.Returns-------prediction:If data is for a single object, the return value depends on prediction_type value:- 'RawFormulaVal' : return raw formula value.- 'Class' : return class label.- 'Probability' : return one-dimensional numpy.ndarray with probability for every class.- 'LogProbability' : return one-dimensional numpy.ndarray withlog probability for every class.otherwise numpy.ndarray, with values that depend on prediction_type value:- 'RawFormulaVal' : one-dimensional array of raw formula value for each object.- 'Class' : one-dimensional array of class label for each object.- 'Probability' : two-dimensional numpy.ndarray with shape (number_of_objects x number_of_classes)with probability for every class for each object.- 'LogProbability' : two-dimensional numpy.ndarray with shape (number_of_objects x number_of_classes)with log probability for every class for each object."""return self._predict(data, prediction_type, ntree_start, ntree_end, thread_count, verbose, 'predict')def predict_proba(self, data, ntree_start=0, ntree_end=0, thread_count=-1, verbose=None):"""Predict class probability with data.Parameters----------data : catboost.Pool or list of features or list of lists or numpy.ndarray or pandas.DataFrame or pandas.Seriesor catboost.FeaturesDataData to apply model on.If data is a simple list (not list of lists) or a one-dimensional numpy.ndarray it is interpretedas a list of features for a single object.ntree_start: int, optional (default=0)Model is applied on the interval [ntree_start, ntree_end) (zero-based indexing).ntree_end: int, optional (default=0)Model is applied on the interval [ntree_start, ntree_end) (zero-based indexing).If value equals to 0 this parameter is ignored and ntree_end equal to tree_count_.thread_count : int (default=-1)The number of threads to use when applying the model.Allows you to optimize the speed of execution. This parameter doesn't affect results.If -1, then the number of threads is set to the number of CPU cores.verbose : boolIf True, writes the evaluation metric measured set to stderr.Returns-------prediction :If data is for a single objectreturn one-dimensional numpy.ndarray with probability for every class.otherwisereturn two-dimensional numpy.ndarray with shape (number_of_objects x number_of_classes)with probability for every class for each object."""return self._predict(data, 'Probability', ntree_start, ntree_end, thread_count, verbose, 'predict_proba')def predict_log_proba(self, data, ntree_start=0, ntree_end=0, thread_count=-1, verbose=None):"""Predict class log probability with data.Parameters----------data : catboost.Pool or list of features or list of lists or numpy.ndarray or pandas.DataFrame or pandas.Seriesor catboost.FeaturesDataData to apply model on.If data is a simple list (not list of lists) or a one-dimensional numpy.ndarray it is interpretedas a list of features for a single object.ntree_start: int, optional (default=0)Model is applied on the interval [ntree_start, ntree_end) (zero-based indexing).ntree_end: int, optional (default=0)Model is applied on the interval [ntree_start, ntree_end) (zero-based indexing).If value equals to 0 this parameter is ignored and ntree_end equal to tree_count_.thread_count : int (default=-1)The number of threads to use when applying the model.Allows you to optimize the speed of execution. This parameter doesn't affect results.If -1, then the number of threads is set to the number of CPU cores.verbose : boolIf True, writes the evaluation metric measured set to stderr.Returns-------prediction :If data is for a single objectreturn one-dimensional numpy.ndarray with log probability for every class.otherwisereturn two-dimensional numpy.ndarray with shape (number_of_objects x number_of_classes)with log probability for every class for each object."""return self._predict(data, 'LogProbability', ntree_start, ntree_end, thread_count, verbose, 'predict_log_proba')def staged_predict(self, data, prediction_type='Class', ntree_start=0, ntree_end=0, eval_period=1, thread_count=-1, verbose=None):"""Predict target at each stage for data.Parameters----------data : catboost.Pool or list of features or list of lists or numpy.ndarray or pandas.DataFrame or pandas.Seriesor catboost.FeaturesDataData to apply model on.If data is a simple list (not list of lists) or a one-dimensional numpy.ndarray it is interpretedas a list of features for a single object.prediction_type : string, optional (default='Class')Can be:- 'RawFormulaVal' : return raw formula value.- 'Class' : return class label.- 'Probability' : return probability for every class.- 'LogProbability' : return log probability for every class.ntree_start: int, optional (default=0)Model is applied on the interval [ntree_start, ntree_end) with the step eval_period (zero-based indexing).ntree_end: int, optional (default=0)Model is applied on the interval [ntree_start, ntree_end) with the step eval_period (zero-based indexing).If value equals to 0 this parameter is ignored and ntree_end equal to tree_count_.eval_period: int, optional (default=1)Model is applied on the interval [ntree_start, ntree_end) with the step eval_period (zero-based indexing).thread_count : int (default=-1)The number of threads to use when applying the model.Allows you to optimize the speed of execution. This parameter doesn't affect results.If -1, then the number of threads is set to the number of CPU cores.verbose : boolIf True, writes the evaluation metric measured set to stderr.Returns-------prediction : generator for each iteration that generates:If data is for a single object, the return value depends on prediction_type value:- 'RawFormulaVal' : return raw formula value.- 'Class' : return majority vote class.- 'Probability' : return one-dimensional numpy.ndarray with probability for every class.- 'LogProbability' : return one-dimensional numpy.ndarray withlog probability for every class.otherwise numpy.ndarray, with values that depend on prediction_type value:- 'RawFormulaVal' : one-dimensional array of raw formula value for each object.- 'Class' : one-dimensional array of class label for each object.- 'Probability' : two-dimensional numpy.ndarray with shape (number_of_objects x number_of_classes)with probability for every class for each object.- 'LogProbability' : two-dimensional numpy.ndarray with shape (number_of_objects x number_of_classes)with log probability for every class for each object."""return self._staged_predict(data, prediction_type, ntree_start, ntree_end, eval_period, thread_count, verbose, 'staged_predict')def staged_predict_proba(self, data, ntree_start=0, ntree_end=0, eval_period=1, thread_count=-1, verbose=None):"""Predict classification target at each stage for data.Parameters----------data : catboost.Pool or list of features or list of lists or numpy.ndarray or pandas.DataFrame or pandas.Seriesor catboost.FeaturesDataData to apply model on.If data is a simple list (not list of lists) or a one-dimensional numpy.ndarray it is interpretedas a list of features for a single object.ntree_start: int, optional (default=0)Model is applied on the interval [ntree_start, ntree_end) with the step eval_period (zero-based indexing).ntree_end: int, optional (default=0)Model is applied on the interval [ntree_start, ntree_end) with the step eval_period (zero-based indexing).If value equals to 0 this parameter is ignored and ntree_end equal to tree_count_.eval_period: int, optional (default=1)Model is applied on the interval [ntree_start, ntree_end) with the step eval_period (zero-based indexing).thread_count : int (default=-1)The number of threads to use when applying the model.Allows you to optimize the speed of execution. This parameter doesn't affect results.If -1, then the number of threads is set to the number of CPU cores.verbose : boolIf True, writes the evaluation metric measured set to stderr.Returns-------prediction : generator for each iteration that generates:If data is for a single objectreturn one-dimensional numpy.ndarray with probability for every class.otherwisereturn two-dimensional numpy.ndarray with shape (number_of_objects x number_of_classes)with probability for every class for each object."""return self._staged_predict(data, 'Probability', ntree_start, ntree_end, eval_period, thread_count, verbose, 'staged_predict_proba')def staged_predict_log_proba(self, data, ntree_start=0, ntree_end=0, eval_period=1, thread_count=-1, verbose=None):"""Predict classification target at each stage for data.Parameters----------data : catboost.Pool or list of features or list of lists or numpy.ndarray or pandas.DataFrame or pandas.Seriesor catboost.FeaturesDataData to apply model on.If data is a simple list (not list of lists) or a one-dimensional numpy.ndarray it is interpretedas a list of features for a single object.ntree_start: int, optional (default=0)Model is applied on the interval [ntree_start, ntree_end) with the step eval_period (zero-based indexing).ntree_end: int, optional (default=0)Model is applied on the interval [ntree_start, ntree_end) with the step eval_period (zero-based indexing).If value equals to 0 this parameter is ignored and ntree_end equal to tree_count_.eval_period: int, optional (default=1)Model is applied on the interval [ntree_start, ntree_end) with the step eval_period (zero-based indexing).thread_count : int (default=-1)The number of threads to use when applying the model.Allows you to optimize the speed of execution. This parameter doesn't affect results.If -1, then the number of threads is set to the number of CPU cores.verbose : boolIf True, writes the evaluation metric measured set to stderr.Returns-------prediction : generator for each iteration that generates:If data is for a single objectreturn one-dimensional numpy.ndarray with log probability for every class.otherwisereturn two-dimensional numpy.ndarray with shape (number_of_objects x number_of_classes)with log probability for every class for each object."""return self._staged_predict(data, 'LogProbability', ntree_start, ntree_end, eval_period, thread_count, verbose, 'staged_predict_log_proba')def score(self, X, y=None):"""Calculate accuracy.Parameters----------X : catboost.Pool or list or numpy.ndarray or pandas.DataFrame or pandas.SeriesData to apply model on.y : list or numpy.ndarrayTrue labels.Returns-------accuracy : float"""if isinstance(X, Pool):if y is not None:raise CatBoostError("Wrong initializing y: X is catboost.Pool object, y must be initialized inside catboost.Pool.")y = X.get_label()if y is None:raise CatBoostError("Label in X has not initialized.")if isinstance(y, DataFrame):if len(y.columns) != 1:raise CatBoostError("y is DataFrame and has {} columns, but must have exactly one.".format(len(y.columns)))y = y[y.columns[0]]elif y is None:raise CatBoostError("y should be specified.")y = np.array(y)predicted_classes = self._predict(X, prediction_type='Class', ntree_start=0, ntree_end=0, thread_count=-1, verbose=None, parent_method_name='score').reshape(-1)if np.issubdtype(predicted_classes.dtype, np.number):if np.issubdtype(y.dtype, np.character):raise CatBoostError('predicted classes have numeric type but specified y contains strings')elif np.issubdtype(y.dtype, np.number):raise CatBoostError('predicted classes have string type but specified y is numeric')elif np.issubdtype(y.dtype, np.bool_):raise CatBoostError('predicted classes have string type but specified y is boolean')return np.mean(np.array(predicted_classes) == np.array(y))def _check_is_classification_objective(self, loss_function):if isinstance(loss_function, str) and not self._is_classification_objective(loss_function):raise CatBoostError("Invalid loss_function='{}': for classifier use ""Logloss, CrossEntropy, MultiClass, MultiClassOneVsAll or custom objective object".format(loss_function))

ML之CatboostC：基于titanic泰坦尼克数据集利用catboost算法实现二分类相关推荐

XAI之ALE：基于titanic泰坦尼克数据集对RF算法利用ALE累积局部效应图可视化算法进而实现模型可解释性案例
XAI之ALE:基于titanic泰坦尼克数据集对RF算法利用ALE累积局部效应图可视化算法进而实现模型可解释性案例目录基于titanic泰坦尼克数据集对RF算法利用ALE累积局部效应图可视化算法 ...
ML之PDP：基于titanic泰坦尼克是否获救二分类预测数据集利用PDP部分依赖图对RF随机森林实现模型可解释性案例
ML之PDP:基于titanic泰坦尼克是否获救二分类预测数据集利用PDP部分依赖图对RF随机森林实现模型可解释性案例目录基于titanic泰坦尼克是否获救二分类预测数据集利用PDP部分依赖图对R ...
ML之interpret：基于titanic泰坦尼克是否获救二分类预测数据集利用interpret实现EBC模型可解释性之全局解释/局部解释案例
ML之interpret:基于titanic泰坦尼克是否获救二分类预测数据集利用interpret实现EBC模型可解释性之全局解释/局部解释案例目录基于titanic泰坦尼克是否获救二分类预测数据 ...
ML之yellowbrick：基于titanic泰坦尼克是否获救二分类预测数据集利用yellowbrick对LoR逻辑回归模型实现可解释性(阈值图)案例
ML之yellowbrick:基于titanic泰坦尼克是否获救二分类预测数据集利用yellowbrick对LoR逻辑回归模型实现可解释性(阈值图)案例目录基于titanic泰坦尼克是否获救二分类 ...
ML之catboost：基于自定义数据集利用catboost 算法实现回归预测(训练采用CPU和GPU两种方式)
ML之catboost:基于自定义数据集利用catboost 算法实现回归预测(训练采用CPU和GPU两种方式) 目录基于自定义数据集利用catboost 算法实现回归预测(训练采用CPU和GPU两 ...
Titanic 泰坦尼克数据集特征工程机器学习建模
以下内容为讲课时使用到的泰坦尼克数据集分析.建模过程,整体比较完整,分享出来,希望能帮助大家.部分内容由于版本问题,可能无法顺利运行. Table of Contents 1 经典又有趣的Titan ...
Dataset：titanic泰坦尼克号数据集/泰坦尼克数据集(是否获救二分类预测)的简介、下载、案例应用之详细攻略
Dataset:titanic泰坦尼克号数据集/泰坦尼克数据集(是否获救二分类预测)的简介.下载.案例应用之详细攻略目录 titanic(泰坦尼克号)数据集的简介 1.titanic数据集各字段描述 ...
泰坦尼克数据集预测分析_探索性数据分析—以泰坦尼克号数据集为例（第1部分）
泰坦尼克数据集预测分析 Imagine your group of friends have decided to spend the vacations by travelling to an am ...
泰坦尼克数据集预测分析_探索性数据分析-泰坦尼克号数据集案例研究（第二部分）
泰坦尼克数据集预测分析 Data is simply useless until you don't know what it's trying to tell you. 除非您不知道数据在试图告诉您 ...

ML之CatboostC：基于titanic泰坦尼克数据集利用catboost算法实现二分类

基于titanic泰坦尼克数据集利用catboost算法实现二分类

设计思路

输出结果

核心代码

ML之CatboostC：基于titanic泰坦尼克数据集利用catboost算法实现二分类相关推荐

最新文章

热门文章