ML之Xgboost：利用Xgboost模型对数据集(比马印第安人糖尿病)进行二分类预测(5年内是否患糖尿病)

输出结果

设计思路

核心代码

输出结果

X_train内容：
[[  3.    102.     44.    ...  30.8     0.4    26.   ][  1.     77.     56.    ...  33.3     1.251  24.   ][  9.    124.     70.    ...  35.4     0.282  34.   ]...[  0.     57.     60.    ...  21.7     0.735  67.   ][  1.    105.     58.    ...  24.3     0.187  21.   ][  8.    179.     72.    ...  32.7     0.719  36.   ]]y_train内容：
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 1.0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0.1. 0. 0. 1. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 1. 0. 1. 0. 0. 0. 1. 0. 1. 1.1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 1. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0.0. 1. 1. 0. 0. 1. 0. 0. 1. 0. 1. 1. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 1.0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0.0. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 1. 1. 1. 1. 0. 1.0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0. 0. 1. 0. 1. 1. 1. 0. 1. 0. 1. 1. 0. 0.0. 0. 1. 1. 0. 1. 1. 1. 0. 0. 1. 0. 1. 0. 1. 0. 0. 1. 1. 0. 1. 1. 1. 1.0. 0. 0. 0. 0. 1. 1. 1. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0.0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 0. 0.0. 1. 1. 0. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 1.1. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 1. 1. 1. 0. 1. 0. 0. 1. 0. 0. 1. 0. 1.1. 0. 1. 0. 0. 1. 1. 1. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0.1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 1.0. 1. 0. 0. 0. 1. 1. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0.0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 1. 1. 1. 1. 0. 0. 1.1. 0. 0. 0. 1. 1. 1. 0. 0. 0. 1. 1. 0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0.1. 0. 1. 0. 0. 1. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0. 1. 1.0. 1. 0. 0. 0. 1. 1. 0. 0. 1.]

设计思路

核心代码

class XGBClassifier Found at: xgboost.sklearnclass XGBClassifier(XGBModel, XGBClassifierBase):# pylint: disable=missing-docstring,too-many-arguments,invalid-name__doc__ = "Implementation of the scikit-learn API for XGBoost classification.\n\n" + '\n'.join(XGBModel.__doc__.split('\n')[2:])def __init__(self, max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective="binary:logistic", booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs):super(XGBClassifier, self).__init__(max_depth, learning_rate, n_estimators, silent, objective, booster, n_jobs, nthread, gamma, min_child_weight, max_delta_step, subsample, colsample_bytree, colsample_bylevel, reg_alpha, reg_lambda, scale_pos_weight, base_score, random_state, seed, missing, **kwargs)def fit(self, X, y, sample_weight=None, eval_set=None, eval_metric=None, early_stopping_rounds=None, verbose=True, xgb_model=None, sample_weight_eval_set=None, callbacks=# pylint: disable = attribute-defined-outside-init,arguments-differNone):"""Fit gradient boosting classifierParameters----------X : array_likeFeature matrixy : array_likeLabelssample_weight : array_likeWeight for each instanceeval_set : list, optionalA list of (X, y) pairs to use as a validation set forearly-stoppingsample_weight_eval_set : list, optionalA list of the form [L_1, L_2, ..., L_n], where each L_i is a list ofinstance weights on the i-th validation set.eval_metric : str, callable, optionalIf a str, should be a built-in evaluation metric to use. Seedoc/parameter.rst. If callable, a custom evaluation metric. The callsignature is func(y_predicted, y_true) where y_true will be aDMatrix object such that you may need to call the get_labelmethod. It must return a str, value pair where the str is a namefor the evaluation and value is the value of the evaluationfunction. This objective is always minimized.early_stopping_rounds : int, optionalActivates early stopping. Validation error needs to decrease atleast every <early_stopping_rounds> round(s) to continue training.Requires at least one item in evals. If there's more than one,will use the last. If early stopping occurs, the model will havethree additional fields: bst.best_score, bst.best_iteration andbst.best_ntree_limit (bst.best_ntree_limit is the ntree_limit parameterdefault value in predict method if not any other value is specified).(Use bst.best_ntree_limit to get the correct value if num_parallel_treeand/or num_class appears in the parameters)verbose : boolIf `verbose` and an evaluation set is used, writes the evaluationmetric measured on the validation set to stderr.xgb_model : strfile name of stored xgb model or 'Booster' instance Xgb model to beloaded before training (allows training continuation).callbacks : list of callback functionsList of callback functions that are applied at end of each iteration.It is possible to use predefined callbacks by using :ref:`callback_api`.Example:.. code-block:: python[xgb.callback.reset_learning_rate(custom_rates)]"""evals_result = {}self.classes_ = np.unique(y)self.n_classes_ = len(self.classes_)xgb_options = self.get_xgb_params()if callable(self.objective):obj = _objective_decorator(self.objective)# Use default value. Is it really not used ?xgb_options["objective"] = "binary:logistic"else:obj = Noneif self.n_classes_ > 2:# Switch to using a multiclass objective in the underlying XGB instancexgb_options["objective"] = "multi:softprob"xgb_options['num_class'] = self.n_classes_feval = eval_metric if callable(eval_metric) else Noneif eval_metric is not None:if callable(eval_metric):eval_metric = Noneelse:xgb_options.update({"eval_metric":eval_metric})self._le = XGBLabelEncoder().fit(y)training_labels = self._le.transform(y)if eval_set is not None:if sample_weight_eval_set is None:sample_weight_eval_set = [None] * len(eval_set)evals = list(DMatrix(eval_set[i][0], label=self._le.transform(eval_set[i][1]), missing=self.missing, weight=sample_weight_eval_set[i], nthread=self.n_jobs) for i in range(len(eval_set)))nevals = len(evals)eval_names = ["validation_{}".format(i) for i in range(nevals)]evals = list(zip(evals, eval_names))else:evals = ()self._features_count = X.shape[1]if sample_weight is not None:train_dmatrix = DMatrix(X, label=training_labels, weight=sample_weight, missing=self.missing, nthread=self.n_jobs)else:train_dmatrix = DMatrix(X, label=training_labels, missing=self.missing, nthread=self.n_jobs)self._Booster = train(xgb_options, train_dmatrix, self.n_estimators, evals=evals, early_stopping_rounds=early_stopping_rounds, evals_result=evals_result, obj=obj, feval=feval, verbose_eval=verbose, xgb_model=xgb_model, callbacks=callbacks)self.objective = xgb_options["objective"]if evals_result:for val in evals_result.items():evals_result_key = list(val[1].keys())[0]evals_result[val[0]][evals_result_key] = val[1][evals_result_key]self.evals_result_ = evals_resultif early_stopping_rounds is not None:self.best_score = self._Booster.best_scoreself.best_iteration = self._Booster.best_iterationself.best_ntree_limit = self._Booster.best_ntree_limitreturn selfdef predict(self, data, output_margin=False, ntree_limit=None, validate_features=True):"""Predict with `data`... note:: This function is not thread safe.For each booster object, predict can only be called from one thread.If you want to run prediction using multiple thread, call ``xgb.copy()`` to make copiesof model object and then call ``predict()``... note:: Using ``predict()`` with DART boosterIf the booster object is DART type, ``predict()`` will perform dropouts, i.e. onlysome of the trees will be evaluated. This will produce incorrect results if ``data`` isnot the training data. To obtain correct results on test sets, set ``ntree_limit`` toa nonzero value, e.g... code-block:: pythonpreds = bst.predict(dtest, ntree_limit=num_round)Parameters----------data : DMatrixThe dmatrix storing the input.output_margin : boolWhether to output the raw untransformed margin value.ntree_limit : intLimit number of trees in the prediction; defaults to best_ntree_limit if defined(i.e. it has been trained with early stopping), otherwise 0 (use all trees).validate_features : boolWhen this is True, validate that the Booster's and data's feature_names are identical.Otherwise, it is assumed that the feature_names are the same.Returns-------prediction : numpy array"""test_dmatrix = DMatrix(data, missing=self.missing, nthread=self.n_jobs)if ntree_limit is None:ntree_limit = getattr(self, "best_ntree_limit", 0)class_probs = self.get_booster().predict(test_dmatrix, output_margin=output_margin, ntree_limit=ntree_limit, validate_features=validate_features)if output_margin:# If output_margin is active, simply return the scoresreturn class_probsif len(class_probs.shape) > 1:column_indexes = np.argmax(class_probs, axis=1)else:column_indexes = np.repeat(0, class_probs.shape[0])column_indexes[class_probs > 0.5] = 1return self._le.inverse_transform(column_indexes)def predict_proba(self, data, ntree_limit=None, validate_features=True):"""Predict the probability of each `data` example being of a given class... note:: This function is not thread safeFor each booster object, predict can only be called from one thread.If you want to run prediction using multiple thread, call ``xgb.copy()`` to make copiesof model object and then call predictParameters----------data : DMatrixThe dmatrix storing the input.ntree_limit : intLimit number of trees in the prediction; defaults to best_ntree_limit if defined(i.e. it has been trained with early stopping), otherwise 0 (use all trees).validate_features : boolWhen this is True, validate that the Booster's and data's feature_names are identical.Otherwise, it is assumed that the feature_names are the same.Returns-------prediction : numpy arraya numpy array with the probability of each data example being of a given class."""test_dmatrix = DMatrix(data, missing=self.missing, nthread=self.n_jobs)if ntree_limit is None:ntree_limit = getattr(self, "best_ntree_limit", 0)class_probs = self.get_booster().predict(test_dmatrix, ntree_limit=ntree_limit, validate_features=validate_features)if self.objective == "multi:softprob":return class_probselse:classone_probs = class_probsclasszero_probs = 1.0 - classone_probsreturn np.vstack((classzero_probs, classone_probs)).transpose()def evals_result(self):"""Return the evaluation results.If **eval_set** is passed to the `fit` function, you can call``evals_result()`` to get evaluation results for all passed **eval_sets**.When **eval_metric** is also passed to the `fit` function, the**evals_result** will contain the **eval_metrics** passed to the `fit` function.Returns-------evals_result : dictionaryExample-------.. code-block:: pythonparam_dist = {'objective':'binary:logistic', 'n_estimators':2}clf = xgb.XGBClassifier(**param_dist)clf.fit(X_train, y_train,eval_set=[(X_train, y_train), (X_test, y_test)],eval_metric='logloss',verbose=True)evals_result = clf.evals_result()The variable **evals_result** will contain.. code-block:: python{'validation_0': {'logloss': ['0.604835', '0.531479']},'validation_1': {'logloss': ['0.41965', '0.17686']}}"""if self.evals_result_:evals_result = self.evals_result_else:raise XGBoostError('No results.')return evals_result

ML之Xgboost：利用Xgboost模型对数据集(比马印第安人糖尿病)进行二分类预测(5年内是否患糖尿病)相关推荐

ML之Xgboost：利用Xgboost模型(7f-CrVa+网格搜索调参)对数据集(比马印第安人糖尿病)进行二分类预测
ML之Xgboost:利用Xgboost模型(7f-CrVa+网格搜索调参)对数据集(比马印第安人糖尿病)进行二分类预测目录输出结果设计思路核心代码输出结果设计思路核心代码 grid_s ...
Keras之MLP：利用MLP【Input(8)→(12)(relu)→O(sigmoid+二元交叉)】模型实现预测新数据(利用糖尿病数据集的八个特征实现二分类预测
Keras之MLP:利用MLP[Input(8)→(12)(relu)→O(sigmoid+二元交叉)]模型实现预测新数据(利用糖尿病数据集的八个特征实现二分类预测目录输出结果实现代码输出结果 ...
Keras之DNN：利用DNN【Input(8)→(12+8)(relu)→O(sigmoid)】模型实现预测新数据(利用糖尿病数据集的八个特征进行二分类预测
Keras之DNN:利用DNN[Input(8)→(12+8)(relu)→O(sigmoid)]模型实现预测新数据(利用糖尿病数据集的八个特征进行二分类预测目录输出结果设计思路实现代码输出 ...
NLP之NBGBT：基于朴素贝叶斯(count/tfidf+网格搜索+4fCrva)、梯度提升树(w2c+网格搜索+4fCrva)算法对IMDB影评数据集进行文本情感分析(情感二分类预测)
NLP之NB&GBT:基于朴素贝叶斯(count/tfidf+网格搜索+4fCrva).梯度提升树(w2c+网格搜索+4fCrva)算法对IMDB影评数据集进行文本情感分析(情感二分类预测) ...
ML之xgboost：利用xgboost算法对breast_cancer数据集实现二分类预测并进行graphviz二叉树节点图可视化
ML之xgboost:利用xgboost算法对breast_cancer数据集实现二分类预测并进行graphviz二叉树节点图可视化目录实现结果实现代码实现结果
ML之xgboost：基于xgboost(5f-CrVa)算法对HiggsBoson数据集(Kaggle竞赛)训练实现二分类预测(基于训练好的模型进行新数据预测)
ML之xgboost:基于xgboost(5f-CrVa)算法对HiggsBoson数据集(Kaggle竞赛)训练实现二分类预测(基于训练好的模型进行新数据预测) 目录输出结果设计思路核心代码 ...
ML之xgboost：基于xgboost(5f-CrVa)算法对HiggsBoson数据集(Kaggle竞赛)训练(模型保存+可视化)实现二分类预测
ML之xgboost:基于xgboost(5f-CrVa)算法对HiggsBoson数据集(Kaggle竞赛)训练(模型保存+可视化)实现二分类预测目录数据集简介输出结果设计思路核心代码数 ...
ML之xgboost：利用xgboost算法(自带,特征重要性可视化+且作为阈值训练模型)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测)
ML之xgboost:利用xgboost算法(自带,特征重要性可视化+且作为阈值训练模型)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测) 目录输出结果 ...
ML之xgboost：利用xgboost算法(sklearn+GridSearchCV)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测)
ML之xgboost:利用xgboost算法(sklearn+GridSearchCV)训练mushroom蘑菇数据集(22+1,6513+1611)来预测蘑菇是否毒性(二分类预测) 目录输出结果 ...

ML之Xgboost：利用Xgboost模型对数据集(比马印第安人糖尿病)进行二分类预测(5年内是否患糖尿病)

输出结果

设计思路

核心代码

ML之Xgboost：利用Xgboost模型对数据集(比马印第安人糖尿病)进行二分类预测(5年内是否患糖尿病)相关推荐

最新文章

热门文章