ML之sklearn:sklearn的RobustScaler函数、KFold函数、cross_val_score函数的代码解释、使用方法之详细攻略
ML之sklearn:sklearn的RobustScaler函数、KFold函数、cross_val_score函数的代码解释、使用方法之详细攻略
目录
sklearn的RobustScaler函数的代码解释、使用方法
RobustScaler函数的代码解释
RobustScaler函数的使用方法
sklearn的KFold函数的代码解释、使用方法
KFold函数的代码解释
KFold函数的使用方法
sklearn的cross_val_score函数的代码解释、使用方法
cross_val_score函数的代码解释
scoring参数可选的对象
cross_val_score函数的使用方法
1、分类预测——糖尿病
2、分类预测——iris鸢尾花
sklearn的RobustScaler函数的代码解释、使用方法
RobustScaler函数的代码解释
class RobustScaler(BaseEstimator, TransformerMixin): This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). Standardization of a dataset is a common requirement for many machine learning estimators. Typically this is done by removing the mean and scaling to unit variance. However, outliers can often influence the sample mean / variance in a negative way. In such cases, the median and the interquartile range often give better results. .. versionadded:: 0.17 Read more in the :ref:`User Guide <preprocessing_scaler>`. Parameters with_scaling : boolean, True by default quantile_range : tuple (q_min, q_max), 0.0 < q_min < q_max < 100.0 .. versionadded:: 0.18 copy : boolean, optional, default is True Attributes scale_ : array of floats .. versionadded:: 0.17 See also :class:`sklearn.decomposition.PCA` Notes https://en.wikipedia.org/wiki/Median_(statistics) |
使用对离群值稳健的统计数据来衡量特征。 这个标量去除中值,并根据分位数范围(默认为IQR:四分位数范围)对数据进行缩放。 数据集的标准化是许多机器学习估计器的常见需求。这通常是通过去除平均值和缩放到单位方差来实现的。然而,异常值往往会对样本均值/方差产生负面影响。在这种情况下,中位数和四分位范围通常会得到更好的结果。 . .versionadded:: 0.17 详见:ref: ' User Guide '。</preprocessing_scaler> 参数 with_scaling:布尔值,默认为True quantile_range:元组(q_min, q_max), 0.0 < q_min < q_max < 100.0 . .versionadded:: 0.18 布尔值,可选,默认为真 属性 浮点数数组 . .versionadded:: 0.17 另请参阅 类:“sklearn.decomposition.PCA” 笔记 https://en.wikipedia.org/wiki/Median_(统计) |
def __init__(self, with_centering=True, with_scaling=True, def _check_array(self, X, copy): if sparse.issparse(X): def fit(self, X, y=None): Parameters if self.with_scaling: q = np.percentile(X, self.quantile_range, axis=0) def transform(self, X): Can be called on sparse input, provided that ``RobustScaler`` has been Parameters if sparse.issparse(X): def inverse_transform(self, X): Parameters if sparse.issparse(X): |
RobustScaler函数的使用方法
lasso = make_pipeline(RobustScaler(), Lasso(alpha =0.5, random_state=1))
ENet = make_pipeline(RobustScaler(), ElasticNet(alpha=0.5, l1_ratio=.9, random_state=3))
sklearn的KFold函数的代码解释、使用方法
KFold函数的代码解释
class KFold Found at: sklearn.model_selection._split class KFold(_BaseKFold): |
在:sklearn.model_select ._split找到的类KFold 类KFold (_BaseKFold): shuffle :布尔型,可选 random_state :int, RandomState实例或None,可选, |
Examples -------- >>> from sklearn.model_selection import KFold >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([1, 2, 3, 4]) >>> kf = KFold(n_splits=2) >>> kf.get_n_splits(X) 2 >>> print(kf) # doctest: +NORMALIZE_WHITESPACE KFold(n_splits=2, random_state=None, shuffle=False) >>> for train_index, test_index in kf.split(X): ... print("TRAIN:", train_index, "TEST:", test_index) ... X_train, X_test = X[train_index], X[test_index] ... y_train, y_test = y[train_index], y[test_index] TRAIN: [2 3] TEST: [0 1] TRAIN: [0 1] TEST: [2 3] Notes ----- The first ``n_samples % n_splits`` folds have size ``n_samples // n_splits + 1``, other folds have size ``n_samples // n_splits``, where ``n_samples`` is the number of samples. See also -------- StratifiedKFold Takes group information into account to avoid building folds with imbalanced class distributions (for binary or multiclass classification tasks). GroupKFold: K-fold iterator variant with non-overlapping groups. RepeatedKFold: Repeats K-Fold n times. """ |
另请参阅 -------- StratifiedKFold 考虑组信息,以避免构建不平衡的类分布的折叠(对于二进制或多类分类任务)。 GroupKFold:不重叠组的K-fold迭代器变体。 RepeatedKFold:重复K-Fold n次。 ”“” |
def __init__(self, n_splits=3, shuffle=False, random_state=None): super(KFold, self).__init__(n_splits, shuffle, random_state) def _iter_test_indices(self, X, y=None, groups=None): n_samples = _num_samples(X) indices = np.arange(n_samples) if self.shuffle: check_random_state(self.random_state).shuffle(indices) n_splits = self.n_splits fold_sizes = (n_samples // n_splits) * np.ones(n_splits, dtype=np. int) fold_sizes[:n_samples % n_splits] += 1 current = 0 for fold_size in fold_sizes: start, stop = current, current + fold_size yield indices[start:stop] current = stop |
KFold函数的使用方法
Examples-------->>> from sklearn.model_selection import KFold>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])>>> y = np.array([1, 2, 3, 4])>>> kf = KFold(n_splits=2)>>> kf.get_n_splits(X)2>>> print(kf) # doctest: +NORMALIZE_WHITESPACEKFold(n_splits=2, random_state=None, shuffle=False)>>> for train_index, test_index in kf.split(X):... print("TRAIN:", train_index, "TEST:", test_index)... X_train, X_test = X[train_index], X[test_index]... y_train, y_test = y[train_index], y[test_index]TRAIN: [2 3] TEST: [0 1]TRAIN: [0 1] TEST: [2 3]
sklearn的cross_val_score函数的代码解释、使用方法
cross_val_score函数的代码解释
def cross_val_score Found at: sklearn.model_selection._validation def cross_val_score(estimator, X, y=None, groups=None, scoring=None, cv=None, n_jobs=1, verbose=0, fit_params=None, pre_dispatch='2*n_jobs'): |
通过交叉验证来评估一个分数 更多信息参见:ref: ' User Guide '。 |
Parameters ---------- estimator : estimator object implementing 'fit' The object to use to fit the data. X : array-like The data to fit. Can be for example a list, or an array. y : array-like, optional, default: None The target variable to try to predict in the case of supervised learning. groups : array-like, with shape (n_samples,), optional Group labels for the samples used while splitting the dataset into train/test set. scoring : string, callable or None, optional, default: None A string (see model evaluation documentation) or a scorer callable object / function with signature ``scorer(estimator, X, y)``. cv : int, cross-validation generator or an iterable, optional Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 3-fold cross validation, - integer, to specify the number of folds in a `(Stratified)KFold`, - An object to be used as a cross-validation generator. - An iterable yielding train, test splits. For integer/None inputs, if the estimator is a classifier and ``y`` is either binary or multiclass, :class:`StratifiedKFold` is used. In all other cases, :class:`KFold` is used. Refer :ref:`User Guide <cross_validation>` for the various cross-validation strategies that can be used here. n_jobs : integer, optional The number of CPUs to use to do the computation. -1 means 'all CPUs'. verbose : integer, optional The verbosity level. fit_params : dict, optional Parameters to pass to the fit method of the estimator. pre_dispatch : int, or string, optional Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: - None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs - An int, giving the exact number of total jobs that are spawned - A string, giving an expression as a function of n_jobs, as in '2*n_jobs' Returns ------- scores : array of float, shape=(len(list(cv)),) Array of scores of the estimator for each run of the cross validation. |
参数 ---------- estimator:实现“适合”对象以适合数据。 X:数组类 需要匹配的数据。可以是列表,也可以是数组。 y : 类似数组,可选,默认:无 在监督学习的情况下,预测的目标变量。 groups : 类数组,形状(n_samples,),可选 将数据集分割为训练/测试集时使用的样本的标签分组。 scoring : 字符串,可调用或无,可选,默认:无 一个字符串(参见模型评估文档)或签名为' ' scorer(estimator, X, y) ' '的scorer可调用对象/函数。 cv : int,交叉验证生成器或可迭代,可选 确定交叉验证分割策略。 cv可能的输入有: -无,使用默认的三折交叉验证, -整数,用于指定“(分层的)KFold”中的折叠数, -用作交叉验证生成器的对象。 -一个可迭代产生的序列,测试分裂。 对于整数/无输入,如果估计器是一个分类器,并且' ' y ' '是二进制的或多类的,则使用:class: ' StratifiedKFold '。在所有其他情况下,使用:class: ' KFold '。 请参考:ref: ' User Guide ',了解可以在这里使用的各种交叉验证策略。 n_jobs:整数,可选 用于进行计算的cpu数量。-1表示“所有cpu”。 verbose:整数,可选 冗长的水平。 fit_params :dict,可选 参数传递给估计器的拟合方法。 pre_dispatch: int或string,可选 控制并行执行期间分派的作业数量。当分配的作业多于cpu能够处理的任务时,减少这个数量有助于避免内存消耗激增。该参数可以为: -无,在这种情况下,立即创建并派生所有作业。将此用于轻量级和快速运行的作业,以避免由于按需生成作业而造成的延迟 -一个int,给出生成的作业的确切总数 一个字符串,给出一个作为n_jobs函数的表达式,如'2*n_jobs' 返回 ------- (len(list(cv)),) 交叉验证的每次运行估计器的分数数组。 |
Examples -------- >>> from sklearn import datasets, linear_model >>> from sklearn.model_selection import cross_val_score >>> diabetes = datasets.load_diabetes() >>> X = diabetes.data[:150] >>> y = diabetes.target[:150] >>> lasso = linear_model.Lasso() >>> print(cross_val_score(lasso, X, y)) # doctest: +ELLIPSIS [ 0.33150734 0.08022311 0.03531764] See Also --------- :func:`sklearn.model_selection.cross_validate`: To run cross-validation on multiple metrics and also to return train scores, fit times and score times. :func:`sklearn.metrics.make_scorer`: Make a scorer from a performance metric or loss function. """ # To ensure multimetric format is not supported scorer = check_scoring(estimator, scoring=scoring) cv_results = cross_validate(estimator=estimator, X=X, y=y, groups=groups, scoring={'score':scorer}, cv=cv, return_train_score=False, n_jobs=n_jobs, verbose=verbose, fit_params=fit_params, pre_dispatch=pre_dispatch) return cv_results['test_score'] |
另请参阅 --------- :func:“sklearn.model_selection.cross_validate”: 在多个指标上进行交叉验证,并返回训练分数、适应时间和得分时间。 :func:“sklearn.metrics.make_scorer”: ”“” |
scoring参数可选的对象
3.3. Metrics and scoring: quantifying the quality of predictions — scikit-learn 1.2.2 documentation
Scoring |
Function |
Comment |
---|---|---|
Classification |
||
‘accuracy’ |
metrics.accuracy_score |
|
‘balanced_accuracy’ |
metrics.balanced_accuracy_score |
|
‘average_precision’ |
metrics.average_precision_score |
|
‘neg_brier_score’ |
metrics.brier_score_loss |
|
‘f1’ |
metrics.f1_score |
for binary targets |
‘f1_micro’ |
metrics.f1_score |
micro-averaged |
‘f1_macro’ |
metrics.f1_score |
macro-averaged |
‘f1_weighted’ |
metrics.f1_score |
weighted average |
‘f1_samples’ |
metrics.f1_score |
by multilabel sample |
‘neg_log_loss’ |
metrics.log_loss |
requires |
‘precision’ etc. |
metrics.precision_score |
suffixes apply as with ‘f1’ |
‘recall’ etc. |
metrics.recall_score |
suffixes apply as with ‘f1’ |
‘jaccard’ etc. |
metrics.jaccard_score |
suffixes apply as with ‘f1’ |
‘roc_auc’ |
metrics.roc_auc_score |
|
‘roc_auc_ovr’ |
metrics.roc_auc_score |
|
‘roc_auc_ovo’ |
metrics.roc_auc_score |
|
‘roc_auc_ovr_weighted’ |
metrics.roc_auc_score |
|
‘roc_auc_ovo_weighted’ |
metrics.roc_auc_score |
|
Clustering |
||
‘adjusted_mutual_info_score’ |
metrics.adjusted_mutual_info_score |
|
‘adjusted_rand_score’ |
metrics.adjusted_rand_score |
|
‘completeness_score’ |
metrics.completeness_score |
|
‘fowlkes_mallows_score’ |
metrics.fowlkes_mallows_score |
|
‘homogeneity_score’ |
metrics.homogeneity_score |
|
‘mutual_info_score’ |
metrics.mutual_info_score |
|
‘normalized_mutual_info_score’ |
metrics.normalized_mutual_info_score |
|
‘v_measure_score’ |
metrics.v_measure_score |
|
Regression |
||
‘explained_variance’ |
metrics.explained_variance_score |
|
‘max_error’ |
metrics.max_error |
|
‘neg_mean_absolute_error’ |
metrics.mean_absolute_error |
|
‘neg_mean_squared_error’ |
metrics.mean_squared_error |
|
‘neg_root_mean_squared_error’ |
metrics.mean_squared_error |
|
‘neg_mean_squared_log_error’ |
metrics.mean_squared_log_error |
|
‘neg_median_absolute_error’ |
metrics.median_absolute_error |
|
‘r2’ |
metrics.r2_score |
|
‘neg_mean_poisson_deviance’ |
metrics.mean_poisson_deviance |
|
‘neg_mean_gamma_deviance’ |
metrics.mean_gamma_deviance |
cross_val_score函数的使用方法
1、分类预测——糖尿病
>>> from sklearn import datasets, linear_model>>> from sklearn.model_selection import cross_val_score>>> diabetes = datasets.load_diabetes()>>> X = diabetes.data[:150]>>> y = diabetes.target[:150]>>> lasso = linear_model.Lasso()>>> print(cross_val_score(lasso, X, y)) # doctest: +ELLIPSIS[ 0.33150734 0.08022311 0.03531764]
2、分类预测——iris鸢尾花
from sklearn import datasets #自带数据集
from sklearn.model_selection import train_test_split,cross_val_score #划分数据 交叉验证
from sklearn.neighbors import KNeighborsClassifier #一个简单的模型,只有K一个参数,类似K-means
import matplotlib.pyplot as plt
iris = datasets.load_iris() #加载sklearn自带的数据集
X = iris.data #这是数据
y = iris.target #这是每个数据所对应的标签
train_X,test_X,train_y,test_y = train_test_split(X,y,test_size=1/3,random_state=3) #这里划分数据以1/3的来划分 训练集训练结果 测试集测试结果
k_range = range(1,31)
cv_scores = [] #用来放每个模型的结果值
for n in k_range:knn = KNeighborsClassifier(n) #knn模型,这里一个超参数可以做预测,当多个超参数时需要使用另一种方法GridSearchCVscores = cross_val_score(knn,train_X,train_y,cv=10,scoring='accuracy') #cv:选择每次测试折数 accuracy:评价指标是准确度,可以省略使用默认值,具体使用参考下面。cv_scores.append(scores.mean())
plt.plot(k_range,cv_scores)
plt.xlabel('K')
plt.ylabel('Accuracy') #通过图像选择最好的参数
plt.show()
best_knn = KNeighborsClassifier(n_neighbors=3) # 选择最优的K=3传入模型
best_knn.fit(train_X,train_y) #训练模型
print(best_knn.score(test_X,test_y)) #看看评分
ML之sklearn:sklearn的RobustScaler函数、KFold函数、cross_val_score函数的代码解释、使用方法之详细攻略相关推荐
- sklearn之XGBModel:XGBModel之feature_importances_、plot_importance的简介、使用方法之详细攻略
sklearn之XGBModel:XGBModel之feature_importances_.plot_importance的简介.使用方法之详细攻略 目录 feature_importances_ ...
- ML之sklearn:sklearn的make_pipeline函数、RobustScaler函数、KFold函数、cross_val_score函数的代码解释、使用方法之详细攻略
ML之sklearn:sklearn的make_pipeline函数.RobustScaler函数.KFold函数.cross_val_score函数的代码解释.使用方法之详细攻略 目录 sklear ...
- Python之 sklearn:sklearn中的RobustScaler 函数的简介及使用方法之详细攻略
Python之 sklearn:sklearn中的RobustScaler 函数的简介及使用方法之详细攻略 目录 sklearn中的RobustScaler 函数的简介及使用方法 sklearn中的R ...
- ML之sklearn:sklearn.linear_mode中的LogisticRegression函数的简介、使用方法之详细攻略
ML之sklearn:sklearn.linear_mode中的LogisticRegression函数的简介.使用方法之详细攻略 目录 sklearn.linear_mode中的LogisticRe ...
- Python之sklearn:GridSearchCV()和fit()函数的简介、具体案例、使用方法之详细攻略
Python之sklearn:GridSearchCV()和fit()函数的简介.具体案例.使用方法之详细攻略 目录 GridSearchCV()和fit()函数的使用方法 GridSearchCV( ...
- sklearn:sklearn.GridSearchCV函数的简介、使用方法之详细攻略
sklearn:sklearn.GridSearchCVl函数的简介.使用方法之详细攻略 目录 sklearn.GridSearchCV函数的简介 1.参数说明 2.功能代码 sklearn.Grid ...
- sklearn:sklearn.preprocessing.StandardScaler函数的fit_transform、transform、inverse_transform简介、使用方法之详细攻略
sklearn:sklearn.preprocessing.StandardScaler函数的fit_transform.transform.inverse_transform简介.使用方法之详细攻略 ...
- Python之 sklearn:sklearn.preprocessing中的StandardScaler函数的简介及使用方法之详细攻略
Python之 sklearn:sklearn.preprocessing中的StandardScaler函数的简介及使用方法之详细攻略 目录 sklearn.preprocessing中的Stand ...
- Python之 sklearn:sklearn中的train_test_split函数的简介及使用方法之详细攻略
Python之 sklearn:sklearn中的train_test_split函数的简介及使用方法之详细攻略 目录 sklearn中的train_test_split函数的简介 train_tes ...
最新文章
- 快速浏览Silverlight3 Beta:当HLSL遇上Silverlight
- BestCoder Round #65 B C D || HDU 5591 5592 5593
- 原创 | 一文了解边缘计算和边缘AI
- linux c 判断字符串是否是数字
- java语言实验报告,Java语言 实验报告(二)
- domdocument php 扩展_php使用自带dom扩展进行元素匹配的原理解析
- php 结果集 json,在PHP中提取JSONP结果集
- 20150504-日报
- Android开发学习—— Broadcast广播接收者
- docker更新容器命令 ,自启
- 神策数据推荐系统:中文关键词提取新模型
- 复用类库内部已有功能
- 12.swoft 安装
- 阮一峰:WebSocket 教程
- 【游戏开发实战】Unity UGUI序列帧动画(蓝胖子序列帧图)
- CTFHUBWeb技能树——密码口令writeup 附常见网络平台默认密码
- 【AMESim】AMESim和Simulink联合仿真步骤
- js自动弹窗被拦截 html,JS打开新窗口防止被浏览器阻止的方法
- 如何将html转换成url,HTML URL
- 微信小程序 测试环境和正式环境 access_token冲突问题
热门文章
- Adobe2021/20年11月更新推送
- JVM系列(二):JDK自带监控命令
- Browserslist: caniuse-lite is outdated. Please run:npx browserslist@latest --update-db
- ADS9.5安装时出现SGC SDD Configurator、破解中遇到 Licensing Note!警告解决方案
- 导出生成有水印的excel文件
- tensorflow2/kera搭建网络模型报错Input layers to a `Model` must be `InputLayer` objects. Received inputs:
- Stata计算可操纵性应计利润——基于琼斯模型
- error: invalid initialization of reference of type ‘std::string’ from expression 	of type ‘const s
- 收藏夹吃灰!2 万字系统总结,带你实现 Linux 命令自由~
- 服装品牌SPA利器,成就谁的未来?