sklearn:sklearn.feature_selection的SelectFromModel函数的简介、使用方法之详细攻略

目录

SelectFromModel函数的简介

1、使用SelectFromModel和LassoCV进行特征选择

2、L1-based feature selection

3、Tree-based feature selection

SelectFromModel函数的使用方法

1、SelectFromModel的原生代码


SelectFromModel函数的简介

SelectFromModel is a meta-transformer that can be used along with any estimator that has a coef_ or feature_importances_ attribute after fitting. The features are considered unimportant and removed, if the corresponding coef_ or feature_importances_ values are below the provided threshold parameter. Apart from specifying the threshold numerically, there are built-in heuristics for finding a threshold using a string argument. Available heuristics are “mean”, “median” and float multiples of these like “0.1*mean”.
        SelectFromModel是一个元转换器,可以与任何在拟合后具有coef_或feature_importances_属性的estimator 一起使用。如果相应的coef_或feature_importances_值低于提供的阈值参数,则认为这些特性不重要并将其删除。除了以数字方式指定阈值外,还有使用字符串参数查找阈值的内置启发式方法。可用的试探法是“平均数”、“中位数”和这些数的浮点倍数,如“0.1*平均数”。

官网API:https://scikit-learn.org/stable/modules/feature_selection.html#feature-selection-using-selectfrommodel

  """Meta-transformer for selecting features based on importance weights.    .. versionadded:: 0.17

用于根据重要性权重来选择特征的元转换器

. .加入在0.17版本::

      Parameters
    ----------
    estimator : object
    The base estimator from which the transformer is built.
    This can be both a fitted (if ``prefit`` is set to True)
    or a non-fitted estimator. The estimator must have either a
    ``feature_importances_`` or ``coef_`` attribute after fitting.
    
    threshold : string, float, optional default None
    The threshold value to use for feature selection. Features whose
    importance is greater or equal are kept while the others are
    discarded. If "median" (resp. "mean"), then the ``threshold`` value is
    the median (resp. the mean) of the feature importances. A scaling
    factor (e.g., "1.25*mean") may also be used. If None and if the
    estimator has a parameter penalty set to l1, either explicitly
    or implicitly (e.g, Lasso), the threshold used is 1e-5.
    Otherwise, "mean" is used by default.
    
    prefit : bool, default False
    Whether a prefit model is expected to be passed into the constructor
    directly or not. If True, ``transform`` must be called directly
    and SelectFromModel cannot be used with ``cross_val_score``,
    ``GridSearchCV`` and similar utilities that clone the estimator.
    Otherwise train the model using ``fit`` and then ``transform`` to do
    feature selection.
    
    norm_order : non-zero int, inf, -inf, default 1
    Order of the norm used to filter the vectors of coefficients below
    ``threshold`` in the case where the ``coef_`` attribute of the
    estimator is of dimension 2.

参数
estimator :对象类型,
建立转换的基本estimator 。
这可以是一个拟合(如果' ' prefit ' '被设置为True) 或者非拟合的estimator。在拟合之后,estimator 必须有' ' feature_importances_ ' '或' ' coef_ ' '属性。

threshold :字符串,浮点类型,可选的,默认无

用于特征选择的阈值。重要性大于或等于的特征被保留,其他特征被丢弃。如果“中位数”(分别地。(“均值”),则“阈值”为中位数(resp,特征重要性的平均值)。也可以使用比例因子(例如“1.25*平均值”)。如果没有,并且估计量有一个参数惩罚设置为l1,不管是显式的还是隐式的(例如Lasso),阈值为1e-5。否则,默认使用“mean”。

prefit: bool,默认为False

prefit模型是否应直接传递给构造函数。如果为True,则必须直接调用“transform”,SelectFromModel不能与cross_val_score 、GridSearchCV以及类似的克隆估计器的实用程序一起使用。否则,使用' ' fit ' '和' ' transform ' '训练模型进行特征选择。

norm_order:非零整型,inf, -inf,默认值1
在estimator的' coef_ 属性为2维的情况下,用于过滤' '阈值' '以下系数的向量的范数的顺序。

    Attributes
    ----------
    estimator_ : an estimator
    The base estimator from which the transformer is built.
    This is stored only when a non-fitted estimator is passed to the
    ``SelectFromModel``, i.e when prefit is False.
    
    threshold_ : float
    The threshold value used for feature selection.
    """

属性
estimator_:一个estimator。

建立转换器的基estimator,只有在将非拟合估计量传递给SelectFromModel 时,才会存储它。当prefit 为假时。

threshold_ :浮点类型
用于特征选择的阈值。

1、使用SelectFromModel和LassoCV进行特征选择

# Author: Manoj Kumar <mks542@nyu.edu>
# License: BSD 3 clauseprint(__doc__)import matplotlib.pyplot as plt
import numpy as npfrom sklearn.datasets import load_boston
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LassoCV# Load the boston dataset.
X, y = load_boston(return_X_y=True)# We use the base estimator LassoCV since the L1 norm promotes sparsity of features.
clf = LassoCV()# Set a minimum threshold of 0.25
sfm = SelectFromModel(clf, threshold=0.25)
sfm.fit(X, y)
n_features = sfm.transform(X).shape[1]# Reset the threshold till the number of features equals two.
# Note that the attribute can be set directly instead of repeatedly
# fitting the metatransformer.
while n_features > 2:sfm.threshold += 0.1X_transform = sfm.transform(X)n_features = X_transform.shape[1]# Plot the selected two features from X.
plt.title("Features selected from Boston using SelectFromModel with ""threshold %0.3f." % sfm.threshold)
feature1 = X_transform[:, 0]
feature2 = X_transform[:, 1]
plt.plot(feature1, feature2, 'r.')
plt.xlabel("Feature number 1")
plt.ylabel("Feature number 2")
plt.ylim([np.min(feature2), np.max(feature2)])
plt.show()

2、L1-based feature selection

>>> from sklearn.svm import LinearSVC
>>> from sklearn.datasets import load_iris
>>> from sklearn.feature_selection import SelectFromModel
>>> X, y = load_iris(return_X_y=True)
>>> X.shape
(150, 4)
>>> lsvc = LinearSVC(C=0.01, penalty="l1", dual=False).fit(X, y)
>>> model = SelectFromModel(lsvc, prefit=True)
>>> X_new = model.transform(X)
>>> X_new.shape
(150, 3)

3、Tree-based feature selection

>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.datasets import load_iris
>>> from sklearn.feature_selection import SelectFromModel
>>> X, y = load_iris(return_X_y=True)
>>> X.shape
(150, 4)
>>> clf = ExtraTreesClassifier(n_estimators=50)
>>> clf = clf.fit(X, y)
>>> clf.feature_importances_
array([ 0.04...,  0.05...,  0.4...,  0.4...])
>>> model = SelectFromModel(clf, prefit=True)
>>> X_new = model.transform(X)
>>> X_new.shape
(150, 2)

SelectFromModel函数的使用方法

1、SelectFromModel的原生代码

class SelectFromModel Found at: sklearn.feature_selection.from_modelclass SelectFromModel(BaseEstimator, SelectorMixin, MetaEstimatorMixin):"""Meta-transformer for selecting features based on importance weights... versionadded:: 0.17Parameters----------estimator : objectThe base estimator from which the transformer is built.This can be both a fitted (if ``prefit`` is set to True)or a non-fitted estimator. The estimator must have either a``feature_importances_`` or ``coef_`` attribute after fitting.threshold : string, float, optional default NoneThe threshold value to use for feature selection. Features whoseimportance is greater or equal are kept while the others arediscarded. If "median" (resp. "mean"), then the ``threshold`` value isthe median (resp. the mean) of the feature importances. A scalingfactor (e.g., "1.25*mean") may also be used. If None and if theestimator has a parameter penalty set to l1, either explicitlyor implicitly (e.g, Lasso), the threshold used is 1e-5.Otherwise, "mean" is used by default.prefit : bool, default FalseWhether a prefit model is expected to be passed into the constructordirectly or not. If True, ``transform`` must be called directlyand SelectFromModel cannot be used with ``cross_val_score``,``GridSearchCV`` and similar utilities that clone the estimator.Otherwise train the model using ``fit`` and then ``transform`` to dofeature selection.norm_order : non-zero int, inf, -inf, default 1Order of the norm used to filter the vectors of coefficients below``threshold`` in the case where the ``coef_`` attribute of theestimator is of dimension 2.Attributes----------estimator_ : an estimatorThe base estimator from which the transformer is built.This is stored only when a non-fitted estimator is passed to the``SelectFromModel``, i.e when prefit is False.threshold_ : floatThe threshold value used for feature selection."""def __init__(self, estimator, threshold=None, prefit=False, norm_order=1):self.estimator = estimatorself.threshold = thresholdself.prefit = prefitself.norm_order = norm_orderdef _get_support_mask(self):# SelectFromModel can directly call on transform.if self.prefit:estimator = self.estimatorelif hasattr(self, 'estimator_'):estimator = self.estimator_else:raise ValueError('Either fit SelectFromModel before transform or set "prefit=''True" and pass a fitted estimator to the constructor.')scores = _get_feature_importances(estimator, self.norm_order)threshold = _calculate_threshold(estimator, scores, self.threshold)return scores >= thresholddef fit(self, X, y=None, **fit_params):"""Fit the SelectFromModel meta-transformer.Parameters----------X : array-like of shape (n_samples, n_features)The training input samples.y : array-like, shape (n_samples,)The target values (integers that correspond to classes inclassification, real numbers in regression).**fit_params : Other estimator specific parametersReturns-------self : objectReturns self."""if self.prefit:raise NotFittedError("Since 'prefit=True', call transform directly")self.estimator_ = clone(self.estimator)self.estimator_.fit(X, y, **fit_params)return self@propertydef threshold_(self):scores = _get_feature_importances(self.estimator_, self.norm_order)return _calculate_threshold(self.estimator, scores, self.threshold)@if_delegate_has_method('estimator')def partial_fit(self, X, y=None, **fit_params):"""Fit the SelectFromModel meta-transformer only once.Parameters----------X : array-like of shape (n_samples, n_features)The training input samples.y : array-like, shape (n_samples,)The target values (integers that correspond to classes inclassification, real numbers in regression).**fit_params : Other estimator specific parametersReturns-------self : objectReturns self."""if self.prefit:raise NotFittedError("Since 'prefit=True', call transform directly")if not hasattr(self, "estimator_"):self.estimator_ = clone(self.estimator)self.estimator_.partial_fit(X, y, **fit_params)return self

sklearn:sklearn.feature_selection的SelectFromModel函数的简介、使用方法之详细攻略相关推荐

  1. Py之Numpy:Numpy库中常用函数的简介、应用之详细攻略

    Py之Numpy:Numpy库中常用函数的简介.应用之详细攻略 目录 Numpy库中常用函数的简介.应用 1.X, Y = np.meshgrid(X, Y) 相关文章 Py之Numpy:Numpy库 ...

  2. Python之 sklearn:sklearn.preprocessing中的StandardScaler函数的简介及使用方法之详细攻略

    Python之 sklearn:sklearn.preprocessing中的StandardScaler函数的简介及使用方法之详细攻略 目录 sklearn.preprocessing中的Stand ...

  3. Python之 sklearn:sklearn中的train_test_split函数的简介及使用方法之详细攻略

    Python之 sklearn:sklearn中的train_test_split函数的简介及使用方法之详细攻略 目录 sklearn中的train_test_split函数的简介 train_tes ...

  4. Python之sklearn:GridSearchCV()和fit()函数的简介、具体案例、使用方法之详细攻略

    Python之sklearn:GridSearchCV()和fit()函数的简介.具体案例.使用方法之详细攻略 目录 GridSearchCV()和fit()函数的使用方法 GridSearchCV( ...

  5. Python之 sklearn:sklearn中的RobustScaler 函数的简介及使用方法之详细攻略

    Python之 sklearn:sklearn中的RobustScaler 函数的简介及使用方法之详细攻略 目录 sklearn中的RobustScaler 函数的简介及使用方法 sklearn中的R ...

  6. ML之sklearn:sklearn.linear_mode中的LogisticRegression函数的简介、使用方法之详细攻略

    ML之sklearn:sklearn.linear_mode中的LogisticRegression函数的简介.使用方法之详细攻略 目录 sklearn.linear_mode中的LogisticRe ...

  7. sklearn:sklearn.GridSearchCV函数的简介、使用方法之详细攻略

    sklearn:sklearn.GridSearchCVl函数的简介.使用方法之详细攻略 目录 sklearn.GridSearchCV函数的简介 1.参数说明 2.功能代码 sklearn.Grid ...

  8. sklearn:sklearn.preprocessing.StandardScaler函数的fit_transform、transform、inverse_transform简介、使用方法之详细攻略

    sklearn:sklearn.preprocessing.StandardScaler函数的fit_transform.transform.inverse_transform简介.使用方法之详细攻略 ...

  9. sklearn:sklearn.preprocessing的MinMaxScaler简介、使用方法之详细攻略

    sklearn:sklearn.preprocessing的MinMaxScaler简介.使用方法之详细攻略 目录 MinMaxScaler简介 MinMaxScaler函数解释 MinMaxScal ...

最新文章

  1. linux lvm 逻辑卷 创建 扩容 缩减 删除
  2. View页面间的跳转
  3. Qt 图形视图框架中的事件处理和传播
  4. maven打包的各种方式和如何在使用maven-assembly-plugin打包时去掉assembly id
  5. 使用application log分析Fiori navigation target解析错误
  6. 搞清这些陷阱,NULL和三值逻辑再也不会作妖
  7. 编程关键词介绍...
  8. [转] 2018年冬流感通知
  9. python语言做法_在Python中使用设置文件的最佳做法是什么?
  10. 关于递归转换成循环的思想
  11. oracle数据库的字符集更改
  12. androidx86 9.0下载_Surface pro 安装 android x86/chrome OS
  13. Mysql深入浅出学习
  14. 专业显卡测试软件 spec,AMD Radeon Pro WX8200专业图形显卡评测 对比Quadro P5000
  15. matlab画带有正态密度曲线的直方图
  16. 求助文,缺少stubs-n32_hard.h
  17. 安卓开发实现悬浮窗显示(全局显示),通过悬浮窗实时监控当前流量
  18. Red Rover 简单字符串应用
  19. 【业界思考】Sam Altman 山姆奥特曼:Idea Generation 创意产生——优秀的创始人对任何事情都有很多想法
  20. 呆呆和你谈谈入职CVTE一个月的感受

热门文章

  1. caffe学习(七):使用py-faster-rcnn来训练voc2007数据集(Ubuntu)
  2. linux七大功能,值得Linux向其他系统借鉴的七大功能特性
  3. golang中的http conn实现分析
  4. GMQ发行稳定币将进一步打破稳定币市场垄断格局
  5. Latex学习(脚注)
  6. 瞬发大量并发连接 造成MySQL连接不响应的分析
  7. 如何破解无线路由器密码,如何破解WEP密码,破解无线路由器
  8. 各种页面刷新代码大全,asp/javascript刷新页面代码
  9. 实现一个多线程循环的类
  10. CentOS 7安装Nginx