背景

由于最近在整理一些特征工程的内容，当整理到特征选择算法的时候有reliefF算法，感觉算法挺常见的，应该在sklearn上能找到，但是找了下API，没找到，可能是太简单了？或者是名字不对？总之，有事不懂就先github，上面有人写了reliefF算法的一个实现，个人感觉实现是没问题的，使用的评价标准应该是KNN

运行效果

github原文连接
但是源代码在最后一行有问题，transform方法只返回一个特征，应该是需要全返回的

return X[:, self.top_features[self.n_features_to_keep]]

应该是

return X[:, :self.top_features[self.n_features_to_keep]]

使用sklearn上关于RFE特征选择的方法的运行示例测试：

导入环境

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import sklearn
import pandas as pd
import os
import sys
import timeprint("-------运行环境如下-------")
print(sys.version_info)
for module in mpl, np, pd, sklearn:print(module.__name__, module.__version__)

-------运行环境如下-------
sys.version_info(major=3, minor=7, micro=7, releaselevel=‘final’, serial=0)
matplotlib 3.3.1
numpy 1.19.1
pandas 1.1.1
sklearn 0.23.2

数据集

from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFE
from sklearn.svm import SVR
X, y = make_friedman1(n_samples=5000, n_features=100, random_state=0)n_features_to_keep = 10

reliefF

from sklearn.neighbors import KDTreeclass ReliefF(object):"""Feature selection using data-mined expert knowledge.Based on the ReliefF algorithm as introduced in:Kononenko, Igor et al. Overcoming the myopia of inductive learning algorithms with RELIEFF (1997), Applied Intelligence, 7(1), p39-55"""def __init__(self, n_neighbors=100, n_features_to_keep=n_features_to_keep):"""Sets up ReliefF to perform feature selection.Parameters----------n_neighbors: int (default: 100)The number of neighbors to consider when assigning feature importance scores.More neighbors results in more accurate scores, but takes longer.Returns-------None"""self.feature_scores = Noneself.top_features = Noneself.tree = Noneself.n_neighbors = n_neighborsself.n_features_to_keep = n_features_to_keepdef fit(self, X, y):"""Computes the feature importance scores from the training data.Parameters----------X: array-like {n_samples, n_features}Training instances to compute the feature importance scores fromy: array-like {n_samples}Training labelsReturns-------None"""self.feature_scores = np.zeros(X.shape[1])self.tree = KDTree(X)for source_index in range(X.shape[0]):distances, indices = self.tree.query(X[source_index].reshape(1, -1), k=self.n_neighbors + 1)# First match is self, so ignore itfor neighbor_index in indices[0][1:]:similar_features = X[source_index] == X[neighbor_index]label_match = y[source_index] == y[neighbor_index]# If the labels match, then increment features that match and decrement features that do not match# Do the opposite if the labels do not matchif label_match:self.feature_scores[similar_features] += 1.self.feature_scores[~similar_features] -= 1.else:self.feature_scores[~similar_features] += 1.self.feature_scores[similar_features] -= 1.self.top_features = np.argsort(self.feature_scores)[::-1]def transform(self, X):"""Reduces the feature set down to the top `n_features_to_keep` features.Parameters----------X: array-like {n_samples, n_features}Feature matrix to perform feature selection onReturns-------X_reduced: array-like {n_samples, n_features_to_keep}Reduced feature matrix"""return X[:, self.top_features[:self.n_features_to_keep]]rel = ReliefF()
rel.fit(X, y)
print(rel.top_features)

[99 36 26 27 28 29 30 31 32 33 34 35 37 98 38 39 40 41 42 43 44 45 46 47
25 24 23 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
21 48 49 50 75 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
96 97 76 74 51 73 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
70 71 72 0]
不过这里提一句，方法可能有问题，或者是我了解不够，或者是测试的数据集使用有问题，因为fit结束后feature_scores的值全是5000

estimator = SVR(kernel="linear")
rfe = RFE(estimator, n_features_to_select=n_features_to_keep, step=5)
rfe = rfe.fit(X, y)
print(rfe.ranking_

[ 1 1 19 1 1 13 12 4 14 7 1 10 8 18 7 19 14 5 10 4 13 14 7 17
3 16 18 8 3 8 6 6 13 18 2 5 12 17 12 2 17 10 9 11 7 15 9 16
9 2 8 4 18 5 15 4 2 6 3 9 10 2 1 1 4 16 12 11 13 11 8 11
5 17 6 1 1 16 19 13 19 19 15 3 11 1 14 12 10 6 17 7 14 18 3 15
16 9 15 5]

特征工程特征选择 reliefF算法相关推荐

《Python自然语言处理-雅兰·萨纳卡(Jalaj Thanaki)》学习笔记：05 特征工程和NLP算法
05 特征工程和NLP算法 5.1 理解特征工程 5.1.1 特征工程的定义 5.1.2 特征工程的目的 5.1.3 一些挑战 5.2 NLP中的基础特征 5.2.1 句法解析和句法解析器 5.2.2 ...
特征选择 ReliefF算法
一.算法 Relief算法最早由Kira提出. 基本内容:从训练集D中随机选择一个样本R, 然后从和R同类的样本中寻找k最近邻样本H,从和R不同类的样本中寻找k最近邻样本M, 最后按照公式更新特征权重 ...
《Python自然语言处理-雅兰·萨纳卡(Jalaj Thanaki)》学习笔记：06 高级特征工程和NLP算法
06 高级特征工程和NLP算法 6.1 词嵌入 6.2 word2vec基础 6.2.1 分布语义 6.2.2 定义word2vec 6.2.3 无监督分布语义模型中的必需品 6.3 word2vec ...
机器学习中的特征建模（特征工程）和算法选型建模 - 以暴力破解识别为例
catalogue 1. 特征工程是什么?有什么作用? 2. 特征获取方案 - 如何获取这些特征? 3. 特征观察 - 运用各种统计工具.图标等工具帮助我们从直观和精确层面认识特征中的概率分布 4. ...
python算法特征_python 3.x实现特征选择ReliefF算法
代码 !/usr/bin/env python # -*- coding:utf-8 -*- @Time : 2019/10/29 0029 9:12 @Author : tb_youth @File ...
python特征选择relieff图像特征优选_python 3.x实现特征选择ReliefF算法
#!/usr/bin/env python # -*- coding:utf-8 -*- #@Time : 2019/10/29 0029 9:12 #@Author : tb_youth #@Fil ...
数据预处理与特征工程—9.Lasso算法实现特征选择
文章目录引言实战引言为什么Lasso算法可以用于特征选择呢?因为Lasso算法可以使特征的系数进行压缩并且可以使某些回归系数为0,即不选用该特征,因此可以进行特征选择.而与它同为线性回归正 ...
Matlab中特征选择reliefF算法使用方法（分类与回归）
1. ReliefF简介 ReliefF是特征选择的一种算法,在高维特征样本中,选取部分具有代表性的特征,从而降低样本特征维度.它也是relief算法的进阶.Relief算法只能用来做二分类,但其算法 ...
特征工程——特征选择
目录 1 特征选择 2 子集搜索与评价 2.1 前向搜索 2.2 后向搜索 2.3 双向搜索 2.4 子集评价 2.5 特征选择方法概览(优缺点) 3 过滤式选择 3.1 特征的方差 3.2 相关系数 ...

特征工程特征选择 reliefF算法

特征工程特征选择 reliefF算法

背景

运行效果

特征工程特征选择 reliefF算法相关推荐

最新文章

热门文章

特征工程 特征选择 reliefF算法

特征工程 特征选择 reliefF算法

背景

运行效果

特征工程 特征选择 reliefF算法相关推荐

最新文章

热门文章

特征工程特征选择 reliefF算法

特征工程特征选择 reliefF算法

特征工程特征选择 reliefF算法相关推荐