#引用

##Latex

@article{GARCIAPEDRAJAS2013150,
title = “A scalable approach to simultaneous evolutionary instance and feature selection”,
journal = “Information Sciences”,
volume = “228”,
pages = “150 - 174”,
year = “2013”,
issn = “0020-0255”,
doi = “https://doi.org/10.1016/j.ins.2012.10.006”,
url = “http://www.sciencedirect.com/science/article/pii/S0020025512006718”,
author = “Nicol’{a}s Garc’{\i}a-Pedrajas and Aida de Haro-Garc’{\i}a and Javier P’{e}rez-Rodr’{\i}guez”,
keywords = “Simultaneous instance and feature selection, Instance selection, Feature selection, Instance-based learning, Very large problems”
}

##Normal

Nicolás García-Pedrajas, Aida de Haro-García, Javier Pérez-Rodríguez,
A scalable approach to simultaneous evolutionary instance and feature selection,
Information Sciences,
Volume 228,
2013,
Pages 150-174,
ISSN 0020-0255,
https://doi.org/10.1016/j.ins.2012.10.006.
(http://www.sciencedirect.com/science/article/pii/S0020025512006718)
Keywords: Simultaneous instance and feature selection; Instance selection; Feature selection; Instance-based learning; Very large problems


#摘要

enormous amount of information
bioinformatics, security and intrusion detection and text mining

data reduction — removing

  1. missing

  2. redundant

  3. information-poor data and/or

  4. erroneous data
    from the dataset to obtain a tractable problem size

  5. feature selection

  6. feature-value discretization

  7. instance selection

the divide-and-conquer principle + bookkeeping

in linear time


#主要内容

Instance selection:choosing a subset of the total available data to achieve the original purpose of the datamining application as though all the data were being used

  1. prototype selection(k-Nearest Neighbors)
  2. obtaining the training set for a learning algorithm(classification trees or neural networks)

‘‘the isolation of the smallest set of instances that enable us to predict the class of a query instance with the same (or higher) accuracy than the original set’’

The objectives of feature selection:

  1. To avoid over-fitting and to improve model performance
  2. To provide faster and more cost-effective models
  3. To gain a deeper insight into the underlying processes that generated the data

simultaneous instance and feature selection

scalable simultaneous instance and feature selection method (SSIFSM)


#相关工作

three fundamental approaches for scaling up learning methods:

  1. designing fast algorithms
  2. partitioning the data
  3. using a relational representation

The stratification strategy
分层策略
splits the training data into disjoint strata with equal class distribution


#算法

Scalable simultaneous instance and feature selection method (SSIFSM)

KKK classes and NNN training instances with MMM features
T={(x1,y1),(x2,y2),…,(xN,yN)}T = \left\{ \left(x_1, y_1 \right), \left(x_2, y_2 \right), \ldots, \left(x_N, y_N \right) \right\}T={(x1​,y1​),(x2​,y2​),…,(xN​,yN​)}
Y={1,…,K}Y = \left\{ 1, \ldots, K \right\}Y={1,…,K}

simultaneous instance and feature selection

Bookkeeping

an evolutionary algorithm

a CHC algorithm —
Cross-generational elitist selection, Heterogeneous recombination and Cataclysmic mutation

  1. To obtain the next generation for a population of size PPP, the parents and the offspring are put together and the PPP best individuals are selected.
  2. To avoid premature convergence, only different individuals separated by a threshold Hamming distance—in our implementation, the length of the chromosome divided by four—are allowed to mate.
  3. During crossover, two parents exchange exactly half of their nonmatching bits. This operator is referred to as Half Uniform Crossover (HUX).
  4. Mutation is not used during the regular evolution. To avoid premature convergence or stagnation of the search, the population is reinitialized when the individuals are not diverse. In such a case, only the best individual is retained in the new population.

acc(x)acc(x)acc(x) — the accuracy — 1-NN classifier
red(x)red(x)red(x) — the reduction — KaTeX parse error: Unexpected character: '' at position 3: 1 ̲ - \frac{N'}{N}…

α\alphaα — a weighting parameter — 0.50.50.5

the instance and feature selection

record the number of times that each instance and feature has been selected to be kept
— the number of votes (its similarity to the combination of classifiers in an ensemble by voting)
— repeated for rrr rounds
all the instances of a certain subset belong to the same class — ignored

the combination of the different rounds

the philosophy of ensembles of classifiers
several weak learners are combined to form a strong classifier — several weak (in the sense that they are applied to subsets of the training data) instance and feature selection procedures are combined to produce a strong and fast selection method

bagging or boosting

number of votes — KaTeX parse error: Unexpected character: '' at position 8: [ 0, r ̲\cdot s ] (an instance)
an instance is in sss subsets in every round.
Each feature is in ttt subsets each round.
KaTeX parse error: Unexpected character: '' at position 8: [ 0, r ̲\cdot t ] (an feature)

threshold

majority voting — at least half (depends heavily on the problem)

automatically

θi\theta_iθi​ — threshold for instance
θf\theta_fθf​ — threshold for feature
T(θi,θf)T \left( \theta_i, \theta_f \right)T(θi​,θf​) — the subset of the training set
1-NN classifier
all the possible values
β\betaβ — close to 111 — 0.750.750.75
red(x)=1−(si+sf)N+Mred \left( x \right) = 1 - \dfrac{ \left( s_i + s_f \right) }{ N + M }red(x)=1−N+M(si​+sf​)​

evaluate all possible pairs of instance and feature thresholds — KaTeX parse error: Unexpected character: '' at position 5: [ r ̲\cdot s ] \tim… — O(N2M)O(N^2M)O(N2M)
a divide-and-conquer method
the same partition philosophy
training set is divided into random disjoint subsets and the accuracy is estimated separately in each subset using the average evaluation of all the subsets for the fitness of each pair of thresholds


The scalability of the method is assured by the following features:

  1. Application of the method to small datasets. Due to the small size of the subsets in which the selection process is applied, the selection process will always be fast, regardless of the complexity of the instance and feature selection method used in the subsets.
  2. Only small datasets must be kept in memory. This allows the application of instance and feature selection when datasets do not fit into memory.
  3. Bookkeeping is applied in the evolution for every subset.

##Complexity of our methodology

linear in the number of instances, N, of the dataset

The process of random partition — O(NM)O(NM)O(NM)
a subset of fixed size, nnn
the complexity of the CHC selection algorithm
KKK — the number of operations required by the selection algorithm to perform its task in a dataset of size nnn
NNN instances and MMM features
his selection process once for each subset, NnMm\frac{N}{n} \frac{M}{m}nN​mM​ times
O((NnMm)K)O \left( \left( \frac{N}{n} \frac{M}{m} \right) K \right)O((nN​mM​)K)
rrr rounds
O(r(NnMm)K)O \left( r \left( \frac{N}{n} \frac{M}{m} \right) K \right)O(r(nN​mM​)K)
linear
easy parallel implementation


##Parallel implementation

the master/slave architecture
master — the partition of the dataset and sends the subsets to each slave
slave — performs the selection algorithm + returns the selected instances and features to the master
master — stores the votes for each kept instance and feature

communication between different tasks
occurs only twice:

  1. Before each slave initiates its selection process, it must receive the subset of data to perform that process. This amount of information is always small because the method is based on each slave taking care of only a small part of the whole dataset. Furthermore, if the slaves can access the disk, they can read the necessary data directly from it.
  2. Once the selection process is finished, the slaves send the selection performed to the master. This selection consists of a list of the selected instances and features, which is a small sequence of integers.

the best value for the votes threshold:

  1. Before each slave initiates the evaluation of a certain pair of thresholds, it must receive the subset of data to perform that task. This amount of information is always small because the method is based on each slave taking care of only a small part of the whole dataset.
  2. Once the evaluation process is finished, the slaves send the evaluation performed to the master. The evaluation is the error obtained when the corresponding pair of thresholds is used, which is a real number.

#试验

50 problems

the UCI Machine Learning Repository

the reduction and accuracy
a 10-fold cross-validation method
a 1-nearest neighbor (1-NN) classifier

from small to medium sizes


##Algorithms for the comparison

  1. Nearest neighbor error using all instances and features is chosen as a baseline measure (1-NN). Any method of performing selection of features or instances must at least match the performance of the 1-NN algorithm or improve its accuracy if possible.
  2. As stated, Cano et al. [10] performed a comprehensive comparison of the performances of different evolutionary algorithms for instance selection. They compared a generational genetic algorithm, a steady-state genetic algorithm, a CHC genetic algorithm and a population-based incremental learning algorithm. Among these methods, CHC achieved the best overall performance (ISCHC). Therefore, we have included the CHC algorithm in our comparison. [10] J.R. Cano, F. Herrera, M. Lozano, Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study, IEEE Transactions on Evolutionary Computation 7 (2003) 561–575.
  3. Feature selection using a genetic algorithm (FSCHC). Following the same idea used for instance selection, we used a CHC algorithm for feature selection with the same characteristics of the algorithm used for instance selection.
  4. Instance and feature selection using a genetic algorithm (IS + FSCHC). Combining the two previous methods, we also used a CHC algorithm that simultaneously evolved the instances and features.
  5. Intelligent Multiobjective Evolutionary Algorithm (IMOEA) [13] method. This method is a multi-objective evolutionary algorithm, which considers both instance and feature selection. The algorithm has two objectives, maximization of training accuracy and minimization of the number of instances and features selected. The multi-objective algorithm used is based on Pareto dominance because the approach is common in multi-objective algorithms [65]. The fitness of each individual is the difference between the individuals it dominates and the individuals that dominate it. The algorithm also includes a new crossover operator called intelligent crossover, which incorporates the systematic reasoning ability of orthogonal experimental design [43] to estimate the contribution of each gene to the fitness of the individuals.
  6. The major aim of our method is scalability. However, if scalability can be achieved with a simple random sampling method, it may be argued that our method is superfluous. Thus, our last method for comparison is a CHC algorithm for instance and feature selection, which uses a random sampling of instances (SAMPLING). The method is applied using a random 10% of all the instances in the dataset.

The source code used for all methods is in C and is licensed under the GNU General Public License.

in a cluster of 32 blades
Each blade is a biprocessor DELL Power Edge M600 with four cores per processor
256 cores
a 1 GB network
2.5 GHz
each blade has 16 GB of memory


##Statistical tests

Iman-Davenport test — based on the χF2\chi^2_FχF2​
Friedman test — compares the average ranks of kkk algorithms — more powerful than the Friedman test


following a FFF distribution with KaTeX parse error: Unexpected character: '' at position 4: k -̲ 1 and KaTeX parse error: Unexpected character: '' at position 5: (k -̲ 1) (N - 1) degrees of freedom

pairwise comparisons — the Wilcoxon test — stronger
family-wise error

The test statistic for comparing the ith and jth classifier

zzz — used to find the corresponding probability from the table of normal distribution

The Bonferroni–Dunn test — Holm
Holm’s procedure — more powerful than Bonferroni–Dunn’s


##Evaluation measures

accuracy — the percentage of instances classified correctly
reduction — the percentage of total data removed during the evolution

class-imbalanced problems

  1. true positives (TPs)
  2. false positives (FPs)
  3. true negatives (TNs)
  4. false negatives (FNs)

  • the sensitivity (Sn) — Sn=TPTP+FNSn = \frac{TP}{TP+FN}Sn=TP+FNTP​
  • the specificity (Sp) — Sp=TNTN+FPSp = \frac{TN}{TN+FP}Sp=TN+FPTN​
  • the G - mean measure — G−mean=Sp⋅SnG-mean = \sqrt{Sp \cdot Sn}G−mean=Sp⋅Sn​

the reduction — 1−nNmM1 - \frac{n}{N} \frac{m}{M}1−Nn​Mm​


##实验结果

regarding the size of the subsets used

  1. 1000 instances and seven features — (1000,7)
  2. 500 instances and nine features — (500,9)
  3. 100 instances and 13 features — (100,13)

cache — 300–400 MB

memory thrashing


no significant differences among the three configurations for both accuracy and reduction

execution time — remarkable differences


The Iman–Davenport test with a p-value of 0.0000 for both accuracy and reduction — significant differences

Holm’s procedure —

the κ\kappaκ-error relative movement diagrams —


— the reduction difference — instead of the κ\kappaκ difference value
These diagrams use an arrow to represent the results of two methods applied to the same dataset.
a convenient way of summarizing the results
arrows pointing up-right
arrows pointing down-left


Regarding time — an approximately linear behavior

Holm test — (15 rounds)


###运行时间

the philosophy of divide-and-conquer
bookkeeping

the wall-clock time

significantly faster

the speedup of SSIFSM with respect to the standard IS + FSCHC for the parallel and sequential implementations respectively —

three control experiments —

  1. a random selection method — [5%,15%][5\%,15\%][5%,15%] and [15%,25%][15\%,25\%][15%,25%] (instance and features)
  2. IB3 [1] algorithm for instance selection and ReliefF [55] method for feature selection
  3. IB3 [1] algorithm for feature selection and ReliefF [55] method for instance selection

the performance in terms of reduction was similar
a very significant worsening of the accuracy


###Scalability to very large datasets

problems with many instances, many features and both many instances and features

the largest set containing 50 million instances and 800 features

a large imbalance ratio

G-mean


#结论

simultaneous instance and feature selection

a bookkeeping mechanism

a voting method

a CHC algorithm

class-imbalanced data

scaled up to problems of very large size

scalability:

  1. instance and feature selection is always performed over small datasets
  2. a bookkeeping approach
  3. only small subsets must be kept in memory

decision trees
support vector machines

一种可扩展的同时进化实例和特征选择方法相关推荐

  1. 结合Scikit-learn介绍几种常用的特征选择方法

    特征选择(排序)对于数据科学家.机器学习从业者来说非常重要.好的特征选择能够提升模型的性能,更能帮助我们理解数据的特点.底层结构,这对进一步改善模型.算法都有着重要作用. 特征选择主要有两个功能: 减 ...

  2. python 卡方检验 特征选择_结合Scikit-learn介绍几种常用的特征选择方法

    特征选择(排序)对于数据科学家.机器学习从业者来说非常重要.好的特征选择能够提升模型的性能,更能帮助我们理解数据的特点.底层结构,这对进一步改善模型.算法都有着重要作用. 特征选择主要有两个功能: 减 ...

  3. 干货:结合Scikit-learn介绍几种常用的特征选择方法

    原文:http://dataunion.org/14072.html 作者:Edwin Jarvis 特征选择(排序)对于数据科学家.机器学习从业者来说非常重要.好的特征选择能够提升模型的性能,更能帮 ...

  4. 几种常用的特征选择方法

    几种常用的特征选择方法 转载 2016年12月14日 16:33:38 标签: 特征选择 6084 结合Scikit-learn介绍几种常用的特征选择方法 原文  http://dataunion.o ...

  5. (干货)结合Scikit-learn介绍几种常用的特征选择方法

    系统版本:window 7 (64bit) python版本:python 3.5 我的GitHub:https://github.com/weepon 写在前面:前段时间正好用到特征选择的知识,有幸 ...

  6. javascript入门系列演示·三种弹出对话框的用法实例

    对话框有三种 1:只是提醒,不能对脚本产生任何改变: 2:一般用于确认,返回 true 或者 false ,所以可以轻松用于 if...else...判断 3: 一个带输入的对话框,可以返回用户填入的 ...

  7. 一种基于扩展反电动势的永磁同步电机无位置控制算法,全部C语言 编写,含有矢量控制大部分功能(弱磁,解耦,过调制,死区补偿等)

    一种基于扩展反电动势的永磁同步电机无位置控制算法,全部C语言 编写,含有矢量控制大部分功能(弱磁,解耦,过调制,死区补偿等) 为了方便学习和工作,该产品结合S-Function进行仿真,且属于量产产品 ...

  8. 单片机按键连接方法总结(五种按键扩展方案详细介绍)

    http://blog.sina.com.cn/s/blog_634771fd0100hd1h.html 单片机在各种领域运用相当广泛,而作为人机交流的按键设计也有很多种.不同的设计方法,有着不同的优 ...

  9. usb扩展坞同时接键盘鼠标_一种带有扩展坞功能的一体式键鼠的制作方法

    一种带有扩展坞功能的一体式键鼠的制作方法 [技术领域] [0001]本实用新型涉及加固计算机技术领域,具体涉及一种带有扩展坞功能的一体式键 ea啦O [背景技术] [0002]随着加固计算机技术的不断 ...

最新文章

  1. Udacity机器人软件工程师课程笔记(十九) - 3D感知介绍 - 主动/被动式传感器、RGB-D相机、点云
  2. 网站基于文本搜索的实现
  3. usb端点轮询_使用Spring Integration轮询http端点
  4. CSS——行内元素的margin与padding
  5. 论计算机网络的发展及运用,试论计算机网络发展及其应用研究
  6. 删改数据如何避免锁表?等等,啥是锁呀
  7. 微信小程序列表渲染(循环渲染)
  8. android 海康云眸SDK简单使用
  9. jenkins 下载插件失败 有效的处理办法(亲测)
  10. 1236mysql,MySQL1236错误的恢复
  11. 办公未来已来,金山WPS如何从“追随者”到“领跑者”
  12. ubuntu18与win10双系统引导修复
  13. 百分点科技启动上市:消息称其国内业务持续亏损,苏萌为实控人
  14. 数据结构-算法与算法描述
  15. 告别夏日的烤串,迎来秋季的凉爽
  16. permission denied (publickey)问题的解决
  17. 软件系统的标准化和产品化
  18. 新房装修|选空调挂机还是中央空调?
  19. 2-2 进制转换(Q进制转换成T进制) (25分)
  20. Java培训四个月能学会吗

热门文章

  1. 使用FFmpeg实现rtmp播放和音视频同步
  2. 我们年轻时,为什么要辛苦赚钱,这是我听过的最好回答!
  3. SpringAOP实现多数据源切换
  4. 关于电信基站nid,sid,bid
  5. 暗时间----有感而发
  6. java连接access数据库的三种方式以及远程连接
  7. python导入pyx文件_初学Python,只会写简单的代码。手头有份Python代码,但是调用C模块生成的.pxd和.pyx文件,运行过程总报错,希望大神指点,调试前该做哪些工作呢?...
  8. 机器人植入情感芯片利与弊_未来机器人或具备情感 专家:要考虑伦理问题
  9. 世界上第一台计算机内存容量,29、世界上第一台电子计算机ENIAC诞生于.doc
  10. 第四章第十三题(判断元音还是辅音)(Vowel or consonant?)