matlab NCA,Neighborhood Component Analysis (NCA) Feature Selection

NCA Feature Selection for Classification

Consider a multi-class classification problem with a training set containing n observations:

S={(xi,yi),i=1,2,…,n},

where xi∈ℝp are the feature vectors, yi∈{1,2,…,c} are the class labels, and c is the number of classes. The aim is to learn a classifier f:ℝp→{1,2,…,c} that accepts a feature vector and makes a prediction f(x) for the true label y of x.

Consider a randomized classifier that:

Randomly picks a point, Ref(x), from S as the ‘reference point’ for x

Labels x using the label of the reference point Ref(x).

This scheme is similar to that of a 1-NN classifier where the reference point is chosen to be the nearest neighbor of the new point x. In NCA, the reference point is chosen randomly and all points in S have some probability of being selected as the reference point. The probability P(Ref(x)=xj|S) that point xj is picked from S as the reference point for x is higher if xj is closer to x as measured by the distance function dw, where

dw(xi,xj)=∑r=1pwr2|xir−xjr|,

and wr are the feature weights. Assume that

P(Ref(x)=xj|S)∝k(dw(x,xj)),

where k is some kernel or a similarity function that assumes large values when dw(x,xj) is small. Suppose it is

k(z)=exp(−zσ),

as suggested in [1]. The reference point for x is chosen from S, so sum of P(Ref(x)=xj|S) for all j must be equal to 1. Therefore, it is possible to write

P(Ref(x)=xj|S)=k(dw(x,xj))∑j=1nk(dw(x,xj)).

Now consider the leave-one-out application of this randomized classifier, that is, predicting the label of xi using the data in S−i, the training set S excluding the point (xi,yi). The probability that point xj is picked as the reference point for xi is

pij=P(Ref(xi)=xj|S−i)=k(dw(xi,xj))∑j=1,j≠ink(dw(xi,xj)).

The average leave-one-out probability of correct classification is the probability pi that the randomized classifier correctly classifies observation i using S−i.

pi=∑j=1,j≠inP(Ref(xi)=xj|S−i)I(yi=yj)=∑j=1,j≠inpijyij,

where

yij=I(yi=yj)={1ifyi=yj,0otherwise.

The average leave-one-out probability of correct classification using the randomized classifier can be written as

F(w)=1n∑i=1npi.

The right hand side of F(w) depends on the weight vector w. The goal of neighborhood component analysis is to maximize F(w) with respect to w. fscnca uses the regularized objective function as introduced in [1].

F(w)=1n∑i=1npi−λ∑r=1pwr2=1n∑i=1n[∑j=1,j≠inpijyij−λ∑r=1pwr2]︸Fi(w)=1n∑i=1nFi(w),

where λ is the regularization parameter. The regularization term drives many of the weights in w to 0.

After choosing the kernel parameter σ in pij as 1, finding the weight vector w can be expressed as the following minimization problem for given λ.

w^=argminwf(w)=argminw1n∑i=1nfi(w),

where f(w) = -F(w) and fi(w) = -Fi(w).

Note that

1n∑i=1n∑j=1,j≠inpij=1,

and the argument of the minimum does not change if you add a constant to an objective function. Therefore, you can rewrite the objective function by adding the constant 1.

w^=argminw{1+f(w)}=argminw{1n∑i=1n∑j=1,j≠inpij−1n∑i=1n∑j=1,j≠inpijyij+λ∑r=1pwr2}=argminw{1n∑i=1n∑j=1,j≠inpij(1−yij)+λ∑r=1pwr2}=argminw{1n∑i=1n∑j=1,j≠inpijl(yi,yj)+λ∑r=1pwr2},

where the loss function is defined as

l(yi,yj)={1ifyi≠yj,0otherwise.

The argument of the minimum is the weight vector that minimizes the classification error. You can specify a custom loss function using the LossFunction name-value pair argument in the call to fscnca.

matlab NCA,Neighborhood Component Analysis (NCA) Feature Selection相关推荐

论文阅读报告：Feature Selection for Multi-label Classification Using Neighborhood Preservation，Zhiling Cai
文章目录 1. 论文出处 2. 流程(示意图) 3. 预备知识 3.1 相似性保持特征选择(Similarity Preserving Feature Selection) 3.2 多标签 4. 论文 ...
A feature selection method via analysis of relevance, redundancy, and interaction
mutual information feature selection, MIFS:互信息特征选择 Smymetrical uncertainty, SU Fast Correlation-Basd ...
【计算神经科学冒险者们】2.3 神经编码：特征选择(Neural Encoding：Feature Selection)...
Today's Task:How to find the components of this model 1 选取特征Feature 1.1 How to proceed? Our problem ...
机器学习降维算法一：PCA (Principal Component Analysis)
引言: 机器学习领域中所谓的降维就是指采用某种映射方法,将原高维空间中的数据点映射到低维度的空间中.降维的本质是学习一个映射函数 f : x->y,其中x是原始数据点的表达,目前最多使用向量表达 ...
PCA（Principal Component Analysis）的原理、算法步骤和实现。
PCA的原理介绍: PCA(Principal Component Analysis)是一种常用的数据分析方法.PCA通过线性变换将原始数据变换为一组各维度线性无关的表示,可用于提取数据的主要特征分 ...
机器学习与高维信息检索 - Note 4 - 主成分分析及其现代解释（Principal Component Analysis, PCA）及相关实例
主成分分析及其现代解释 4. 主成分分析及其现代解释 Principal Component Analysis and Its Modern Interpretations 4.1 几何学解释 The ...
Feature Selection Based on Mutual Information:Criteria of Max-Dependency, Max-Relevance,and Min-Redu
Feature Selection Based on Mutual Information:Criteria of Max-Dependency, Max-Relevance,and Min-Redu ...
Machine Learning week 8 quiz: Principal Component Analysis
Principal Component Analysis 5 试题 1. Consider the following 2D dataset: Which of the following figur ...
【ML】Principle Component Analysis 主成分分析
PCA(Principal Component Analysis) 是一种常见的数据分析方式,常用于高维数据的降维,可用于提取数据的主要特征分量. 首先介绍一些关于线性降维的基本思想,用于线性PCA的 ...
R语言的特征选择(Feature Selection)包：Boruta和caret
转载自:http://www.zhizhihu.com/html/y2011/3188.html 对于大数据的分析,特征选择Feature Selection和降维是必不可少的,R有很多做FS的包,这 ...

matlab NCA,Neighborhood Component Analysis (NCA) Feature Selection

matlab NCA,Neighborhood Component Analysis (NCA) Feature Selection相关推荐

最新文章

热门文章