用词袋（bag of word）实现场景识别

前段时间在standford university的计算机视觉：算法与应用这门课上做了一个小作业——利用词袋实现场景识别（Scene recognition with bag of words），下面整理如下：

一、词袋模型

最先是由Josef等基于自然语言处理模型而提出的。这一模型在文档分类里广为应用，通过统计each word的frequency来作为分类器的特征。类比一篇文章由很多文字(textual words) 组合而成，如果將一張图片表示成由许多视觉单词（visual words）组合而成，就能将过去在文本检索（text retrieval）领域的技巧直接利用在图像检索（image retrieval）中，以文字检索系统现在的效率，图像表示的“文字化”也有助于大规模(large-scale)图像检索系统的效率。

下面通过一个简单的例子来说明词袋在文本处理中的应用：

如下两篇简单的文档;

基于这两篇文档建立一个字典（Dictionary）如下：

易见这个字典由10个distinct word构成，将其作为indexes，我们可将两篇文档表示为如下的10-entry vector：

通俗的讲：

Bag-of-words model实际就是把文档表示成向量,其中vector的维数就是字典所含词的个数，在上例中，vector中的第i个元素就是统计该文档中对应（字典）dictionry中的第i个单词出现的个数，因此可认为BoW model就是统计词频直方图的简单文档表示方法。

二、词袋模型在计算机视觉中的应用

类别识别的最简单的算法之一是词袋（bag of words，也称为特征袋，即bag of features或“关键点袋”bag of keypoints）方法。词袋类别识别系统的典型处理框架如下图所示：

由上图可以看出，我们先对训练数据进行关键区域检测，提取关键特征后得到其特征向量，形成视觉词典（vision dictionary），然后得出其直方图。也就是说，此时所得到的图像的单词是图像的特征（features），类似于我们上面所说的文档单词。而我们常用的特征有局部SIFT特征，这样我们就可以将一幅图像表示基于图像特征的统计直方图。

BoVM（视觉词汇，bag of visual wors）模型框架如下：

也就是说，我们要先从训练图像库中提取图像的sift特征，形成局部特征描述子集合，通过聚类算法形成多类视觉词汇，最终形成视觉词典。具体地说，图像视觉单词直方图生成过程可如下图所示：

由上图的上半部分可以看到，利用Bag-of-words（词袋）模型将一幅图像表示成数值向量的步骤如下：

1.特征提取，利用sift算法从不同类别的图像集（即训练图像）中提取特征，及我们所说的视觉词汇向量，这些向量代表的是图像中局部不变的特征点。

2.将所有的特征点向量集合到一块，利用K-Means算法合并词义相近的视觉词汇，构造一个包含K个词汇的的单词表。

3.统计单词中每个单词在图像中出现的次数，从而可以将图像表示成一个K维数值向量，就是K维的统计直方图。

要实现场景识别或者分类，我们就可以把测试图像和训练图像联系起来，从而实现场景识别或者分类。

三、实现场景识别

此次我实现的场景识别，是调用了一个机器视觉库的函数包完成的，它是VLFeat 0.9.17 bingary package，它在matlab中的配置方法如下：

一、首先应准备的东西：

1. Matlab软件（我使用的mathlab是2013b试用版的）
2. vlfeat文件，可以是二进制包，也可以是源码。如果使用windows平台的话，推荐使用二进制包。
二进制包的下载地址可以从官网下载，也可以从我的个人网盘下载：
官网地址：http://www.vlfeat.org/download/vlfeat-0.9.18-bin.tar.gz
我的个人网盘地址：http://pan.baidu.com/s/1c0zPSqs

二、安装
1. 将所下载的二进制包解压缩到某个位置，如D:\盘

2.将解压完后的vlfeat文件复制到matlab安装目录toolbox文件夹下
3. 打开matlab，输入edit startup.m创建启动文件startup.m
4. 在startup.m中编辑发下内容（注意，如果将vlfeat安装在不同的地方，需要将以下的”D:\”改为你所安装的地址）：

5. 保存并关闭startup.m文件，重新打开matlab程序，安装即成功（安装成功后，不能删除vlfeat解压后的文件夹，因为vl_setup只是将vlfeat的toolbox的地址加到matlab的path里面，使得matlab可以使用vlfeat toolbox）。

6.检验vlfeat是否成功配置，在matlab中输入以下命令，出现如下证明就已经配置成功了！

在计算即视觉——算法与应用这个小作业中，主要准备的数据有不同场景下的训练图像和测试图像集，主要通过两种方法实现场景识别：

1.tiny image and nearest neighbor classification（微图像和最近邻分类)（正确率18%~25%）

tiiny image（微小图像）特征，是最简单的图像表征方法之一。这里我按照作业的说法，简单地将每幅图像调整为固定的分辨率（16x16大小）。如果使tiny image 图像矩阵具有零均值和单位长度，这种方法效果会更好。但这种不是很好的图像表征方法，因为它回忽略高频图像内容，并且不具有图像尺度不变性。

nearest neighbor classification，即最近邻分类，这里我将测试图像得到的视觉词汇简单地与训练图像得到的视觉词典词汇做欧氏距离度量，然后找出最近距离，从而实现场景分类识别。最近邻分类有很多优点：无需训练，简单，易于理解，易于实现，无需估计参数。但是这种方法易于受到噪声的影响，并且随着特征维度的增加，这种方法不能很好地学习不相关维度的决策。

其matlab代码如下：

function image_feats = get_tiny_images(image_paths)
% image_paths is an N x 1 cell array of strings where each string is an
%  image path on the file system.
% image_feats is an N x d matrix of resized and then vectorized tiny
%  images. E.g. if the images are resized to 16x16, d would equal 256.
% small square resolution, e.g. 16x16. You can either resize the images to
% square while ignoring their aspect ratio or you can crop the center
% square portion out of each image. Making the tiny images zero mean and
% unit length (normalizing them) will increase performance modestly.
%file_paths = cell(System.IO.Directory.GetDirectories('D:\MATLAB\R2014a\bin\data3'));
%celldisp(file_paths);
[m,n] = size(image_paths);
d = 256;
%image_feats = [];
image_feats = zeros(m,d);
for i = 1:m%string = image_paths{i};s = num2str(cell2mat(image_paths(i)));image = imread(s);image = imresize(image,[16,16]);image = reshape(image,1,256);%image = image/norm(image);   %normalize the tiny imageimage = image - mean(image);  %make the tiny image zero meanimage_feats(i,1:d) = image;%image_feats = [image_feats;image];
end

最近邻分类代码如下：

function predicted_categories = nearest_neighbor_classify(train_image_feats, train_labels, test_image_feats)
% image_feats is an N x d matrix, where d is the dimensionality of the
%  feature representation.
% train_labels is an N x 1 cell array, where each entry is a string
%  indicating the ground truth category for each training image.
% test_image_feats is an M x d matrix, where d is the dimensionality of the
%  feature representation. You can assume M = N unless you've modified theD = vl_alldist2(X,Y) http://www.vlfeat.org/matlab/vl_alldist2.htmlreturns the pairwise distance matrix D of the columns of X and Y. D(i,j) = sum (X(:,i) - Y(:,j)).^2Note that vl_feat represents points as columns vs this code (and Matlabin general) represents points as rows. So you probably want to use thetranspose operator ' vl_alldist2 supports different distance metrics which can influenceperformance significantly. The default distance, L2, is fine for images.CHI2 tends to work well for histograms.[Y,I] = MIN(X) if you're only doing 1 nearest neighbor, or[Y,I] = SORT(X) if you're going to be reasoning about many nearestneighbors %}
[N,d] = size(test_image_feats);
predicted_categories = cell(N,1);
dist = zeros(N,N);
for i = 1:Nfor j = 1:Ndist(i,j) = vl_alldist2(test_image_feats(i,:)',train_image_feats(j,:)');end[Y,I] = min(dist(i,:));predicted_categories(i,1) = train_labels(I);
end

出来的结果如下：

2.SIFT features and nearset neighbor classification（正确率50%~60%）

利用vlfeat视觉库实现sift特征检测的代码如下：

（1）建立训练图像集的视觉词汇表

function vocab = build_vocabulary( image_paths, vocab_size )
% The inputs are images, a N x 1 cell array of image paths and the size of
% the vocabulary.
[centers, assignments] = vl_kmeans(X, K)http://www.vlfeat.org/matlab/vl_kmeans.htmlX is a d x M matrix of sampled SIFT features, where M is the number offeatures sampled. M should be pretty large! Make sure matrix is of typesingle to be safe. E.g. single(matrix).K is the number of clusters desired (vocab_size)centers is a d x K matrix of cluster centroids. This is your vocabulary.
N = size(image_paths,1);
image_sampledSIFT = [];
for i = 1:4:Ns = num2str(image_paths(i));    %%s = num2str(cell2mat(image_paths(i)));img = single(imread(s));[locations,SIFT_features] = vl_dsift(img,'STEP',10);SIFT_features = single(SIFT_features);image_sampledSIFT = [image_sampledSIFT SIFT_features];
end
[vocab assignments] = vl_kmeans(image_sampledSIFT,vocab_size);

（2）获取sift特征

function image_feats = get_bags_of_sifts(image_paths)
% image_paths is an N x 1 cell array of strings where each string is an
% image path on the file system.
%{Useful functions:
[locations, SIFT_features] = vl_dsift(img) http://www.vlfeat.org/matlab/vl_dsift.htmllocations is a 2 x n list list of locations, which can be used for extracredit if you are constructing a "spatial pyramid".SIFT_features is a 128 x N matrix of SIFT features
D = vl_alldist2(X,Y) http://www.vlfeat.org/matlab/vl_alldist2.htmlreturns the pairwise distance matrix D of the columns of X and Y. D(i,j) = sum (X(:,i) - Y(:,j)).^2  %}

</pre><pre name="code" class="html" style="color: rgb(51, 51, 51);">

load('vocab.mat')
fprintf('vocab loaded\n')
vocab_size = size(vocab, 2);
image_feats = [];
for i = 1:size(image_paths)img = single(imread(num2str(cell2mat(image_paths(i)))));[locations,SIFT_features] = vl_dsift(img,'STEP',10);SIFT_features = single(SIFT_features);D = vl_alldist2(vocab,SIFT_features);[X,I] = min(D);histogram = zeros(vocab_size,1);for j = 1:vocab_sizehistogram(I(j)) = histogram(I(j)) + 1;endhistogram = histogram/norm(histogram);image_feats(i,:) = histogram';
end

同样也是利用最近邻分类进行分类。

结果如下：

用词袋（bag of word）实现场景识别相关推荐

BoW(词袋Bag of words)
Bag-of-words词袋模型最初被用在信息检索领域.例如检索一篇文档,只需考虑文档中出现单词的频率,而不用考虑语法语序等.在图像领域,如果把一幅图像当作文档,图像中的图像块的特征向量视为文档内的词 ...
【NLP】词袋模型（bag of words model）和词嵌入模型（word embedding model）
本文作为入门级教程,介绍了词袋模型(bag of words model)和词向量模型(word embedding model)的基本概念. 目录 1 词袋模型和编码方法 1.1 文本向量化 1.2 ...
词袋模型（BOW，bag of words）和词向量模型（Word Embedding）概念介绍
一.词袋模型例句: Jane wants to go to Shenzhen. Bob wants to go to Shanghai. 将所有词语装进一个袋子里,不考虑其词法和语序的问题,即每个词 ...
词袋模型（BOW，bag of words）和词向量模型（Word Embedding）理解
Word2vec 向量空间模型在信息检索中是众所周知的,其中每个文档被表示为向量.矢量分量表示文档中每个单词的权重或重要性.使用余弦相似性度量计算两个文档之间的相似性. 尽管对单词使用矢量表示的想法也 ...
NLP：词袋模型（bag of words）、词向量模型（Word Embedding）
例句: Jane wants to go to Shenzhen. Bob wants to go to Shanghai 一.词袋模型将所有词语装进一个袋子里,不考虑其词法和语序的问题, ...
基于视词袋模型的场景识别
基于视词袋模型的场景识别一.问题场景内容的自动识别是计算机视觉领域的一个重要问题,对目标识别.检测基于内容的图像检索等计算机视觉方面的应用具有重要意义,最近12306的图片验证码系统就可以看做一个 ...
ORB词袋特征提取和匹配
一.预备知识点 Bag-of-Words ORB特征提取与匹配 5. Bag of Word 作用: 加速匹配和回环在跟踪线程里面一进来就要对每帧进行词袋的提取词袋模型是由Node和Word ...
NLP从词袋到Word2Vec的文本表示
目录 1.离散表示 1.1 One-hot表示 1.2 词袋模型 1.3 TF-IDF 1.4 n-gram模型 1.5 离散表示存在的问题 2. 分布式表示 2.1 共现矩阵 3.神经网络表示 3. ...
SLAM笔记（七）回环检测中的词袋BOW
1.词频 (摘自阮一峰博客,参见附录参考) 如果某个词很重要,它应该在这篇文章中多次出现.于是,我们进行"词频"(Term Frequency,缩写为TF)统计.考虑到文章有长短之 ...

用词袋（bag of word）实现场景识别

用词袋（bag of word）实现场景识别相关推荐

最新文章

热门文章