FLANN快速近似最邻近算法官方指导文档

  • Introduction
    • Quick Start 快速学习
  • Downloading and compiling FLANN 下载和编译
    • Upgrading from a previous version 版本升级
    • Compiling FLANN with multithreading support 多线程支持
  • Using FLANN 使用细则
    • Using FLANN from C++
      • flann::Index
      • flann::Index::buildIndex
      • flann::Index::addPoints
      • flann::Index::removePoint
      • flann::Index::getPoint
      • flann::Index::knnSearch
      • flann::Index::radiusSearch
      • flann::Index::save
      • flann::hierarchicalClustering
      • flann::KdTreeCuda3dIndex

by Oxidane Lin

本文翻译自FLANN的官方文档,原文是LATEX,改成了md格式方便阅读。能力时间所限,只翻译了C++部分,C/MATLAB/PYTHON还请自行对应。

Introduction

We can define the nearest neighbor search (NNS) problem in the
following way: given a set of points P=p1,p2,…,pnP=p_1,p_2,\dots,p_nP=p1​,p2​,…,pn​ in a metric
space XXX, these points must be preprocessed in such a way that given a
new query point q∈Xq \in Xq∈X, finding the point in PPP that is nearest to
qqq can be done quickly.

定义“最邻近搜索”问题如下:在某个空间XXX中任意给定一组点P=p1,p2,…,pnP=p_1,p_2,\dots,p_nP=p1​,p2​,…,pn​,这些点经过预处理后,需要达到以下效果:任意给定一个新的查询点 q∈Xq \in Xq∈X,可以快速找到距离qqq最近的点PPP

The problem of nearest neighbor search is one of major importance in a
variety of applications such as image recognition, data compression,
pattern recognition and classification, machine learning, document
retrieval systems, statistics and data analysis. However, solving this
problem in high dimensional spaces seems to be a very difficult task and
there is no algorithm that performs significantly better than the
standard brute-force search. This has lead to an increasing interest in
a class of algorithms that perform approximate nearest neighbor
searches, which have proven to be a good-enough approximation in most
practical applications and in most cases, orders of magnitude faster
that the algorithms performing the exact searches.

在图像识别、数据压缩、模式识别和分类、机器学习、文档检索系统、统计和数据分析等各种应用中,最近邻搜索问题是一个非常重要的问题。然而,在高维空间中解决这个问题似乎是一项非常困难的任务,而且没有任何算法能比标准的蛮力搜索表现得更好。这使得人们对使用近似最近邻搜索的一类算法越来越感兴趣,在大多数实际应用中,这已经证明是一个足够好的近似,在大多数情况下,比执行精确搜索的算法快一个数量级。

FLANN (Fast Library for Approximate Nearest Neighbors) is a library for
performing fast approximate nearest neighbor searches. FLANN is written
in the C++ programming language. FLANN can be easily used in many
contexts through the C, MATLAB, Python, and Ruby bindings provided with
the library.

FLANN就是用C++写的一个快速近似最邻近算法。

Quick Start 快速学习

This section contains small examples of how to use the FLANN library
from different programming languages (C++, C, MATLAB, Python, and Ruby).

  • C++
        // file flann_example.cpp// 最简单的例子#include <flann/flann.hpp>#include <flann/io/hdf5.h>#include <stdio.h>int main(int argc, char** argv){int nn = 3;flann::Matrix<float> dataset;flann::Matrix<float> query;flann::load_from_file(dataset, "dataset.hdf5","dataset");flann::load_from_file(query, "dataset.hdf5","query");flann::Matrix<int> indices(new int[query.rows*nn], query.rows, nn);flann::Matrix<float> dists(new float[query.rows*nn], query.rows, nn);// construct an randomized kd-tree index using 4 kd-treesflann::Index<flann::L2<float> > index(dataset, flann::KDTreeIndexParams(4));index.buildIndex();                                                                                               // do a knn search, using 128 checksindex.knnSearch(query, indices, dists, nn, flann::SearchParams(128));flann::save_to_file(indices,"result.hdf5","result");delete[] dataset.ptr();delete[] query.ptr();delete[] indices.ptr();delete[] dists.ptr();return 0;}

Downloading and compiling FLANN 下载和编译

FLANN can be downloaded from the following address:

http://www.cs.ubc.ca/\simmariusm/flann

After downloading and unpacking, the following files and directories
should be present:

  • bin: directory various for scripts and binary files

  • doc: directory containg this documentation

  • examples: directory containg examples of using FLANN

  • src: directory containg the source files

  • test: directory containg unit tests for FLANN

To compile the FLANN library the CMake[^1] build system is required.
Below is an example of how FLANN can be compiled on Linux (replace x.y.z
with the corresponding version number).

$ cd flann-x.y.z-src
$ mkdir build
$ cd build
$ cmake ..
$ make

On windows the steps are very similar:

> "C:\Program Files\Microsoft Visual Studio 9.0\VC\vcvarsall.bat"
> cd flann-x.y.z-src
> mkdir build
> cd build
> cmake ..
> nmake

There are several compile options that can be configured before FLANN is
compiled, for example the build type (Release, RelWithDebInfo, Debug) or
whether to compile the C, Python or the MATLAB bindings. To change any
of this options use the cmake-gui application after cmake has
finished (see figure [fig:cmake-gui]).

> cmake-gui .

Configuring the FLANN compile options

Upgrading from a previous version 版本升级

This section contains changes that you need to be aware of when
upgrading from a previous version of FLANN.

Version 1.8.0

Due to changes in the library, the on-disk format of the saved
indexes has changed and it is not possible to load indexes
previously saved with an older version of the library.

Version 1.7.0

A small breaking API change in flann::Matrix requires client code
to be updated. In order to release the memory used by a matrix, use:

  delete[] matrix.ptr();

instead of:

  delete[] matrix.data;

or:

  matrix.free();

The member data of flann::Matrix is not publicly accessible any
more, use the ptr() method to obtain the pointer to the memory
buffer.

Compiling FLANN with multithreading support 多线程支持

For taking advantage of multithreaded search, the project that uses FLANN needs to be compiled with a compiler that supports the OpenMP standard and the OpenMP support must be enabled. The number of cores to be used can be selected with the cores field in the SearchParams structure. By default a single core will be used. Setting the cores field to zero will automatically use as many threads as cores available on the machine.

SearchParams里把cores设为零可以自动用最多的核实现快速多线程搜索。

Using FLANN 使用细则

Using FLANN from C++

The core of the FLANN library is written in C++. To make use of the full power and flexibility of the templated code one should use the C++ bindings if possible. To use the C++ bindings you only need to include the the library header file flann.hpp. An example of the compile command that must be used will look something like this:

g++ flann_example.cpp -I $FLANN_ROOT/include -o flann_example_cpp

where $FLANN_ROOT is the library main directory.

The following sections describe the public C++ API.

用C++最能体现模板代码的优势所在,最好用C++进行编程。编译过程如上↑。

flann::Index

The FLANN nearest neighbor index class. This class is used to abstract different types of nearest neighbor search indexes. The class is templated on the distance functor to be used for computing distances between pairs of features.

Index类。是各种最临近搜索索引的抽象。该类用到了距离因子的模版,以计算特征之间的距离。

namespace flann
{template<typename Distance>class Index {typedef typename Distance::ElementType ElementType;typedef typename Distance::ResultType DistanceType;public:Index(const IndexParams& params, Distance distance = Distance() );Index(const Matrix<ElementType>& points, const IndexParams& params,Distance distance = Distance() );~Index();void buildIndex();        void buildIndex(const Matrix<ElementType>& points);void addPoints(const Matrix<ElementType>& points, float rebuild_threshold = 2);void removePoint(size_t point_id);ElementType* getPoint(size_t point_id);int knnSearch(const Matrix<ElementType>& queries, Matrix<int>& indices, Matrix<DistanceType>& dists, size_t knn, const SearchParams& params);int knnSearch(const Matrix<ElementType>& queries,std::vector< std::vector<int> >& indices,std::vector<std::vector<DistanceType> >& dists,size_t knn,const SearchParams& params);int radiusSearch(const Matrix<ElementType>& queries, Matrix<int>& indices, Matrix<DistanceType>& dists, float radius, const SearchParams& params);int radiusSearch(const Matrix<ElementType>& queries,std::vector< std::vector<int> >& indices,std::vector<std::vector<DistanceType> >& dists,float radius,const SearchParams& params);void save(std::string filename);int veclen() const;int size() const;IndexParams getParameters() const;flann_algorithm_t getType() const;};
}

The Distance functor

The distance functor is a class whose operator() computes the distance between two features. If the distance is also a kd-tree compatible distance it should also provide an accum_dist() method that computes the distance between individual feature dimensions. A typical distance functor looks like this (see the dist.h file for more examples):

距离因子是一个类,它的operator()用于计算特征之间的“距离”。如果该距离兼容k-d树(多维树),它同时会提供一个accum_dist()方法,用于计算单独维度中的距离。一个典型的距离因子计算方法如下,更多请参考dist.h

    template<class T>struct L2{typedef bool is_kdtree_distance;//判断是否为多维树typedef T ElementType;//元素的类型和模板类型一致typedef typename Accumulator<T>::Type ResultType; //template <typename Iterator1, typename Iterator2>ResultType operator()(Iterator1 a, Iterator2 b, size_t size, ResultType /*worst_dist*/ = -1) const{ResultType result = ResultType();ResultType diff;for(size_t i = 0; i < size; ++i ) {diff = *a++ - *b++;//对应位置上的a-b为差值result += diff*diff;//a-b的平方和为距离}return result;}template <typename U, typename V>inline ResultType accum_dist(const U& a, const V& b, int) const{return (a-b)*(a-b);}};

In addition to operator() and accum_dist(), a distance functor
should also define the ElementType and the ResultType as the types
of the elements it operates on and the type of the result it computes.

程序还应该指明元素的类型 ElementTypeResultType

If a distance functor can be used as a kd-tree distance (meaning that the full distance between a pair of features can be accumulated from the partial distances between the individual dimensions) a typedef is_kdtree_distance should be present inside the distance functor. If the distance is not a kd-tree distance, but it’s a distance in a vector space (the individual dimensions of the elements it operates on can be accessed independently) a typedef is_vector_space_distance should be defined inside the functor. If neither typedef is defined, the distance is assumed to be a metric distance and will only be used with indexes operating on generic metric distances.\

如果一个距离因子是kdtree的距离,也就是说,一对特征之间的总距离可以从各个独立维度的微分距离累加得到,那么变量is_kdtree_distance应该存在;如果某个距离因子不是kdtree的距离,而是一个向量空间的距离,(操作的每个独立的维度都可以独立获取),is_vector_space_distance变量应该定义。如果这两者都没有被定义,这个距离会被认为是一个度量距离,只会用通用的度量距离算子进行处理。

flann::Index::Index Constructs a nearest neighbor search index for a given dataset.

    Index(const IndexParams& params, Distance distance = Distance() );Index(const Matrix<ElementType>& points, const IndexParams& params,Distance distance = Distance() );
points 需要建立索引的特征形成的矩阵,每行一个特征,矩阵大小是特征数×维度。
:   Matrix containing the features(points) that should be indexed,stored in a row-major order (one point on each row of the matrix).The size of the matrix is $num\_features \times dimensionality$.params 包含索引参数的结构,构建索引的类型取决于这个参数,可用的类型有:
:   Structure containing the index parameters. The type of index thatwill be constructed depends on the type of this parameter. Thepossible parameter types are:

LinearIndexParams 传递一个该类型的对象时,索引会做线性暴力搜索。
When passing an object of this type, the index will perform a linear, brute-force search.

    struct LinearIndexParams : public IndexParams {};

KDTreeIndexParams 传递一个该类型的对象时,索引将由一系列随机的kdtrees并行搜索。
When passing an object of this type the index constructed will consist of a set of randomized kd-trees which will be searched in parallel.

    struct KDTreeIndexParams : public IndexParams{KDTreeIndexParams( int trees = 4 );};
trees 并行kdtree的数量,好的取值在1-16之间。
:   The number of parallel kd-trees to use. Good values are in therange [1..16]

KMeansIndexParams 传递一个该类型的对象时,索引将被构建成一个层次k均值树。
k均值聚类:以空间中k个点为中心进行聚类,对最靠近他们的对象归类。通过迭代的方法,逐次更新各聚类中心的值,直至得到最好的聚类结果。
When passing an object of this type the index constructed will be a hierarchical k-means tree.

    struct KMeansIndexParams : public IndexParams{KMeansIndexParams( int branching = 32,int iterations = 11,flann_centers_init_t centers_init = FLANN_CENTERS_RANDOM,float cb_index = 0.2 );};
branching 分支数
:   <span> The branching factor to use for the hierarchical k-meanstree </span>iterations 聚类过程中的最大迭代数。如果是-1,聚类将一直进行到收敛。
:   <span> The maximum number of iterations to use in the k-meansclustering stage when building the k-means tree. If a value of-1 is used here, it means that the k-means clustering should beiterated until convergence</span>centers\_init 进行k均值聚类的时候,挑选初始中心点的算法。可以用的参数有:CENTERS\_RANDOM 随机选CENTERS\_GONZALES Gonzales算法CENTERS\_KMEANSPP Kmeanspp算法
:   <span> The algorithm to use for selecting the initial centerswhen performing a k-means clustering step. The possible valuesare CENTERS\_RANDOM (picks the initial cluster centersrandomly), CENTERS\_GONZALES (picks the initial centers usingGonzales’ algorithm) and CENTERS\_KMEANSPP (picks the initialcenters using the algorithm suggested in @arthur_kmeanspp_2007)</span>cb\_index 聚类边界系数:该参数影响分层k均值树中“探索”进行的方式。如果是0,下一个探索的区域就是最近中心点的区域;大于0的值,会把区域的大小也考虑在内。
:   <span> This parameter (cluster boundary index) influences theway exploration is performed in the hierarchical kmeans tree.When `cb_index` is zero the next kmeans domain to be explored ischoosen to be the one with the closest center. A value greaterthen zero also takes into account the size of the domain.</span>

CompositeIndexParams 组合索引,用这种索引时,随机kdtree和层次k均值树会被组合使用。
When using a parameters object of this type the index created combines the randomized kd-trees and the hierarchical k-means tree.

    struct CompositeIndexParams : public IndexParams{CompositeIndexParams( int trees = 4,int branching = 32,int iterations = 11,flann_centers_init_t centers_init = FLANN_CENTERS_RANDOM, float cb_index = 0.2 );};

KDTreeSingleIndexParams 传递一个该类型的对象时,索引将会包含一个单独的kdtree,用于搜索低维度的数据,例如3d点云。
When passing an object of this type the index will contain a single kd-tree optimized for searching lower dimensionality data (for example 3D point clouds)

    struct KDTreeSingleIndexParams : public IndexParams{KDTreeSingleIndexParams( int leaf_max_size = 10 );};
max\_leaf\_size 每片叶子上包含点的最大数量。
:   The maximum number of points to have in a leaf for not branchingthe tree any more.

KDTreeCuda3dIndexParams 传递一个该类型的对象时,索引将会包含一个单独的kdtree,并且在CUDA上运算。大量搜索和查询点时,搜索表现最好。
When passing an object of this type the index will be a single kd-tree that is built and performs searches on a CUDA compatible GPU. Search performance is best for large numbers of search and query points. For more information, see section [sec:flann::cuda]

    struct KDTreeCuda3dIndexParams : public IndexParams{KDTreeCuda3dIndexParams( int leaf_max_size = 64 );};
max\_leaf\_size 每片叶子上包含点的最大数量。
:   The maximum number of points to have in a leaf for not branchingthe tree any more.

HierarchicalClusteringIndexParams 传递一个该类型的对象时,索引类型会是一个层次聚类索引。这种索引适用于任何度量距离,并且可以用汉明距离匹配二进制特征。
When passing an object of this type the index constructed will be a hierarchical clustering index. This type of index works with any metric distance and can be used for matching binary features using Hamming distances.

    struct HierarchicalClusteringIndexParams : public IndexParams{HierarchicalClusteringIndexParams(int branching = 32,flann_centers_init_t centers_init = FLANN_CENTERS_RANDOM,int trees = 4, int leaf_max_size = 100)};
branching 分支数
:   <span> The branching factor to use for the hierarchicalclustering tree </span>centers\_init 中心点选取算法
:   <span> The algorithm to use for selecting the initial centerswhen performing a k-means clustering step. The possible valuesare CENTERS\_RANDOM (picks the initial cluster centersrandomly), CENTERS\_GONZALES (picks the initial centers usingGonzales’ algorithm) and CENTERS\_KMEANSPP (picks the initialcenters using the algorithm suggested in @arthur_kmeanspp_2007)</span>trees 并行树,最好在3-8.
:   The number of parallel trees to use. Good values are in therange [3..8]leaf\_size 叶片大小
:   The maximum number of points a leaf node should contain.

LshIndexParams 传递一个该类型的对象时,索引将会是一种多探寻的局部敏感哈希方法,只适用于汉明距离匹配二进制特征。
When passing an object of this type the index constructed will be a multi-probe LSH (Locality-Sensitive Hashing) index. This type of index can only be used for matching binary features using Hamming distances.

    struct LshIndexParams : public IndexParams{LshIndexParams(unsigned int table_number = 12, unsigned int key_size = 20, unsigned int multi_probe_level = 2);};
table\_number 使用哈希表的数量
:   <span> The number of hash tables to use </span>key\_size 哈希表键的长度
:   <span> The length of the key in the hash tables</span>multi\_probe\_level 多探寻中用到的级别,0是标准LSH
:   Number of levels to use in multi-probe (0 for standard LSH)

AutotunedIndexParams 传递一个该类型的对象时,自动选择最佳的索引类型和参数(随机kdtree,层次k均值,线性)
When passing an object of this type the index created is automatically tuned to offer the best performance, by choosing the optimal index type (randomized kd-trees, hierarchical kmeans, linear) and parameters for the dataset provided.

    struct AutotunedIndexParams : public IndexParams{AutotunedIndexParams( float target_precision = 0.9,float build_weight = 0.01,float memory_weight = 0,float sample_fraction = 0.1 );};
target\_precision 0-1之间表征近似最临近搜索返回值是真正最临近值的百分比,此值越大,结果越精确,但搜索时间更长。最佳值一般取决于应用场景。
:   <span> Is a number between 0 and 1 specifying the percentage ofthe approximate nearest-neighbor searches that return the exactnearest-neighbor. Using a higher value for this parameter givesmore accurate results, but the search takes longer. The optimumvalue usually depends on the application. </span>build\_weight 表征索引构建时间相比于最临近搜索时间重要性的比例。有些应用场合,构建索引花费很长时间,随后的搜索非常快是允许的;另一些场景中要求索引被快速建立,即使这会使搜索时间更长。
:   <span> Specifies the importance of the index build time raportedto the nearest-neighbor search time. In some applications it’sacceptable for the index build step to take a long time if thesubsequent searches in the index can be performed very fast. Inother applications it’s required that the index be build as fastas possible even if that leads to slightly longer searchtimes.</span>memory\_weight 表征索引建立和搜索所需时间和消耗内存之间的权重关系,小于1的值更在意时间,大于1的值更在意内存消耗。
:   <span>Is used to specify the tradeoff between time (index buildtime and search time) and memory used by the index. A value lessthan 1 gives more importance to the time spent and a valuegreater than 1 gives more importance to the memory usage.</span>sample\_fraction 采样比例:0-1之间表征自动参数认证算法中使用的数据集的比例大小。当数据集过大时,耗时会非常久,用一部分表示整体也能得出很好的评估结果。
:   <span>Is a number between 0 and 1 indicating what fraction ofthe dataset to use in the automatic parameter configurationalgorithm. Running the algorithm on the full dataset gives themost accurate results, but for very large datasets can takelonger than desired. In such case using just a fraction of thedata helps speeding up this algorithm while still giving goodapproximations of the optimum parameters.</span>

SavedIndexParams 从硬盘读取一个之前保存好的索引。
This object type is used for loading a previously saved index from the disk.

    struct SavedIndexParams : public IndexParams{SavedIndexParams( std::string filename );};
filename 文件名字
:   <span> The filename in which the index was saved. </span>

flann::Index::buildIndex

构建最临近搜索索引,有两个重载,一个以点为参数,另一个以对象创建时提供的点为参数。
Builds the nearest neighbor search index. There are two versions of the
buildIndex method, one that uses the points provided as argument and
one that uses the points provided to the constructor when the object was
constructed.

    void buildIndex();        void buildIndex(const Matrix<ElementType>& points);

flann::Index::addPoints

addPoints添加点的方法,用于向建好的索引中加点,为了防止索引失衡,有阈值参数控制何时重建索引,默认为2.
The addPoints method can be used to incrementally add points to the
index after the index was build. To avoid the index getting unbalanced,
the addPoints method has the option of rebuilding the index after a
large number of points have been added. The rebuild_threshold
parameter controls when the index is rebuild, by default this is done
when it doubles in size (rebuild_threshold = 2).

    void addPoints(const Matrix<ElementType>& points, float rebuild_threshold = 2);

flann::Index::removePoint

removePoint移除点的方法,留下的点的索引号不会改变。
The removePoint method removes one point with the specified point_id
from the index. The indices (of the remaining points) returned by the
nearest neighbor operations do not change when points are removed from
the index.

    void removePoint(size_t point_id);

flann::Index::getPoint

getPoint返回一个点的id号,指向数据中的一个点。
The getPoint method returns a pointer to the data point with the specified point_id.

    ElementType* getPoint(size_t point_id);

flann::Index::knnSearch

进行k最近邻搜索,就是指最接近的k个邻居(数据),即每个样本都可以由它的K个邻居来表达。这个方法有两个重载,一个是预先分配内存的flann::Matrix对象,用于存放返回的索引号和距离;另一个是两层向量,会自动调整大小。
Performs a K-nearest neighbor search for a set of query points. There
are two signatures for this method, one that takes pre-allocated
flann::Matrix objects for returning the indices of and distances to
the neighbors found, and one that takes std::vector<std::vector> that
will re resized automatically as needed.

    int Index::knnSearch(const Matrix<ElementType>& queries,Matrix<int>& indices, Matrix<DistanceType>& dists,size_t knn,const SearchParams& params);int Index::knnSearch(const Matrix<ElementType>& queries,std::vector< std::vector<int> >& indices,std::vector<std::vector<DistanceType> >& dists,size_t knn,const SearchParams& params);
queries 包含查询点的矩阵,大小是查询点数×维度。
:   <span>Matrix containing the query points. Size of matrix is($num\_queries \times dimentionality $)</span>indices 包含找到的k临近结果的索引号,对预分配版本,最少要查询点数×knn的值。
:   <span>Matrix that will contain the indices of the K-nearestneighbors found (size should be at least $num\_queries \times knn$for the pre-allocated version).</span>dists 包含找到的k临近结果的距离信息,对预分配版本,最少要查询点数×knn的值。注意:对于欧式距离,`flann::L2`计算的是平方距离,所以传到此处的值也应该是平方值。
:   <span>Matrix that will contain the distances to the K-nearestneighbors found (size should be at least $num\_queries \times knn$for the pre-allocated version).\*Note:* For Euclidean distances, the `flann::L2` functor computessquared distances, so the value passed here needs to be a squareddistance as well. </span>knn 最临近搜索要找的点的数量
:   <span>Number of nearest neighbors to search for.</span>params 搜索参数,搜索中要用的参数的结构。
:   <span>Search parameters.</span> Structure containing parameters usedduring search.

SearchParameters 搜索参数

    struct SearchParams{SearchParams(int checks = 32,float eps = 0,bool sorted = true);int checks;float eps;bool sorted;int max_neighbors;tri_type use_heap;int cores;bool matrices_in_gpu_ram;};
checks 搜索临近点时最大叶子的数量。该值越大,搜索精度越高,耗时越久。所有叶片都搜到,用
`FLANN_CHECKS_UNLIMITED`;如果是自动认证的情况,达到预期精度所需checks的数量也会被计算,
想用该值的话,设为`FLANN_CHECKS_AUTOTUNED`。
:   specifies the maximum leafs to visit when searching forneighbours. A higher value for this parameter would give bettersearch precision, but also take more time. For all leafs to bechecked use the value `FLANN_CHECKS_UNLIMITED`. If automaticconfiguration was used when the index was created, the number ofchecks required to achieve the specified precision was alsocomputed, to use that value specify `FLANN_CHECKS_AUTOTUNED`.eps 搜索 eps-大致临近的点,大概是epsilon的意思?只用于KDTreeSingleIndex/KDTreeCuda3dIndex。
:   Search for eps-approximate neighbors (only used byKDTreeSingleIndex and KDTreeCuda3dIndex).sorted 只用于半径搜索,表征返回的临近点是否按距离排序。
:   Used only by radius search, specifies if the neighbors returnedshould be sorted by distance.max\_neighbours 表征半径搜索返回临近值的最大数量,仅用于半径搜索。
:   Specifies the maximum number of neighbors radius search shouldreturn (default: -1 = unlimited). Only used for radius search.use\_heap 用堆排序。用堆的数据结构来管理内部的临近列表(可选值 FLANN\_False, FLANN\_True,FLANN\_Undefined)。堆对于大量临近点更高效,对少量值的效果不明显。默认值是Undefined,让代码根据临近点的数量自己选择最佳值。该选项只用与knn搜索。
:   Use a heap data structure to manage the list of neighborsinternally (possible values: FLANN\_False, FLANN\_True,FLANN\_Undefined). A heap is more efficient for a large numberof neighbors and less efficient for a small number of neighbors.Default value is FLANN\_Undefined, which lets the code choosethe best option depending on the number of neighbors requested.Only used for KNN search.cores 搜索中内核的数量。
:   How many cores to assign to the search (specify 0 for automaticcore selection).matrices\_in\_gpu\_ram GPU搜索中,表征矩阵是否已经在GPU内存中。
:   for GPU search indicates if matrices are already in GPU ram.

flann::Index::radiusSearch

进行半径最近邻搜索。这个方法有两个重载,一个是预先分配内存的flann::Matrix对象,用于存放返回的索引号和距离;另一个是两层向量,会自动调整大小。
Performs a radius nearest neighbor search for a set of query points.
There are two signatures for this method, one that takes pre-allocated
flann::Matrix objects for returning the indices of and distances to
the neighbors found, and one that takes std::vector<std::vector> that
will resized automatically as needed.

    int Index::radiusSearch(const Matrix<ElementType>& queries,Matrix<int>& indices,Matrix<DistanceType>& dists,float radius,const SearchParams& params); int Index::radiusSearch(const Matrix<ElementType>& queries,std::vector< std::vector<int> >& indices,std::vector<std::vector<DistanceType> >& dists,float radius,const SearchParams& params);
queries
The query point. Size of matrix is ($num_queries \times dimentionality $).
indices
Matrix that will contain the indices of the K-nearest
neighbors found. For the pre-allocated version, only as many
neighbors are returned as many columns in this matrix. If fewer
neighbors are found than columns in this matrix, the element after
that last index returned is -1. In case of the std::vector version,
the rows will be resized as needed to fit all the neighbors to be
returned, except if the “max_neighbors” search parameter is
set.
dists
Matrix that will contain the distances to the K-nearest
neighbors found. The same number of values are returned here as for
the indices matrix.
Note: For Euclidean distances, the flann::L2 functor computes
squared distances, so the value passed here needs to be a squared
distance as well.
radius
The search radius.
Note: For Euclidean distances, the flann::L2 functor computes
squared distances, so the value passed here needs to be a squared
distance as well.
params
Search parameters.

The method returns the number of nearest neighbors found.

flann::Index::save

保存索引到一个文件
Saves the index to a file.

    void Index::save(std::string filename);
filename 文件名字
The file to save the index to

flann::hierarchicalClustering

多层聚类:将给定的点构造成一个层次k均值树,并且选择其中使聚类方差最小的一支cut。
Clusters the given points by constructing a hierarchical k-means tree
and choosing a cut in the tree that minimizes the clusters’ variance.

    template <typename Distance>int hierarchicalClustering(const Matrix<typename Distance::ElementType>& points, Matrix<typename Distance::ResultType>& centers,const KMeansIndexParams& params,Distance d = Distance())
features 要被聚类的点
:   <span>The points to be clustered</span>centers 聚类得到的中心点。这个矩阵中的行数表征了所需聚类数量。然而,基于层次树中分支的选择,计算得到的聚类数量会是
$(branching-1)*k+1$ 的最大值,该值会比所求聚类数小。$branching$是分支数。
:   <span>The centers of the clusters obtained. The number of rows inthis matrix represents the number of clusters desired. However,because of the way the cut in the hierarchical tree is choosen, thenumber of clusters computed will be the highest number of the form$(branching-1)*k+1$ that’s lower than the number of clustersdesired, where $branching$ is the tree’s branching factor (seedescription of the KMeansIndexParams). </span>params
:   <span>Parameters used in the construction of the hierarchicalk-means tree</span>

该方法的返回值是计算得到聚类的数量。
The function returns the number of clusters computed.

flann::KdTreeCuda3dIndex

提供CUDA接口,用于大量3d数据集的提速。
FLANN provides a CUDA implementation of the kd-tree build and search
algorithms to improve the build and query speed for large 3d data sets.
This section will provide all the necessary information to use the
KdTreeCuda3dIndex index type.

编译:如果CMake检测到CUDA模块,libflann_cuda.so库将会被编译。
Building: If CMake detects a CUDA install during the build (see
section [sec:downloadingandcompiling]), a library libflann_cuda.so
will be built.

基本用法:
Basic Usage: To be able to use the new index type, you have to include the FLANN header this way:

    #define FLANN_USE_CUDA#include <flann/flann.hpp>

如果在头文件之前定义了FLANN_USE_CUDA,你需要手动链接libflann_cuda.so到你的项目。这样的好处是你不用编译的时候加上nvcc参数了,除非你还要用其它的CUDA代码。
If you define the symbol FLANN_USE_CUDA before including the FLANN
header, you will have to link libflann_cuda.so or libflann_cuda_s.a
with your project. However, you will not have to compile your source
code with nvcc, except if you use other CUDA code, of course.

然后,可以用KDTreeCuda3dIndexParams方法创建索引,这个索引会负责所有索引创建和搜索的数据拷贝工作。
You can then create your index by using the KDTreeCuda3dIndexParams to
create the index. The index will take care of copying all the data from
and to the GPU for you, both for index creation and search.

当心:关于选用CUDA还是多线程CPU进行kdtree运算,很难给出一条通用的指导意见。最好是尝试一下,看看效果决定。
A word of caution: A general guideline for deciding whether to use the
CUDA kd tree or a (multi-threaded) CPU implementation is hard to give,
since it depends on the combination of CPU and GPU in each system and
the data sets. For example, on a system with a Phenom II 955 CPU and a
Geforce GTX 260 GPU, the maximum search speedup on a synthetic (random)
data set is a factor of about 8-9 vs the single-core CPU search, and
starts to be reached at about 100k search and query points. (This
includes transfer times.) Build time does not profit as much from the
GPU acceleration; here the benefit is about 2x at 200k points, but this
largely depends on the data set. The only way to know which
implementation is best suited is to try it.

高级用法:一些情况下,GPU里已经有了缓存的数据,这时可以直接用缓存,而不必再从系统缓存中拷贝。具体的实现方法是,把GPU的指针传给flann::Matrix的输入,并且告诉索引按照GPU的指针去处理。
Advanced Usage: In some cases, you might already have your data in a
buffer on the GPU. In this case, you can re-use these buffers instead of
copying the buffers back to system RAM for index creation and search.
The way to do this is to pass GPU pointers via the flann::Matrix
inputs and tell the index via the index and search params to treat the
pointers as GPU pointers.

    thrust::device_vector<float4>  points_vector( n_points );// ... fill vector here...float* gpu_pointer = (float*)thrust::raw_pointer_cast(&points_vector[0]);flann::Matrix<float> matrix_gpu(gpu_pointer,n_points,3, 4);flann::KDTreeCuda3dIndexParams params;params["input_is_gpu_float4"]=true;flann::Index<flann::L2<float> > flannindex( matrix_gpu, params  );flannindex.buildIndex();

首先,创建并填充一个格式为float4的GPU缓存
First, a GPU buffer of float4 values is created and filled with points.[^2]

然后,把一个指向该缓存的GPU指针存放在一个FLANN的矩阵中,3列,步长为4(·float4的最后一个元素应当为0
Then, a GPU pointer to the buffer is stored in a flann matrix with 3 columns and a stride of 4 (the last element in the float4 should be zero).

最后,创建索引,input_is_gpu_float4标识表明以GPU缓存的形式去处理矩阵。
Last, the index is created. The input_is_gpu_float4 flag tells the index to treat the matrix as a gpu buffer.

同样的,返回值为flann矩阵的搜索过程也可以用GPU缓存来处理,但返回值为向量的不可以。要实现这种效果,索引中的指针和距离矩阵一定要指向GPU缓存,并且cols的值必须设为3(因为我们在处理三维点)。这里的步长值stride可以任意设置。

Similarly, you can specify GPU buffers for the search routines that
return the result via flann matrices (but not for those that return them
via std::vectors). To do this, the pointers in the index and dist
matrices have to point to GPU buffers and the cols value has to be set
to 3 (as we are dealing with 3d points). Here, any value for stride
can be used.

    flann::Matrix<int> indices_gpu(gpu_pointer_indices,n_points, knn, stride);flann::Matrix<float> dists_gpu(gpu_pointer_dists,n_points, knn, stride);flann::SearchParams params;params.matrices_in_gpu_ram = true;flannindex.knnSearch( queries_gpu ,indices_gpu,dists_gpu,knn, params);

注意:这里千万不能混用CPU里的矩阵信息。
Note that you cannot mix matrices in system and CPU ram here!

Search Parameters: 搜索参数
The search routines support three parameters:搜索支持以下参数:

  • eps - used for approximate knn search. The maximum possible error
    is e=dbest∗epse= d_{best} * epse=dbest​∗eps, i.e. the distance of the returned neighbor
    is at maximum epsepseps times larger than the distance to the real best
    neighbor.

    用于大致k临近搜索。最大可能误差值是 e=dbest∗epse= d_{best} * epse=dbest​∗eps,也就是返回的临近点的距离最多也就比实际最短距离多出了eee

  • use_heap - used in knn and radius search. If set to true, a heap
    structure will be used in the search to keep track of the distance
    to the farthest neighbor. Beneficial with about k>64k>64k>64 elements.

    knn和半径搜索中用的参数,堆结构。堆结构主要用来跟踪最远临近点,元素数多于64比较有益。

  • sorted - if set to true, the results of the radius and knn
    searches will be sorted in ascending order by their distance to the
    query point.

    设为true时,半径搜索和knn搜索会按照到查询点距离的升序排列

  • matrices_in_gpu_ram - set to true to indicate that all (input and
    output) matrices are located in GPU RAM.

    设为true表示所有矩阵信息都是在GPU内存里的。

FLANN快速近似最邻近算法官方指导文档相关推荐

  1. golang中文文档_【译】Go 语言源码贡献官方指导文档

    以前给 Go 语言项目源码提交过一些 commits,期间阅读他们的官方指导文档的时候觉得这篇指导文档可以作为绝佳的关于大型软件项目的规范管理的参考,因为最近又提交了几个 commits,就又把这篇文 ...

  2. FreeRTOS官方指导文档阅读笔记

    FreeRTOS官方指导文档阅读笔记 基于 161204_Mastering_the_FreeRTOS_Real_Time_Kernel-A_Hands-On_Tutorial_Guide.pdf,可 ...

  3. SprintBoot开发官方指导文档

    2019独角兽企业重金招聘Python工程师标准>>> http://docs.spring.io/spring-boot/docs/current/reference/html/b ...

  4. 学习记录《Simulink 快速入门 —— 官方Help文档》

    Simulink 快速入门 -- 官方Help文档 1. Simulink 模块图 2 创建简单模型 2.1 打开新模型 2.2 打开 Simulink库浏览器 2.2.1将模块添加到模型 2.2.2 ...

  5. 【Auto.JS】Autojs官方提取文档使用说明函数 (2)

    接上一篇文章:Autojs官方提取文档使用说明函数 (1) Images Stability: 2 - Stable images模块提供了一些手机设备中常见的图片处理函数,包括截图.读写图片.图片剪 ...

  6. PyQt5随笔:PyQt5 官方网站文档、一些基础教程网站和书籍推荐

    PyQt5随笔:PyQt5 官方网站文档.一些基础教程网站和书籍推荐 文章目录 PyQt5随笔:PyQt5 官方网站文档.一些基础教程网站和书籍推荐 1. 官方网站文档 2.基础教程网站(点击标题即可 ...

  7. java se说明文档_Java笔记---Java官方API文档的下载及使用

    个人网站还在做数据迁移和备案,就先写在简书了(博客园将常被盗取),但是知道为啥不能发布Markdown Java 官方API文档 官方文档即是最好的学习教材 API文档是什么 说明文档,用于说明每个类 ...

  8. UniRx官方入门文档

    UniRx 官方文档翻译 UniRx官方入门文档 UniRx入门系列一 UniRx入门系列二 UniRx入门系列三 UniRx入门系列四 UniRx入门系列五 UniRx 基于Unity的响应式编程框 ...

  9. Java日志框架-Logback手册中文版以及官方配置文档教程

    Logback手册中文版:(链接: https://pan.baidu.com/s/1bpMyasR 密码: 6u5c),虽然版本有点旧,但是大体意思差不多,先用中文版了解个大概,然后一切最新的配置以 ...

最新文章

  1. java正则表达式 匹配%号_java正则表达式匹配带有括号的电话号为什么匹配不上...
  2. 在GridView中如何格式化Money型字段?
  3. LinkedIn已将可自定义的索引引擎IndexTank开源
  4. api中文文档 mws_中英文排版规范化 API
  5. P3243-[HNOI2015]菜肴制作【拓扑排序,优先队列】
  6. NAND Flash和NOR Flash的比较
  7. java执行db2命令_送你一份P6级Java面试题
  8. 数据结构与算法之递归和分治思想
  9. 计算几何——poj1410,线段不规范交
  10. Redis之连接redis服务命令
  11. openwrt 南京电信 PPPOE IPv6 dhcpv6 client设置方法
  12. css如何使图片在右下角,这个右下角折角用css怎么画出来?
  13. A study finds nearly half of jobs are vulnerable to automation
  14. 微信图片怎么添加竖排文字_轻松设置微信竖排昵称
  15. 深度学习概念——Epoch, Batch, Iteration
  16. 在1-10中选择一个数,输出x+xx+xxx+xxx....x之和,如:数字为2,则2+22=24
  17. 血杀英雄为什么不显示服务器,《血杀英雄》新服开启后 这些地方你去过吗?...
  18. java:IO流(缓冲流、对象流、控制台IO、转换流、java.io.File 类 )
  19. 初雪  文/奧斯·科特林
  20. Arduino极速入门教程——两篇文章让你会用Arduino(下)

热门文章

  1. 2020小迪安全第六天笔记-加密算法
  2. 《微信公众号-腾讯问卷》03-查看问卷答题结果统计
  3. Android 设置背景之layer-list,层叠多个背景
  4. Apache NIFI 安装 ● 操作 ● 文件同步 ● oracle 数据库增量同步实例讲解
  5. RTE 大会回顾 | 基于 Web 引擎技术的 Web 内容录制
  6. InDesign CC2018软件安装
  7. 新iPhone将采用全新四核A6芯片,用9针数据线?
  8. CSS介绍/音视频标签/字体/文本/选择器
  9. 【Premiere Pro模板】动态模板(.mogrt文件)无法改变颜色的解决方法
  10. 【python】《多媒体技术与应用》实验报告「数字图像压缩」「傅里叶变换」