Curse of dimensionality - 维数灾难

curse [kɜːs]：n. 诅咒，咒骂 vt. 诅咒，咒骂 vi. 诅咒，咒骂
dimensionality [dɪ,menʃə'nælətɪ]：n. 维度，幅员，广延

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic optimization.[1] [2]
维数灾难 (curse of dimensionality，维度的诅咒) 是一个最早由理查德·贝尔曼 (Richard E. Bellman) 在考虑优化问题时首次提出来的术语，用来描述当 (数学) 空间维度增加时，分析和组织高维空间 (通常有成百上千维)，因体积指数增加而遇到各种问题场景。这样的难题在低维空间中不会遇到，如物理空间通常只用三维来建模。

phenomena [fə'nɒmɪnə]：n. 现象 (phenomenon 的复数)
physical ['fɪzɪk(ə)l]：adj. 物理的，身体的，物质的，根据自然规律的，符合自然法则的 n. 体格检查
coin [kɒɪn]：vt. 铸造 (货币)，杜撰，创造 n. 硬币，钱币
expression [ɪkˈspreʃn]：n. 表现，表示，表达，表情，脸色，态度，腔调，声调，式，符号，词句，语句，措辞，说法

举例来说，100 个平均分布的点能把一个单位区间以每个点距离不超过 0.01 采样；而当维度增加到 10 后，如果以相邻点距离不超过 0.01 小方格采样一单位超正方体，则需要 102010^{20}1020 个采样点。所以，这个 10 维的超正方体也可以说是比单位区间大 101810^{18}1018 倍。 (这个是理查德·贝尔曼所举的例子)

Cursed phenomena occur in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining and databases. The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. This sparsity is problematic for any method that requires statistical significance. In order to obtain a statistically sound and reliable result, the amount of data needed to support the result often grows exponentially with the dimensionality. Also, organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data, however, all objects appear to be sparse and dissimilar in many ways, which prevents common data organization strategies from being efficient.
在很多领域中，如采样、组合数学、机器学习和数据挖掘都有提及到这个名字的现象。这些问题的共同特色是当维数提高时，空间的体积提高太快，因而可用数据变得很稀疏。稀疏性对于任何要求有统计学意义的方法而言都是一个问题，为了获得在统计学上正确并且有可靠的结果，用来支撑这一结果所需要的数据量通常随着维数的提高而呈指数级增长。而且，在组织和搜索数据时也有赖于检测对象区域，这些区域中的对象通过相似度属性而形成分组。然而在高维空间中，所有的数据都很稀疏，从很多角度看都不相似，因而平常使用的数据组织策略变得极其低效。

occur [ə'kɜː]：vi. 发生，出现，存在
combinatorics [,kɒmbɪnə'tɒrɪks]：n. 组合学，组合数学 (等于 combinatorial analysis，combinatorial mathematics)
theme [θiːm]：n. 主题，主旋律，题目 adj. 以奇想主题布置的
sparse [spɑːs]：adj. 稀疏的，稀少的
problematic [prɒblə'mætɪk]：adj. 问题的，有疑问的，不确定的
statistically [stə'tɪstɪkli]：adv. 统计地，统计学上
exponentially [ˌekspəʊˈnenʃəlɪ]：adv. 以指数方式
dissimilar [dɪ'sɪmɪlə]：adj. 不同的
strategy [ˈstrætədʒɪ]：n. 战略，策略

维数灾难通常是用来作为不要处理高维数据的无力借口。然而，学术界一直都对其有兴趣，而且在继续研究。另一方面，也由于本征维度的存在，其概念是指任意低维数据空间可简单地通过增加空余 (如复制) 或随机维将其转换至更高维空间中，相反地，许多高维空间中的数据集也可削减至低维空间数据，而不必丢失重要信息。这一点也通过众多降维方法的有效性反映出来，如应用广泛的主成分分析方法。针对距离函数和最近邻搜索，当前的研究也表明除非其中存在太多不相关的维度，带有维数灾难特色的数据集依然可以处理，因为相关维度实际上可使得许多问题 (如聚类分析) 变得更加容易。另外，一些如马尔科夫蒙特卡洛或共享最近邻搜索方法经常在其他方法因为维数过高而处理棘手的数据集上表现得很好。

Combinatorics - 组合学

In some problems, each variable can take one of several discrete values, or the range of possible values is divided to give a finite number of possibilities. Taking the variables together, a huge number of combinations of values must be considered. This effect is also known as the combinatorial explosion. Even in the simplest case of ddd binary variables, the number of possible combinations already is O(2d)O(2^d)O(2d), exponential in the dimensionality. Naively, each additional dimension doubles the effort needed to try all combinations.
在一些问题中，每个变量都可取一系列离散值中的一个，或者可能值的范围被划分为有限个可能性。把这些变量放在一起，则必须考虑很多种值的组合方式，这后果就是常说的组合爆炸。即使在最简单的二元变量例子中，可能产生的组合总数就已经是在维数上呈现指数级的 O(2d)O(2^d)O(2d)。一般而言，每个额外的维度都需要成倍地增加尝试所有组合方式的影响。

finite ['faɪnaɪt]：adj. 有限的，限定的 n. 有限之物
explosion [ɪk'spləʊʒ(ə)n; ek-]：n. 爆炸，爆发，激增
combinatorial [kɒm,baɪnə'tɔːrɪəl]：adj. 组合的
exponential [,ekspə'nenʃ(ə)l]：adj. 指数的 n. 指数
naively [nɑ'ivli]：adv. 无邪地，天真烂漫地

Sampling - 采样

There is an exponential increase in volume associated with adding extra dimensions to a mathematical space. For example, 102=10010^{2}=100102=100 evenly spaced sample points suffice to sample a unit interval (a “1-dimensional cube”) with no more than 10−2=0.0110^{−2}=0.0110−2=0.01 distance between points; an equivalent sampling of a 10-dimensional unit hypercube with a lattice that has a spacing of 10−2=0.0110^{−2}=0.0110−2=0.01 between adjacent points would require 102010^{20}1020[=(102)10][=(10^{2})^{10}][=(102)10] sample points. In general, with a spacing distance of 10−n10^{−n}10−n the 10-dimensional hypercube appears to be a factor of 10n(10−1)[=(10n)10/(10n)]10^{n(10-1)}[=(10^{n})^{10}/(10^{n})]10n(10−1)[=(10n)10/(10n)] “larger” than the 1-dimensional hypercube, which is the unit interval. In the above example n=2n=2n=2: when using a sampling distance of 0.01 the 10-dimensional hypercube appears to be 1018 “larger” than the unit interval. This effect is a combination of the combinatorics problems above and the distance function problems explained below.
当在数学空间上额外增加一个维度时，其体积会呈指数级的增长。例如点间距离不超过 10−2=0.0110^{−2}=0.0110−2=0.01，102=10010^{2}=100102=100 个均匀间距的样本点足够采样到一个单位区间 (一个维度的立方体)；一个 10 维单元超立方体的等价采样，其相邻两点间的距离为 10−2=0.0110^{−2}=0.0110−2=0.01 则需要 102010^{20}1020[=(102)10][=(10^{2})^{10}][=(102)10] 个样本点。一般而言，点距为 10−n10^{−n}10−n 的 10 维超立方体所需要的样本点数量，是 1 维超立方体这样的单元区间的 10n(10−1)[=(10n)10/(10n)]10^{n(10-1)}[=(10^{n})^{10}/(10^{n})]10n(10−1)[=(10n)10/(10n)] 倍。在上面的 n=2n=2n=2 的例子中：当样本距离为 0.01 时，10 维超立方体所需要的样本点数量会比单元区间多 101810^{18}1018 倍。这一影响就是上面所述组合学问题中的组合结果，距离函数问题将在下面介绍。

volume ['vɒljuːm]：n. 量，体积，卷，音量，大量，册 adj. 大量的 vi. 成团卷起 vt. 把...收集成卷
extra ['ekstrə]：adv. 特别地，非常，另外 n. 临时演员，号外，额外的事物，上等产品 adj. 额外的，另外收费的，特大的
evenly ['i:vənlɪ]：adv. 均匀地，平衡地，平坦地，平等地
cube [kjuːb]：n. 立方，立方体，骰子 vt. 使成立方形，使自乘二次，量...的体积
equivalent [ɪ'kwɪv(ə)l(ə)nt]：adj. 等价的，相等的，同意义的 n. 等价物，相等物
hypercube ['haipə,kju:b]：n. 超立方体
lattice ['lætɪs]：n. 晶格，格子，格架 vt. 使成格子状
adjacent [ə'dʒeɪs(ə)nt]：adj. 邻近的，毗连的
suffice [sə'faɪs]：vt. 使满足，足够...用，合格 vi. 足够，有能力
appear [ə'pɪə]：vi. 出现，显得，似乎，出庭，登场

Optimization - 优化

When solving dynamic optimization problems by numerical backward induction, the objective function must be computed for each combination of values. This is a significant obstacle when the dimension of the “state variable” is large.
当用数值逆向归纳法解决动态优化问题时，目标函数针对每个可能的组合都必须计算一遍，当状态变量的维度很大时，这是极其困难的。

induction [ɪn'dʌkʃ(ə)n]：n. 感应，归纳法，感应现象，入门培训，入职仪式，就职，诱导
backward ['bækwəd]：adj. 向后的，反向的，发展迟缓的 adv. 向后地，相反地
obstacle ['ɒbstək(ə)l]：n. 障碍，干扰，妨碍，障碍物

Machine learning - 机器学习

In machine learning problems that involve learning a “state-of-nature” from a finite number of data samples in a high-dimensional feature space with each feature having a range of possible values, typically an enormous amount of training data is required to ensure that there are several samples with each combination of values. A typical rule of thumb is that there should be at least 5 training examples for each dimension in the representation.[3] With a fixed number of training samples, the predictive power of a classifier or regressor first increases as number of dimensions/features used is increased but then decreases,[4] which is known as Hughes phenomenon [5] or peaking phenomena.[3]
在机器学习问题中，需要在高维特征空间 (每个特征都能够取一系列可能值) 的有限数据样本中学习一种“自然状态” (可能是无穷分布)，要求有相当数量的训练数据含有一些样本组合。给定固定数量的训练样本，其预测能力随着维度的增加而减小，这就是所谓的 Hughes 影响或 Hughes 现象 (以Gordon F. Hughes 命名)

involve [ɪn'vɒlv]：vt. 包含，牵涉，使陷于，潜心于
finite ['faɪnaɪt]：adj. 有限的，限定的 n. 有限之物
enormous [ɪ'nɔːməs]：adj. 庞大的，巨大的，凶暴的，极恶的
predictive power：预测能力
tour [tʊə]：n. 旅游，旅行，巡回演出 vt. 旅行，在...旅游，在...作巡回演出 vi. 旅行，旅游，作巡回演出

贝叶斯统计

在贝叶斯统计中维数灾难通常是一个难点，因为其后验分布通常都包含着许多参数。
然而，这一问题在基于模拟的贝叶斯推理 (尤其是适应于很多实践问题的马尔科夫蒙特卡洛方法) 出现后得到极大地克服，当然，基于模拟的方法收敛很慢，因此这也并不是解决高维问题的灵丹妙药。

Distance functions - 距离函数

When a measure such as a Euclidean distance is defined using many coordinates, there is little difference in the distances between different pairs of samples.
当一个度量，如欧几里德距离使用很多坐标来定义时，不同的样本对之间的距离已经基本上没有差别。

pair [peə]：n. 一对，一双，一副 vt. 把...组成一对

One way to illustrate the “vastness” of high-dimensional Euclidean space is to compare the proportion of an inscribed hypersphere with radius rrr and dimension ddd, to that of a hypercube with edges of length 2r2r2r. The volume of such a sphere is 2rdπd/2dΓ(d/2){\frac {2r^{d} \pi ^{d/2}}{d \Gamma (d/2)}}dΓ(d/2)2rdπd/2, where Γ\GammaΓ is the gamma function, while the volume of the cube is (2r)d(2r)^d(2r)d. As the dimension ddd of the space increases, the hypersphere becomes an insignificant volume relative to that of the hypercube. This can clearly be seen by comparing the proportions as the dimension ddd goes to infinity:
VhypersphereVhypercube=πd/2d2d−1Γ(d/2)→0{\frac {V_{hypersphere}}{V_{hypercube}}}={\frac {\pi ^ {d/2}}{ d2^{d-1} \Gamma (d/2)}} \rightarrow 0VhypercubeVhypersphere=d2d−1Γ(d/2)πd/2→0 as d→∞d \rightarrow \inftyd→∞.
一种用来描述高维欧几里德空间的巨型性的方法是将超球体中半径 rrr 和维数 ddd 的比例，和超立方体中边长 2r2r2r 和等值维数的比例相比较。
这样一个球体的体积计算如下：2rdπd/2dΓ(d/2){\frac {2r^{d} \pi ^{d/2}}{d \Gamma (d/2)}}dΓ(d/2)2rdπd/2
立方体的体积计算如下：(2r)d(2r)^d(2r)d
随着空间维度 ddd 的增加，相对于超立方体的体积来说，超球体的体积就变得微不足道了。这一点可以从当 ddd 趋于无穷时比较前面的比例清楚地看出：VhypersphereVhypercube=πd/2d2d−1Γ(d/2)→0{\frac {V_{hypersphere}}{V_{hypercube}}}={\frac {\pi ^ {d/2}}{ d2^{d-1} \Gamma (d/2)}} \rightarrow 0VhypercubeVhypersphere=d2d−1Γ(d/2)πd/2→0 当 d→∞d \rightarrow \inftyd→∞。

A further development of this phenomenon is as follows. Any fixed distribution on R\mathbb{R}R induces a product distribution on points in Rd\mathbb{R}_dRd. For any fixed nnn, it turns out that the minimum and the maximum distance between a random reference point QQQ and a list of nnn random data points P1,...,PnP_1,...,P_nP1,...,Pn become indiscernible compared to the minimum distance:
这种现象的进一步发展如下。R\mathbb{R}R 上的任何固定分布都会导致产品分布在 Rd\mathbb{R}_dRd 中。对于任何固定的 nnn，事实证明随机参考点 QQQ 与 nnn 个随机数据点 P1，...，PnP_1，...，P_nP1，...，Pn 的列表之间的最小和最大距离与最小距离相比变得难以辨认：

lim⁡d→∞E(dist⁡max⁡(d)−dist⁡min⁡(d)dist⁡min⁡(d))→0{\displaystyle \lim _{d\to \infty }E({\frac {\operatorname {dist} _{\max }(d)-\operatorname {dist} _{\min }(d)}{\operatorname {dist} _{\min }(d)}})\to 0}d→∞limE(distmin(d)distmax(d)−distmin(d))→0.

phenomenon [fɪ'nɒmɪnən]：n. 现象，奇迹，杰出的人才
induce [ɪn'djuːs]：vt. 诱导，引起，引诱，感应

This is often cited as distance functions losing their usefulness (for the nearest-neighbor criterion in feature-comparison algorithms, for example) in high dimensions. However, recent research has shown this to only hold in the artificial scenario when the one-dimensional distributions R\mathbb{R}R are independent and identically distributed.[7] When attributes are correlated, data can become easier and provide higher distance contrast and the signal-to-noise ratio was found to play an important role, thus feature selection should be used.
这经常被引用为距离函数在高维度上失去其有用性 (例如，在特征比较算法中的最近邻标准)。然而，最近的研究表明，只有在一维分布 R\mathbb{R}R 独立且相同分布的情况下才能实现这一点。[7] 当属性相关时，数据可以变得更容易并提供更高的距离对比度，并且发现信噪比起着重要作用，因此应该使用特征选择。

cite [saɪt]：vt. 引用，传讯，想起，表彰
criterion [kraɪ'tɪərɪən]：n. 标准，准则，规范，准据
artificial[ɑːtɪ'fɪʃ(ə)l]：adj. 人造的，仿造的，虚伪的，非原产地的，武断的
scenario [sɪ'nɑːrɪəʊ]：n. 方案，情节，剧本，设想
identically [aɪ'dɛntɪkli]：adv. 同一地，相等地

因此，在某种意义上，几乎所有的高维空间都远离其中心，或者从另一个角度来看，高维单元空间可以说是几乎完全由超立方体的“边角”所组成的，没有“中部”，这对于理解卡方分布是很重要的直觉理解。 给定一个单一分布，由于其最小值和最大值与最小值相比收敛于 0，因此，其最小值和最大值的距离变得不可辨别。

这通常被引证为距离函数在高维环境下失去其意义的例子。

illustrate ['ɪləstreɪt]：vt. 阐明，举例说明，图解 vi. 举例
vastness ['væstnɪs]：n. 巨大，广大，广漠
proportion [prə'pɔːʃ(ə)n]：n. 比例，占比，部分，面积，均衡 vt. 使成比例，使均衡，分摊
inscribe [ɪn'skraɪb]：vt. 题写，题献，铭记，雕
hypersphere ['haipəsfiə]：n. 超球面
radius ['reɪdɪəs]：n. 半径，半径范围，桡骨，辐射光线，有效航程
hypercube ['haipə,kju:b]：n. 超立方体
sphere [sfɪə]：n. 范围，球体 vt. 包围，放入球内，使...成球形 adj. 球体的
insignificant [ɪnsɪg'nɪfɪk(ə)nt]：adj. 无关紧要的

Nearest neighbor search - 最近邻搜索

The effect complicates nearest neighbor search in high dimensional space. It is not possible to quickly reject candidates by using the difference in one coordinate as a lower bound for a distance based on all the dimensions.[8][9]
最近邻搜索在高维空间中影响很大，因为其不可能使用其中一个坐标上的距离下界来快速地去掉一个候选项，因为该距离计算需要基于所有维度。

complicate ['kɒmplɪkeɪt]：vt. 使复杂化，使恶化，使卷入
reject [(for v.)rɪˈdʒɛkt; (for n.)ˈriːdʒɛkt]：vt. 拒绝，排斥，抵制，丢弃 n. 被弃之物或人，次品
candidate ['kændɪdeɪt; -dət]：n. 候选人，候补者，应试者
lower bound：下界

However, it has recently been observed that the mere number of dimensions does not necessarily result in difficulties,[10] since relevant additional dimensions can also increase the contrast. In addition, for the resulting ranking it remains useful to discern close and far neighbors. Irrelevant (“noise”) dimensions, however, reduce the contrast in the manner described above. In time series analysis, where the data are inherently high-dimensional, distance functions also work reliably as long as the signal-to-noise ratio is high enough.[11]
然而，最近的研究表明仅仅一些数量的维度不一定会必然导致该问题，因为相关的附加维度也能增加其相反项。另外，结果排序的方法仍然有助于辨别近处和远处的邻居。然而，不相关 (噪声) 维度也如期望一样会减少相反项，在时间序列分析中，数据一般都是高维的，只要信噪比足够高的话，其距离函数也同样能够可靠地工作。

observe [əb'zɜːv]：vt. 庆祝 vt. 观察，遵守，说，注意到，评论 vi. 观察，说，注意到，评论
mere [mɪə]：adj. 仅仅的，只不过的 n. 小湖，池塘
relevant [ˈreləvənt]：adj. 相关的，切题的，中肯的，有重大关系的，有意义的，目的明确的
contrast ['kɒntrɑːst]：vi. 对比，形成对照 vt. 使对比，使与...对照 n. 对比，差别，对照物
discern [dɪ'sɜːn]：vt. 识别，领悟，认识 vi. 看清楚，辨别
inherently [ɪnˈhɪərəntlɪ]：adv. 内在地，固有地，天性地
reliably [ri'laiəbli]：adv. 可靠地，确实地
irrelevant [ɪ'relɪv(ə)nt]：adj. 不相干的，不切题的

kkk-nearest neighbor classification - kkk 近邻分类

Another effect of high dimensionality on distance functions concerns kkk-nearest neighbor (k-NN) graphs constructed from a data set using a distance function. As the dimension increases, the indegree distribution of the kkk-NN digraph becomes skewed with a peak on the right because of the emergence of a disproportionate number of hubs, that is, data-points that appear in many more kkk-NN lists of other data-points than the average. This phenomenon can have a considerable impact on various techniques for classification (including the kkk-NN classifier), semi-supervised learning, and clustering,[12] and it also affects information retrieval.[13]
高维度在距离函数的另一个影响例子就是 kkk 近邻 (kkk-NN) 图，该图使用一些距离函数从数据集构造。当维度增加时，kkk-NN 有向图的入度分页将会向右倾斜，从而导致中心的出现，很多的数据实例出现在其他许多实例 (比预期多得多) 的 kkk-NN 列表中。这一现象对很多技术，如分类 (包括最近邻居法、半监督学习，和聚类分析都有很大的影响。同时它也对信息检索问题有影响。

concern [kən'sɜːn]：vt. 涉及，关系到，使担心 n. 关系，关心，关心的事，忧虑
indegree：n. 入度，引入次数
skew [skjuː]：n. 斜交，歪斜 adj. 斜交的，歪斜的 vt. 使歪斜 vi. 偏离，歪斜，斜视
emergence [ɪ'mɜːdʒ(ə)ns]：n. 出现，浮现，发生，露头
disproportionate [,dɪsprə'pɔːʃ(ə)nət]：adj. 不成比例的
hub [hʌb]：n. 中心，毂，木片

References

[1] Dynamic Programming
[2] Adaptive Control Processes: A Guided Tour
[3] Pattern Recognition 4th Edition
[4] A Problem of Dimensionality: A Simple Example
[5] On the mean accuracy of statistical pattern recognizers
[6] When Is “Nearest Neighbor” Meaningful?
[7] A survey on unsupervised outlier detection in high‐dimensional numerical data
[8] Nearest Neighbour Searches and the Curse of Dimensionality
[9] Searching in metric spaces
[10] Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?
[11] Quality of Similarity Rankings in Time Series
[12] Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
[13] On the Existence of Obstinate Results in Vector Space Models