机器人学数学理论_基于格理论的机器学习数学

机器人学数学理论

This is a third article in the series of works (see also first one and second one) describing Machine Learning system based on Lattice Theory named 'VKF-system'. It uses structural (lattice theoretic) approach to representing training objects and their fragments considered to be causes of the target property. The system computes these fragments as similarities between some subsets of training objects. There exists the algebraic theory for such representations, called Formal Concept Analysis (FCA). However the system uses randomized algorithms to remove drawbacks of the unrestricted approach. The details follow…

这是这一系列工作的第三篇文章(另请参阅第一篇和第二篇 )，描述了基于格子理论的名为“ VKF-system”的机器学习系统。它使用结构(晶格理论)方法来表示训练对象及其被认为是目标属性原因的片段。系统将这些片段计算为训练对象的某些子集之间的相似度。对于这种表示，存在代数理论，称为形式概念分析(FCA)。然而，系统使用随机算法来消除无限制方法的缺点。详细信息如下...

介绍 (Introduction)

We begin with demonstration of our approach by its application to a school problem. It is to find sufficient conditions on a convex quadrangle possessing symmetries to be circled and to predict this property of the rectangle.

我们首先通过将其应用于学校问题来演示我们的方法。在具有对称性的凸四边形上找到足够的条件以进行圆化，并预测矩形的这种特性。

Hence there are two target classes: positive (there exists a circle around the quadrangle) and negative.

因此，有两个目标类别：正目标(在四边形周围有一个圆)和负目标。

The training sample contains square, isosceles trapezoid, diamond, and deltoid (see the rows labels in Table below).

训练样本包含正方形，等腰梯形，菱形和三角肌(请参见下表中的行标签)。

A single test example is the rectangle.

一个测试示例是矩形。

We represent each quadrangle by a subset of attributes related to its possible symmetries:

我们用与其可能的对称性相关的属性子集来表示每个四边形：

"There exists a central symmetry point" (A), "The group of rotations is trivial" (B), "The group of rotations contains at least two elements" (C), "There is a diagonal symmetry axis" (D), "There is a non-diagonal symmetry axis" (E). They correspond to columns labels in Table below.

“存在中心对称点”( A )，“旋转组是微不足道的”( B )，“旋转组包含至少两个元素”( C )，“有对角对称轴”( D ) ，“有一个非对角对称轴”( E )。它们对应于下表中的列标签。

quadrangle	target	A	B	C	D	E
square	1	1	0	1	1	1
trapezoid	1	0	1	0	0	1
diamond	0	1	0	1	1	0
deltoid	0	0	1	0	1	0
rectangle	?	1	0	1	0	1

四边形	目标	一个	乙	C	d	Ë
广场	1个	1个	0	1个	1个	1个
梯形	1个	0	1个	0	0	1个
钻石	0	1个	0	1个	1个	0
三角肌	0	0	1个	0	1个	0
长方形	？	1个	0	1个	0	1个

To discover possible causes (in terms of symmetries) the system computes similarities (common attributes) between training examples of same sign. Hence we have

为了发现可能的原因(就对称性而言)，系统计算相同符号的训练示例之间的相似性(公共属性)。因此，我们有

where first subset collects parents (all the training objects whose similarity computed), and the second is a common fragment of these examples.

其中第一个子集收集父母(计算出相似性的所有训练对象)，第二个子集是这些示例的共同片段。

Since common fragment is a part of rectangle's description , the system predicts the target property positively, i.e. the rectangle is circled. It corresponds to Analogy cognitive procedure of the JSM-method. The analogues of the rectangle are parents (square and trapezoid) that have the same fragment as common part.

由于共同的片段是矩形说明的一部分，系统会积极预测目标属性，即，将矩形圈出。它对应于JSM方法的类比认知过程。矩形的类似物是父母(正方形和梯形)，它们具有与公共部分相同的片段。

However, we can exchange signs: the similarity between negative examples is

但是，我们可以交换符号：否定示例之间的相似之处是

This observation leads to Argumentation Logics, but we prefer to omit the details here. Interested reader refers to the author's papers from Финн В.К., Аншаков О.М., Виноградов Д.В. (Ред.). Многозначные логики и их применения. Том 2: Логики в системах искусственного интеллекта, M.: URSS, 2020, 238 с. ISBN 978-5-382-01977-2 (in Russian).

这种观察导致了论证逻辑，但是我们更愿意在这里省略细节。有兴趣的读者可以参考ФиннВ.К.，АншаковО.М.，ВиноградовД.В的作者论文。 (Ред。)。 Многозначныелогикииихприменения。 2月2日：路易斯安那州，拉里斯，2020年，238с.。上一篇：Логикивсистемахискусственногоинтеллекта。 ISBN 978-5-382-01977-2(俄语)。

English translations may be requested from the author (Allerton Press now is a part of Springer, but original translations are not available through any sites).

可能需要作者提供英文翻译(Allerton Press现在是Springer的一部分，但任何网站都无法提供原始翻译)。

However the similarity between negative examples demonstrate the 'counter-example forbidden' condition, since its fragment is a part of description of opposite sign example 'square'.

但是，否定示例之间的相似性表明了“反例禁止”条件，因为其片段是描述的一部分相反符号示例“正方形”。

1.形式概念分析 (1. Formal Concept Analysis)

Initially the author planned to represent the theory in terms of so-called JSM-method of automatic hypotheses generation. But its creator said the doubt on possibility to express 'rich ideas of JSM-method in popular fashion'. Hence the author decide to use FCA language for this article. However the author will use some own terms paired with original ones (in brackets) where he prefer to change the terminology.

最初，作者计划用自动假设生成的所谓JSM方法来代表该理论。但是它的创造者对是否有可能表达“流行时尚的JSM方法的丰富思想”表示怀疑。因此，作者决定在本文中使用FCA语言。但是，作者将使用一些自己的术语与原始术语(在方括号中)配对，在其中他更喜欢更改术语。

A sample (=formal context) is a triple where and are finite sets and . The elements of and are called objects and attributes, respectively. As usual, we write instead of to denote that object has attribute .

一个样本 (= 正式语境 )是一个三元组哪里和是有限集。的要素和分别称为对象和属性。和往常一样，我们写代替表示那个物体有属性。

For and , define

对于和，定义

so is the set of attributes common to all the objects in and is the set of objects possessing all the attributes in . The maps and are called polars (=derivation operators) of sample .

所以是所有对象中共有的属性集和是拥有以下所有属性的对象的集合。地图和被称为样本的极点 (= 导数 ) 。

A candidate (=formal concept) of sample is defined to be a pair , where , , , and . The first component of candidate is called the parents list (=extent) of the candidate, and the second component is called its fragment (=intent). The set of all candidates of sample is denoted by .

样本的候选人 (= 正式概念 ) 被定义为一对，在哪里，，和。第一部分候选人被称为候选人的父母名单 (= 程度 )，第二部分称为其片段 (= intent )。样本的所有候选集用表示。

It is an easy exercise to check that is a lattice with operations

检查一下是否容易是带有操作的晶格

We use a special case: for , , and define

我们使用一种特殊情况：，和定义

We call these operations CbO because the first one is used in well-known Close-by-One (CbO) Algorithm to generate all the elements of .

之所以将这些操作称为CbO，是因为第一个操作用于众所周知的“一人制(CbO)”算法中，以生成。

Most important (monotonicity) property of CbO operations is represented in the following Lemma

CbO操作最重要的(单调性)性质由以下引理表示

Let be a sample, , , and . Then

让成为样本，和。然后

2. FCA问题 (2. Problems with FCA)

Unfortunately, the author and his colleagues have discovered and investigated some theoretical shortcomings of FCA based approach to Machine Learning:

不幸的是，作者及其同事发现并研究了基于FCA的机器学习方法的理论缺陷：

The number of hypotheses can be an exponentially large with respect to the size of input data (training sample) in the worst case.

在最坏的情况下，相对于输入数据(训练样本)的大小，假设的数量可能成倍增长。
Problem of detection of large hypothesis is computational (NP-)hard.

大假设的检测问题是计算(NP-)困难的。
Overtraining is unavoidable and appears in practice.

过度培训是不可避免的，并且在实践中会出现。
There are 'phantom' similarities between training examples, where each such parent has alternative hypothesis on the target property cause.

训练示例之间存在“幻像”相似之处，其中每个这样的父母都有关于目标属性原因的替代假设。

To demonstrate drawback 1 we need Boolean algebra case corresponding to the sample of coatoms as positive examples:

为了证明缺点1，我们需要与coveroms样本相对应的布尔代数格作为正例：


0	1	1
1	0	1

1	1	0


0	1个	1个
1个	0	1个

1个	1个	0

Then it is easy to check that any pair is a candidate. Hence there are candidates.

然后很容易检查是否有任何一对是候选人。因此有候选人。

To evaluate the exponential growth of the output with respect to the input, estimate memory needed to store the sample for as 128 bytes and memory for candidates as bites, i.e. 16 Gigabytes!

要评估输出相对于输入的指数增长，请估算存储样本所需的内存作为128字节和内存候选人为咬，即16 GB！

Drawback 2 was discovered by Prof. Sergei O. Kuznetsov (HSE Moscow).

缺点2由Sergei O. Kuznetsov教授(HSE莫斯科)发现。

Shortcomings 3 and 4 were discovered by the author during his Dr.Science investigations. He introduced several probabilistic models to generate 'phantom' similarities together with corresponding counter-examples to deny them. Most clear result is the asymptotic theorem that asserts that a probability of generation of 'phantom' similarity between two parents without counter-examples tends to

作者在他的Dr.Science调查中发现了缺点3和缺点4。他介绍了几种概率模型来生成“幻像”相似性，并通过相应的反例予以否认。最明显的结果是渐近定理，该定理断定，在没有反例的情况下，两个亲本之间产生“幻像”相似性的可能性倾向于

when the probability of appearance of each attribute (considered as i.i.d. Bernoulli variable) is , the number of counter-examples is and attributes number too.

当每个属性(被视为iid Bernoulli变量)出现的概率为，反例数为和属性编号太。

Note, that even smaller number is positive since it coincides with the probability that the Poisson variable with mean has value >1.

请注意，数字甚至更少是正的，因为它与均值的泊松变量相符值> 1。

Consult with the author's Dr.Science Thesis for more details and results.

有关更多详细信息和结果，请咨询作者的Dr.Science论文。

3.随机算法 (3. Randomized Algorithms)

The key idea of VKF-method is to random generate a small subset of the lattice of candidates and to use its elements as hypothetical causes for the target property. By this trick we avoid the exponentially high computational complexity of usual algorithms of FCA (and JSM-method too).

VKF方法的关键思想是随机生成候选晶格的一小部分，并将其元素用作目标属性的假设原因。通过这个技巧，我们避免了FCA常用算法(以及JSM方法)的指数级高计算复杂性。

So we need algorithms like random walks on the huge lattice with generation of a candidate only when we need it.

因此，仅在需要时，我们才需要像在大型格子上随机行走那样的算法，并生成候选对象。

The author invented and investigated mathematical properties of several algorithms (such as non-monotonic, monotonic, coupled, lazy coupled, and stopped coupled Markov chains). Details may be found in the author's Dr.Science Thesis.

作者发明和研究了几种算法的数学特性(例如非单调，单调，耦合，延迟耦合和停止耦合马尔可夫链)。详细信息可以在作者的Dr.Science论文中找到。

Now we represent the coupled Markov chain algorithm that is a core of probabilistic approach to machine learning based on FCA (VKF-method).

现在，我们表示耦合马尔可夫链算法，它是基于FCA(VKF方法)的概率机器学习方法的核心。

input: sample (G,M,I), external functions CbO( , )
result: random candidate <A,B>
X=G U M;
A = M'; B = M;
C = G; D = G';
while (A!=C || B!= D) {select random element x from X;<A,B> = CbO(<A,B>,x);<C,D> = CbO(<C,D>,x);
}

There exists a lazy variant of the coupled Markov chain. The author proved that lazy computations lead to acceleration (with respect to classical scheme above) up-to

耦合马尔可夫链存在一个惰性变体。作者证明了惰性计算导致加速(相对于上述经典方案)达到

times, where is the attributes total and is a number of training examples.

时间，在哪里是属性的总和是许多培训示例。

This result matches well to experimental estimates obtained by former RSUH student Lyudmila A. Yakimova.

这个结果与前RSUH学生Lyudmila A. Yakimova获得的实验估计非常吻合。

4. VKF方法的一般结构 (4. General Structure of VKF-method)

In supervised Machine Learning there are two sets of objects called the training and test samples, respectively.

在有监督的机器学习中，有两组对象分别称为训练样本和测试样本 。

From positive examples of the training sample the program generates a sample . The negative examples form the set of counter-examples (obstacles to become a VKF-hypothesis).

从训练样本的积极示例中，程序生成样本。否定的例子形成集合反例 (成为VKF假设的障碍)。

Set of tests contains all test objects to predict the target class.

组测试包含所有测试对象以预测目标类别。

The program invokes the algorithm of the lazy coupled Markov chain to generate random candidate . The program saves VKF-hypothesis , if there is no obstacle such that .

该程序调用延迟耦合马尔可夫链的算法来生成随机候选。该程序保存了VKF假设，如果没有障碍这样。

The main Inductive Generalization Algorithm is the following

主要的归纳概括算法如下

input: number N of VKF-hypotheses to generate
result: random sample S of requested VKF-hypotheses
while (i<N) {generate random candidate <A,B> for (G,M,I);hasObstacle = false;for (o in O) {if (B is a part of {o}') hasObstacle = true;}if (hasObstacle == false) {S = S U {<A,B>};i = i+1;}
}

Condition means the inclusion of fragment of candidate into the fragment (attributes subset) of counter-example .

健康)状况表示包含片段候选人放入反例的片段(属性子集) 。

If a candidate avoids all such obstacles it is added to the result set of generated VKF-hypotheses.

如果候选人避开所有此类障碍，则会将其添加到生成的VKF假设的结果集中。

We replace a time-consuming deterministic algorithm (for instance, the well-known "Close-by-One" algorithm) for generation of the all candidates by the probabilistic one to randomly generate the prescribed number of VKF-hypotheses.

我们用费时的确定性算法替换了所有概率的生成所用的费时的确定性算法(例如，众所周知的“一次关闭”算法)，以随机生成规定数量的VKF假设。

After that Machine Learning system predicts the target class of tests and compares the results of prediction with the original target values. This is Prediction Algorithm

之后，机器学习系统将预测目标测试类别，并将预测结果与原始目标值进行比较。这是预测算法

input: list T of test examples to predict the target property
input: random sample S of candidates without counter-examples
for (x in T) {target(x) = false;for (<A,B> in S) {if (B is a part of {x}') target(x) = true;}
}

The worst situation occurs when some important positive test is missed by all generated VKF-hypotheses and obtains negative sign.

当所有生成的VKF假设都错过了一些重要的阳性测试并获得负号时，就会出现最坏的情况。

Test object is an -important, if the probability of all VKF-hypotheses with exceeds .

测试对象是一个 - 重要，如果所有VKF假设的可能性与超过。

The author proved theorem to estimate parameter from Inductive Generalization Algorithm to avoid the worst case.

作者证明了定理来估计参数从归纳概括算法避免最坏的情况。

For , for any , and any random sample of VKF-hypotheses of cardinality

对于，对于任何，以及任何随机样品 VKF基数假设

with probability has property that every -important object contains a fragment of some VKF-hypothesis , i.e. .

很有可能具有每个重要对象包含一些VKF假设的片段，即。

This theorem is an analogue of famous results of Prof. Vladimir N. Vapnik and Prof. Alexey Y. Chervonenkis from Computational Learning Theory.

该定理与计算学习理论的弗拉基米尔·N·瓦普尼克(Vladimir N. Vapnik)教授和阿列克谢·谢尔文嫩基斯(Alexey Y. Chervonenkis)教授的著名结果相似。

结论 (Conclusion)

The article describes main mathematical aspects of Machine Learning system based on Lattice Theory. The author call it 'VKF-system' in honour his teacher Prof. Victor K. Finn.

本文介绍了基于格理论的机器学习系统的主要数学方面。作者称其为“ VKF系统”，以纪念他的老师Victor K. Finn教授。

The last article of the series will be devoted to representations of objects with attributes of different types for application of described here Learning Machine.

该系列的最后一篇文章将专门介绍具有不同类型属性的对象的表示形式，以供此处所述的学习机应用。

Discrete attributes again require some technique from FCA. Continuous attributes ask for logistic regression, entropy-based separation of their ranges into subintervals, and presentation corresponding to convex envelope for subintervals those similarity is computed.

离散属性再次需要FCA的某种技术。连续属性要求逻辑回归，将其范围划分为子区间的基于熵的分离，以及与子区间的凸包络相对应的表示形式，这些相似度可通过计算得出。

The author would like to thanks his colleagues and students for support and stimulus.

作者要感谢他的同事和学生的支持和刺激。

翻译自: https://habr.com/en/post/510534/

机器人学数学理论