(For recognizing hand-written numbers) Far better results can be obtained by adopting a machine learning approach in which a large set of N digits {x1x_1x1​,…, xNx_NxN​ } called a training set is used to tune the parameters of an adaptive model. The categories of the digits in the training set are known in advance, typically by inspecting them individually and hand-labelling them. We can express the category of a digit using target vector t, which represents the identity of the corresponding digit. Suitable techniques for representing categories in terms of vectors will be discussed later. Note that there is one such target vector t for each digit image x

采用机器学习方法可以获得更好的结果。这之中,一组 N 个数 {x1x_1x1​,…, xNx_NxN​} 被称为训练集,用于调整自适应模型的参数。 训练集中数字的类别是事先知道的,通常是单独检查并手工标记的。 我们可以使用目标向量 t 来表示一个图像的类别,它表示图像对应的数字种类。 稍后将讨论根据这种向量表示所用的合适技术。 这里只需注意每个图像向量x都对应一种数字向量t

The result of running the machine learning algorithm can be expressed as a function y(x) which takes a new digit image x as input and that generates an output vector y, encoded in the same way as the target vectors. The precise form of the function y(x) is determined during the training phase, also known as the learning phase, on the basis of the training data. Once the model is trained it can then determine the identity of new digit images, which are said to comprise a test set. The ability to categorize correctly new examples that differ from those used for training is known as generalization. In practical applications, the variability of the input
vectors will be such that the training data can comprise only a tiny fraction of all possible input vectors, and so generalization is a central goal in pattern recognition.

在机器学习中,输入数字图像 x ,生成输出向量 y,这俩向量编码方式相同。(x类似于题干,y类似于正确答案)y也能写成函数y(x)。 函数 y(x) 的精确形式是在训练阶段(也称为学习阶段)根据训练数据确定的。 一旦模型经过训练,它就可以确定新数字图像的类别,这些新图像称作一个测试集。 对未用于训练的新示例进行正确分类的能力称为泛化。 在实际应用中,可能的输入比训练数据多老了,所以泛化是模式识别的终极目标。

For most practical applications, the original input variables are typically preprocessed to transform them into some new space of variables where, it is hoped, the pattern recognition problem will be easier to solve. For instance, in the digit recognition problem, the images of the digits are typically translated and scaled so that each digit is contained within a box of a fixed size. This greatly reduces the variability within each digit class, because the location and scale of all the digits are now the same, which makes it much easier for a subsequent pattern recognition algorithm to distinguish between the different classes. This pre-processing stage is sometimes also called feature extraction. Note that new test data must be pre-processed using the same steps as the training data.

一般我们对原始输入变量进行一波预处理,把它们转换为一些新的变量空间,希望模式识别问题更容易解决。 比如,在数字识别问题中,我们经常平移、缩放数字图像,来让每个图像信号长得一边大,这样模式识别算法就更容易区分信号的类别。 这个预处理阶段有时也称为特征提取。 注意哦,新测试数据 和 训练数据 的预处理必须相同。

Pre-processing might also be performed in order to speed up computation. For example, if the goal is real-time face detection in a high-resolution video stream, the computer must handle huge numbers of pixels per second, and presenting these directly to a complex pattern recognition algorithm may be computationally infeasible. Instead, the aim is to find useful features that are fast to compute, an also preserve useful discriminatory information enabling faces to be distinguished from non-faces. These features are then used as the inputs to the pattern recognition algorithm. For instance, the average value of the image intensity over a rectangular subregion can be evaluated extremely efficiently (Viola and Jones, 2004), and a set of such features can prove very effective in fast face detection. Because the number of such features is smaller than the number of pixels, this kind of pre-processing represents a form of dimensionality reduction. Care must be taken during pre-processing because often information is discarded, and if this information is important to the solution of the problem then the overall accuracy of the system can suffer.

预处理也能加快计算速度。例如,在高清视频流中做实时人脸检测时,计算机要快速处理大量像素,直接用复杂的模式识别算法显然不可行。但其实找有用的特征就能判断是否为人脸,把这些特征输入模式识别算法就行。比如说,来一组 矩形子区域上图像强度的平均值 这样的特征就能有效识别人脸。小心不要丢掉对做区分很重要的信息哦。

Applications in which the training data comprises examples of the input vectors along with their corresponding target vectors are known as supervised learning problems. Cases such as the digit recognition example, in which the aim is to assign each input vector to one of a finite number of discrete categories, are called classification problems. If the desired output consists of one or more continuous variables, then the task is called regression. An example of a regression problem would be the prediction of the yield in a chemical manufacturing process in which the inputs consists of the concentrations of reactants, the temperature, and the pressure.

训练数据里有“题干”有“答案”称为监督学习问题。 识别数字那种把每个输入向量分给有限个离散的类的叫分类问题。 如果"答案"由一个或多个连续变量组成,那就叫回归问题,比如通过反应物的浓度、温度和压力预测化学生产流程中的产量。

In other pattern recognition problems, the training data consists of a set of input vectors x without any corresponding target values. The goal in such unsupervised learning problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization.

在其他模式识别问题中,训练数据由一组输入向量 x 组成,但没有对应的目标值,属于是开放题,没答案,这叫无监督学习问题。这题的目标可能是把数据长得像的分一堆(称为聚类),或确定输入空间里数据的分布(称为密度估计),或者为了可视化把数据降维。

Finally, the technique of reinforcement learning (Sutton and Barto, 1998) is concerned with the problem of finding suitable actions to take in a given situation in order to maximize a reward. Here the learning algorithm is not given examples of optimal outputs, in contrast to supervised learning, but must instead discover them by a process of trial and error. Typically there is a sequence of states and actions in which the learning algorithm is interacting with its environment. In many cases, the current action not only affects the immediate reward but also has an impact on the reward at all subsequent time steps. For example, by using appropriate reinforcement learning techniques a neural network can learn to play the game of backgammon to a high standard (Tesauro, 1994).

最后,强化学习(Reinforcement Learning, RL)就是要通过一波操作拿到尽量高的分。但学习算法并不直接给出最优解,而是要AI自己去试;多数情况下AI的一步操作不仅会影响即时奖励,还会影响所有后续操作得到的奖励。 比如我们可以教AI玩西洋双陆琪。

Here the network must learn to take a board position as input, along with the result of a dice throw, and produce a strong move as the output. This is done by having the network play against a copy of itself for perhaps a million games. A major challenge is that a game of backgammon can involve dozens of moves, and yet it is only at the end of the game that the reward, in the form of victory, is achieved. The reward must then be attributed appropriately to all of the moves that led to it, even though some moves will have been good ones and others less so. This is an example of a credit assignment problem. A general feature of reinforcement learning is the trade-off between exploration, in which the system tries out new kinds of actions to see how effective they are, and exploitation, in which the system makes use of actions that are known to yield a high reward. Too strong a focus on either exploration or exploitation will yield poor results. Reinforcement learning continues to be an active area of machine learning research. However, a detailed treatment lies beyond the scope of this book.

学双陆棋的时候,电脑要学会将棋盘位置和掷骰子的结果作为输入,下出好棋为输出。这就得让它跟自己的分身下几百万盘棋。难就难在西洋双陆棋游戏一步可能有几十个动作,但游戏结束时才能获得奖励,还得把奖励适当地归因于这盘棋的每个动作。这其实是一个贡献度分配问题的例子。强化学习的一个一般特征是在探索(试试新操作)和利用(使用好操作)之间进行权衡,偏重哪个都会凉(小声:不能太左也不能太右)。强化学习仍然是机器学习研究的一个活跃领域。然而,详述则超出了本书的范围。

Figure 1.2 Plot of a training data set of N = 10 points, shown as blue circles, each comprising an observation of the input variable x along with the corresponding target variable t. The green curve shows the function sin(2πx) used to generate the data. Our goal is to predict the value of t for some new value of x, without knowledge of the green curve.

图为回归问题的例子

Although each of these tasks needs its own tools and techniques, many of the key ideas that underpin them are common to all such problems. One of the main goals of this chapter is to introduce, in a relatively informal way, several of the most important of these concepts and to illustrate them using simple examples. Later in the book we shall see these same ideas re-emerge in the context of more sophisticated models that are applicable to real-world pattern recognition applications. This chapter also provides a self-contained introduction to three important tools that will be used throughout the book, namely probability theory, decision theory, and information theory. Although these might sound like daunting topics, they are in fact straightforward, and a clear understanding of them is essential if machine learning techniques are to be used to best effect in practical applications.

尽管这些任务中的每一项都需要自己的工具和技术,但支撑它们的许多关键思想对于所有此类问题都是通用的。 本章目标之一是以一种相对非正式的方式介绍其中几个最重要的概念,并用简单的例子来说明它们。 在后续章节,我们将看到这些概念在实用的复杂模型中再现。 本章还介绍了将要用到的三个工具:概率论、决策论和信息论。 它们听起来很难,但实际上很简单,且清楚地理解它们是运用它们的前提。

1.1. Example: Polynomial Curve Fitting

多项式曲线拟合

We begin by introducing a simple regression problem, which we shall use as a running example throughout this chapter to motivate a number of key concepts. Suppose we observe a real-valued input variable x and we wish to use this observation to predict the value of a real-valued target variable t. For the present purposes, it is instructive to consider an artificial example using synthetically generated data because we then know the precise process that generated the data for comparison against any learned model. The data for this example is generated from the function sin(2πx) with random noise included in the target values, as described in detail in Appendix A.

本章将以一个简单回归问题为线索引出概念。 假设我们想用输入变量 x来预测目标变量 t 的值。 就目前来说,考虑手造的数据很有价值,可将其与学习模型进行比较。 此示例的数据是从函数 sin(2πx) 生成的,包含随机噪声,如附录 A 中详细描述的。

Now suppose that we are given a training set comprising N observations of x, written x≡(x1,...,xN)Tx ≡ (x_1,...,x_N )^Tx≡(x1​,...,xN​)T, together with corresponding observations of the values of t, denoted t≡(t1,...,tN)Tt ≡ (t_1,...,t_N )Tt≡(t1​,...,tN​)T. Figure 1.2 shows a plot of a training set comprising N = 10 data points. The input data set x in Figure 1.2 was generated by choosing values of xnx_nxn​, for n=1,...,Nn = 1,...,Nn=1,...,N, spaced uniformly in range[0,1][0, 1][0,1], and the target data set t was obtained by first computing the corresponding values of the function sin(2πx) and then adding a small level of random noise having a Gaussian distribution (the Gaussian distribution is discussed in Section 1.2.4) to each such point in order to obtain the corresponding value tnt_ntn​. By generating data in this way, we are capturing a property of many real data sets, namely that they possess an underlying regularity, which we wish to learn, but that individual observations are corrupted by random noise. This noise might arise from intrinsically stochastic (i.e. random) processes such as radioactive decay but more typically is due to there being sources of variability that are themselves unobserved.

现在假设给定一个训练集,其中包含x 的 N 个值,记为 x≡(x1,...,xN)Tx ≡ (x_1,...,x_N )^Tx≡(x1​,...,xN​)T,以及对 t 相应值,记为 t≡(t1,...,tN)Tt ≡ (t_1,. ..,t_N )Tt≡(t1​,...,tN​)T。 图 1.2 显示为10 个数据点的训练集的图。 图 1.2 中输入数据xnx_nxn​ 在范围 [0,1][0, 1][0,1] 中均匀分布, n=1,...,Nn = 1,...,Nn=1,...,N。目标数据 t 是通过首先计算函数 sin(2πx) 的对应值,然后将、加入具有高斯分布(在第 1.2.4 节中讨论)的小型随机噪声以获得对应tnt_ntn​。 通过以这种方式生成数据,我们发现真实数据集既具有我们希望学习的潜在规律性,又存在随机噪声的干扰。 这种噪音可能就是因为本来的随机性(例如放射性衰变),但更常见的情况是有被我们忽视的的干扰信号来源。

Our goal is to exploit this training set in order to make predictions of the value t^\hat{t}t^ of the target variable for some new value x^\hat{x}x^ of the input variable. As we shall see later, this involves implicitly trying to discover the underlying function sin(2πx). This is intrinsically a difficult problem as we have to generalize from a finite data set. Furthermore the observed data are corrupted with noise, and so for a given x^\hat{x}x^ there is uncertainty as to the appropriate value for t^\hat{t}t^. Probability theory, discussed
in Section 1.2, provides a framework for expressing such uncertainty in a precise and quantitative manner, and decision theory, discussed in Section 1.5, allows us to exploit this probabilistic representation in order to make predictions that are optimal according to appropriate criteria.

我们的目标是利用这个训练集来预测新值 x^\hat{x}x^对应的t^\hat{t}t^,这就要悄悄试图发现本质上的函数 sin(2πx)。 因为我们必须从有限的数据集中进行概括,所以这很难嗷。 此外,由于训练集数据被噪声干扰,给定的 x^\hat{x}x^,t^\hat{t}t^ 存在不确定性。在 1.2 节中将讨论的概率论,提供了一个以精确定量的方式表达这种不确定性的框架,且在 1.5 节中将讨论的决策论允许我们利用这种概率表示,以便根据适当的标准做出最优的预测。

For the moment, however, we shall proceed rather informally and consider a simple approach based on curve fitting. In particular, we shall fit the data using a polynomial function of the form.

然而目前我们就凑活一下,考虑一种用曲线拟合的简单方法。 特别是我们将使用多项式函数来拟合数据。
KaTeX parse error: Undefined control sequence: \boltw at position 7: y(x, \̲b̲o̲l̲t̲w̲) = w_0 + w_{1}…

PRML翻译 Chap1 Introduction相关推荐

  1. Gilbert Strang 《Introduction to Linear Algebra》 chap1 Introduction to Vectors 笔记

    Gilbert Strang Introduction to Linear Algebra chap1 Introduction to Vectors 笔记 会持续更新 Introduction to ...

  2. 《Physically Based Rendering》翻译中文版——Introduction(介绍)

    我自己在自学计算机图形学,现在正好学习到基于物理渲染这一块.在网上查阅大量的资料发现很少有中文的资料,想着趁着自己学习的时候,将这本书翻译过来帮助大家阅读和学习这本书中的内容.因个人能力和精力有限,在 ...

  3. [hive学习翻译]Hive - Introduction

    术语"大数据"用于大数据集的集合,包括大量,高速度和各种日益增加的数据.使用传统的数据管理系统,很难处理大数据.因此,Apache Software Foundation引入了一个 ...

  4. c#v2.0 扩展特性 翻译(1)

    c#v2.0 扩展特性 翻译(1) Introduction to C# 2.0 C# 2.0 introduces several language extensions, the most imp ...

  5. 【Chaos Mesh官方文档】Chaosd Introduction

    总目录 读我 关于CM CM介绍:这篇文档介绍ChaosMesh的概念,用例,核心优势和架构 基本功能:这篇文档描述了CM的基本特性,包括错误注入,混沌工作流,可视化操作和安全保证 安装&部署 ...

  6. 每周翻译一篇前端技术英文官网(一)

    今天准备翻译的网站是 socket . io 网址链接: https://socket.io/get-started/chat/ 首先是左侧sidebar部分翻译: Introduction - - ...

  7. 《繁凡的论文精读》(一)CVPR 2019 基于决策的高效人脸识别黑盒对抗攻击(清华朱军)

    点我一文弄懂深度学习所有基础和各大主流研究方向! <繁凡的深度学习笔记>,包含深度学习基础和 TensorFlow2.0,PyTorch 详解,以及 CNN,RNN,GNN,AE,GAN, ...

  8. UniDAC 基础 英文 原文【就不传其中的图片了】

    Universal Data Access Components Send comments on this topic UniDAC Basics Top Previous Next 要看译文戳这里 ...

  9. html左右滑轮标签,css样式支持左右滑动要点

    div 包含 ul ,ul 包含 li div宽度固定,ul 宽度随着li的可以无限增加,li 左右滑动的最小容器. div 样式position:relative;width:xxpx;height ...

最新文章

  1. 稀疏矩阵的压缩存储的两种策略
  2. springboot参数校验,对象的某属性校验
  3. Android Studio导入别人的module提示错误Plugin with id ‘com.jfrog.bintray‘ not found.
  4. P3391 【模板】文艺平衡树 fhq-treap 模板
  5. nodejs 实现文件拷贝
  6. mysql 月份去0_mysql 查询每个月发布的文章数。月份无数据显示为0。如何写SQL呢...
  7. UDS协议之诊断会话管理服务 0x10
  8. 软件测试-测试需求分析
  9. ZOJ 3204 Connect them
  10. 一天掌握DID模型,传统DID+多期DID+DID模型扩展PSM-DID+空间DID结合论文实现
  11. 「一本通 1.2 练习 2」扩散(loj10015)
  12. 爬点今日头条街拍美女。。。
  13. 联发科mtk手机处理器怎么样_“传音”新款手机发布,设计大胆,搭载联发科G90T处理器...
  14. 交换机用户模式、特权模式、全局模式、端口模式
  15. 利用DW制作简单网页
  16. 手把手教你做智能LED灯(一) 功能规划与设计方案
  17. java语音识别(科大讯飞版)
  18. 《Adobe Fireworks CS5中文版经典教程》——1.6 撤销操作
  19. 彩色图像去马赛克与图像超分辨问题的关系
  20. Layout 知识总结2

热门文章

  1. linux毁灭指令,那些致命的Linux命令
  2. 精美图标大全_40个精美表情符号和笑脸图标包
  3. wangEditor 初始化设置行高、字体和字体大小
  4. 浅议驻波测量中的经典误差
  5. 蓝牙APP开发你应该知道的几件事
  6. 岁月不饶人,每个人都会变老
  7. 高校BBS上令人思绪万千的100条
  8. 特征级融合_多传感器融合理论
  9. 【ELL】ell学习之__builtin_expect(likely-unlikely)
  10. c++ 使用nacos_Nacos入门