Python+OpenCV:理解k近邻(kNN)算法(k-Nearest Neighbour (kNN) algorithm)

理论

kNN is one of the simplest classification algorithms available for supervised learning.

The idea is to search for the closest match(es) of the test data in the feature space.

We will look into it with the below image.

In the image, there are two families: Blue Squares and Red Triangles. We refer to each family as a Class.

Their houses are shown in their town map which we call the Feature Space.

You can consider a feature space as a space where all data are projected.

For example, consider a 2D coordinate space. Each datum has two features, a x coordinate and a y coordinate.

You can represent this datum in your 2D coordinate space, right? Now imagine that there are three features, you will need 3D space.

Now consider N features: you need N-dimensional space, right? This N-dimensional space is its feature space. In our image, you can consider it as a 2D case with two features.

Now consider what happens if a new member comes into the town and creates a new home, which is shown as the green circle.

He should be added to one of these Blue or Red families (or classes). We call that process, Classification.

How exactly should this new member be classified? Since we are dealing with kNN, let us apply the algorithm.

One simple method is to check who is his nearest neighbour. From the image, it is clear that it is a member of the Red Triangle family. So he is classified as a Red Triangle.

This method is called simply Nearest Neighbour classification, because classification depends only on the nearest neighbour.

But there is a problem with this approach! Red Triangle may be the nearest neighbour, but what if there are also a lot of Blue Squares nearby?

Then Blue Squares have more strength in that locality than Red Triangles, so just checking the nearest one is not sufficient. Instead we may want to check some k nearest families.

Then whichever family is the majority amongst them, the new guy should belong to that family.

In our image, let's take k=3, i.e. consider the 3 nearest neighbours.

The new member has two Red neighbours and one Blue neighbour (there are two Blues equidistant, but since k=3, we can take only one of them), so again he should be added to Red family.

But what if we take k=7? Then he has 5 Blue neighbours and 2 Red neighbours and should be added to the Blue family. The result will vary with the selected value of k.

Note that if k is not an odd number, we can get a tie, as would happen in the above case with k=4.

We would see that our new member has 2 Red and 2 Blue neighbours as his four nearest neighbours and we would need to choose a method for breaking the tie to perform classification.

So to reiterate, this method is called k-Nearest Neighbour since classification depends on the k nearest neighbours.

Again, in kNN, it is true we are considering k neighbours, but we are giving equal importance to all, right? Is this justified?

For example, take the tied case of k=4. As we can see, the 2 Red neighbours are actually closer to the new member than the other 2 Blue neighbours, so he is more eligible to be added to the Red family.

How do we mathematically explain that? We give some weights to each neighbour depending on their distance to the new-comer: those who are nearer to him get higher weights, while those that are farther away get lower weights.

Then we add the total weights of each family separately and classify the new-comer as part of whichever family received higher total weights. This is called modified kNN or weighted kNN.

So what are some important things you see here?

  • Because we have to check the distance from the new-comer to all the existing houses to find the nearest neighbour(s), you need to have information about all of the houses in town, right?
    If there are plenty of houses and families, it takes a lot of memory, and also more time for calculation.
  • There is almost zero time for any kind of "training" or preparation. Our "learning" involves only memorizing (storing) the data, before testing and classifying.

Now let's see this algorithm at work in OpenCV.

kNN in OpenCV

####################################################################################################
# 理解k近邻(kNN)算法(k-Nearest Neighbour (kNN) algorithm)
def lmc_cv_knn():"""函数功能: 理解k近邻(kNN)算法(k-Nearest Neighbour (kNN) algorithm)."""# Feature set containing (x,y) values of 25 known/training datatrain_data = np.random.randint(0, 100, (25, 2)).astype(np.float32)# Label each one either Red or Blue with numbers 0 and 1responses = np.random.randint(0, 2, (25, 1)).astype(np.float32)# Take Red neighbours and plot themred = train_data[responses.ravel() == 0]# Take Blue neighbours and plot themblue = train_data[responses.ravel() == 1]# The new-comer is marked in green.# If you have multiple new-comers (test data), you can just pass them as an array.# Corresponding results are also obtained as arrays.newcomer = np.random.randint(0, 100, (10, 2)).astype(np.float32)knn = lmc_cv.ml.KNearest_create()knn.train(train_data, lmc_cv.ml.ROW_SAMPLE, responses)ret, results, neighbours, dist = knn.findNearest(newcomer, 3)print("result:  {}\n".format(results))print("neighbours:  {}\n".format(neighbours))print("distance:  {}\n".format(dist))pyplot.figure('k-Nearest Neighbour (kNN) algorithm', figsize=(16, 9))pyplot.scatter(red[:, 0], red[:, 1], 80, 'r', '^')pyplot.scatter(blue[:, 0], blue[:, 1], 80, 'b', 's')pyplot.scatter(newcomer[:, 0], newcomer[:, 1], 80, 'g', 'o')pyplot.savefig('%02d.png' % (0 + 1))pyplot.show()

result:[[0.][0.][0.][1.][0.][1.][0.][1.][1.][0.]
]
neighbours: [[0. 1. 0.][1. 0. 0.][1. 0. 0.][1. 1. 0.][0. 0. 1.][0. 1. 1.][0. 0. 0.][1. 1. 1.][1. 1. 0.][0. 0. 0.]
]
distance: [[ 41. 122. 360.][145. 241. 449.][193. 225. 712.][ 53.  68. 409.][  4.  50. 610.][117. 449. 565.][ 17.  25.  53.][101. 149. 232.][ 61.  74. 377.][ 85. 125. 153.]
]

Python+OpenCV:理解k近邻(kNN)算法(k-Nearest Neighbour (kNN) algorithm)相关推荐

  1. 机器学习100天(三十):030 K近邻分类算法-K值的选择

    机器学习100天,今天讲的是:K近邻分类算法-K值的选择. <机器学习100天>完整目录:目录 上一节我们讲了 K 折交叉验证的理论,下面我们将 K 折交叉验证算法应用到 K 近邻分类算法 ...

  2. 机器学习——K近邻分类算法及python代码实现

    <机器学习:公式推导与代码实践>鲁伟著读书笔记. K近邻(K-nearest neighbor,K-NN)算法是一种经典的监督学习的分类方法.K近邻算法是依据新样本与k个与其相邻最近的样本 ...

  3. AI基础:KNN与K近邻距离度量说明、利用KNN手写字体识别分类实践

    KNN k近邻 文章目录 KNN算法 K近邻中近邻的距离度量 欧式距离 标准化欧式距离 曼哈顿距离 汉明距离 夹角余弦 杰卡德相似系数 皮尔逊系数 切比雪夫距离 闵可夫斯基距离 马氏距离 巴氏距离 各 ...

  4. 机器学习 —— 基础整理(三)生成式模型的非参数方法: Parzen窗估计、k近邻估计;k近邻分类器...

    本文简述了以下内容: (一)生成式模型的非参数方法 (二)Parzen窗估计 (三)k近邻估计 (四)k近邻分类器(k-nearest neighbor,kNN) (一)非参数方法(Non-param ...

  5. 机器学习基础 KNN(K近邻)算法及sklearn的基本使用(附带一些基础概念)

    文章目录 一. K-近邻算法简介 1. 什么是K-近邻算法 1.1 K-近邻算法(KNN)概念 1.2 电影类型分析 1.3 KNN算法流程总结 2. 小结 二.K近邻算法api初步使用 1. Sci ...

  6. k近邻回归算法python_K近邻算法用作回归的使用介绍(使用Python代码)

    介绍 在我遇到的所有机器学习算法中,KNN是最容易上手的.尽管它很简单,但事实上它其实在某些任务中非常有效(正如你将在本文中看到的那样). 甚至它可以做的更好?它可以用于分类和回归问题!然而,它其实更 ...

  7. k近邻回归算法python_K-近邻回归算法的实用介绍(附Python代码)

    介绍 在我所遇到的所有机器学习算法中,KNN很容易被选择.尽管它很简单,但它在某些任务上被证明是非常有效的(如本文中所见). 甚至更好?它可以用于分类和回归问题!然而,它更广泛地用于分类问题.我很少看 ...

  8. K近邻法算法(KNN)及其R实现

    1. K近邻算法 输入:训练数据集 T={(x1,y1),(x2,y2),⋯,(xN,yN)} T=\{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\} 其中, xi∈χ⊆ ...

  9. kNN算法(k近邻算法,k Nearest Neighbor)

    主要内容: 1.认识kNN算法 2.kNN算法原理 3.应用举例 4.kNN改进方法 1.认识knn算法 "看一个人怎么样,看他身边的朋友什么样就知道了",kNN算法即寻找最近的K ...

最新文章

  1. php -- PDO事务处理
  2. Oracle 数据库查看具有sysdba系统权限用户,设置、重置sys用户密码
  3. vue - 官方 - 上手
  4. pycharm下的第一个函数程序
  5. 意超级杯尤文小胜AC米兰 C罗获转会后首个冠军
  6. 手机APP测试,个人整理(Android和IOS)
  7. 和Keyle一起学ShaderForge - Overview
  8. 如何巧妙应用shift键的解说
  9. 电脑扬声器没有声音,插上耳机也检测不到
  10. java的HMACSHA1加密算法
  11. 贼有趣:朱茵变杨幂,人工智能换脸让明星不再担心自己演技?
  12. 谷歌浏览器的本地收藏夹在什么位置?
  13. html将页面分成四部分,将HTML页面拆分为定义的宽度和高度部分
  14. CPU乱序发射与内存屏障
  15. 操作系统---文件管理
  16. AC-DC电源管理芯片选型及应用
  17. 2020年 ICLR 国际会议最终接受论文(poster-paper)列表(一)
  18. python 心形线_python 心形
  19. CC2530的串口通信原理与应用
  20. Plugin ‘‘maven-dependency-plugin“ not found

热门文章

  1. Linux 命令(19)—— tar 命令
  2. 设计模式(6)——抽象工厂模式(Abstract Factory Pattern,创建型)
  3. Codeforces Round #565 (Div. 3) B
  4. LightOJ 1245 - Harmonic Number (II)
  5. A - ACM Computer Factory - poj 3436(最大流)
  6. eclipse3.2 汉化 汉化包下载
  7. 【jeecg-mybatis版本】 mybatis+spring mvc 完美整合方案 查询,保存,更新,删除自动生成...
  8. editor.md实现拖拽剪切复制粘贴上传图片,文件插件
  9. DM 源码阅读系列文章(四)dump/load 全量同步的实现
  10. MariaDB的简单使用