RANSAC算法做直线拟合

RANSAC算法之前了解过相关的原理，这两天利用晚上闲暇的时间，看了一下RANSAC算法的Python代码实现，这方面的资料很多了，这里就不在重复。在分析该RANSAC.py代码之前，想用自己的对RANSAC的理解对其做下总结。

在实际应用中获取到的数据，常常会包含有噪声数据，这些噪声数据会使对模型的构建造成干扰，我们称这样的噪声数据点为outliers，那些对于模型构建起积极作用的我们称它们为inliers，RANSAC做的一件事就是先随机的选取一些点，用这些点去获得一个模型(这个讲得有点玄，如果是在做直线拟合的话，这个所谓的模型其实就是斜率)，然后用此模型去测试剩余的点，如果测试的数据点在误差允许的范围内，则将该数据点判为inlier，否则判为outlier。inliers的数目如果达到了某个设定的阈值，则说明此次选取的这些数据点集达到了可以接受的程度，否则继续前面的随机选取点集后所有的步骤，不断重复此过程，直到找到选取的这些数据点集达到了可以接受的程度为止，此时得到的模型便可认为是对数据点的最优模型构建。

在Cookbook/RANSAC中给出的是一个用RANSAC做直线拟合的例子。这个例子非常的直观，而且代码也很简短易懂，为便于后面详细解读该代码，这里把它贴出来：

  1 # -*- coding: utf-8 -*-
  2 import numpy
  3 import scipy # use numpy if scipy unavailable
  4 import scipy.linalg # use numpy if scipy unavailable
  5 import pylab
  6
  7 ## Copyright (c) 2004-2007, Andrew D. Straw. All rights reserved.
  8
  9 def ransac(data,model,n,k,t,d,debug=False,return_all=False):
 10     """fit model parameters to data using the RANSAC algorithm
 11
 12 This implementation written from pseudocode found at
 13 http://en.wikipedia.org/w/index.php?title=RANSAC&oldid=116358182
 14
 15 Given:
 16     data - a set of observed data points # 可观测数据点集
 17     model - a model that can be fitted to data points #
 18     n - the minimum number of data values required to fit the model # 拟合模型所需的最小数据点数目
 19     k - the maximum number of iterations allowed in the algorithm # 最大允许迭代次数
 20     t - a threshold value for determining when a data point fits a model #确认某一数据点是否符合模型的阈值
 21     d - the number of close data values required to assert that a model fits well to data
 22 Return:
 23     bestfit - model parameters which best fit the data (or nil if no good model is found)
 24 """
 25     iterations = 0
 26     bestfit = None
 27     besterr = numpy.inf
 28     best_inlier_idxs = None
 29     while iterations < k:
 30         maybe_idxs, test_idxs = random_partition(n,data.shape[0])
 31         maybeinliers = data[maybe_idxs,:]
 32         test_points = data[test_idxs]
 33         maybemodel = model.fit(maybeinliers)
 34         test_err = model.get_error( test_points, maybemodel)
 35         also_idxs = test_idxs[test_err < t] # select indices of rows with accepted points
 36         alsoinliers = data[also_idxs,:]
 37         if debug:
 38             print 'test_err.min()',test_err.min()
 39             print 'test_err.max()',test_err.max()
 40             print 'numpy.mean(test_err)',numpy.mean(test_err)
 41             print 'iteration %d:len(alsoinliers) = %d'%(
 42                 iterations,len(alsoinliers))
 43         if len(alsoinliers) > d:
 44             betterdata = numpy.concatenate( (maybeinliers, alsoinliers) )
 45             bettermodel = model.fit(betterdata)
 46             better_errs = model.get_error( betterdata, bettermodel)
 47             thiserr = numpy.mean( better_errs )
 48             if thiserr < besterr:
 49                 bestfit = bettermodel
 50                 besterr = thiserr
 51                 best_inlier_idxs = numpy.concatenate( (maybe_idxs, also_idxs) )
 52         iterations+=1
 53     if bestfit is None:
 54         raise ValueError("did not meet fit acceptance criteria")
 55     if return_all:
 56         return bestfit, {'inliers':best_inlier_idxs}
 57     else:
 58         return bestfit
 59
 60 def random_partition(n,n_data):
 61     """return n random rows of data (and also the other len(data)-n rows)"""
 62     all_idxs = numpy.arange( n_data )
 63     numpy.random.shuffle(all_idxs)
 64     idxs1 = all_idxs[:n]
 65     idxs2 = all_idxs[n:]
 66     return idxs1, idxs2
 67
 68 class LinearLeastSquaresModel:
 69     """linear system solved using linear least squares
 70
 71     This class serves as an example that fulfills the model interface
 72     needed by the ransac() function.
 73
 74     """
 75     def __init__(self,input_columns,output_columns,debug=False):
 76         self.input_columns = input_columns
 77         self.output_columns = output_columns
 78         self.debug = debug
 79     def fit(self, data):
 80         A = numpy.vstack([data[:,i] for i in self.input_columns]).T
 81         B = numpy.vstack([data[:,i] for i in self.output_columns]).T
 82         x,resids,rank,s = scipy.linalg.lstsq(A,B)
 83         return x
 84     def get_error( self, data, model):
 85         A = numpy.vstack([data[:,i] for i in self.input_columns]).T
 86         B = numpy.vstack([data[:,i] for i in self.output_columns]).T
 87         B_fit = scipy.dot(A,model)
 88         err_per_point = numpy.sum((B-B_fit)**2,axis=1) # sum squared error per row
 89         return err_per_point
 90
 91 def test():
 92     # generate perfect input data
 93     n_samples = 500
 94     n_inputs = 1
 95     n_outputs = 1
 96     A_exact = 20*numpy.random.random((n_samples,n_inputs) ) # x坐标
 97     perfect_fit = 60*numpy.random.normal(size=(n_inputs,n_outputs) ) # the model(斜率)
 98     B_exact = scipy.dot(A_exact,perfect_fit) # y坐标
 99     assert B_exact.shape == (n_samples,n_outputs) #验证y坐标数组的大小
100     #pylab.plot( A_exact, B_exact, 'b.', label='data' )
101     #pylab.show()
102
103     # add a little gaussian noise (linear least squares alone should handle this well)
104     A_noisy = A_exact + numpy.random.normal(size=A_exact.shape ) # x坐标添加高斯噪声
105     B_noisy = B_exact + numpy.random.normal(size=B_exact.shape ) # y坐标....
106     #pylab.plot( A_noisy, B_noisy, 'b.', label='data' )
107
108     if 1:
109         # add some outliers
110         n_outliers = 100 # 500个数据点有100个是putliers
111         all_idxs = numpy.arange( A_noisy.shape[0] )
112         numpy.random.shuffle(all_idxs) # 索引随机排列
113         outlier_idxs = all_idxs[:n_outliers] # 选取all_idxs前100个做outlier_idxs
114         non_outlier_idxs = all_idxs[n_outliers:] # 后面的不是outlier_idxs
115         A_noisy[outlier_idxs] =  20*numpy.random.random((n_outliers,n_inputs) ) # 外点的横坐标
116         B_noisy[outlier_idxs] = 50*numpy.random.normal(size=(n_outliers,n_outputs) ) # 外点的纵坐标
117         #pylab.plot( A_noisy, B_noisy, 'b.', label='data' )
118         #pylab.show()
119
120
121     # setup model
122
123     all_data = numpy.hstack( (A_noisy,B_noisy) ) # 组成坐标对
124     input_columns = range(n_inputs) # the first columns of the array
125     output_columns = [n_inputs+i for i in range(n_outputs)] # the last columns of the array
126     debug = False
127     model = LinearLeastSquaresModel(input_columns,output_columns,debug=debug)
128
129     linear_fit,resids,rank,s = scipy.linalg.lstsq(all_data[:,input_columns],
130                                                   all_data[:,output_columns])
131
132     # run RANSAC algorithm
133     ransac_fit, ransac_data = ransac(all_data,model,
134                                      50, 1000, 7e3, 300, # misc. parameters
135                                      debug=debug,return_all=True)
136     if 1:
137         import pylab
138
139         sort_idxs = numpy.argsort(A_exact[:,0]) # 对A_exact排序， sort_idxs为排序索引
140         A_col0_sorted = A_exact[sort_idxs] # maintain as rank-2 array
141
142         if 1:
143             pylab.plot( A_noisy[:,0], B_noisy[:,0], 'k.', label='data' )
144             pylab.plot( A_noisy[ransac_data['inliers'],0], B_noisy[ransac_data['inliers'],0], 'bx', label='RANSAC data' )
145         else:
146             pylab.plot( A_noisy[non_outlier_idxs,0], B_noisy[non_outlier_idxs,0], 'k.', label='noisy data' )
147             pylab.plot( A_noisy[outlier_idxs,0], B_noisy[outlier_idxs,0], 'r.', label='outlier data' )
148         pylab.plot( A_col0_sorted[:,0],
149                     numpy.dot(A_col0_sorted,ransac_fit)[:,0],
150                     label='RANSAC fit' )
151         pylab.plot( A_col0_sorted[:,0],
152                     numpy.dot(A_col0_sorted,perfect_fit)[:,0],
153                     label='exact system' )
154         pylab.plot( A_col0_sorted[:,0],
155                     numpy.dot(A_col0_sorted,linear_fit)[:,0],
156                     label='linear fit' )
157         pylab.legend()
158         pylab.show()
159
160 if __name__=='__main__':
161     test()

上面代码跟原版的代码相比，我删除了一些冗余的东西。在test()中做的是直线拟合。在看test()部分之前，我们先来看看RANSAC部分的代码，传入RANSAC函数中的参数有8个，前面6个是比较重要的。data就是全部的数据点集，model注释里给出的是拟合点集的模型，放到这个直线拟合的实例下，就是斜率，n就是拟合时所需要的最小数据点数目，放在这里直线拟合的例子中，就是用于选取的用于去做直线拟合的数据点数目，k就是最大允许的迭代次数，t是人为设定的用于判断误差接受许可的范围。这几个参数的含义知道了，剩下的就是理解while循环里面的内容了。在每一次循环中，选对所有的数据点做一个随机的划分，将数据点集分成两堆，分别对应maybeinliers和test_points，maybeinliers这部分数据用于做直线拟合，这里直线拟合采用的是最小二乘法，得到拟合到的直线的斜率maybemodel，然后用该直线及测试数据的横坐标去估计测试数据的纵坐标，也就是在该模型下测试数据的估计值，测试数据的估计值和测试数据的真实值做一个平方和便得到误差，将得到的误差分别和设定的可接受误差进行判断，在误差范围内的判定为inlier，否者判断为outlier。当inliers的数目达到了设定的数目的要求是，再讲inliers和maybeinliers放一下再做一下最小二乘拟合，便得到最终的最佳斜率了。

test()部分的内容很简单，先生成在某条直线上的一些离散点，这里某条直线的斜率就是精确的模型：然后添加高斯平稳高斯噪声：将其中的某些点变为outliers:最后用RANSAC拟合出来的结果如下：整个过程就酱紫，后面有时间继续前面在BoW图像检索Python实战用RANSAC做一个重排过程。

from: http://yongyuan.name/blog/fitting-line-with-ransac.html

RANSAC算法做直线拟合相关推荐

RANSAC算法实现 + 直线拟合
一.RANSAC算法 1.参考资料 [1]题目来源与解析:商汤科技SLAM算法岗的RANSAC编程题 [2]牛客网题目:[编程题]线性回归 [3]牛客网解答参考:商汤科技某算法岗的编程题有点过分了啊 ...
RANSAC估计——以直线拟合为例
RANSAC(RANdom SAmple Consensus),即随机采样一致性.该方法最早是由Fischler和Bolles提出的一种鲁棒估计方法,最早用于计算机视觉中位姿估计问题,现在已广泛应用于 ...
RANSAC原理及直线拟合（python动态图解）
一.简介随机采样一致性(Random Sample Consensus,RANSAC)由斯坦福国际研究院的Fischler和Bolles于1981年首次提出[1].RANSAC算法是一种随机参数估计 ...
拟合算法之一直线拟合
直线拟合很早就想学习拟合了,经常听同事用到拟合,当时尚且一窍不通,必须快递加急紧追此处才是,也参考了网上大佬的一些宝贵经验,先将拟合方法总结如下: 最小二乘法 1.原理 2.举例实现 void fi ...
RANSAC 直线拟合算法
1. 参考文献 RANSAC 直线拟合算法 2. 算法实现 #include <iostream> #include <random> #include <vector& ...
机器视觉：ransac算法详解
一.说明: 在线性回归问题中,采样点具备明显的线性相关性:如下图左.然而有些情性中,噪声很大,线性性散布在非线性点的噪声中,将无法用线性回归的方法进行直线提取. 随机样本一致性 (RANSAC) 是 ...
RANSAC算法(附RANSAC直线拟合C++与Python版本)
文章目录 RANSAC算法简介 RANSAC算法基本思想和流程迭代次数推导 RANSAC与最小二乘区别 RANSAC直线拟合代码(C++及Python版本) C++版本代码 Python版本代码如下 ...
直线拟合算法（续：加权最小二乘）
直线拟合算法(续:加权最小二乘) 在此之前,我写过两篇文章介绍直线拟合算法: https://blog.csdn.net/liyuanbhu/article/details/50866802 http ...
OpenCV | 直线拟合fitline函数(Python)
简介之前做直线拟合时,自己写了一个利用最小二乘做直线拟合的程序,但是由于直线检测的误差比较大,拟合的效果并不好.个人不知道是什么原因,因此想尝试更改一下直线拟合的算法,后来找到了OpenCV中的fi ...

RANSAC算法做直线拟合

RANSAC算法做直线拟合相关推荐

最新文章

热门文章