过程原理

简介

随机抽样一致算法（Random Sample Consensus，RANSAC）。它采用迭代的方式从一组包含离群的被观测数据中估算出数学模型的参数。 RANSAC是一个非确定性算法，在某种意义上说，它会产生一个在一定概率下合理的结果，而更多次的迭代会使这一概率增加。此RANSAC算法在1981年由Fischler和Bolles首次提出。

RANSAC的基本假设：

“内群”(inlier, 似乎译为内点群更加妥当，即正常数据，正确数据)数据可以通过几组模型的参数来叙述其分布，而“离群”(outlier,似乎译为外点群更加妥当，异常数据，噪声点)数据则是不适合模型化的数据。
数据会受噪声影响，噪声指的是离群，例如从极端的噪声或错误解释有关数据的测量或不正确的假设。
RANSAC假定，给定一组（通常很小）的内点群，存在一个模型，可以估算最佳解释或最适用于这一数据模型的参数。

算法原理

在数据中随机选择若干个点设定为内点群
计算拟合内点群的模型
把其它刚才没选到的点带入刚才建立的模型中，计算是否属于内点群
记下内点群数量
重复以上步骤
比较哪次计算中内点群数量最多，内点群最多的那次所建的模型就是我们所要求的解

这里有几个问题

一开始的时候我们要随机选择多少点（n）
以及要重复做多少次（k）

参数求解

假设每个点是真正内点群的几率是 www，则：

www = 真正内点群的数目 / 数据总数

通常我们不知道www 是多少，wnw^{n}wn是所选择的nnn个点都是内点群的几率，1−wn1-w^{n}1−wn是所选择的 nnn个点至少有一个不是内点群的几率，(1−wn)k(1-w^{n})^{k}(1−wn)k是表示重复kkk 次都没有全部的 nnn 个点都是内点群的几率，假设算法跑kkk 次以后成功的几率是ppp，那么：
1−p=(1−wn)k1-p=(1-w^{n})^{k}1−p=(1−wn)kk=log(1−p)/log(1−wn)kk=log(1-p)/log(1-w^{n})^{k}k=log(1−p)/log(1−wn)k

所以如果希望成功几率高，p=0.99p=0.99p=0.99，
当 nnn不变时，kkk越大，ppp 越大，
当 www不变时，nnn 越大，所需的 kkk 就越大，
通常 www 未知，所以 nnn 选小一点比较好。

数学知识

参考1
参考2

在拟合过程中会用到一些数学的基础知识，长时间不用，很容易忘记，这里简单介绍下，有助于代码的理解。
以直线模型为例，即y=kx+by=kx+by=kx+b或者是多项式的写法y=a0+a1xy=a0+a1xy=a0+a1x，由此也可以泛化到n次多项式中，即曲线模型。
将拟合方程转换成矩阵形式：XA=YXA=YXA=Y矩阵A=(k,b)−1A=(k, b)^{-1}A=(k,b)−1即为所求的模型系数。
A=(XTX)−1XTYA=(X^TX)^{-1}X^TYA=(XTX)−1XTY
若X为满秩的方阵，则：
A=(X)−1YA=(X)^{-1}YA=(X)−1Y

代码实现

伪代码

参考链接
参考链接

 伪码形式的算法如下所示：
输入：
data —— 一组观测数据
model —— 适应于数据的模型
n —— 适用于模型的最少数据个数
k —— 算法的迭代次数
t —— 用于决定数据是否适应于模型的阀值
d —— 判定模型是否适用于数据集的数据数目
输出：
best_model —— 跟数据最匹配的模型参数（如果没有找到好的模型，返回null）
best_consensus_set —— 估计出模型的数据点
best_error —— 跟数据相关的估计出的模型错误iterations = 0
best_model = null
best_consensus_set = null
best_error = 无穷大
while ( iterations < k )maybe_inliers = 从数据集中随机选择n个点maybe_model = 适合于maybe_inliers的模型参数consensus_set = maybe_inliersfor ( 每个数据集中不属于maybe_inliers的点 ）if ( 如果点适合于maybe_model，且错误小于t ）将点添加到consensus_setif （ consensus_set中的元素数目大于d ）已经找到了好的模型，现在测试该模型到底有多好better_model = 适合于consensus_set中所有点的模型参数this_error = better_model究竟如何适合这些点的度量if ( this_error < best_error )我们发现了比以前好的模型，保存该模型直到更好的模型出现best_model =  better_modelbest_consensus_set = consensus_setbest_error =  this_error增加迭代次数
返回 best_model, best_consensus_set, best_errorRANSAC算法的可能变化包括以下几种：（1）如果发现了一种足够好的模型（该模型有足够小的错误率），则跳出主循环。这样可能会节约计算额外参数的时间。（2）直接从maybe_model计算this_error，而不从consensus_set重新估计模型。这样可能会节约比较两种模型错误的时间，但可能会对噪声更敏感。

Given:data – A set of observations.model – A model to explain observed data points.n – Minimum number of data points required to estimate model parameters.k – Maximum number of iterations allowed in the algorithm.t – Threshold value to determine data points that are fit well by model.d – Number of close data points required to assert that a model fits well to data.Return:bestFit – model parameters which best fit the data (or null if no good model is found)iterations = 0
bestFit = null
bestErr = something really largewhile iterations < k domaybeInliers := n randomly selected values from datamaybeModel := model parameters fitted to maybeInliersalsoInliers := empty setfor every point in data not in maybeInliers doif point fits maybeModel with an error smaller than tadd point to alsoInliersend ifend forif the number of elements in alsoInliers is > d then// This implies that we may have found a good model// now test how good it is.betterModel := model parameters fitted to all points in maybeInliers and alsoInliersthisErr := a measure of how well betterModel fits these pointsif thisErr < bestErr thenbestFit := betterModelbestErr := thisErrend ifend ifincrement iterations
end whilereturn bestFit

python代码实现

# 测试发现每次拟合结果都不一样，还是有很多次拟合不准，需要多调参
def ransac(points:np.ndarray, best_model=None, max_iters=2000, min_samples=2, num_inliers=10,residual_threshold=0.1, best_error=np.inf, probability=0.99):"""Determine number trials such that at least one outlier-free subset issampled for the given inlier/outlier ratio.Parameters----------n_inliers : intNumber of inliers in the data.n_samples : intTotal number of samples in the data.min_samples : intMinimum number of samples chosen randomly from original data.probability : floatProbability (confidence) that one outlier-free sample is generated.Returns-------"""rng = np.random.default_rng()_EPSILON = np.spacing(1)n_samples = points.shape[0]best_inliers = 0iterations = 0best_iters = 0x_points = points[:, 0].reshape(-1,1)y_points = points[:, 1].reshape(-1,1)while iterations < max_iters:print(f"-----------------------------iterations {iterations}----------------------------")ridx = rng.permutation(points.shape[0])select_inliers = ridx[:min_samples] #随机排序后取前几个r = x_points[select_inliers, :].shape[0]X = x_points[select_inliers]X = np.hstack([np.ones((r, 1)), X])Y = y_points[select_inliers]try:np.linalg.inv(X.T @ X)except:print('存在不可逆')continueparams = np.linalg.inv(X.T @ X) @ X.T @ Y # A = (x.T*x)^-1 * x.T * Yy_true = y_points[ridx][min_samples:]y_pred = np.hstack([np.ones((x_points[ridx][min_samples:].shape[0], 1)), x_points[ridx][min_samples:]]) @ paramsresidual = y_true - y_predinliers_ids = np.where(abs(residual) < residual_threshold)[0]# inliers = []# for k in range(n_samples):#     y_hat = slope * X[k][0] + intercept#     if abs(y[k] - y_hat) < residual_threshold:#         inliers.append(k)inliers = inliers_ids.sizeif inliers > best_inliers:best_inliers = inliersbest_model = paramsiterations += 1# 求最大迭代次数# max_iters = np.ceil(np.log(max(_EPSILON, 1 - probability)) / #     np.log(max(_EPSILON, 1 - (inliers / n_samples) ** min_samples)))if inliers > num_inliers:inliers_points = np.hstack([select_inliers, inliers_ids])x_inliers = x_points[inliers_points]x_matrix = np.hstack([np.ones((x_inliers.shape[0], 1)), x_inliers])y_inliers = y_points[inliers_points]better_model = np.linalg.inv(x_matrix.T @ x_matrix) @ x_matrix.T @ y_inliersy_pred_better = x_matrix @ better_modelmean_square_error = np.sum((y_inliers - y_pred_better) ** 2) / y_inliers.shape[0]if mean_square_error < best_error:best_error = mean_square_errorbest_model = better_modelbest_iters = iterations# best_slope = best_model[1,0]# best_intercept = best_model[0,0]print('max_iters: ', max_iters, best_iters)print(f"#n_liers: {inliers}, #best_n_inliers: {best_inliers}, best_error:{best_error}")print(f"quit after {iterations} iterations")return best_model, best_inliers

from copy import copy
import numpy as np
from numpy.random import default_rng
rng = default_rng()class RANSAC:def __init__(self, n=10, k=100, t=0.05, d=10, model=None, loss=None, metric=None):self.n = n              # `n`: Minimum number of data points to estimate parametersself.k = k              # `k`: Maximum iterations allowedself.t = t              # `t`: Threshold value to determine if points are fit wellself.d = d              # `d`: Number of close data points required to assert model fits wellself.model = model      # `model`: class implementing `fit` and `predict`self.loss = loss        # `loss`: function of `y_true` and `y_pred` that returns a vectorself.metric = metric    # `metric`: function of `y_true` and `y_pred` and returns a floatself.best_fit = Noneself.best_error = np.infdef fit(self, X, y):for _ in range(self.k):ids = rng.permutation(X.shape[0])maybe_inliers = ids[: self.n]maybe_model = copy(self.model).fit(X[maybe_inliers], y[maybe_inliers])thresholded = (self.loss(y[ids][self.n :], maybe_model.predict(X[ids][self.n :]))< self.t)inlier_ids = ids[self.n :][np.flatnonzero(thresholded).flatten()]if inlier_ids.size > self.d:inlier_points = np.hstack([maybe_inliers, inlier_ids])better_model = copy(self.model).fit(X[inlier_points], y[inlier_points])this_error = self.metric(y[inlier_points], better_model.predict(X[inlier_points]))if this_error < self.best_error:self.best_error = this_errorself.best_fit = maybe_modelreturn selfdef predict(self, X):return self.best_fit.predict(X)def square_error_loss(y_true, y_pred):return (y_true - y_pred) ** 2def mean_square_error(y_true, y_pred):return np.sum(square_error_loss(y_true, y_pred)) / y_true.shape[0]class LinearRegressor:def __init__(self):self.params = Nonedef fit(self, X: np.ndarray, y: np.ndarray):r, _ = X.shapeX = np.hstack([np.ones((r, 1)), X])self.params = np.linalg.inv(X.T @ X) @ X.T @ yreturn selfdef predict(self, X: np.ndarray):r, _ = X.shapeX = np.hstack([np.ones((r, 1)), X])return X @ self.paramsif __name__ == "__main__":regressor = RANSAC(model=LinearRegressor(), loss=square_error_loss, metric=mean_square_error)X = np.array([-0.848,-0.800,-0.704,-0.632,-0.488,-0.472,-0.368,-0.336,-0.280,-0.200,-0.00800,-0.0840,0.0240,0.100,0.124,0.148,0.232,0.236,0.324,0.356,0.368,0.440,0.512,0.548,0.660,0.640,0.712,0.752,0.776,0.880,0.920,0.944,-0.108,-0.168,-0.720,-0.784,-0.224,-0.604,-0.740,-0.0440,0.388,-0.0200,0.752,0.416,-0.0800,-0.348,0.988,0.776,0.680,0.880,-0.816,-0.424,-0.932,0.272,-0.556,-0.568,-0.600,-0.716,-0.796,-0.880,-0.972,-0.916,0.816,0.892,0.956,0.980,0.988,0.992,0.00400]).reshape(-1,1)y = np.array([-0.917,-0.833,-0.801,-0.665,-0.605,-0.545,-0.509,-0.433,-0.397,-0.281,-0.205,-0.169,-0.0531,-0.0651,0.0349,0.0829,0.0589,0.175,0.179,0.191,0.259,0.287,0.359,0.395,0.483,0.539,0.543,0.603,0.667,0.679,0.751,0.803,-0.265,-0.341,0.111,-0.113,0.547,0.791,0.551,0.347,0.975,0.943,-0.249,-0.769,-0.625,-0.861,-0.749,-0.945,-0.493,0.163,-0.469,0.0669,0.891,0.623,-0.609,-0.677,-0.721,-0.745,-0.885,-0.897,-0.969,-0.949,0.707,0.783,0.859,0.979,0.811,0.891,-0.137]).reshape(-1,1)regressor.fit(X, y)import matplotlib.pyplot as pltplt.style.use("seaborn-darkgrid")fig, ax = plt.subplots(1, 1)ax.set_box_aspect(1)plt.scatter(X, y)line = np.linspace(-1, 1, num=100).reshape(-1, 1)plt.plot(line, regressor.predict(line), c="peru")plt.show()

python:随机采样一致性（RANSAC）直线模型拟合原理及代码实现相关推荐

最小二乘法以及RANSAC（随机采样一致性）思想及实现
线性回归–最小二乘法(Least Square Method) 线性回归: 什么是线性回归? 举个例子,某商品的利润在售价为2元.5元.10元时分别为4元.10元.20元, 我们很容易得出商品的利润与 ...
PCL 点云分割与分类 Segmentation RANSAC随机采样一致性平面模型分割欧氏距离分割区域聚类分割算法最小分割算法超体聚类渐进式形态学滤波器
点云分割博文末尾支持二维码赞赏哦 _ 点云分割是根据空间,几何和纹理等特征对点云进行划分, 使得同一划分内的点云拥有相似的特征,点云的有效分割往往是许多应用的前提, 例如逆向工作,CAD领域对零件的 ...
PCL学习：随机采样一致性算法（RANSAC）
此文是在他人的文章上进行了补充完善.另外代码部分是在Ziv Yaniv的c++实现上重新实现了一次,加了中文注释,修正了一个错误.便于理解算法实现. 随机采样一致性算法,RANSAC是"RA ...
Python-pcl 随机采样一致性算法
RANSAC 随机采样一致性算法 RANSAC是一种随机参数估计算法.RANSAC从样本中随机抽选出一个样本子集,使用最小方差估计算法对这个子集计算模型参数,然后计算所有样本与该模型的偏差,在使用一个 ...
点云PCL学习笔记-分割segmentation-RANSAC随机采样一致性算法欧式聚类提取
随机采样一致性算法RANSAC 程序实例参考网址: https://pcl.readthedocs.io/projects/tutorials/en/latest/random_sample_cons ...
一致性哈希算法原理及代码实现
一致性哈希安装 go get -u github.com/junhaideng/consistent 使用 c := consistent.New() ips := []string{"1 ...
PCL采样一致性算法
在计算机视觉领域广泛的使用各种不同的采样一致性参数估计算法用于排除错误的样本,样本不同对应的应用不同,例如剔除错误的配准点对,分割出处在模型上的点集,PCL中以随机采样一致性算法(RANSAC)为核心 ...
5.7 随机采样最小二乘法
5.7 随机采样最小二乘法如果万一出现差错,又难以检测出,则强影响点影响很大,此时可以采用一种随机方法,尽量避免差错带来的影响. 随机最小二乘法不采用所有测量数据,而是随机抽取部分测量数据,根据这些 ...
python求扇形面积_Python随机生成均匀分布在单位圆内的点代码示例
Python有一随机函数可以产生[0,1)区间内的随机数,但是如果我们想生成随机分布在单位圆上的,那么我们可以首先生成随机分布在单位圆边上的点,然后随机调整每个点距离原点的距离,但是我们发现这个距离不 ...

python:随机采样一致性（RANSAC）直线模型拟合原理及代码实现

RANSAC直线模型拟合