浅谈多目标跟踪中的相机运动

©PaperWeekly 原创 · 作者｜黄飘

学校｜华中科技大学硕士生

研究方向｜多目标跟踪

之前的文章中我介绍了 Kalman 滤波器，这个算法被广泛用于多目标跟踪任务中的行人运动模型。然而实际场景中存在有很多相机运动，仅仅依赖行人运动模型是不够的。这次我主要介绍下相机运动模型，以对极几何和 ECC 为主。完整的代码和示例我都放在了 Github：

https://github.com/nightmaredimple/libmot

多目标跟踪中的相机运动

在多目标跟踪场景中往往存在有复杂的运动模式，这些模式除了行人这类非刚性运动，还有相机这类刚性运动。以 MOT Challenge 数据集为例，其中就存在大量相机运动场景，甚至超过了静态相机场景数。

比如 MOT17-13 号视频中车载相机在车辆转弯时对于两个运动速度较慢行人的视角：

我们从示意图可以看到，由于车辆转弯速度很快，上一帧的行人位置映射到下一帧就变成了另一个的位置。因此相机运动对于多目标跟踪的影响很大，尤其是仅依赖运动信息的模型，相机的运动会严重干扰运动模型。

对极几何

2.1 对极几何模型

关于相机运动方面的知识，我在之前介绍单目深度估计中的无监督模型时介绍过，即将变化差异不剧烈的两帧画面近似看作不同相机视角下同一场景的画面，也就是对极几何，这一点可以看看《计算机视觉中的多视几何》中关于相机几何方面的知识：

不过这里我需要先解释一下一些概念，以方便后续模型的讲解：

1. 基线 [baseline]：直线 CC'为基线。

2. 对极平面束 [epipolar pencil]：以基线为轴的平面束。

3. 对极平面 [epipolar plane]：任何包含基线的平面都称为对极平面。

4. 对极点 [epipole]：摄像机的基线与每幅图像的交点。比如，上图中的点 x 和 x'。

5. 对极线 [epipolar line]：对极平面与图像的交线。

6. 5点共面：点 x，x'，摄像机中心 C、C'，空间点 X 是 5 点共面的。

7. 极线约束：两极线上点的对应关系。

接下来，我们首先看一篇 ACM MM2019 的论文 TNT [1]，这是一篇研究端到端运动信息和表观信息结合框架的论文：

不过这里我们要讲的是其提出来的相机运动模型：

我们可以看到，作者将行人运动和相机运动结合了，其中目标函数的第一部分是利用了对极几何中本质矩阵 F 的性质，相关的理论推导可以看下图：

其中 x 表示的目标框的四个顶点的坐标信息，第二部分中作者则是假设两帧中的同一目标的形状近似不变。因此我们只需要求得本质矩阵 F，即可根据上一帧目标框信息，利用最小二乘法求得下一帧目标框信息。

关于本质矩阵 F 的求解，作者提到基于 SURF 特征点提取和 Ransac 采样进行估计。

不过作者也没有给出详细的实现过程，我这里试着做一下理论推导。首先由于作者在目标函数中要求了目标框形状的一致性，那么我们不妨直接把下一帧目标框的形状信息看做已知的。

其次，我们先假设本质矩阵 F 已经被估计出来了，这个矩阵是 3x3 的形状，那么为了推导方便，我这里做一个假设：

对于第 t 帧的任意一个目标框的每一个节点，这里由于是三维的几何信息，所以添加一个 z 轴坐标，令为一个已知的三维向量，那么一个目标框就存在四个这样的三维向量，不妨看作一个 4x3 的矩阵 M。

那么就可以将目标函数展开，这里面的 (w,h) 为已知信息，(x,y) 为下一帧目标框的左上角坐标：

很明显这就是一个典型的 Ax=b 问题，后面的问题就迎刃而解了。

2.2 实验分析

为了保证效率，我这里采用 ORB 特征提取策略，然后采用 brute force 的匹配策略：

class Epipolar(object):def __init__(self, feature_method = 'orb', match_method = 'brute force',metric = cv2.NORM_HAMMING, n_points = 50, nfeatures = 500,scaleFactor = 1.2, nlevels = 8):"""Using Epipolar Geometry to Estimate Camara MotionParameters----------feature_method : strthe method of feature extraction, the default is ORB, more methods will be added in the futurematch_method : strthe method of feature matching, the default is brute force, more methods will be added in the futuremetric: metrics in cv2distance metric for feature matchingn_points: intnumbers of matched points to be considerednfeatures: intnumbers of features to be extractscaleFactor: floatscale factor for orbnlevels: floatlevels for orb"""self.metric = metricif feature_method == 'orb':self.feature_extractor = cv2.ORB_create(nfeatures = nfeatures,scaleFactor = scaleFactor, nlevels = nlevels)if match_method == 'brute force':self.matcher = cv2.BFMatcher(metric, crossCheck=True)self.n_points = n_pointsdef FeatureExtract(self, img):"""Detect and Compute the input image's keypoints and descriptorsParameters----------img : ndarray of opencvAn HxW(x3) matrix of imgReturns-------keypoints : List of cv2.KeyPointusing keypoint.pt can see (x,y)descriptors: List of descriptors[keypoints, features]keypoints: keypoints which a descriptor cannot be computed are removedfeatures: An Nx32 ndarray of unit8 when using "orb" method"""if img.ndim == 3:img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)# find the keypoints with ORBkeypoints = self.feature_extractor.detect(img, None)# compute the descriptors with ORBkeypoints, descriptors = self.feature_extractor.compute(img, keypoints)return keypoints, descriptors

‍

那么对于本质矩阵的估计和最小二乘法的应用，都可以直接利用已有的工具箱 opencv 和 numpy 搞定：

 def GetFundamentalMat(self, keypoints1, descriptors1, keypoints2, descriptors2):"""Estimate FunfamentalMatrix using BF matcher and ransac[p2;1]^T K^(-T) E K^(-1) [p1;1] = 0, T means transpose, K means the intrinsic matrix of cameraF = K^(-T) E K^(-1)Parameters----------keypoints : List of cv2.KeyPointusing keypoint.pt can see (x,y)descriptor : ndarrayAn Nx32 matrix of descriptorsReturns-------F: ndarrayA 3x3 Matrix of Fundamental Matrixmask: ndarrayA Nx1 Matrix of those inline pointspts1: List of cv2.KeyPointkeypoints matchedpts2: List of cv2.KeyPointkeypoints matchedmatches : List of matchesdistance - distance of two points,queryIdx - query image's descriptor id, default is the second imagetrainIdx - train image's descriptor id, default is the second imageimageIdx - train image's id, default is 0"""# matching pointsmatches = self.matcher.match(descriptors1, descriptors2)matches = sorted(matches, key=lambda x: x.distance)pts1 = []pts2 = []for i, match in enumerate(matches):if i >= self.n_points:breakpts1.append(keypoints1[match.queryIdx].pt)pts2.append(keypoints2[match.trainIdx].pt)pts1 = np.int32(pts1)pts2 = np.int32(pts2)matches = matches[:self.n_points]## Estimate Fundamental Matrix by ransac, distance_threshold = 1, confidence_threshold = 0.99F, mask = cv2.findFundamentalMat(pts1, pts2, cv2.FM_RANSAC, 1, 0.99)return F, mask, pts1, pts2, matchesdef EstimateBox(self, boxes, F):"""Estimate box in target image by Fundamental MatrixParameters----------boxes : array likeA Nx4 matrix of boxes in source images (x,y,w,h)F : ndarrayA 3x3 Fundamental MatrixReturns-------aligned_boxes: ndarrayA Nx4 matrix of boxes in source images (x,y,w,h)Method-------L = ||Bi^T F Ai||2 + ||(A2-A0)+(B2-B0)||2A is the four corner of box in source imageB is the four corner of aligned box in target imageA0,B0:top left corner of box, [x;y;1]A1,B1:top right corner of boxA2,B2:bottom left corner of boxA3,B3:bottom right corner of boxthe height and width of boxes and aligned boxes are assumed to be samewe can use greedy strategy: make M = A^T F^Tthen:M11   x1   +   M12  y1   + M13 = 0M21 (x1+w) +   M22  y1   + M23 = 0M31   x1   +   M32 y1+h  + M33 = 0M41 (x1+w) +  M42 (y1+h) + M43 = 0=>M[:2][x;y] + M[:3]+[0;M21w;M32h;M41w+M42h] = 0 ->Ax = bx = (pseudo inverse of A )b"""boxes = np.asarray(boxes)if boxes.ndim == 1:boxes = boxes[np.newaxis, :]aligned_boxes = np.zeros(boxes.shape)for i, bbox in enumerate(boxes):w = bbox[2]h = bbox[3]AT = np.array([[bbox[0]   , bbox[1]    , 1],[bbox[0] + w, bbox[1]    , 1],[bbox[0]    , bbox[1] + h, 1],[bbox[0] + w, bbox[1] + h, 1]])M = AT @ F.Tb = -M[:, 2] - np.array([0, M[1][0]*w, M[2][1]*h, M[3][0]*w+M[3][1]*h])aligned_tl = np.linalg.pinv(M[:,:2]) @ baligned_boxes[i, 0] = aligned_tl[0]aligned_boxes[i, 1] = aligned_tl[1]aligned_boxes[i, 2] = waligned_boxes[i, 3] = hreturn aligned_boxes.astype(np.int32)

具体效果如下：

上面极线的法线也正好是车载相机的方向所在，可以看到第一章的示例问题被很大缓解了：

ECC

3.1 原理介绍

第二章所介绍的对极几何方法，由于我们只是根据二维信息对三维信息的估计，所以也会存在误差。这一张我们也讲一个简单有效的方案，那就是“仿射变换”。当然，并不是我们所理解的那种仿射变换，具体细节我将慢慢介绍。

第一次看到 ECC 算法，我是在 ICCV 2019 的 Tracktor++ [3] 中，不过作者只是一笔带过，没有提及如何实现。ECC 算法全名是增强相关系数算法 [2]，来自于 PAMI2008 的一篇论文，这个算法适用于图像配准任务的：

也就是对于两张内容差异小，但是存在光照、尺度、颜色、平移等变换影响的图像，将二者对齐。ECC 算法本质是一个目标函数：

当然这只是一个原始形式，在求解过程中有所调整，我就不细讲这里的理论了。可以注意到的是 y=warp(x) 这个函数，所以这个算法假设两帧图像之间存在某种变换，不一定是仿射变换，可能有以下几种：

其中最后一种透视变换的矩阵形式是：

前三种变换则不考虑最后一行信息，即 2x3 的矩阵形式。

3.2 实验分析

opencv 中正好提供了 ECC 相关的功能函数，这里我们只需要再次封装，以方便多目标跟踪。可以知道的是 ECC 算法的核心在于变换矩阵的求解：

def ECC(src, dst, warp_mode = cv2.MOTION_EUCLIDEAN, eps = 1e-5,max_iter = 100, scale = None, align = False):"""Compute the warp matrix from src to dst.Parameters----------src : ndarrayAn NxM matrix of source img(BGR or Gray), it must be the same format as dst.dst : ndarrayAn NxM matrix of target img(BGR or Gray).warp_mode: flags of opencvtranslation: cv2.MOTION_TRANSLATIONrotated and shifted: cv2.MOTION_EUCLIDEANaffine(shift,rotated,shear): cv2.MOTION_AFFINEhomography(3d): cv2.MOTION_HOMOGRAPHYeps: floatthe threshold of the increment in the correlation coefficient between two iterationsmax_iter: intthe number of iterations.scale: float or [int, int]scale_ratio: floatscale_size: [W, H]align: boolwhether to warp affine or perspective transforms to the source imageReturns-------warp matrix : ndarrayReturns the warp matrix from src to dst.if motion model is homography, the warp matrix will be 3x3, otherwise 2x3src_aligned: ndarrayaligned source image of gray"""assert src.shape == dst.shape, "the source image must be the same format to the target image!"# BGR2GRAYif src.ndim == 3:# Convert images to grayscalesrc = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)dst = cv2.cvtColor(dst, cv2.COLOR_BGR2GRAY)# make the imgs smaller to speed upif scale is not None:if isinstance(scale, float) or isinstance(scale, int):if scale != 1:src_r = cv2.resize(src, (0, 0), fx = scale, fy = scale,interpolation =  cv2.INTER_LINEAR)dst_r = cv2.resize(dst, (0, 0), fx = scale, fy = scale,interpolation =  cv2.INTER_LINEAR)scale = [scale, scale]else:src_r, dst_r = src, dstscale = Noneelse:if scale[0] != src.shape[1] and scale[1] != src.shape[0]:src_r = cv2.resize(src, (scale[0], scale[1]), interpolation = cv2.INTER_LINEAR)dst_r = cv2.resize(dst, (scale[0], scale[1]), interpolation=cv2.INTER_LINEAR)scale = [scale[0] / src.shape[1], scale[1] / src.shape[0]]else:src_r, dst_r = src, dstscale = Noneelse:src_r, dst_r = src, dst# Define 2x3 or 3x3 matrices and initialize the matrix to identityif warp_mode == cv2.MOTION_HOMOGRAPHY :warp_matrix = np.eye(3, 3, dtype=np.float32)else :warp_matrix = np.eye(2, 3, dtype=np.float32)# Define termination criteriacriteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, max_iter, eps)# Run the ECC algorithm. The results are stored in warp_matrix.(cc, warp_matrix) = cv2.findTransformECC (src_r, dst_r, warp_matrix, warp_mode, criteria, None, 1)if scale is not None:warp_matrix[0, 2] = warp_matrix[0, 2] / scale[0]warp_matrix[1, 2] = warp_matrix[1, 2] / scale[1]if align:sz = src.shapeif warp_mode == cv2.MOTION_HOMOGRAPHY:# Use warpPerspective for Homographysrc_aligned = cv2.warpPerspective(src, warp_matrix, (sz[1],sz[0]), flags=cv2.INTER_LINEAR)else :# Use warpAffine for Translation, Euclidean and Affinesrc_aligned = cv2.warpAffine(src, warp_matrix, (sz[1],sz[0]), flags=cv2.INTER_LINEAR)return warp_matrix, src_alignedelse:return warp_matrix, None

这里面我添加了一个技巧，由于 ECC 算法针对的是两幅图，所以图像的尺寸对于算法求解速度的影响很大。因此这里我根据变换矩阵的形式，设计了一种可以根据尺度放缩自动调节的简易算法。效果如下：

效果也很好，值得一提的是，ECC 算法只需要大约几毫秒的时间，但是由于它的求解效率跟变换的难度相关，所以间隔越久越慢，而对极几何的方法效率比较稳定，不过就很慢了。

其他近似方案

4.1 光流

上面我介绍的都是近两年关于相机运动的针对性解决方案，那么实际上在有一些算法模型中，如果场景变化不剧烈，并不特别需要用到运动模型。比如基于光流法的多目标跟踪算法，这里众所周知的就是 ICCV2015 的 NOMT [5] 算法。

作者用的是一种简化版的快速光流法，那么更形象的可以看今年刚出的一篇论文《Multiple Object Tracking by Flowing and Fusing》，具体我就不说了，就是简单的在 Tracktor++ 框架上加了一个光流预测分支：

可以看到的是，光流也是在捕捉相邻帧中相似的像素信息，这一点跟第二章中提出的两种相机运动模型有点类似，所以不需要显式使用相机运动模型。

4.2 SOT

而基于 SOT 的方法，无论是使用传统的相关滤波算法还是使用 Siamese 类深度学习框架，都会在上一帧目标周围 1.5~2.5 倍区域搜索下一帧的目标，这里面会显式或者隐式用到特征的比对。只不过不同于上面的像素比对，这里是更加高层的特征比对。

参考资料

[1] Wang G, Wang Y, Zhang H, et al. Exploit the connectivity: Multi-object tracking with trackletnet[C]. in: Proceedings of the 27th ACM International Conference on Multimedia. 2019. 482-490.

[2] Evangelidis G D, Psarakis E Z. Parametric image alignment using enhanced correlation coefficient maximization[J]. IEEE transactions on pattern analysis and machine intelligence, 2008, 30(10): 1858-1865.

[3] Bergmann P, Meinhardt T, Leal-Taixe L. Tracking without bells and whistles[C]. in: Proceedings of the IEEE International Conference on Computer Vision. 2019. 941-951.

[4] Choi W. Near-online multi-target tracking with aggregated local flow descriptor[C]. in: Proceedings of the IEEE international conference on computer vision. 2015. 3029-3037.

[5] Feng W, Hu Z, Wu W, et al. Multi-object tracking with multiple cues and switcher-aware classification[J]. arXiv preprint arXiv:1901.06129, 2019.

[6] https://blog.csdn.net/ssw_1990/article/details/53355572

点击以下标题查看更多往期内容：

对抗训练浅谈：意义、方法和思考
一文读懂领域迁移与领域适应的常见方法
借助注意力机制实现特征软阈值化
CVPR 2020 三篇有趣的论文解读
NAS+目标检测：AI设计的目标检测模型
图神经网络时代的深度聚类

????

现在，在「知乎」也能找到我们了

进入知乎首页搜索「PaperWeekly」

点击「关注」订阅我们的专栏吧

关于PaperWeekly

PaperWeekly 是一个推荐、解读、讨论、报道人工智能前沿论文成果的学术平台。如果你研究或从事 AI 领域，欢迎在公众号后台点击「交流群」，小助手将把你带入 PaperWeekly 的交流群里。

浅谈多目标跟踪中的相机运动相关推荐

opencv运动目标跟踪预测_浅谈多目标跟踪中的相机运动
©PaperWeekly 原创 · 作者|黄飘学校|华中科技大学硕士生研究方向|多目标跟踪之前的文章中我介绍了 Kalman 滤波器,这个算法被广泛用于多目标跟踪任务中的行人运动模型.然而实际场 ...
pcl中ransac提取直线_多目标跟踪中的相机运动模型
前言之前的博客中我介绍了Kalman滤波器,这个算法被广泛用于多目标跟踪任务中的行人运动模型.然而实际场景中存在有很多相机运动,仅仅依赖行人运动模型是不够的.这次我主要介绍下相机运动模型,以对极几何 ...
视频基础知识：浅谈视频会议中H.264编码标准的技术发展
浅谈视频会议中H.264编码标准的技术发展浅谈视频会议中H.264编码标准的技术发展数字视频技术广泛应用于通信.计算机.广播电视等领域,带来了会议电视.可视电话及数字电视.媒体存储等一系列应用,促 ...
java 中的单元测试_浅谈Java 中的单元测试
单元测试编写 Junit 单元测试框架对于Java语言而言,其单元测试框架,有Junit和TestNG这两种, 下面是一个典型的JUnit测试类的结构 package com.example.dem ...
mybatis与php,浅谈mybatis中的#和$的区别
浅谈mybatis中的#和$的区别发布于 2016-07-30 11:14:47 | 236 次阅读 | 评论: 0 | 来源: 网友投递 MyBatis 基于Java的持久层框架MyBatis 本 ...
浅谈CSS3中display属性的Flex布局，关于登陆页面属性框的设置
声明:本文转发自三里屯柯南的浅谈CSS3中display属性的Flex布局http://www.cnblogs.com/xuyuntao/articles/6391728.html 基本概念采用Fl ...
python sys模块作用_浅谈Python中的模块
模块为了编写可维护的代码,我们把很多函数分组,分别放到不同的文件里,这样,每个文件包含的代码就相对较少,很多编程语言都采用这种组织代码的方式.在Python中,一个.py文件就称之为一个模块(Mod ...
python生成器和迭代器作用_浅谈Python中的生成器和迭代器
迭代器迭代器协议对象必须提供一个next方法,执行该方法要么返回迭代中的下一项,要么返回一个异常来终止本次迭代.(只能往前走,不能往后退!) 迭代器对象遵循了(实现了)迭代器协议的对象.(对象内 ...
oracle hash join outer,CSS_浅谈Oracle中的三种Join方法，基本概念 Nested loop join： Outer - phpStudy...
浅谈Oracle中的三种Join方法基本概念 Nested loop join: Outer table中的每一行与inner table中的相应记录join,类似一个嵌套的循环. Sort mer ...

浅谈多目标跟踪中的相机运动

浅谈多目标跟踪中的相机运动相关推荐

最新文章

热门文章