SORT 多目标跟踪算法笔记

SORT 是一种简单的在线实时多目标跟踪算法。文章要点为：

以 IoU 作为前后帧间目标关系度量指标；
利用卡尔曼滤波器预测当前位置；
通过匈牙利算法关联检测框到目标；
应用试探期甄别虚检；
使用 Faster R-CNN，证明检测好跟踪可以很简单。

技术方案

SORT 算法以检测作为关键组件，传播目标状态到未来帧中，将当前检测与现有目标相关联，并管理跟踪目标的生命周期。

检测

跟踪框架使用 Faster R-CNN 并应用其在 PASCAL VOC 挑战中的默认参数，只输出概率大于50%的行人检测结果而忽略其他类。
文章在实验中替换 MDP 和所提方法的检测，发现检测质量对跟踪性能有显著影响。

估计模型

目标模型，即用于将目标身份传播到下一帧的表示和运动模型。SORT 算法用一个独立于其他物体和相机运动的线性等速模型来近似每个物体的帧间位移。每个目标的状态建模为：

x=[u,v,s,r,u˙,v˙,s˙]T,\mathbf{x} = [u,v,s,r,\dot{u},\dot{v},\dot{s}]^T, x=[u,v,s,r,u˙,v˙,s˙]T,

其中 uuu 和 vvv 分别代表目标中心的水平和垂直像素位置，而 sss 和 rrr 分别代表目标边界框的比例（面积）和纵横比。注意，纵横比被认为是常数。关联检测到目标后，用检测到的边界框更新目标状态，其中速度分量通过卡尔曼滤波器框架进行优化求解。如果没有与目标相关的检测，则使用线性速度模型简单地预测其状态而不进行校正。

数据关联

在将检测分配给现有目标时：

预测每个目标在当前帧中的新位置，估计其边界框形状；
由每个检测与现有目标的所有预测边界框之间的交并比（IoU）计算分配成本矩阵；
使用匈牙利算法对分配进行优化求解；
拒绝检测与目标重叠小于 IOUminIOU_{min}IOUmin 的分配。

文章发现边界框的 IoU 距离隐式处理由目标经过引起的短时遮挡。具体地说，当遮挡物盖过目标时，只检测到遮挡物。尽管隐藏目标离检测框中心更近，但 IoU 距离更倾向于具有相似比例的检测。这使得可以在不影响覆盖目标的情况下，通过检测对遮挡目标进行校正。

创建和删除轨迹标识

当目标进入和离开图像时，需要相应地创建或销毁唯一标识。对于创建跟踪程序，文中认为任何重叠小于 IoUminIoU_{min}IoUmin 的检测都表示存在未跟踪的目标。使用速度设置为零的边界框信息初始化跟踪器。由于此时无法观测到速度，因此速度分量的协方差用较大的值初始化，反映出这种不确定性。此外，新的跟踪器将经历一个试用期，其中目标需要与检测相关联以积累足够的证据以防止误报的跟踪。

如果 TLostT_{Lost}TLost 帧未检测到，则终止轨迹。这可以防止跟踪器数量的无限增长以及由于无检测校正下预测时间过长而导致的定位错误。在所有实验中，TLostT_{Lost}TLost 设为1有以下原因：

首先，等速模型对真实动力学的预测能力较差；
其次，我们主要关注逐帧跟踪，目标重识别超出本工作范畴；
此外，早期删除丢失的目标有助于提高效率。如果目标重新出现，跟踪将在新标识下隐式恢复。

sort.py

算法和程序都比较简单。程序依赖 scikit-learn 所提供的 linear_assignment 实现匈牙利匹配。KalmanFilter 由 FilterPy 提供。

matplotlib.pyplot.ion() 打开交互模式。

  # all trainsequences = ['PETS09-S2L1','TUD-Campus','TUD-Stadtmitte','ETH-Bahnhof','ETH-Sunnyday','ETH-Pedcross2','KITTI-13','KITTI-17','ADL-Rundle-6','ADL-Rundle-8','Venice-2']args = parse_args()display = args.displayphase = 'train'total_time = 0.0total_frames = 0colours = np.random.rand(32,3) #used only for displayif(display):if not os.path.exists('mot_benchmark'):print('\n\tERROR: mot_benchmark link not found!\n\n    Create a symbolic link to the MOT benchmark\n    (https://motchallenge.net/data/2D_MOT_2015/#download). E.g.:\n\n    $ ln -s /path/to/MOT2015_challenge/2DMOT2015 mot_benchmark\n\n')exit()plt.ion()fig = plt.figure() if not os.path.exists('output'):os.makedirs('output')

对于每个序列，创建一个 SORT 跟踪器实例。
加载序列的检测数据。检测框格式为[x1,y1,w,h]。

  for seq in sequences:mot_tracker = Sort() #create instance of the SORT trackerseq_dets = np.loadtxt('data/%s/det.txt'%(seq),delimiter=',') #load detectionswith open('output/%s.txt'%(seq),'w') as out_file:print("Processing %s."%(seq))for frame in range(int(seq_dets[:,0].max())):frame += 1 #detection and frame numbers begin at 1dets = seq_dets[seq_dets[:,0]==frame,2:7]dets[:,2:4] += dets[:,0:2] #convert to [x1,y1,w,h] to [x1,y1,x2,y2]total_frames += 1

skimage.io.imread 从文件加载图像。

        if(display):ax1 = fig.add_subplot(111, aspect='equal')fn = 'mot_benchmark/%s/%s/img1/%06d.jpg'%(phase,seq,frame)im =io.imread(fn)ax1.imshow(im)plt.title(seq+' Tracked Targets')

update 由检测框更新轨迹。trackers命名有问题。

        start_time = time.time()trackers = mot_tracker.update(dets)cycle_time = time.time() - start_timetotal_time += cycle_time

matplotlib.axes.Axes.add_patch 将补丁p添加到轴补丁列表中；剪辑框将设置为 Axes 剪切框。如果未设置变换，则将其设置为 transData。返回补丁。
matplotlib.axes.Axes.set_adjustable 定义 Axes 将更改哪个参数以实现给定面。

        for d in trackers:print('%d,%d,%.2f,%.2f,%.2f,%.2f,1,-1,-1,-1'%(frame,d[4],d[0],d[1],d[2]-d[0],d[3]-d[1]),file=out_file)if(display):d = d.astype(np.int32)ax1.add_patch(patches.Rectangle((d[0],d[1]),d[2]-d[0],d[3]-d[1],fill=False,lw=3,ec=colours[d[4]%32,:]))ax1.set_adjustable('box-forced')if(display):fig.canvas.flush_events()plt.draw()ax1.cla()

  print("Total Tracking took: %.3f for %d frames or %.1f FPS"%(total_time,total_frames,total_frames/total_time))if(display):print("Note: to get real runtime results run without the option: --display")

Sort

Sort 是一个多目标跟踪器，管理多个 KalmanBoxTracker 对象。

  def __init__(self,max_age=1,min_hits=3):"""Sets key parameters for SORT"""self.max_age = max_ageself.min_hits = min_hitsself.trackers = []self.frame_count = 0

update

参数dets：格式为[[x1,y1,x2,y2,score],[x1,y1,x2,y2,score],...]的 numpy 检测数组。
要求：即使空检测，也必须为每个帧调用此方法一次。返回一个类似的数组，其中最后一列是对象 ID。

注意：返回的对象数可能与提供的检测数不同。

update 的输入参数dets为 numpy.array，然而 KalmanBoxTracker 要求的输入为列表。

Created with Raphaël 2.2.0updatedetsKalmanBoxTracker.predictassociate_detections_to_trackersKalmanBoxTracker.updateKalmanBoxTrackertracksEnd

从现有跟踪器获取预测位置。
predict 推进状态向量并返回预测的边界框估计。

在当前帧逐个预测轨迹位置，记录状态异常的跟踪器索引。trks存储跟踪器的预测，不幸与下面的跟踪器重名。

    self.frame_count += 1#get predicted locations from existing trackers.trks = np.zeros((len(self.trackers),5))to_del = []ret = []for t,trk in enumerate(trks):pos = self.trackers[t].predict()[0]trk[:] = [pos[0], pos[1], pos[2], pos[3], 0]if(np.any(np.isnan(pos))):to_del.append(t)

numpy.ma.masked_invalid 屏蔽出现无效值的数组（NaN 或 inf）。
numpy.ma.compress_rows 压缩包含掩码值的2-D 数组的整行。这相当于np.ma.compress_rowcols(a, 0)，有关详细信息，请参阅 extras.compress_rowcols。
reversed 返回反向 iterator. seq 必须是具有 __reversed__() 方法的对象，或者支持序列协议（__len__() 方法和 __getitem__() 方法，整数参数从0开始）。

逆向删除异常的跟踪器，防止破坏索引。压缩能够保证在数组中的位置不变。
associate_detections_to_trackers 将检测分配给跟踪对象（均以边界框表示）。返回3个列表：matches，unmatched_detections和unmatched_trackers。

    trks = np.ma.compress_rows(np.ma.masked_invalid(trks))for t in reversed(to_del):self.trackers.pop(t)matched, unmatched_dets, unmatched_trks = associate_detections_to_trackers(dets,trks)

使用分配的检测更新匹配的跟踪器。为什么不通过matched存储的索引选择跟踪器？
update 使用观测边界框更新状态向量。

    #update matched trackers with assigned detectionsfor t,trk in enumerate(self.trackers):if(t not in unmatched_trks):d = matched[np.where(matched[:,1]==t)[0],0]trk.update(dets[d,:][0])

由未匹配的检测创建和初始化新的跟踪器。

    #create and initialise new trackers for unmatched detectionsfor i in unmatched_dets:trk = KalmanBoxTracker(dets[i,:]) self.trackers.append(trk)

get_state 返回当前边界框估计值。
ret格式为[[x1,y1,x2,y2,score],[x1,y1,x2,y2,score],...]。

自后向前遍历，仅返回在当前帧出现且命中周期大于self.min_hits（除非跟踪刚开始）的跟踪结果；如果未命中时间大于self.max_age则删除跟踪器。
hit_streak忽略目标初始的若干帧。

    i = len(self.trackers)for trk in reversed(self.trackers):d = trk.get_state()[0]if((trk.time_since_update < 1) and (trk.hit_streak >= self.min_hits or self.frame_count <= self.min_hits)):ret.append(np.concatenate((d,[trk.id+1])).reshape(1,-1)) # +1 as MOT benchmark requires positivei -= 1#remove dead trackletif(trk.time_since_update > self.max_age):self.trackers.pop(i)

    if(len(ret)>0):return np.concatenate(ret)return np.empty((0,5))

associate_detections_to_trackers

这里命名不准确，应该是将检测框关联到跟踪目标（objects）或者轨迹（tracks），而不是跟踪器（trackers）。
跟踪器数量为0则直接构造结果。

  if(len(trackers)==0):return np.empty((0,2),dtype=int), np.arange(len(detections)), np.empty((0,5),dtype=int)iou_matrix = np.zeros((len(detections),len(trackers)),dtype=np.float32)

iou 不支持数组计算。
逐个计算两两间的交并比，调用 linear_assignment 进行匹配。

  for d,det in enumerate(detections):for t,trk in enumerate(trackers):iou_matrix[d,t] = iou(det,trk)matched_indices = linear_assignment(-iou_matrix)

记录未匹配的检测框及轨迹。

  unmatched_detections = []for d,det in enumerate(detections):if(d not in matched_indices[:,0]):unmatched_detections.append(d)unmatched_trackers = []for t,trk in enumerate(trackers):if(t not in matched_indices[:,1]):unmatched_trackers.append(t)

过滤掉 IoU 低的匹配。

  #filter out matched with low IOUmatches = []for m in matched_indices:if(iou_matrix[m[0],m[1]]<iou_threshold):unmatched_detections.append(m[0])unmatched_trackers.append(m[1])else:matches.append(m.reshape(1,2))

初始化用列表，返回值用 Numpy.array。

  if(len(matches)==0):matches = np.empty((0,2),dtype=int)else:matches = np.concatenate(matches,axis=0)return matches, np.array(unmatched_detections), np.array(unmatched_trackers)

KalmanBoxTracker

此类表示观测目标框所对应跟踪对象的内部状态。
定义等速模型。
内部使用 KalmanFilter，7个状态变量，4个观测输入。
F是状态变换模型，H是观测函数，R为测量噪声矩阵，P为协方差矩阵，Q为过程噪声矩阵。
状态转移矩阵A根据运动学公式确定
x=[u,v,s,r,u˙,v˙,s˙]T,\mathbf{x} = [u,v,s,r,\dot{u},\dot{v},\dot{s}]^T, x=[u,v,s,r,u˙,v˙,s˙]T,
F=[1000Δu0001000Δv0001000Δs0001000000010000000100000001]F=\begin{bmatrix} 1 & 0 & 0 & 0 & \Delta u & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & \Delta v & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & \Delta s \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix} F=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡1000000010000000100000001000Δu0001000Δv0001000Δs0001⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤

H=[1000000010000000100000001000]H=\begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 \end{bmatrix} H=⎣⎢⎢⎡1000010000100001000000000000⎦⎥⎥⎤

  count = 0def __init__(self,bbox):"""Initialises a tracker using initial bounding box."""#define constant velocity modelself.kf = KalmanFilter(dim_x=7, dim_z=4)self.kf.F = np.array([[1,0,0,0,1,0,0],[0,1,0,0,0,1,0],[0,0,1,0,0,0,1],[0,0,0,1,0,0,0],  [0,0,0,0,1,0,0],[0,0,0,0,0,1,0],[0,0,0,0,0,0,1]])self.kf.H = np.array([[1,0,0,0,0,0,0],[0,1,0,0,0,0,0],[0,0,1,0,0,0,0],[0,0,0,1,0,0,0]])self.kf.R[2:,2:] *= 10.self.kf.P[4:,4:] *= 1000. #give high uncertainty to the unobservable initial velocitiesself.kf.P *= 10.self.kf.Q[-1,-1] *= 0.01self.kf.Q[4:,4:] *= 0.01self.kf.x[:4] = convert_bbox_to_z(bbox)self.time_since_update = 0self.id = KalmanBoxTracker.countKalmanBoxTracker.count += 1self.history = []self.hits = 0self.hit_streak = 0self.age = 0

update

使用观察到的目标框更新状态向量。filterpy.kalman.KalmanFilter.update 会根据观测修改内部状态估计self.kf.x。
重置self.time_since_update，清空self.history。

    self.time_since_update = 0self.history = []self.hits += 1self.hit_streak += 1self.kf.update(convert_bbox_to_z(bbox))

predict

推进状态向量并返回预测的边界框估计。
将预测结果追加到self.history。由于 get_state 直接访问 self.kf.x，所以self.history没有用到。

    if((self.kf.x[6]+self.kf.x[2])<=0):self.kf.x[6] *= 0.0self.kf.predict()self.age += 1if(self.time_since_update>0):self.hit_streak = 0self.time_since_update += 1self.history.append(convert_x_to_bbox(self.kf.x))return self.history[-1]

get_state

convert_x_to_bbox

返回当前边界框估计值。

    return convert_x_to_bbox(self.kf.x)

iou

@numba.jit 即时编译修饰函数以生成高效的机器代码。所有参数都是可选的。

@jit
def iou(bb_test,bb_gt):"""Computes IUO between two bboxes in the form [x1,y1,x2,y2]"""xx1 = np.maximum(bb_test[0], bb_gt[0])yy1 = np.maximum(bb_test[1], bb_gt[1])xx2 = np.minimum(bb_test[2], bb_gt[2])yy2 = np.minimum(bb_test[3], bb_gt[3])w = np.maximum(0., xx2 - xx1)h = np.maximum(0., yy2 - yy1)wh = w * ho = wh / ((bb_test[2]-bb_test[0])*(bb_test[3]-bb_test[1])+ (bb_gt[2]-bb_gt[0])*(bb_gt[3]-bb_gt[1]) - wh)return(o)

convert_bbox_to_z

将[x1,y1,x2,y2]形式的检测框转为滤波器的状态表示形式[x,y,s,r]。其中x，y是框的中心，s是比例/区域，r是宽高比。

  w = bbox[2]-bbox[0]h = bbox[3]-bbox[1]x = bbox[0]+w/2.y = bbox[1]+h/2.s = w*h    #scale is just arear = w/float(h)return np.array([x,y,s,r]).reshape((4,1))

convert_x_to_bbox

将[cx，cy，s，r]的目标框表示转为[x_min，y_min，x_max，y_max]的形式。

  w = np.sqrt(x[2]*x[3])h = x[2]/wif(score==None):return np.array([x[0]-w/2.,x[1]-h/2.,x[0]+w/2.,x[1]+h/2.]).reshape((1,4))else:return np.array([x[0]-w/2.,x[1]-h/2.,x[0]+w/2.,x[1]+h/2.,score]).reshape((1,5))

改进思路

Sort 算法受限于在线的定位，直接忽略了所有目标的考察期输出。这未免有些因噎废食。对于目标的甄别期较短，可以考虑延时判断后再行输出。

参考资料：

【算法分析】SORT/Deep SORT 物体跟踪算法解析
人脸跟踪：deepsort代码解读
二分图的最大匹配、完美匹配和匈牙利算法
The Optimal Assignment Problem
assignment-problem-and-hungarian-algorithm
The Hungarian Algorithm for Weighted Bipartite Graphs
匈牙利算法详解（含时间复杂度）
srianant/kalman_filter_multi_object_tracking
[Tutorial OpenCV] “Ball Tracker” using Kalman filter
SORT:SIMPLE ONLINE AND REALTIME TRACKING
多目标跟踪(MOT)论文随笔-SIMPLE ONLINE AND REALTIME TRACKING (SORT)
多目标跟踪方法：deep-sort
卡尔曼滤波的理解以及参数调整
图说卡尔曼滤波，一份通俗易懂的教程
Kalman滤波器从原理到实现
The Hungarian algorithm: Kuhn-Munkres theorem
How to save a list as numpy array in python?
How to delete items from a dictionary while iterating over it?