本教程将教您如何使用 dlib 和 Python 执行对象跟踪。阅读今天的博文后，您将能够使用 dlib 在实时视频中跟踪对象。

1.执行一次对象检测（或每N帧一次）
2.然后应用专用的跟踪算法，可以在后续帧中当对象移动时保持跟踪，而无需执行对象检测

这样的方法可行吗？

答案是肯定的，特别是我们可以使用 dlib 的相关性跟踪算法的实现。

在今天博文的其余部分，您将学习如何应用 dlib 的相关性跟踪器来实时跟踪视频流中的对象。

1.使用 dlib 进行对象跟踪

我们将从今天的教程开始，简要讨论 dlib 实现基于相关性的对象跟踪。

从那里我将向您展示如何在您自己的应用程序中使用 dlib 的对象跟踪器。

最后，我们将通过讨论 dlib 的对象跟踪器的一些限制和缺点来结束今天的内容。

2.什么是相关性跟踪器？

dlib相关跟踪器的实现建立在 Bolme 等人 2010 年工作 MOSSE 跟踪器的基础上，即使用自适应相关过滤器进行视觉对象跟踪。虽然 MOSSE 跟踪器适用于平移的对象，但它通常无法处理规模变化的对象。

Danelljan 等人的工作。提出利用比例金字塔在找到最佳平移后准确估计对象的比例。这一突破使我们能够跟踪在 (1) 平移和 (2) 缩放整个视频流中发生变化的对象——此外，我们可以实时执行这种跟踪。

3.项目结构

要查看该项目的组织方式，只需在终端中使用 tree 命令：

$ tree
.
├── input
│   ├── cat.mp4
│   └── race.mp4
├── output
│   ├── cat_output.avi
│   └── race_output.avi
├── mobilenet_ssd
│   ├── MobileNetSSD_deploy.caffemodel
│   └── MobileNetSSD_deploy.prototxt
└── track_object.py

我们有三个目录：

input/ : 包含用于对象跟踪的输入视频。
output/ : 我们处理的视频。在处理后的视频中，跟踪的对象用框和标签进行注释。
mobilenet_ssd/ : Caffe CNN 模型文件包含在此目录中。

今天我们将回顾一个 Python 脚本：track_object.py。

4.实现我们的 dlib 对象跟踪器

让我们继续并开始使用 dlib 实现我们的对象跟踪器。
打开 track_object.py 并插入以下代码：

# import the necessary packages
from imutils.video import FPS
import numpy as np
import argparse
import imutils
import dlib
import cv2

这里我们导入我们需要的包。值得注意的是，我们正在使用 dlib、imutils 和 OpenCV。
让我们解析我们的命令行参数：

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,help="path to Caffe pre-trained model")
ap.add_argument("-v", "--video", required=True,help="path to input video file")
ap.add_argument("-l", "--label", required=True,help="class label we are interested in detecting + tracking")
ap.add_argument("-o", "--output", type=str,help="path to optional output video file")
ap.add_argument("-c", "--confidence", type=float, default=0.2,help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

我们的脚本有四个必需的命令行参数：

–prototxt : 我们部署所需的Caffe prototxt 文件的路径。
–model : 我们部署所需的Caffe 预训练模型.
–video : 输入视频文件的路径。今天的脚本适用于视频文件而不是您的网络摄像头（但您可以轻松更改它以支持网络摄像头流）。
–label : 一个类标签，我们感兴趣的检测和跟踪。查看下一个代码块中该模型支持的可用类。

还有两个选项:

–output : 如果你想保存目标跟踪器的结果，输出视频文件的可选路径。
–confidence : 对于默认值=0.2，这是最小概率阈值，它允许我们从Caffe对象检测器中过滤弱检测。

让我们定义这个模型支持的类，并从磁盘加载我们的网络:

# 初始化MobileNet SSD训练好的类标签列表
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat","bottle", "bus", "car", "cat", "chair", "cow", "diningtable","dog", "horse", "motorbike", "person", "pottedplant", "sheep","sofa", "train", "tvmonitor"]
# 从磁盘加载我们的序列化模型
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

我们将使用预训练的 MobileNet SSD 在单帧中执行目标检测。从那里，对象位置将被移交给 dlib 的相关跟踪器，以跟踪整个视频的剩余帧。模型支持 20 个对象类（加上 1 个用于背景类）。

注意：如果您使用不同的 Caffe 模型，则需要重新定义此 CLASSES 列表。同样，如果您使用的是今天下载的模型，请不要修改此列表。

在循环帧之前，我们需要将模型加载到内存中。加载 Caffe 模型所需的只是 prototxt 和模型文件的路径（这两个参数都可以在命令行参数字典中找到）。

现在让我们执行重要的初始化，特别是我们的视频流：

# initialize the video stream, dlib correlation tracker, output video
# writer, and predicted class label
print("[INFO] starting video stream...")
vs = cv2.VideoCapture(args["video"])
tracker = None
writer = None
label = ""
# start the frames per second throughput estimator
fps = FPS().start()

我们初始化视频流、跟踪器和视频写入器对象。我们还初始化我们的文本标签。我们实例化每秒帧数估计器。
现在我们准备开始循环我们的视频帧：

# loop over frames from the video file stream
while True:# grab the next frame from the video file(grabbed, frame) = vs.read()# check to see if we have reached the end of the video fileif frame is None:break# resize the frame for faster processing and then convert the# frame from BGR to RGB ordering (dlib needs RGB ordering)frame = imutils.resize(frame, width=600)rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# if we are supposed to be writing a video to disk, initialize# the writerif args["output"] is not None and writer is None:fourcc = cv2.VideoWriter_fourcc(*"MJPG")writer = cv2.VideoWriter(args["output"], fourcc, 30,(frame.shape[1], frame.shape[0]), True)

接下来，我们需要检测要跟踪的对象（如果我们还没有的话）：

 # if our correlation object tracker is None we first need to# apply an object detector to seed the tracker with something# to actually trackif tracker is None:# grab the frame dimensions and convert the frame to a blob(h, w) = frame.shape[:2]blob = cv2.dnn.blobFromImage(frame, 0.007843, (w, h), 127.5)# pass the blob through the network and obtain the detections# and predictionsnet.setInput(blob)detections = net.forward()

如果我们的跟踪器对象为None，我们首先需要检测输入帧中的对象。为此，我们创建了一个 blob并将其传递给网络。现在让我们处理detections：

     # ensure at least one detection is madeif len(detections) > 0:# find the index of the detection with the largest# probability -- out of convenience we are only going# to track the first object we find with the largest# probability; future examples will demonstrate how to# detect and extract *specific* objectsi = np.argmax(detections[0, 0, :, 2])# grab the probability associated with the object along# with its class labelconf = detections[0, 0, i, 2]label = CLASSES[int(detections[0, 0, i, 1])]

如果我们的对象检测器找到任何对象，我们将获取概率最大的对象。在这篇文章中，我们只是演示了如何使用 dlib 执行单个对象跟踪，因此我们需要找到具有最高概率的检测到的对象。下周的博客文章将介绍使用 dlib 进行多对象跟踪。

从那里，我们将获取与对象关联的置信度 (conf) 和标签。

现在是时候过滤掉检测结果了。在这里，我们试图确保我们拥有通过命令行参数传递的正确类型的对象：

         # filter out weak detections by requiring a minimum# confidenceif conf > args["confidence"] and label == args["label"]:# compute the (x, y)-coordinates of the bounding box# for the objectbox = detections[0, 0, i, 3:7] * np.array([w, h, w, h])(startX, startY, endX, endY) = box.astype("int")# construct a dlib rectangle object from the bounding# box coordinates and then start the dlib correlation# trackertracker = dlib.correlation_tracker()rect = dlib.rectangle(startX, startY, endX, endY)tracker.start_track(rgb, rect)# draw the bounding box and text for the objectcv2.rectangle(frame, (startX, startY), (endX, endY),(0, 255, 0), 2)cv2.putText(frame, label, (startX, startY - 15),cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)

我们检查以确保 conf 超过置信度阈值，并且该对象实际上是我们正在寻找的类类型。当我们稍后运行脚本时，我们将使用“person”或“cat”作为示例，以便您了解我们如何过滤结果。

让我们处理已经建立跟踪器的情况：

 # otherwise, we've already performed detection so let's track# the objectelse:# update the tracker and grab the position of the tracked# objecttracker.update(rgb)pos = tracker.get_position()# unpack the position objectstartX = int(pos.left())startY = int(pos.top())endX = int(pos.right())endY = int(pos.bottom())# draw the bounding box from the correlation object trackercv2.rectangle(frame, (startX, startY), (endX, endY),(0, 255, 0), 2)cv2.putText(frame, label, (startX, startY - 15),cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)

这个 else 块处理我们已经锁定到一个对象进行跟踪的情况。把它想象成电影《壮志凌云》中的一场混战。一旦敌机被“制导系统”锁定，就可以通过更新对其进行跟踪。

这需要我们采取两项主要行动：

更新我们的跟踪器对象——繁重的工作在此update方法的后端执行。
从tracker中获取我们对象的位置（get_position）。这将是 PID 控制回路派上用场的地方，例如，如果机器人试图跟随被跟踪的物体。在我们的例子中，我们只是要用边界框和标签来标注帧中的对象。

让我们完成循环：

 # 检查我们是否应该将帧写入磁盘if writer is not None:writer.write(frame)# 显示输出帧cv2.imshow("Frame", frame)key = cv2.waitKey(1) & 0xFF# 如果按下了“q”键，则退出循环if key == ord("q"):break# 更新FPS计数器fps.update()

最后，让我们在脚本退出之前打印出 FPS 帧率并释放指针：

# 我们的 fps 计数器停止并且 FPS 信息显示在终端中
fps.stop()
print("[INFO] elapsed time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))
# 然后，如果我们正在写入输出视频，我们释放视频编写器
if writer is not None:writer.release()
# 最后，我们关闭所有 OpenCV 窗口并释放视频流
cv2.destroyAllWindows()
vs.release()

5.完整代码

# 导入必要的包
from imutils.video import FPS
import numpy as np
import argparse
import imutils
import dlib
import cv2# 构造参数解析并解析参数
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,help="path to Caffe pre-trained model")
ap.add_argument("-v", "--video", required=True,help="path to input video file")
ap.add_argument("-l", "--label", required=True,help="class label we are interested in detecting + tracking")
ap.add_argument("-o", "--output", type=str,help="path to optional output video file")
ap.add_argument("-c", "--confidence", type=float, default=0.2,help="minimum probability to filter weak detections")
args = vars(ap.parse_args())# 初始化MobileNet SSD训练好的类标签列表
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat","bottle", "bus", "car", "cat", "chair", "cow", "diningtable","dog", "horse", "motorbike", "person", "pottedplant", "sheep","sofa", "train", "tvmonitor"]
# 从磁盘加载我们的序列化模型
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])# 初始化视频流、dlib 相关跟踪器、输出视频写入器和预测的类标签
print("[INFO] starting video stream...")
vs = cv2.VideoCapture(args["video"])
tracker = None
writer = None
label = ""
# 启动每秒帧数估计器
fps = FPS().start()# 循环播放视频文件流中的帧
while True:# 从视频文件中获取下一帧(grabbed, frame) = vs.read()# 检查我们是否已经到达视频文件的末尾if frame is None:break# 调整帧大小以加快处理速度，然后将帧从 BGR 转换为 RGB 排序（dlib 需要 RGB 排序）frame = imutils.resize(frame, width=600)rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# 如果我们应该将视频写入磁盘，请初始化写入器if args["output"] is not None and writer is None:fourcc = cv2.VideoWriter_fourcc(*"MJPG")writer = cv2.VideoWriter(args["output"], fourcc, 30,(frame.shape[1], frame.shape[0]), True)# 如果我们的相关对象跟踪器是None，我们首先需要应用一个对象检测器来为跟踪器提供实际跟踪的东西if tracker is None:# 获得帧尺寸并将帧转换为 blob(h, w) = frame.shape[:2]blob = cv2.dnn.blobFromImage(frame, 0.007843, (w, h), 127.5)# blob传入网络并获得检测结果net.setInput(blob)detections = net.forward()# 确保至少有一个检测结果if len(detections) > 0:# 找到概率最大的检测索引——为方便起见，我们只跟踪我们以最大概率找到的第一个对象；# 未来的示例将演示如何检测和提取*特定*对象i = np.argmax(detections[0, 0, :, 2])# 获取与对象关联的概率及其类标签conf = detections[0, 0, i, 2]label = CLASSES[int(detections[0, 0, i, 1])]# 否则，我们已经执行了检测，所以让我们跟踪对象else:# 更新跟踪器并抓取被跟踪对象的位置tracker.update(rgb)pos = tracker.get_position()# 解包位置对象startX = int(pos.left())startY = int(pos.top())endX = int(pos.right())endY = int(pos.bottom())# 从相关对象跟踪器中绘制边界框cv2.rectangle(frame, (startX, startY), (endX, endY),(0, 255, 0), 2)cv2.putText(frame, label, (startX, startY - 15),cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)# 检查我们是否应该将帧写入磁盘if writer is not None:writer.write(frame)# 显示输出帧cv2.imshow("Frame", frame)key = cv2.waitKey(1) & 0xFF# 如果按下了“q”键，则退出循环if key == ord("q"):break# 更新FPS计数器fps.update()# 我们的 fps 计数器停止并且 FPS 信息显示在终端中
fps.stop()
print("[INFO] elapsed time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))
# 然后，如果我们正在写入输出视频，我们释放视频编写器
if writer is not None:writer.release()
# 最后，我们关闭所有 OpenCV 窗口并释放视频流
cv2.destroyAllWindows()
vs.release()

6.实时运行 dlib 的对象跟踪器

打开终端并执行以下命令：

$ python track_object.py --prototxt mobilenet_ssd/MobileNetSSD_deploy.prototxt \--model mobilenet_ssd/MobileNetSSD_deploy.caffemodel --video input/race.mp4 \--label person --output output/race_output.avi
[INFO] loading model...
[INFO] starting video stream...
[INFO] elapsed time: 13.18
[INFO] approx. FPS: 25.80

下面我们有第二个使用 dlib 进行对象跟踪的示例：

$ python track_object.py --prototxt mobilenet_ssd/MobileNetSSD_deploy.prototxt \--model mobilenet_ssd/MobileNetSSD_deploy.caffemodel --video input/cat.mp4 \--label cat --output output/cat_output.avi
[INFO] loading model...
[INFO] starting video stream...
[INFO] elapsed time: 6.76
[INFO] approx. FPS: 24.12

7.缺点和潜在的改进

您会注意到对象跟踪器在演示结束时丢失对象。请记住，没有“完美”的对象跟踪器之类的东西——此外，这种对象跟踪算法不需要您在输入图像的每一帧上运行更昂贵的对象检测器。

相反，dlib 的相关跟踪器将 (1) 关于前一帧中对象边界框位置的先验信息与 (2) 从当前帧获取的数据结合起来，以推断对象的新位置在哪里。

算法肯定会有丢失对象的时候。为了解决这种情况，我建议偶尔运行更昂贵的对象检测器，以 (1) 验证对象是否仍然存在，以及 (2) 使用更新的（并且理想情况下正确的）边界框坐标重新设置对象跟踪。

BONUS:C++实现目标跟踪

/*
这个例子展示了如何使用 dlib C 库中的correlation_tracker。 要使用它，您需要为相关跟踪器提供您要在当前视频帧中跟踪的对象的边界框。 然后它将在后续帧中识别对象的位置。
在这个特定的示例中，我们将在 dlib 附带的视频序列上运行，该视频序列可以在 examples/video_frames 文件夹中找到。 这段视频显示了一个果汁盒坐在桌子上，有人在周围挥舞着相机。 任务是在相机四处移动时跟踪果汁盒的位置。
*/#include <dlib/image_processing.h>
#include <dlib/gui_widgets.h>
#include <dlib/image_io.h>
#include <dlib/dir_nav.h>using namespace dlib;
using namespace std;int main(int argc, char** argv) try
{if (argc != 2){cout << "Call this program like this: " << endl;cout << "./video_tracking_ex ../video_frames" << endl;return 1;}// 获取视频帧列表。std::vector<file> files = get_files_in_directory_tree(argv[1], match_ending(".jpg"));std::sort(files.begin(), files.end());if (files.size() == 0){cout << "No images found in " << argv[1] << endl;return 1;}// 加载第一帧。array2d<unsigned char> img;load_image(img, files[0]);// 现在创建一个跟踪器并在果汁盒上开始跟踪。如果您查看第一帧，您会看到// 果汁盒以像素点 (92,110) 为中心，宽 38 像素，高 86 像素。correlation_tracker tracker;tracker.start_track(img, centered_rect(point(93,110), 38, 86));// 现在运行跟踪器。我们所要做的就是调用 tracker.update() ，它会跟踪果汁盒！image_window win;for (unsigned long i = 1; i < files.size(); ++i){load_image(img, files[i]);tracker.update(img);win.set_image(img); win.clear_overlay(); win.add_overlay(tracker.get_position());cout << "hit enter to process next frame" << endl;cin.get();}
}
catch (std::exception& e)
{cout << e.what() << endl;
}

参考目录

https://pyimagesearch.com/2018/10/22/object-tracking-with-dlib/
http://dlib.net/video_tracking_ex.cpp.html

目标跟踪（4）使用dlib进行对象跟踪相关推荐

OpenCV连续自适应跟踪算法CAMShift实现视频对象跟踪
1.概述案例:使用OpenCV的CAMShift算法实现视频中对象跟踪算法API介绍: 本文福利,莬费领取Qt开发学习资料包.技术视频,内容包括(C++语言基础,Qt编程入门,QT信号与槽机制,Q ...
【目标跟踪 SOT】SiamFC -用于对象跟踪的全卷积孪生网络
SiamFC - 全卷积孪生网络 $背景知识 SOT(单目标跟踪)和MOT(多目标跟踪)的思想是,在视频中的某一帧中框出你需要跟踪目标的bounding box,在后续的视频帧中,无需你再检测出物体的 ...
OpenCV 实时对象跟踪（质心跟踪）
本文章先介绍对象跟踪过程,考虑对象跟踪的特点决定使用:质心跟踪算法,然后会一步一步说明质心跟踪算法的实现:最后是如何用python代码实现. 实验效果如下: 对象跟踪过程进行一组初始的对象检测(如: ...
目标跟踪（7）使用 OpenCV 进行简单的对象跟踪
1.简述目标跟踪的过程是: 1.获取对象检测的初始集(例如边界框坐标的输入集) 2.为每个初始检测创建唯一的ID 3.然后跟踪每一个在视频中移动的对象,保持唯一ID的分配此外,对象跟踪允许我们为每 ...
基于OpenCV实战：对象跟踪
点击上方"小白学视觉",选择加"星标"或"置顶" 重磅干货,第一时间送达介绍跟踪对象的基本思想是找到对象的轮廓,基于HSV颜色值. 轮廓 ...
opencv实现对象跟踪_如何使用opencv跟踪对象的距离和角度
opencv实现对象跟踪介绍 (Introduction) Tracking the distance and angle of an object has many practical uses, ...
足球视频AI(五)——球员与球的对象跟踪
一.基础概念在之前的四节中,我们尝试解决: 1,球员识别.足球识别.裁判识别: 2,队伍的分类 3,平面坐标的换算存在关键的问题是:每一帧的画面,每次都是重新识别,无法将特定的人与坐标对应上. 我 ...
AAAI 2020:北大开源算法姿态辅助多摄像机协作以进行主动对象跟踪
我们知道主动对象跟踪(AOT)对于许多基于视觉的应用程序至关重要,例如移动机器人,智能监控.但是,在复杂场景中部署主动跟踪时存在许多挑战,例如,目标经常被障碍物遮挡,这篇paper描述将单摄像机AOT ...
使用 OpenCV 进行对象跟踪的几种算法解读
使用 OpenCV 进行对象跟踪--算法在本节中,我们将深入研究不同的跟踪算法.目标不是对每个跟踪器有深入的理论理解,而是从实践的角度理解它们. 让我首先解释跟踪背后的一些一般原则.在跟踪中,我们的 ...
OpenCV视频分析与对象跟踪C++（二）光流对象跟踪-稀疏光流、稠密光流
移动对象跟踪三要素:图像表示(跟踪的对象要在图像中出现)外光模型,移动模型. 稀疏光流跟踪,KTL void calcOpticalFlowPyrLK( // 稀疏光流跟踪,KLTInputArray ...

目标跟踪（4）使用dlib进行对象跟踪