tensorflow 多人

介绍 (Introduction)

As described by Zhe Cao in his 2017 Paper, Realtime multi-person 2D pose estimation is crucial in enabling machines to understand people in images and videos.

正如曹哲在其2017年的论文中所述，实时多人2D姿态估计对于使机器能够理解图像和视频中的人物至关重要。

但是，什么是姿势估计？ (However, what is the Pose Estimation?)

As the name suggests, it is a technique used to estimate how a person is physically positioned, such as standing, sitting, or lying down. One way to obtain this estimate is to find the 18 “joints of the body” or as named in the Artificial Intelligence field: “Key Points.” The images below show our goal, which is to find these points in an image:

顾名思义，它是一种用于估计人的身体位置(例如站立，坐着或躺下)的技术。获得此估算值的一种方法是找到18个“人体关节”或在“人工智能”字段中命名为“关键点”。下图显示了我们的目标，即在图像中找到这些点：

PhysicsWorld — Einstein in Oxford (1933)PhysicsWorld —爱因斯坦在牛津(1933)

The key points go from point #0 (Top neck) going down on body joints and returning to head, ending with point #17 (right ear).

关键点从＃0点 (上颈)向下延伸到身体关节并回到头部，最后以＃17点 (右耳)结束。

The first significant work that appeared using the Artificial Intelligence-based approach was DeepPose, a 2014 paper by Toshev and Zegedy from Google. The paper proposed a human pose estimation method based on Deep Neural Networks (DNNs), where the pose estimation was formulated as a DNN-based regression problem towards body joints.

使用基于人工智能的方法出现的第一项重要工作是DeepPose ，这是Google的Toshev和Zegedy在2014年发表的论文。本文提出了一种基于深度神经网络(DNN)的人体姿势估计方法，该方法将姿势估计公式化为基于DNN的身体关节回归问题。

The model consisted of an AlexNet backend (7 layers) with an extra final layer that outputs 2k joint coordinates. The significant problem with this approach is that first, a single person must be detected (classic object detection) following by the model application. So, each human body found on an image must be treated separately, which increases considerably the time to process the image. This type of approach is known as “top-down” because first find the bodies and from it, the joints associated with them.

该模型由一个AlexNet后端(7层)组成，该后端具有一个额外的最终层，可输出2k关节坐标。这种方法的主要问题是，首先，模型应用程序必须检测到一个人(经典对象检测)。因此，图像上发现的每个人体都必须分开处理，这大大增加了处理图像的时间。这种方法被称为“自上而下”，因为首先找到物体，并从中找到与之关联的关节。

姿势估计面临的挑战 (Challenges with Pose Estimation)

There are several problems related to Pose Estimation, as:

与姿势估计有关的几个问题如下：

Each image may contain an unknown number of people that can appear at any position or scale.
每个图像可能包含未知数量的人，这些人可以在任何位置或任何比例出现。
Interactions between people induce complex spatial interference, due to contact, occlusion, or limb articulations, making association of parts difficult.
人与人之间的互动会由于接触，咬合或肢体关节而引起复杂的空间干扰，从而使零件的关联变得困难。
Runtime complexity tends to grow with the number of people in the image, making realtime performance a challenge.
运行时复杂度往往随着映像中人数的增加而增加，这给实时性能带来了挑战。

To solve those problems, a more exciting approach (that is the one used on this project) is OpenPose, which was introduced in 2016 by ZheCao and his colleagues from the Robotics Institute at Carnegie Mellon University.

为了解决这些问题， OpenPose是一种更令人兴奋的方法(该项目中使用的一种方法)，该方法由ZheCao和他的卡内基梅隆大学机器人学院的同事于2016年引入。

OpenPose (OpenPose)

The proposed method of OpenPose uses a nonparametric representation, referred to as Part Affinity Fields (PAFs), to “connect” each finds body joints on an image, associating them with individual people. In other words, OpenPose does the opposite of DeepPose, first finding all the joints on an image and after going “up,” looking for the most probable body that will contain that joint without using any person detector (“bottom-up” approach). OpenPose finds the key points on an image regardless of the number of people on it. The below image, retrieved from OpenPose presentation at ILSVRC and COCO workshop 2016, give us an idea about the process.

提出的OpenPose方法使用非参数表示法，称为部分亲和力字段(PAF)，以“连接”图像上的每个发现的身体关节，并将它们与各个人相关联。换句话说，OpenPose的工作方式与DeepPose相反，首先查找图像上的所有关节，然后“向上”，在不使用任何人体检测器的情况下查找包含该关节的最有可能的物体(“自下而上”的方法) 。无论图像上的人数多少，OpenPose都能找到图像上的关键点。下图取自ILSVRC和COCO研讨会2016的OpenPose演示文稿，使我们对过程有了一个了解。

OpenPose presentation at ILSVRC and COCO workshop 2016OpenPose在ILSVRC和COCO研讨会2016上的演讲

The image below shows the architecture of the two-branch multi-stage CNN model used for training. First, a feed-forward network simultaneously predicts a set of 2D confidence maps (S) of body part locations (keypoints annotations from (dataset/COCO/annotations/) and a set of 2D vector fields of part affinities (L), which encode the degree of association between parts. After each stage, the two branches’ predictions, along with the image features, are concatenated for the next stage. Finally, the confidence maps and the affinity fields are parsed by greedy inference to output the 2D keypoints for all people in the image.

下图显示了用于训练的两分支多阶段CNN模型的体系结构。首先，前馈网络同时预测身体部位位置的一组2D置信图(S)(来自(数据集/ COCO / annotations /)的关键点注释和一组部位亲和力(L)的2D矢量场，这些场在每个阶段之后，将两个分支的预测以及图像特征连接到下一个阶段，最后，通过贪婪推断对置信度图和亲和度字段进行解析，以输出2D关键点。图片中的所有人。

During the execution of the project, we will return to some of those concepts for clarification. However, it is highly recommended to follow the OpenPose ILSVRC and COCO workshop 2016 presentation and the video recording at CVPR 2017 for a better understanding.

在项目执行期间，我们将返回一些概念进行说明。但是，强烈建议您遵循OpenPose ILSVRC和COCO研讨会2016的介绍以及CVPR 2017的视频记录，以更好地理解。

TensorFlow 2 OpenPose安装(tf-pose-estimation) (TensorFlow 2 OpenPose installation (tf-pose-estimation))

The original OpenPose was developed using the model-based VGG pre-trained network and using a Caffe framework. However, for this installation, we will follow Ildoo Kim TensorFlow approach as detailed on his tf-pose-estimation GitHub.

原始的OpenPose使用基于模型的VGG预训练网络和Caffe框架开发。但是，对于此安装，我们将遵循Ildoo Kim TensorFlow的方法，该方法在其tf-pose-estimation GitHub上进行了详细介绍。

什么是tf姿势估计？ (What is tf-pose-estimation?)

tf-pose-estimation is the ‘Openpose’, human pose estimation algorithm that has been implemented using Tensorflow. It also provides several variants that have some changes to the network structure for realtime processing on the CPU or low-power embedded devices.

tf-pose-estimation是使用Tensorflow实现的“ Openpose”人体姿势估计算法。它还提供了几种变体，这些变体对网络结构进行了一些更改，以便在CPU或低功耗嵌入式设备上进行实时处理。

The tf-pose-estimation GitHub, shows several experiments with different models as:

tf-pose-estimation GitHub显示了几个使用不同模型的实验，如下所示：

cmu: the model-based VGG pretrained network described in the original paper with weights in Caffe format converted to be used in TensorFlow.

cmu：原始论文中描述的基于模型的VGG预训练网络，其Caffe格式的权重已转换为可在TensorFlow中使用。
dsconv: same architecture as the cmu version except for the depthwise separable convolution of mobilenet.

dsconv ：与cmu版本相同的体系结构，但移动网的深度可分离卷积除外。
mobilenet: based on the mobilenet V1 paper, 12 convolutional layers are used as feature-extraction layers.

mobilenet ：基于mobilenet V1的论文，使用12个卷积层作为特征提取层。
mobilenet v2: similar to mobilenet, but using an improved version of it.

mobilenet v2 ：类似于mobilenet，但使用了改进版本。

The studies on this article were done with mobilenet V1 (“mobilenet_thin”), that has an intermediary performance regarding computation budget and latency:

本文的研究是使用mobilenet V1(“ mobilenet_thin”)完成的，该软件在计算预算和延迟方面具有中间性能：

https://github.com/ildoonet/tf-pose-estimation/blob/master/etcs/experiments.mdhttps://github.com/ildoonet/tf-pose-estimation/blob/master/etcs/experiments.md

第1部分-安装tf-pose-estimation (Part 1 — Installing tf-pose-estimation)

We follow here, the excellent Gunjan Seth article Pose Estimation with TensorFlow 2.0.

我们在这里关注Gunjan Seth的优秀文章TensorFlow 2.0的Pose Estimation 。

Go to terminal and create a working directory (for example, “Pose_Estimation”), moving to it :
转到终端并创建一个工作目录(例如，“ Pose_Estimation”)，然后移至该目录：

mkdir Pose_Estimationcd Pose_Estimation

Create a Virtual Environment (for example Tf2_Py37)
创建一个虚拟环境(例如Tf2_Py37)

conda create --name Tf2_Py37 python=3.7.6 -y conda activate Tf2_Py37

Install TF2
安装TF2

pip install --upgrade pippip install tensorflow

Install basic packages to be used during development:
安装要在开发期间使用的基本软件包：

conda install -c anaconda numpyconda install -c conda-forge matplotlibconda install -c conda-forge opencv

Clone tf-pose-estimation repository:
克隆tf-pose-estimation存储库：

git clone https://github.com/gsethi2409/tf-pose-estimation.git

Go to tf-pose-estimation folder and install the requirements
转到tf-pose-estimation文件夹并安装要求

cd tf-pose-estimation/pip install -r requirements.txt

In the next step, install SWIG, an interface compiler that connects programs written in C and C++ with scripting languages such as Python. It works by taking the declarations found in C/C++ header files and using them to generate the wrapper code that scripting languages need to access the underlying C/C++ code.

在下一步中，安装SWIG ，这是一个接口编译器，它将用C和C ++编写的程序与脚本语言(例如Python)连接起来。它通过获取C / C ++头文件中的声明并使用它们来生成脚本语言访问底层C / C ++代码所需的包装器代码来工作。

conda install swig

Using Swig, build C++ library for post-processing.
使用Swig构建用于后期处理的C ++库。

cd tf_pose/pafprocessswig -python -c++ pafprocess.i && python3 setup.py build_ext --inplace

Now, install tf-slim library, a lightweight library used for defining, training, and evaluating complex models in TensorFlow.

现在，安装tf-slim 库，一个轻量级的库，用于定义，训练和评估TensorFlow中的复杂模型。

pip install git+https://github.com/adrianc-a/tf-slim.git@remove_contrib

That is it! Now, it is essential to run a quick test. For that return to the main tf-pose-estimation directory.

这就对了！现在，必须进行快速测试。为此，返回主tf-pose-estimation目录。

If you follow the sequence, you must be inside tf_pose/pafprocess. Otherwise use the appropriated command to change directory.

如果遵循该顺序，则必须位于tf_pose / pafprocess之内。否则，使用适当的命令来更改目录。

cd ../..

Inside tf-pose-estimation directory there is a python script run.py, let's run it, having as parameters:

在tf-pose-estimation目录中，有一个python脚本run.py ，让我们运行它作为参数：

model=mobilenet_thin
型号= mobilenet_thin
resize=432x368 (size of the image at pre-processing)
resize = 432x368(预处理时图像的大小)
image=./images/ski.jpg (sample image inside images directory)
image =。/ images / ski.jpg(图像目录中的示例图像)

python run.py --model=mobilenet_thin --resize=432x368 --image=./images/ski.jpg

Note that during a few seconds, nothing will happen, but after a minute or so, the terminal should present something similar to the below image:

请注意，在几秒钟之内，什么都不会发生，但是在一分钟左右之后，终端应该显示类似于下图的内容：

However, more important, an image will appear on an independent OpenCV window:

但是，更重要的是，图像将出现在独立的OpenCV窗口中：

Great! The images are proof that everything was properly installed and working fine! We will enter in more detail in the next section. However, for a quick explanation about what the four images mean, the top-left (“Result”) is the pose detection skeleton drawn having the original image (in this case, ski.jpg) as background. The top-right image is a “heat map”, where the “parts detected” (Ss) are shown, and both bottom images show the part association (Ls). The “Result” is the connected S’s and L’s to individual persons.

大！这些图像证明一切都已正确安装并且可以正常工作！我们将在下一部分中更详细地输入。但是，为了快速解释这四个图像的含义，左上角(“结果”)是绘制的姿势检测骨架，该骨架以原始图像(在这种情况下为ski.jpg)作为背景。右上方的图像是“热图”，其中显示了“检测到的零件”(Ss)，两个底部图像都显示了零件关联(Ls)。 “结果”是将S和L连接到个人。

The next test is a live video:

下一个测试是现场视频：

If the computer has only one camera installed, use: camera=0

如果计算机仅安装了一个摄像头，请使用：camera = 0

python run_webcam.py --model=mobilenet_thin --resize=432x368 --camera=1

If everything goes well, a window will appear with a real live video, like this screenshot:

如果一切顺利，将出现一个带有实时视频的窗口，如以下屏幕截图所示：

Image source: PrintScreen Author’s WebCam

第2部分-深入了解图像中的姿势估计 (Part 2 — Going Deeper with Pose Estimation in Images)

In this section, we will go more in-depth with our TensorFlow Pose Estimation implementation. It is advised to follow the article, trying to reproduce Jupyter Notebook: 10_Pose_Estimation_Images, which can be downloaded from Project GitHub.

在本节中，我们将更深入地研究TensorFlow姿势估计实现。建议遵循该文章，尝试重现Jupyter Notebook： 10_Pose_Estimation_Images ，可以从GitHub项目下载。

As a reference, this project was 100% developed on a MacPro (2.9Hhz Quad-Core i7 16GB 2133Mhz RAM).

作为参考，该项目是在MacPro(2.9Hhz四核i7 16GB 2133Mhz RAM)上进行的100％开发。

导入库 (Import Libraries)

import sysimport timeimport loggingimport numpy as npimport matplotlib.pyplot as pltimport cv2from tf_pose import commonfrom tf_pose.estimator import TfPoseEstimatorfrom tf_pose.networks import get_graph_path, model_wh

模型定义和TfPose估算器创建 (Model definition and TfPose Estimator creation)

It is possible to use the models located on model/graph sub-directory, as mobilenet_v2_large or cmu (VGG pretrained model).

可以将模型/图形子目录中的模型用作mobilenet_v2_large或cmu(VGG预训练模型)。

For cmu, the *.pb files were not downloaded during installation, because they are significant in size. To use it, run the bash script download.sh that is located on /cmu sub-directory.

对于cmu，*。pb文件在安装期间未下载，因为它们的大小很大。要使用它，请运行/ cmu子目录中的bash脚本download.sh 。

This project uses mobilenet_thin (MobilenetV1), considering that all images used should be reshaped to 432x368.

考虑到所有使用的图像都应重塑为432x368，因此该项目使用mobilenet_thin(MobilenetV1)。

Parameters:

参数：

model='mobilenet_thin'resize='432x368'w, h = model_wh(resize)

Create estimator:

创建估算器：

e = TfPoseEstimator(get_graph_path(model), target_size=(w, h))

Let us load a simple human image for ease analysis. OpenCV is used to read images. The images are stored as RGB, but internally, OpenCV works with BGR. Using OpenCV to show an image has no problem because it will be converted from BGR to RGB before image presentation on a specific window (as saw with ski.jpg on the previous section).

让我们加载一个简单的人像以便进行分析。 OpenCV用于读取图像。图像存储为RGB，但在内部，OpenCV与BGR一起使用。使用OpenCV来显示图像没有问题，因为在特定窗口上进行图像显示之前，它会从BGR转换为RGB(如上一节中的ski.jpg所示)。

Once the image should be plotted on a Jupyter cell, Matplotlib will be used instead OpenCV. Because of that, the image should be converted before display, as shown below:

将图像绘制在Jupyter单元上后，将使用Matplotlib代替OpenCV。因此，图像应在显示之前进行转换，如下所示：

image_path = ‘./images/human.png’image = cv2.imread(image_path)image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)plt.imshow(image)plt.grid();

Observe that this image has a shape of 567x567. OpenCV when reading an image, automatically convert it to an array, where each value goes from 0 to 255, where 0=''white" and 255=''Black"'.

观察到该图像的形状为567x567。读取图像时，OpenCV会自动将其转换为数组，其中每个值从0到255，其中0 =“ white”和255 =“ Black”'。

Once the image is an array, it is simple to verify its size, using shape:

一旦图像是一个数组，就可以很容易地使用形状验证其大小：

image.shape

The result will be (567, 567, 3), where the shape is (width, height, color channels).

结果将是(567，567，3)，其中形状是(宽度，高度，颜色通道)。

Spite that the image can be read using OpenCV; we will use the function read_imgfile(image_path) from the library tf_pose.common to prevent any trouble with color channels.

尽管可以使用OpenCV读取图像；我们将使用tf_pose.common库中的read_imgfile(image_path)函数来防止颜色通道出现任何问题。

image = common.read_imgfile(image_path, None, None)

Once we have the image as an array, we can apply the method inference to the estimator (e), having the image array as input (the image will be resized using the parameters w and h defined at principle).

一旦我们将图像作为数组，就可以将方法推论应用于估计器(e)，将图像数组作为输入(将使用原则上定义的参数w和h调整图像的大小)。

humans = e.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=4.0)

After running the above command, let us inspect the array e.heatmap. This array has a shape of (184, 216, 19), where 184 is h/2, 216 is w/2, and 19 is related to the probability of that specific pixel to belong to one of the 18 joints (0 to 17) + one (18: none). For example, inspecting the top-left pixel, a “none” should be expected:

运行上面的命令后，让我们检查数组e.heatmap。此数组的形状为(184，216，19)，其中184为h / 2，216为w / 2，并且19与该特定像素属于18个关节之一(0至17)的概率有关)+一(18：无)。例如，检查左上角的像素，应该是“无”：

It is possible to verify the last value of this array

可以验证此数组的最后一个值

which is the highest value of all; what can be understood that with 99.6% of chance, this pixel does not belong to any one of the 18 joints.

这是所有产品中的最高价值；可以理解，这个像素有99.6％的机会不属于18个关节中的任何一个。

Let us try to find the base of the neck (midpoint between shoulders). It is located on the original picture around mid-width (0.5 * w = 108) and around 20% of height, starting top/down (0.2 * h = 37). So, let us inspect this specific pixel:

让我们尝试找到脖子的根部(肩膀之间的中点)。它位于原始图片的中间宽度(0.5 * w = 108)和高度的20％左右，从上/下开始(0.2 * h = 37)。因此，让我们检查此特定像素：

It is easy to realize that position 1 has a maximum value of 0.7059… (or by calculating e.heatMat[37][108].max()), which means that that specific pixel has a 70% probability of being a “base neck.” The figure below shows all 18 COCO Keypoints (or “body joints”), showing that "1" corresponds to the "base neck".

很容易意识到位置1的最大值为0.7059…(或通过计算e.heatMat [37] [108] .max() )，这意味着特定像素有70％的概率成为“基础”颈部。” 下图显示了所有18个COCO关键点(或“身体关节”)，显示“ 1”对应于“基础颈部”。

COCO keypoint format for human pose skeletons.

It is possible to plot for every pixel, a color representing its maximum value. Doing that, a heat map, showing the key points will magically appear:

可以为每个像素绘制代表其最大值的颜色。这样做，魔术图将显示关键点：

max_prob = np.amax(e.heatMat[:, :, :-1], axis=2)plt.imshow(max_prob)plt.grid();

Le us now plot the key points over the reshaped original image:

现在，我们将绘制的关键点绘制在重塑的原始图像上：

plt.figure(figsize=(15,8))bgimg = cv2.cvtColor(image.astype(np.uint8), cv2.COLOR_BGR2RGB)bgimg = cv2.resize(bgimg, (e.heatMat.shape[1], e.heatMat.shape[0]), interpolation=cv2.INTER_AREA)plt.imshow(bgimg, alpha=0.5)plt.imshow(max_prob, alpha=0.5)plt.colorbar()plt.grid();

So, it is possible to see the keypoints (S’s) over the image, being the values shown at colorbar means that more yellow means higher probability.

因此，可以在图像上看到关键点(S)，因为在颜色栏显示的值表示黄色越多表示概率越高。

To get the L’s, the most probable connections (or “bones”) between the key points (or “joints”), we can use the resulted array of e.pafMat. This array has a shape of (184, 216, 38), where here the 38 (2 x 19) is related to the probability of that pixel be part of a horizontal (x) or vertical (y)connection with one of the 18 specific joints + nones.

为了获得L，关键点(或“接头”)之间最可能的连接(或“骨骼”)，我们可以使用e.pafMat的结果数组。该数组的形状为(184，216，38)，其中38(2 x 19)与像素成为18个像素之一的水平(x)或垂直(y)连接的一部分的概率有关。特定关节+无。

The functions to plot the above figures are in the Notebook.

绘制以上图形的功能在笔记本电脑中。

使用方法draw_human绘制骨架 (Draw the skeleton using method draw_human)

With the list human, resultant of e.inference() method, it is possible to draw the skeleton using method draw_human:

使用e.inference()方法的结果列表human ，可以使用方法draw_human绘制骨架：

image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)

The result will be below image:

结果将如下图所示：

If desired, it is possible to plot only the skeleton, as shown here (let us rerun all code for a recap):

如果需要，可以仅绘制骨架，如下所示(让我们重新运行所有代码进行回顾)：

image = common.read_imgfile(image_path, None, None)humans = e.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=4.0)black_background = np.zeros(image.shape)skeleton = TfPoseEstimator.draw_humans(black_background, humans, imgcopy=False)plt.figure(figsize=(15,8))plt.imshow(skeleton);plt.grid(); plt.axis(‘off’);

获取关键点(关节)坐标 (Getting the Key points (Joints) coordinates)

Pose estimation can be used on a series of applications such as robotics, gaming, or medicine. For that, it could be interesting to get the physical keypoints coordinates from the image to be used by other applications.

姿势估计可用于一系列应用程序，例如机器人技术，游戏或医学。为此，从图像中获取物理关键点坐标以供其他应用程序使用可能会很有趣。

Looking at the human list resulted from e.inference(), it can be verified that it is a list with a single element, a string. In this string, every key point appears with its relative coordinate and associated probability. For example, for the human image used so far, we have:

查看由e.inference()生成的人员列表，可以验证它是具有单个元素(字符串)的列表。在此字符串中，每个关键点均以其相对坐标和相关概率出现。例如，对于到目前为止使用的人类图像，我们有：

For example:

例如：

BodyPart:0-(0.49, 0.09) score=0.79BodyPart:1-(0.49, 0.20) score=0.75...BodyPart:17-(0.53, 0.09) score=0.73

We can extract an array (size of 18) from this list with the real coordinates related tothe original image shape:

我们可以从此列表中提取一个数组(大小为18)，并使用与原始图像形状相关的真实坐标：

keypoints = str(str(str(humans[0]).split('BodyPart:')[1:]).split('-')).split(' score=')keypts_array = np.array(keypoints_list)keypts_array = keypts_array*(image.shape[1],image.shape[0])keypts_array = keypts_array.astype(int)

Let us plot this array (being that the array’s index is the key point), over the original image. Here the result:

让我们在原始图像上绘制该数组(因为数组的索引是关键点)。结果如下：

plt.figure(figsize=(10,10))plt.axis([0, image.shape[1], 0, image.shape[0]])  plt.scatter(*zip(*keypts_array), s=200, color='orange', alpha=0.6)img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)plt.imshow(img)ax=plt.gca() ax.set_ylim(ax.get_ylim()[::-1]) ax.xaxis.tick_top() plt.grid();for i, txt in enumerate(keypts_array):    ax.annotate(i, (keypts_array[i][0]-5, keypts_array[i][1]+5)

创建函数以快速重现普通图像的研究结果： (Creating Functions to reproduce the studies on generic images quickly:)

The Notebook shows all the code developed so far, “encapsulated” as functions. For example, let us see another image:

笔记本将到目前为止开发的所有代码显示为“封装”为功能。例如，让我们看到另一个图像：

image_path = '../images/einstein_oxford.jpg'img, hum = get_human_pose(image_path)keypoints = show_keypoints(img, hum, color='orange')

img, hum = get_human_pose(image_path, showBG=False)keypoints = show_keypoints(img, hum, color='white', showBG=False)

与多人学习图像 (Studying images with multiple persons)

So far, only was explored images that contain a single person. Once the algorithm was developed to capture all joints (S’s) and PAFs (L’s) at the same time from the image, finding the most probable connections was only for simplicity. So, the code to get the result is the same; only when we get the result (“human”), for example, the list will have a size compatible with the number of people in the image.

到目前为止，仅浏览了包含一个人的图像。一旦开发了算法以从图像中同时捕获所有关节(S)和PAF(L)，发现最可能的连接只是为了简化。因此，获得结果的代码是相同的；例如，仅当我们获得结果(“人类”)时，列表的大小才与图像中的人数兼容。

For example, let us use a “busy image” with five people on it:

例如，让我们使用一个有五个人的“忙碌图片”：

image_path = './images/ski.jpg'img, hum = get_human_pose(image_path)plot_img(img, axis=False)

OpenPose — IEEE-2019OpenPose — IEEE-2019

The algorithm found all Ss and Ls associating them with the five people. The result is excellent!

该算法发现所有S和L将它们与这五个人相关联。结果非常好！

From reading the image path to plotting the result, all the process took less than 0.5s, independent of the number of people found in the image.

从读取图像路径到绘制结果，所有过程花费的时间少于0.5s，与图像中发现的人数无关。

Let us complicate it and see an image where people are more “mixed” as a couple dancing:

让我们复杂化它，并看到一个图像，其中人们随着情侣跳舞而更加“融合”：

image_path = '../images/figure-836178_1920.jpgimg, hum = get_human_pose(image_path)plot_img(img, axis=False)

The result also seems very good. Let us plot only the keypoints, having a different color for each person:

结果似乎也很好。让我们仅绘制关键点，每个人的颜色各不相同：

plt.figure(figsize=(10,10))plt.axis([0, img.shape[1], 0, img.shape[0]])  plt.scatter(*zip(*keypoints_1), s=200, color='green', alpha=0.6)plt.scatter(*zip(*keypoints_2), s=200, color='yellow', alpha=0.6)ax=plt.gca() ax.set_ylim(ax.get_ylim()[::-1]) ax.xaxis.tick_top() plt.title('Keypoints of all humans detected\n')plt.grid();

第3部分：视频和实时摄像机中的姿势估计 (Part 3: Pose Estimation in Videos and live camera)

The process of getting the pose estimation in videos is the same as we did with images because a video can be treated as a succession of images (frames). It is advised to follow the section, trying to reproduce Jupyter Notebook: 20_Pose_Estimation_Video which can be downloaded from Project GitHub.

在视频中获取姿势估计的过程与处理图像的过程相同，因为可以将视频视为一系列图像(帧)。建议遵循本节，尝试重现Jupyter Notebook： 20_Pose_Estimation_Video ，可以从GitHub项目下载。

OpenCV does a fantastic job of handling videos.

OpenCV在处理视频方面做得非常出色。

So, let us get a .mp4 video and inform OpenCV that we will capture its frames:

因此，让我们获取一个.mp4视频，并通知OpenCV我们将捕获其帧：

video_path = '../videos/dance.mp4cap = cv2.VideoCapture(video_path)

Now let us create a loop that will capture each frame. Having the frame, we will apply e.inference(), and from the result, we will draw the skeleton, the same way as we did with images. A code at the end was included to stop the video when a key (‘q’, for example) is pressed.

现在让我们创建一个将捕获每个帧的循环。有了框架，我们将应用e.inference()，然后从结果中绘制骨骼，就像处理图像一样。末尾包含一个代码，可在按下键(例如“ q”)时停止播放视频。

Below the necessary code:

下面的必要代码：

fps_time = 0while True:    ret_val, image = cap.read()    humans = e.inference(image,                         resize_to_default=(w > 0 and h > 0),                         upsample_size=4.0)    if not showBG:        image = np.zeros(image.shape)        image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)    cv2.putText(image, "FPS: %f" % (1.0 / (time.time() - fps_time)), (10, 10),                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)    cv2.imshow('tf-pose-estimation result', image)    fps_time = time.time()    if cv2.waitKey(1) & 0xFF == ord('q'):        breakcap.release()cv2.destroyAllWindows()

tf-pose-estimation GitHubtf-pose-estimation GitHub上的视频示例中的ScreenShot

The result is fantastic, but a little slow. The movie that originally had around 30 FPS (Frames per second), will run here in “slow camera”, around 3 FPS.

结果很棒，但是有点慢。最初具有约30 FPS(每秒帧数)的电影将在此处以3 FPS的“慢速相机”运行。

Here another experience where the movie was run twice, recording the pose estimated skeleton with and w/o the background video. The videos were manually synchronized, but if the result is not perfect, it is fascinating. I cut the last scene of the 1928 Chaplin movie “The Circus, “ where the way the Tramp walks is classic.

这是电影放映两次的另一种体验，它在没有背景视频的情况下录制了估计姿态的骨架。视频是手动同步的，但是如果效果不理想，那就很有趣。我剪辑了1928年卓别林电影《马戏团》的最后一幕，流浪汉的走步是经典之作。

使用实时摄像头进行测试 (Testing with a live camera)

It is advised to follow the section, trying to reproduce Jupyter Notebook: 30_Pose_Estimation_Camera which can be downloaded from Project GitHub.

建议遵循本节，尝试重现Jupyter Notebook： 30_Pose_Estimation_Camera ，可以从GitHub项目下载。

The code needed to run a live camera is almost the same as that used with video, except that the OpenCV videoCapture() method will receive as an input parameter an integer that refers to what real camera is used. For example, an internal camera uses “0” and an external “1”. Also the camera should be set to capture frames as ‘432x368’ as used by the model.

运行实时摄像头所需的代码与视频所使用的代码几乎相同，不同之处在于OpenCV videoCapture()方法将接收一个整数，该整数表示使用的是真实摄像头。例如，内部摄像机使用“ 0”，外部摄像机使用“ 1”。此外，应将相机设置为捕捉模型使用的“ 432x368”帧。

Parameters initialization:

参数初始化：

camera = 1resize = '432x368'     # resize images before they are processedresize_out_ratio = 4.0 # resize heatmaps before they are post-processedmodel = 'mobilenet_thin'show_process = Falsetensorrt = False       # for tensorrt processcam = cv2.VideoCapture(camera)cam.set(3, w)cam.set(4, h)

The loop part of the code should be very similar to the one used with video:

该代码的循环部分应与视频使用的部分非常相似：

while True:    ret_val, image = cam.read()    humans = e.inference(image,                         resize_to_default=(w > 0 and h > 0),                         upsample_size=resize_out_ratio)    image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)    cv2.putText(image, "FPS: %f" % (1.0 / (time.time() - fps_time)), (10, 10),                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)    cv2.imshow('tf-pose-estimation result', image)    fps_time = time.time()    if cv2.waitKey(1) & 0xFF == ord('q'):        breakcam.release()cv2.destroyAllWindows()

Again, the standard video capture at 30 FPS, is reduced to around 10% when the algorithm is used. Here a full video where the delay can be better observed. However, the result is excellent!

同样，使用该算法时，以30 FPS的标准视频捕获率可降低到10％左右。这是完整的视频，可以更好地观察延迟。但是，结果是极好的！

结论 (Conclusion)

As always, I hope this article can inspire others to find their way in the fantastic world of AI!

与往常一样，我希望本文能够激发其他人在梦幻般的AI世界中找到自己的路！

All the codes used in this article are available for download on project GitHub: TF2_Pose_Estimation

本文中使用的所有代码都可以在项目GitHub上下载： TF2_Pose_Estimation

Regards from the South of the World!

南方的问候！

See you in my next article!

下一篇再见！

Thank you

谢谢

Marcelo

马塞洛

翻译自: https://towardsdatascience.com/realtime-multiple-person-2d-pose-estimation-using-tensorflow2-x-93e4c156d45f