作者：Igor Barros Barbosa, Marco Cristani, Alessio Del Bue,Loris Bazzani, and Vittorio Murino

期刊：Computer Vision – ECCV 2012. Workshops and Demonstrations(Lecture Notes in Computer Science)

摘要

1 Introduction

2 State of the art

3 Our approach

3.1 提取First stage: signature extraction

3.2 匹配Second stage: signature matching

4 Experiments

4.1 Database creation

4.2 Semi-Cooperative re-id

4.3 Non-Cooperative re-id

5 Conclusions

References

摘要

Abstract. People re-identification is a fundamental operation for any multi-camera surveillance scenario. Until now, it has been performed by exploiting primarily appearance cues, hypothesizing that the individuals cannot change their clothes. In this paper, we relax this constraint by presenting a set of 3D soft-biometric cues, being insensitive to appearance variations, that are gathered using RGB-D technology. The joint use of these characteristics provides encouraging performances on a benchmark of 79 people, that have been captured in different days and with different clothing. This promotes a novel research direction for the re-identification community, supported also by the fact that a new brand of affordable RGB-D cameras have recently invaded the worldwide market. Keywords: Re-identification, RGB-D sensors, Kinect

人员重新识别是任何多摄像机监控场景的基本操作。到目前为止，这项研究主要是通过利用外貌线索来进行的，假设个体不能换衣服。在本文中，我们通过呈现一组对外观变化不敏感的3D软生物特征线索来放松这一约束，这些线索是使用RGB-D技术收集的这些特征的联合使用在79人的基准上提供了令人鼓舞的表现，这些人在不同的日子和不同的服装中被捕获。这为重新识别社区推动了一个新的研究方向，这也得到了一个事实的支持，即一个新品牌的价格合理的RGB-D相机最近已经进入了全球市场。

1 Introduction

The task of person re-identification (re-id) consists in recognizing an individual in different locations over a set of non-overlapping camera views. It represents a fundamental task for heterogeneous video surveillance applications, especially for modeling long-term activities inside large and structured environments, such as airports, museums, shopping malls, etc. In most of the cases, re-id approaches rely on appearance-based only techniques, in which it is assumed that individuals do not change their clothing within the observation period [1–3]. This hypothesis represents a very strong restriction, since it constraints re-id methods to be applied under a limited temporal range (reasonably, in the order of minutes).

人员重新识别 (re-id) 的任务在于在一组不重叠的摄像机视图上识别不同位置的个人。它代表了异构视频监控应用程序的基本任务，尤其是对大型和结构化环境 (例如机场，博物馆，购物中心等) 内的长期活动进行建模。在大多数情况下，re-id方法仅依赖于基于外观的技术，在这种技术中，假定个人在观察期内不会更换衣服 [1-3]。该假设代表了一个非常强的限制，因为它限制了重新id方法在有限的时间范围内 (合理地，以分钟为顺序) 应用。

In this paper we remove this restriction, presenting a new approach of person re-id that uses soft biometrics cues as features. In general, soft biometrics cues have been exploited in different contexts, either to aid facial recognition [4], used as features in security surveillance solutions [5, 6] or also for person recognition under a bag of words policy [7]. In [4] soft biometrics cues are the size of limbs, which were manually measured. The approaches in [5–7] are based on data coming from 2D cameras and extract soft biometrics cues such as gender, ethnicity, clothing, etc.

在本文中，我们消除了这一限制，提出了一种新的方法，使用软生物特征作为特征的人重新身份识别。一般来说，软生物特征线索在不同的环境中被利用，或者用于辅助面部识别[4]，在安全监控解决方案[5,6]中用作功能，或者也用于字包策略下的人员识别[7]。在[4]中，软生物特征是指人工测量的肢体大小。[5–7]中的方法基于来自2D摄像头的数据，提取性别、种族、服装等软生物特征线索。

At the best of our knowledge, 3D soft biometric features for re-identification have been employed only in [4], but in that case the scenario is strongly supervised and needs a complete cooperation of the user to take manual measures. In contrast, a viable soft biometrics system should mostly deal with subjects without requiring strong collaboration from them, in order to extend its applicability to more practical scenarios.

据我们所知，仅在[4]中使用了用于重新识别的3D软生物特征，但在这种情况下，场景受到严格监督，需要用户完全合作才能采取手动措施。相比之下，一个可行的软生物识别系统应该主要处理受试者，而不需要他们进行强有力的合作，以便将其适用性扩展到更实际的场景。

In our case, the cues are extracted from range data which are computed using RGB-D cameras. Recently, novel RGB-D camera sensors as the Microsoft Kinect and Asus Xtion PRO, both manufactured using the techniques developed by PrimeSense [8], provided to the community a new method of acquiring depth information in a fast and affordable way. This drove researchers to use RGB-D cameras in different fields of applications, such as pose estimation [9] and object recognition [10], to quote a few. In our opinion, re-id can be extended to novel scenarios by exploiting this novel technology, allowing to overcome the constraint of analyzing people that do not change their clothes.

在我们的例子中，这些线索是从使用RGB-D相机计算的范围数据中提取的。最近，微软Kinect和华硕Xtion PRO等新型RGB-D摄像头传感器均采用PrimeSense[8]开发的技术制造，为社区提供了一种快速、经济的获取深度信息的新方法。这促使研究人员在不同的应用领域使用RGB-D相机，比如姿势估计[9]和物体识别[10]，举几个例子。我们认为，通过利用这项新技术，re-id可以扩展到新的场景，从而克服分析不换衣服的人的限制。

In particular, our aim is to extract a set of features computed directly on the range measurements given by the sensor. Such features are related to specific anthropometric measurements computed automatically from the person body. In more detail, we introduce two distinct subsets of features. The first subset represents cues computed from the fitted skeleton to depth data i.e. the Euclidean distance between selected body parts such as legs, arms and the overall height. The second subset contains features computed on the surface given by the range data. They come in the form of geodesic distances computed from a predefined set of joints (e.g. from torso to right hip). This latest measure gives an indication of the curvature (and, by approximation, of the size) of specific regions of the body.

特别是，我们的目标是提取一组直接根据传感器给出的距离测量计算的特征。这些特征与从人体自动计算的特定人体测量相关。更详细地，我们介绍了两个不同的特征子集。第一个子集表示从拟合骨骼计算到深度数据的线索，即所选身体部位（如腿、手臂和整体高度）之间的欧几里得距离。第二个子集包含在由范围数据给出的表面上计算的特征。它们以测地线距离的形式出现，从一组预定义的关节（例如从躯干到右臀部）计算得出。这一最新测量给出了身体特定区域的曲率（以及近似大小）的指示。

After analyzing the effectiveness of each feature separately and performing a pruning stage aimed at removing not influent cues, we studied how such features have to be weighted in order to maximize the re-identification performance. We obtained encouraging re-id results on a pool of 79 people, acquired under different times and across intervals of days. This promotes our approach and in general the idea of performing re-id with 3D soft biometric cues extracted from RGB-D cameras.

在分别分析每个特征的有效性并执行旨在去除非影响线索的修剪阶段后，我们研究了如何对这些特征进行加权，以最大限度地提高重新识别性能。我们在79人的研究中获得了令人鼓舞的结果，这些研究是在不同的时间间隔内获得的。这促进了我们的方法，以及总体上利用从RGB-D摄像机中提取的3D软生物特征线索进行身份识别的想法。

The remaining of the paper is organized as follows. Section 2 briefly presents the re-identification literature. Section 3 details our approach followed by Section 4 that shows experimental results. Finally, Section 5 concludes the paper, envisaging some future perspectives.

本文的其余部分组织如下。第 2 节简要介绍了重新识别文献。第 3 节详细介绍了我们的方法，第 4 节显示了实验结果。最后，第 5 节总结了本文，展望了一些未来的观点。

2 State of the art

Most of the re-identification approaches build on appearance-based features [1, 11, 3] and this prevents from focusing on re-id scenarios where the clothing may change. Few approaches constrain the re-id operative conditions by simplifying the problem to temporal reasoning. They actually use the information on the layout distribution of cameras and the temporal information in order to prune away some candidates in the gallery set [12].

大多数重新识别方法都基于外观特征[1,11,3]，这阻止了人们关注衣服可能会发生变化的重新识别场景。很少有方法通过将问题简化为时间推理来限制re-id的操作条件。他们实际上使用有关相机布局分布的信息和时间信息，以便修剪画廊集合中的一些候选者[12]。

The adoption of 3D body information in the re-identification problem was first introduced by [13] where a coarse and rigid 3D body model was fitted to different pedestrians. Given such 3D localization, the person silhouette can be related given the different orientations of the body as viewed from different cameras. Then, the registered data are used to perform appearance-based re-identification. Differently, in our case we manage genuine soft biometric cues of a body which is truly non-rigid and also disregarding an appearance based approach. Such possibility is given by nowadays technology that allows to extract reliable anatomic cues from depth information provided by a range sensor.

在re-id问题中采用3D身体信息首先由 [13] 引入，其中将粗糙和刚性的3D身体模型拟合到不同的行人。给定这样的3D定位，考虑到从不同的相机观察到的身体的不同方向，人的轮廓可以是相关的。然后，注册的数据用于执行基于外观的re-id。不同的是，在我们的案例中，我们管理真正的身体软生物识别提示，这是真正的非刚性的，也不考虑基于外观的方法。这种可能性是由当今的技术给出的，该技术允许从范围传感器提供的深度信息中提取可靠的解剖线索。

In general, the methodological approach to re-identification can be divided into two groups: learning-based and direct strategies. Learning based methods split a re-id dataset into two sets: training and test [1, 3]. The training set is used for learning features and strategies for combining features while the test dataset is used for validation. Direct strategies [11] are simple feature extractors. Usually, learning-based strategies are strongly time-consuming (considering the training and testing steps), but more effective than direct ones. Under this taxonomy, our proposal can be defined as a learning-based strategy.

一般来说，re-id的方法论方法可以分为两组：基于学习的策略和直接策略。基于学习的方法将 re-id 数据集分成两组：训练和测试 [1, 3]。训练集用于学习特征和组合特征的策略，而测试数据集用于验证。直接策略 [11] 是简单的特征提取器。通常，基于学习的策略非常耗时（考虑到训练和测试步骤），但比直接策略更有效。在这种分类法下，我们的建议可以定义为基于学习的策略。

3 Our approach

Our re-identification approach has two distinct phases. First, a particular signature is computed from the range data of each subject. Such signature is a composition of several soft biometric cues extracted from the depth data acquired with a RGB-D sensor. In the second phase, these signatures are matched against the test subjects from the gallery set. A learning stage, computed beforehand, explains how each single feature has to be weighted when combined with the others. A feature with high weight means that it is useful for obtaining good re-identification performances.

我们的重新鉴定方法有两个不同的阶段。首先，根据每个主题的范围数据计算特定签名。这种特征是从RGB-D传感器获取的深度数据中提取的几个软生物特征线索的组合。在第二阶段，这些签名与gallery set中的受试者匹配。预先计算的学习阶段解释了在与其他特征组合时，每个特征必须如何加权。高权重特征意味着它有助于获得良好的再识别性能。

3.1 提取First stage: signature extraction

The first step processes the data acquired from a RGB-D camera such as the Kinect. In particular, this sensor uses a structured light based infrared patterns [8] that illuminates the scene/objects. Thus the system obtains a depth map of the scene by measuring the pattern distortion created by the 3D relief of the object. When RGB-D cameras are used with the OpenNI framework [14], it is possible to use the acquired depth map to segment & track human bodies, estimate the human pose, and perform metric 3D scene reconstruction. In our case, the information used is given by the segmented point-cloud of a person, the positions of the fifteen body joints and the estimation of the floor plane. Although the person depth map and pose are given by the OpenNIsoftware libraries, the segmentation of the floor required an initial pre-processing using RANSAC to fit a plane to the ground. Additionally, a mesh was generated from the person point cloud using the“Greedy Projection” method [15].

第一步处理从RGB-D相机（如Kinect）获取的数据。特别是，该传感器使用基于结构光的红外模式[8]，照亮场景/对象。因此，系统通过测量物体的3D浮雕产生的图案失真来获得场景的深度图。当RGB-D摄像机与OpenNI框架一起使用[14]时，可以使用获取的深度图分割和跟踪人体，估计人体姿势，并执行公制3D场景重建。在我们的例子中，所使用的信息由一个人的分段点云、15个身体关节的位置和地板的估计给出。尽管OpenNIS软件库给出了人的深度图和姿势，但地板的分割需要使用RANSAC进行初始预处理，以使飞机适合地面。此外，使用“贪婪投影”方法从人点云生成网格[15]。

Before focusing on the signature extraction, a preliminary study has been performed by examining a set of 121 features on a dataset of 79 individuals, each captured in 4 different days (see more information on the dataset in Sec. 4). These features can be partitioned in two groups: the first contains the skeletonbased features, i.e., those cues which are based on the exhaustive combination of distances among joints, distances between the floor plane and all the possible joints. The second group contains the Surface-based features, i.e., the geodesic distances on the mesh surface computed from different joints pairs. In order to determine the most relevant features, a feature selection stage evaluates the performance on the re-identification task of each single cue, one at a time, independently. In particular, as a measure of the re-id accuracy, we evaluated the normalized area under curve (nAUC) of the cumulative matching curve (CMC) discarding those features which resulted equivalent to perform a random choice of the correct match (see more information on these classification measures on Sec. 4).

在关注签名提取之前，已经通过检查 79 个人的数据集上的一组 121 个特征进行了初步研究，每个特征在 4 个不同的天内捕获（请参阅第 4 节中有关数据集的更多信息）。这些特征可以分为两组：第一组包含基于骨架的特征，即那些基于关节之间距离、地板和所有可能关节之间的距离的详尽组合的线索。第二组包含基于表面的特征，即从不同关节对计算的网格表面上的测地距离。为了确定最相关的特征，特征选择阶段评估每个单一线索的重新识别任务的性能，一次一个，独立地。特别是，作为 re-id 准确性的衡量标准，我们评估了累积匹配曲线 (CMC) 的归一化曲线下面积 (nAUC)，丢弃了那些等同于执行随机选择正确匹配的特征（查看更多信息关于第 4 节的这些分类措施）。

The results after such pruning stage was a set of 10 features:

– Skeleton-based features 基于骨架的特征 ---第一组

d1: Euclidean distance between floor and head 地板和头部之间的欧几里德距离
d2: Ratio between torso and legs 躯干和腿之间的比例
d3: Height estimate 身高估计
d4: Euclidean distance between floor and neck 地板和颈部之间的欧氏距离
d5: Euclidean distance between neck and left shoulder 颈部和左肩欧氏距离
d6: Euclidean distance between neck and right shoulder 颈部和右肩欧氏距离
d7: Euclidean distance between torso center and right shoulder 躯干中心和右肩欧氏距离

– Surface-based features 基于表面的特征 ---第二组

d8: Geodesic distance between torso center and left shoulder 躯干中心与左肩之间的测地距离
d9: Geodesic distance between torso center and left hip 躯干中心和左臀部测地距离
d10: Geodesic distance between torso center and right hip 躯干中心和右臀部测地距离

Some of the features based on the distance from the floor are illustrated in Fig. 1 together with the joints localization on the body. In particular, the second feature (ratio between torso and legs) is computed according to the following equation:

图1显示了一些基于离地板距离的特征，以及身体上的关节定位。具体而言，第二个特征（躯干和腿之间的比率）根据以下等式计算：

Fig. 1. Distances employed for building the soft-biometric features (in black), and some of the soft biometric features (in green).

It is important to notice that the joints are not localized in the outskirt of the point-cloud, but, in most of the cases, in the proximities of the real articulations of the human body. 图1。用于构建软生物特征（黑色）和一些软生物特征（绿色）的距离。需要注意的是，关节并非局限于点云的外围，而是在大多数情况下，位于人体真实关节的附近。

Fig. 2. Geodesic features: the red line represents the path found by A* between torsoto left shoulder, torso to left hip and torso to right hip图 2. 测地线特征：红线表示 A* 在躯干到左肩、躯干到左臀部和躯干到右臀部之间找到的路径

The computation of the (approximated) geodesic distances, i.e., Torso to left shoulder, torso to left hip and torso to right hip, is given by the following steps. First, the selected joints pairs, which are normally not lying onto the point cloud, are projected towards the respective closest points in depth. This generates a starting and ending point on the surface where it is possible to initialize an A* algorithm computing the minimum path over the point cloud (Fig. 2). Since the torso is usually recovered by the RGB-D sensor with higher precision, the computed geodesic features should be also reliable.

（近似）测地线距离的计算，即躯干到左肩、躯干到左臀部和躯干到右臀部，由以下步骤给出。首先，通常不位于点云上的选定的关节对被投影到深度上各自最近的点。这会在表面上生成一个起点和终点，在该点可以初始化一个 A* 算法，计算点云上的最小路径（图 2）。由于躯干通常由 RGB-D 传感器以更高的精度恢复，因此计算的测地线特征也应该是可靠的。

As a further check on the 10 selected features, we verified the accuracy by manually measuring the features on a restricted set of subjects. At the end, we found out that higher precision was captured especially in the features related to the height (d1, ..., d4), while other features were slightly more noisy. In general, all these features are well-suited for an indoor usage, in which people do not wear heavy clothes that might hide the human body aspects.

作为对 10 个选定特征的进一步检查，我们通过手动测量一组受限对象的特征来验证准确性。最后，我们发现特别是在与高度相关的特征（d1，...，d4）中捕获的精度更高，而其他特征的噪声稍大。一般来说，所有这些功能都非常适合在室内使用，在这种情况下，人们不会穿可能隐藏人体方面的厚衣服。

3.2 匹配Second stage: signature matching

This section illustrates how the selected features can be jointly employed in the re-id problem. In the literature, a re-id technique is usually evaluated considering two sets of personal ID signatures: a gallery set A and a probe set B.

本节说明了如何在re-id问题中联合使用选定的features。在文献中，re-id技术的评估通常考虑两组个人id签名：画廊集gallery set A和探针集probe set B。

The evaluation consists in associating each ID signature of the probe set B to a corresponding ID signature in the gallery set A. For the sake of clarity, let us suppose to have N different ID signatures (each one representing a different individual, so N different individuals) in the probe set and the same occurs in the gallery set. All the N subjects in the probe are present in the gallery. For evaluating the performance of a re-id technique, the most used measure is the Cumulative Matching Curve (CMC) [1], which models the mean probability that whatever probe signature is correctly matched in the first T ranked gallery individuals, where the ranking is given by evaluating the distances between ID signatures in ascending order.

评估包括将probe set B的每个ID签名与gallery set A中的相应ID签名相关联。为了清楚起见，让我们假设有 N 个不同的 ID 签名（每个代表一个不同的个体，因此 N 个不同的个人）在probe set中，同样在gallery set中也一样。probe中的所有 N 个受试者都存在于gallery中。为了评估 re-id 技术的性能，最常用的度量是累积匹配曲线 (CMC) [1]，它模拟了任何probe签名在前 T 个排序的gallery个体中正确匹配的平均概率，其中排名通过按升序评估 ID 签名之间的距离来给出。

In our case, each ID signature is composed by F features (in our case, F = 10), and each feature has a numerical value. Let us then define the distance between corresponding features as the squared difference between them. For each feature, we obtain a N × N distance matrix. However such matrix is biased towards features with higher measured values leading to a problem of heterogeneity of the measures. Thus, if a feature such as the height is measured, it would count more w.r.t. other features whose range of values is more compact (e.g. the distance between neck and left shoulder). To avoid this problem, we normalize all the features to a zero mean and unitary variance. We use the data from the gallery set to compute the mean value of each feature as well as the feature variance.

在我们的例子中，每个ID签名由F个特征组成（在我们的例子中，F=10），每个特征都有一个数值。然后，让我们将相应特征之间的距离定义为它们之间的平方差。对于每个特征，我们得到一个N×N距离矩阵。然而，这种矩阵偏向于具有更高测量值的特征，导致测量的异质性问题。因此，如果测量诸如高度之类的特征，它将计算更多 w.r.t. 其他值范围更紧凑的特征（例如颈部和左肩之间的距离）。为了避免这个问题，我们将所有特征归一化为零均值和单一方差【均值为0方差为1 就是服从正态分布】。我们使用gallery set中的数据来计算每个特征的平均值以及特征方差。

Given the normalized N ×N distance matrix, we now have to surrogate those distances into a single distance matrix, obtaining thus a final CMC curve. The naive way to integrate them out would be to just average the matrices. Instead, we propose to utilize a weighted sum of the distance matrices. Let us define the set of weight wi for i = 1, ..., F that represents the importance of the i−th feature: the higher the weight, the more important is the feature. Since tuning those weights is usually hard, we propose a quasi-exhaustive learning strategy, i.e., we explore the weight space (from 0 to 1 with step 0.01) in order to select the weights that maximize the nAUC score. In the experiments, we report the values of those weights and compare this strategy with the average baseline.

给定标准化的N ×N距离矩阵，我们现在必须将这些距离代入单个距离矩阵，从而获得最终的CMC曲线。最简单的方法就是求矩阵的平均值。相反，我们建议利用距离矩阵的加权和。我们定义一组权重wi,(i=1,…,F)，表示第i个特征的重要性:权重越高，表示该特征越重要。由于通常很难调整这些权值，我们提出了一种准穷举学习策略，即，我们探索权值空间(from 0 to 1 with step 0.01)，以选择使nAUC得分最大化的权值。在实验中，我们报告这些权重的值，并将此策略与平均基线进行比较。

4 Experiments

In this section, we describe first how we built the experimental dataset and how we formalised the re-id protocol. Then, an extensive validation is carried forward over the test dataset in different conditions.

在本节中，我们首先描述我们如何构建实验数据集以及我们如何形式化 re-id 协议。然后，在不同条件下对测试数据集进行广泛的验证。

4.1 Database creation

Fig. 3. Illustration of the different groups in the recorded data, rows from top to bottom: “Walking”, “Walking2”, “Backwards” and “Collaborative”.

Note that people changed their clothings during the acquisitions in different days. On the right, statistics of the“Walking” dataset: for each feature, the histogram is shown; in the parenthesis, its mean value (in cm, except d2) and standard deviation. 图 3. 记录数据中不同组的图示，从上到下的行：“Walking”、“Walking2”、“Backwards”和“Collaborative”。请注意，人们在不同的日子里在收购期间更换了他们的衣服。右侧是“Walking”数据集的统计数据：对于每个特征，显示直方图；在括号中，它的平均值（以厘米为单位，d2 除外）和标准偏差。

Our dataset is composed by four different groups of data. The first “Collaborative” group has been obtained by recording 79 people with a frontal view, walking slowly, avoiding occlusions and with stretched arms. This happened in an indoor scenario, where the people were at least 2 meters away from the camera. This scenario represents a collaborative setting, the only one that we considered in these experiments. The second (“Walking”) and third (“Walking2”) groups of data are composed by frontal recordings of the same 79 people walking normally while entering the lab where they normally work. The fourth group (“Back-wards”) is a back view recording of the people walking away from the lab. Since all the acquisitions have been performed in different days, there is no guarantee that visual aspects like clothing or accessories will be kept constant. Figure 3 shows the computed meshes from different people during the recording of the four different sessions, together with some statistics about the collected features.

我们的数据集由四组不同的数据组成。第一个“协作”组是通过记录79人的正面视图、缓慢行走、避免阻塞和伸展手臂而获得的。这发生在一个室内场景中，人们距离摄像机至少2米远。这个场景代表了一种协作环境，是我们在这些实验中考虑的唯一一种。第二组（“行走”）和第三组（“行走2”）数据由79名正常行走的人在进入他们正常工作的实验室时的正面记录组成。第四组（“向后”）是人们离开实验室的后视图记录。由于所有采集都是在不同的日子进行的，因此无法保证服装或配饰等视觉方面会保持不变。图3显示了在记录四个不同会话期间，来自不同人群的计算网格，以及有关收集特征的一些统计信息。

From each acquisition, a single frame was automatically selected for the computation of the biometric features. This selection uses the frame with the best confidence of tracked skeleton joints【Such confidence score is a byproduct of the skeletonfitting algorithm】, which is closest to the camera and it was not cropped by the sensors fields of view. This represents the frame with the highest joints tracking confidence which in most of the cases was approximately 2.5 meters away from the camera.

从每次采集中，自动选择一个帧来计算生物特征。此选择使用具有最佳置信度的跟踪骨架关节的帧【这种置信度分数是骨架拟合算法的副产品】，该帧最接近相机并且未被传感器视野裁剪。这代表了具有最高关节跟踪置信度的帧，在大多数情况下，它距离相机大约 2.5 米。

After that, the mesh for each subject was computed and the 10 soft biometric cues have been extracted using both skeleton and geodesics information.

然后，计算每个对象的网格，并使用骨架和测地线信息提取10个软生物特征线索。

4.2 Semi-Cooperative re-id

Fig. 4. Single-feature CMCs — “Collaborative” VS “Walking 2” (best viewed in colors)

Given the four datasets, we have built a semi-collaborative scenario, where the gallery set was composed by the ID signatures of the “Collaborative” setting, and the test data was the “Walking 2” set. The CMCs related to each feature are portrayed in Fig. 4: they show how each feature is able to capture discriminative information of the analyzed subjects. Fig. 5 shows the normalized AUC of each features. Notice that the features associated to the height of the person are very meaningful, as so the ratio between torso and legs.

给定四个数据集，我们构建了一个半协作场景，其中gallery set由“Collaborative”setting的 ID 签名组成，测试数据为“Walking 2”set。与每个特征相关的 CMC 如图 4 所示：它们显示了每个特征如何能够捕获被分析对象的判别信息。图 5 显示了每个特征的归一化 AUC。请注意，与人的身高相关的特征非常有意义，躯干和腿之间的比例也是如此。

Fig. 5. Area under the curve for each feature (the numbering here follows the featuresenumeration presented in Sec. 3) —“Collaborative” VS “Walking 2”. The numbersover the bars indicate the numerical nAUC values of the different features.

The results of Fig. 5 highlights that the nAUC over the different features spans from 52.8% to 88.1%. Thus, all of them contributes to have better re-identification results. To investigate how their combination helps in re-id, we exploit the learning strategy proposed in Sec. 3.2. Such weights wi are learned once using a different dataset than the one used during testing. The obtained weights are: w1 = 0.24, w2 = 0.17, w3 = 0.18, w4 = 0.09, w5 = 0.02, w6 = 0.02, w7 = 0.03, w8 = 0.05, w9 = 0.08, w10 = 0.12. The weights mirrors the nUAC obtained for each feature independently (Fig. 5): the most relevant ones are d1 (Euclidean distance between floor and head), d2 (Ratio between torso and legs), d3 (Height estimate), and d10 (Geodesic distance between torso center and right hip). In Fig. 6, we compare this strategy with a baseline: the average case where wi = 1/F for each i. It is clear that the learning strategy gives better results (nAUC= 88.88%) with respect to the baseline (nAUC= 76.19%) and also the best feature (nAUC= 88.10%) that correspods to d1 in Fig. 5. For the rest of the experiments the learning strategy is adopted.

图5的结果显示，不同特征上的nAUC在52.8%到88.1%之间。因此，它们都有助于获得更好的再识别结果。为了研究它们的组合如何帮助重新识别，我们利用了第3.2节中提出的学习策略。这样的权重wi只需要使用一个不同于测试期间使用的数据集来学习一次。得到的权重为:w1 = 0.24, w2 = 0.17, w3 = 0.18, w4 = 0.09, w5 = 0.02, w6 = 0.02, w7 = 0.03, w8 = 0.05, w9 = 0.08, w10 = 0.12。权重反映了每个特征独立获得的nUAC(图5):最相关的是d1(地板与头部之间的欧式距离)、d2(躯干与腿部之间的比率)、d3(身高估计)和d10(躯干中心与右臀部之间的测地距离)。

在图6中,我们将此策略与基线进行比较:每个 i的wi = 1/F 的平均情况。很明显，相较于在基线方面（nAUC= 76.19%）和图 5 中 d1 的最佳特征(nAUC= 88.10%)，学习策略给出了更好的结果(nAUC= 88.88%)。其余实验采用了学习策略。

Fig. 6. Compilation of final CMC curves —“Collaborative” - “Walking 2”

4.3 Non-Cooperative re-id

Non-cooperative scenarios consist of the “walking”, “walking2” and “backwards” datasets. We generate different experiments by combining cooperative and non-cooperative scenarios as gallery and probe sets. Table 1 reports the nAUC score given the trials we carried out. The non-cooperative scenarios gave rise to higher performances than the cooperative ones. The reason is that, in the collaborative acquisition, people tended to move in a very unnatural and constrained way, thus originating biased measurements towards a specific posture. In the non-cooperative setting this did not clearly happen.

非合作场景由“walking”、“walking2”和“backwards”数据集组成。我们通过将合作和非合作场景组合成gallery和probe 集来生成不同的实验。表1给出了我们所进行的试验的nAUC分数。non-cooperative scenarios比cooperative具有更高的性能。原因是，在合作获取中，人们倾向于以一种非常不自然和受约束的方式移动，从而产生了对特定姿势的偏差测量。在非合作环境中，这种情况并没有发生。

Table 1. nAUC scores for the different re-id scenarios.

5 Conclusions

In this paper, we presented a person re-identification approach which exploits soft-biometrics features, extracted from range data, investigating collaborative and non-collaborative settings. Each feature has a particular discriminative expressiveness with height and torso/legs ratio being the most informative cues. Re-identification by 3D soft biometric information seems to be a very fruitful research direction: other than the main advantage of a soft biometric policy, i.e., that of being to some extent invariant to clothing, many are the other reasons: from one side, the availability of precise yet affordable RGB-D sensors encourage the study of robust software solutions toward the creation of real surveillance system. On the other side, the classical appearance-based re-id literature is characterized by powerful learning approaches that can be easily embedded in the 3D situation. Our research will be focused on this last point, and on the creation of a larger 3D non-collaborative dataset.

在本文中，我们提出了一种person re-id方法，该方法利用从范围数据中提取的软生物特征，研究协作和非协作设置。每个特征都有特定的判别性表达，身高和躯干/腿部比例是最具信息性的线索。通过3D软生物特征信息进行再识别似乎是一个非常富有成效的研究方向：除了软生物特征策略的主要优势，即在某种程度上对服装保持不变之外，还有许多其他原因：一方面，精确但经济实惠的RGB-D传感器的可用性鼓励研究稳健的软件解决方案，以创建真正的监视系统。另一方面，经典的基于外观的re-id文献的特点是强大的学习方法，可以很容易地嵌入到3D环境中。我们的研究将集中在最后一点上，以及创建一个更大的3D非协作数据集。

References

1. D. Gray and H. Tao, “Viewpoint invariant pedestrian recognition with an ensamble of localized features,” in ECCV, Marseille, France, 2008, pp. 262–275.

2. M. Farenzena, L. Bazzani, A. Perina, V. Murino, and M. Cristani, “Person reidentification by symmetry-driven accumulation of local features,” in CVPR, 2010.

3. W. Zheng, S. Gong, and T. Xiang, “Person re-identification by probabilistic relative distance comparison,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011, pp. 649–656.

4. C. Velardo and J.-L. Dugelay, “Improving identification by pruning: a case study on face recognition and body soft biometric,” Eurecom, Tech. Rep. EURECOM+3593, 01 2012.

5. Y.-F. Wang, E. Y. Chang, and K. P. Cheng, “A video analysis framework for soft biometry security surveillance,” in Proceedings of the third ACM international workshop on Video surveillance & sensor networks, ser. VSSN ’05, 2005, pp. 71–78.

6. M. Demirkus and K. Garg, “Automated person categorization for video surveillance using soft biometrics,” Proc of SPIE, Biometric Technology for, 2010.

7. A. Dantcheva, J.-L. Dugelay, and P. Elia, “Person recognition using a bag of facial soft biometrics (BoFSB),” in 2010 IEEE International Workshop on Multimedia Signal Processing, vol. 85. IEEE, Oct. 2010, pp. 511–516.

8. B. Freedman, A. Shpunt, M. Machline, and Y. Ariel, “US Patent - US2010/0118123,” 2010.

9. J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake, “Real-time human pose recognition in parts from single depth images,” in CVPR 2011. IEEE, Jun. 2011, pp. 1297–1304.

10. L. Bo, K. Lai, X. Ren, and D. Fox, “Object recognition with hierarchical kernel descriptors,” in CVPR 2011, no. c. IEEE, Jun. 2011, pp. 1729–1736.

11. D. S. Cheng, M. Cristani, M. Stoppa, L. Bazzani, and V. Murino, “Custom pictorial structures for re-identification,” in British Machine Vision Conference (BMVC), 2011.

12. O. Javed, K. Shafique, Z. Rasheed, and M. Shah, “Modeling inter-camera spacetime and appearance relationships for tracking across non-overlapping views,” Comput. Vis. Image Underst., vol. 109, no. 2, pp. 146–162, 2008.

13. D. Baltieri, R. Vezzani, and R. Cucchiara, “Sarc3d: a new 3d body model for people tracking and re-identification,” in Proceedings of the 16th international conference on Image analysis and processing, ser. ICIAP’11, 2011, pp. 197–206.

14. OpenNI. (2012, Feb.) Openni framework@ONLINE. [Online]. Available: http: //www.openni.org/

15. Z. C. Marton, R. B. Rusu, and M. Beetz, “On Fast Surface Reconstruction Methods for Large and Noisy Datasets,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan, May 12-17 2009.

[论文阅读1]Re-identiﬁcation with RGB-D sensors相关推荐

论文阅读： Clothes-Changing Person Re-identification with RGB Modality Only
论文阅读: Clothes-Changing Person Re-identification with RGB Modality Only 这篇文章提出了很巧妙的思想从RGB图像中去除衣服特征的影响 ...
rgb fusion检测不到显卡_【论文阅读27】Co-Fusion
主要内容物体级别的语义SLAM.维护一个背景模型和多物体模型,每个模型由面元地图表示. 基于运动分割和语义信息检测运动物体. 使用基于ICP对齐的几何误差和基于颜色差异的光度误差跟踪背景模型(相机位 ...
【论文阅读】Adaptive Clustering-based Malicious Trafﬁc Classiﬁcation at the Network Edge
[论文阅读]Adaptive Clustering-based Malicious Trafﬁc Classiﬁcation at the Network Edge 原文标题:Adaptive Clu ...
论文阅读：Automatic Detection and Classication of Teeth in CT Data
[论文信息] MICCAI 2012 会议论文文章实现了中全自动的牙齿检测和分类,对象为CBCT/MSCT,实验数据集是43套临床头部CT图像. 主要是两个步骤: 1. 分割上颌骨: 2. 分成16 ...
多目标跟踪：CVPR2019论文阅读
多目标跟踪:CVPR2019论文阅读 Robust Multi-Modality Multi-Object Tracking 论文链接:https://arxiv.org/abs/1909.03850 ...
Action4D：人群和杂物中的在线动作识别：CVPR209论文阅读
Action4D:人群和杂物中的在线动作识别:CVPR209论文阅读 Action4D: Online Action Recognition in the Crowd and Clutter 论文链接 ...
3D目标检测论文阅读多角度解析
3D目标检测论文阅读多角度解析一．前言 CNN(convolutional neural network)在目标检测中大放异彩,R-CNN系列,YOLO,SSD各类优秀的方法层出不穷在2D图像的目标 ...
道路检测 | SNE-RoadSeg论文阅读
道路检测 | SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for Accurate ...
[论文阅读] (11)ACE算法和暗通道先验图像去雾算法（Rizzi | 何恺明老师）
<娜璋带你读论文>系列主要是督促自己阅读优秀论文及听取学术讲座,并分享给大家,希望您喜欢.由于作者的英文水平和学术能力不高,需要不断提升,所以还请大家批评指正,非常欢迎大家给我留言评论,学 ...
【论文阅读】Learning Traffic as Images: A Deep Convolutional ... [将交通作为图像学习: 用于大规模交通网络速度预测的深度卷积神经网络]（2）
[论文阅读]Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation ...

[论文阅读1]Re-identiﬁcation with RGB-D sensors

摘要

1 Introduction

2 State of the art

3 Our approach

3.1 提取First stage: signature extraction

3.2 匹配Second stage: signature matching

4 Experiments

4.1 Database creation

4.2 Semi-Cooperative re-id

4.3 Non-Cooperative re-id

5 Conclusions

References

[论文阅读1]Re-identiﬁcation with RGB-D sensors相关推荐

最新文章

热门文章