车辆搜索 -使用triplet loss 训练车辆识别模型

最近读了LEARNING A REPRESSION NETWORK FOR PRECISE VEHICLE SEARCH 论文，将从中所了解的信息在此记录下来。

背景及模型介绍

此论文主要是讨论车辆的识别问题。虽然每个车辆都有唯一的车牌号，但是监控摄像头并不是针对车牌号拍照的。而且有些监控摄像头精度也并不高，有时并不能很清晰的拍摄出车牌号码。再者，一些相似的数字与字母也不容易辩认，如0与O，8与B，D与O。另外也有人会遮挡车牌或涂改车牌。所以，除了车牌号我们还需要从别的方面来分辨车辆是否是同一台车。对于这个问题，我们最先考虑到的是贴年检标志的地方。一方面这个是强制要求必须贴的；另一方面是每个人有自己的习惯，贴的顺序、排列样子会有差别。另外还可以比较车身上的装饰，吊饰，刮痕等这些细微的特征。
车辆的识别与人脸识别是有差异的。同型号同颜色的车辆很多，我们需要进行比较车辆身上更细微的特征。如果用与人脸识别一样的模型来训练不容易有收获。故本文对此提出了改进，增加了对ancor图片的车型号与颜色的分类任务。整个模型结构如下：

从上图可以看出，输入的数据是ancor, positive, negative图片，三张图片通过CNN网络，在Fbase层提取出图片的特征，在此阶段我们可以选择使用imagenet上的模型，如VGG, Inception, ResNet等，然后在Fbase后分成了两个分支，一个是训练triplet loss的，通过学习ancor和positive 图片的相似及ancor与negative 图片的不同性，降低triplet loss, 学习到车辆的细微的特征；另一个分支用于学习ancor图片的车辆型号与颜色的分类信息，并把这个分支的F_acs层学习到的特征，传给triplet loss分支，并与分支F_sls1层的输出一起拼接起来（即concate)作为F_sls2层的输入，这样做是为了让这个分支只关注细微的特征。模型的输出就是model, color, triplet_loss。

模型搭建

根据论文中描过的模型，使用keras实现，数据来源于VehcileID dataset, 如下：

#use triplet identify the vehicle
from keras.applications.inception_v3 import InceptionV3
from keras.callbacks import EarlyStop, ReduceLROnPlateau, ModelCheckpoint
from keras.layers import Dense, Input, Lambda
from keras.models import ModelLEARNING_RATE = 0.00001
IMG_WIDTH = 299
IMG_HEIGHT = 299
NBR_MODELS = 250
NBR_COLORS = 7
INITIAL_EPOCH = 0
#define the model, we get imagenet weights from InceptionV3, but don't need the top layer
inception = InceptionV3(include_top = False, input_tensor = None, input_shape= (IMG_WIDTH, IMG_HEIGHT, 3), pooling = 'avg')
f_base = inception.get_layers(index = -1).outputf_acs = Dense(1024, name='f_acs')(f_base)
feature_model = Model(inputs = inception.input, outputs = f_acs)ancor = Input(input_shape= (IMG_WIDTH, IMG_HEIGHT, 3), name='ancor')
positive = Input(input_shape= (IMG_WIDTH, IMG_HEIGHT, 3), name='positive')
negative = Input(input_shape= (IMG_WIDTH, IMG_HEIGHT, 3), name='negative')#the function of classify the car model and color is only for the ancor
#now get the ancor f_acs layers feature
#after this layer, will do model classify and color classify
f_acs_ancor = feature_model(ancor)
f_ancor_model = Dense(NBR_MODELS, activation='softmax', name='pred_model')(f_acs_ancor)
f_ancor_color = Dense(NBR_COLORS, activation='softmax', name = 'pred_color')(f_acs_ancor)#now create the triplet branch, to check the similarity with the positive and negative
f_sls1 = Dense(1024, name='f_sls1')(f_base)
#for the seconde layer of this branch, we need to concate the data from the f_acs layer
f_sls2_data = concatenate([f_acs_ancor, f_sls1], axis = -1)
f_sls2 = Dense(1024, name='f_sls2')(f_sls2_data)
#build third layer
f_sls3 = Dense(256, name='f_sls3')(f_sls2)
#this model is just for one picture, try to get it's embedding data
sls_model = Model(inputs = inception.input, outputs=f_sls3)
#ancor embedding data
sls_ancor = sls_model(ancor)
sls_positive = sls_model(positive)
sls_neg = sls_model(negative)
#then after get those 3 image's embedding data, we can compute the loss
loss = Lambda(triplet_loss, shape=(1, ))([sls_ancor, sls_positive, sls_neg])#after build the classify model branch, similarity branch, now we build the whole model
#the model inputs is those 3 image, and outputs is the classify result and loss
model = Model(inputs=[ancor, positive, negative], outputs=[f_ancor_model, f_ancor_color, loss])#create optimizer
optimizer = SGD(lr = LEARNING_RATE, momentum = 0.9, decay = 0.0, nesterov = True)
#compile model
model.compile(loss=["categorical_crossentropy", "categorical_crossentropy", identity_loss], optimizer=optimizer, metrics=["accuracy"])
#model sumary
model.sumary()

Triplet loss 的实现：

from keras.backend as K
MARGIN = 1
def triplet_loss(vecs):ancor, pos, neg = vecs#l2 normalize those datal2_ancor = K.l2_normalize(ancor, axis = -1)l2_pos = K.l2_normalize(pos, axis = -1)l2_neg = K.l2_normalize(neg, axis = -1)distance_ancor_pos = K.sum(K.square(K.abs(l2_ancor - l2_pos)), axis = -1, keepdims= True)distance_ancor_neg = K.sum(K.square(K.abs(l2_ancor - l2_neg)), axis = -1, keepdims = True)loss = distance_ancor_pos + MARGIN - distance_ancor_negreturn loss

在车辆搜索模型中使用的triplet loss的实现与FaceNet中的并不一样，因为我们输入的训练数据是找好合适的ancor, positive, negative ，每次训练时是直接输入三张图片的信息来训练的，与FaceNet训练时直接输入所有的图片数据，然后再从得到的embedding数据中去判断哪些是有效的（ancor, positive, negative) 数据对，再用这些有效的数据来计算工triplet loss是不同。有兴趣的，可以看人脸检测的Triplet loss训练
另外identity_loss只是一个伪实现，如下：

ef identity_loss(y_true, y_pred):return K.mean(y_pred - 0 * y_true)

模型使用

训练完后，我们如何将训练好的模型应用于车辆搜索呢？

我们将数据库中已有的车辆图片数据都作为ancor图片，分批通过训练好的模型提取出它们的color, model 标签数据及更细粒度的特征数据。
将上一步得到的数据作为 KNN或其它cluster 模型的输入数据，训练该模型，得到不同的聚类信息。
当要搜索某车辆时，拿到该车辆图片相应特征，就可以到数据库中搜索与之相似的车辆信息了。

模型的思考

在看完这个论文后，我在想为什么SLS层需要concate ancor 图片的color, model 分类任务中提取的特征信息？为何它有助于SLS层的收敛？原因是concate的数据让negative和positive图片都被赋予了与ancor图片相同的color, model特征。这样SLS层无法从color, model特征中去区分它们，只能关注更细粒度的特征。