Orderless Recurrent Models for Multi-label Classification Paper PDF

文章目录

  • Introduction
  • Innovation
  • Method
    • Image-to-sequence model
    • Training recurrent models
    • Orderless recurrent models
  • Experiments
    • Convergence rate
    • Co-occurrence in the predictions
    • Comparation of different ordering methods
    • Comparation of state-of-the-art

Introduction

Multi-label classification is the task of assigning a wide range of visual concepts (labels) to images. The large variety of concepts and the uncertain relations among them make this a very challenging task and, to successfully address it. RNNs have demonstrated good performance in many tasks that require processing variable length sequential data including multi-label classification. This modal takes account the relation-patterns among labels into training process naturally. But since RNNs produce sequential outputs, labels need to be ordered for the multi-label classification task.

Several recent works have tried to address this issue by imposing an arbitrary, but consistent, ordering to the ground truth label sequences. Despite alleviating the problem, these approaches are short of solving it, and many of the original issues are still present. For example, in an image that features a clearly visible and prominent dog, the LSTM may chose to predict that label first, as the evidence for it is very large. However, if dog is not the label that happened to be first in the chosen ordering, the network will be penalized for that output, and then penalized again for not predicting dog in the “correct” step according to the ground truth sequence. In this way,the training process could become very slow.

In this paper, we propose ways to dynamically order the ground truth labels with the predicted label sequence. There are the two ways of doing that: predicted label alignment (PLA) and minimal loss alignment (MLA). We empirically show that these approaches lead to faster training and also eliminate other nuisances like repeated labels in the predicted sequence.

Innovation

  1. Orderless recurrent models with the minimal loss alignment (MLA) and predicted label alignment (PLA)

Method

Image-to-sequence model


This type of model consists of a CNN (encoder) part that extracts a compact visual representation from the image, and of an RNN (decoder) part that uses the encoding to generate a sequence of labels, modeling the label dependencies.

Linearized activations from the fourth convolutional layer are used as input for the attention module, along with the hidden state of the LSTM at each time step, thus the attention module focuses on different parts of the image every time. These attention weighted features are then concatenated with the word embedding of the class predicted in the previous time step, and given to the LSTM as input for the current time step.

The predictions for the current time step ttt are computed in the following way:
xt=E⋅l^t−1ht=LSTM(xt,ht−1,ct−1)pt=W⋅ht+b\begin{aligned} x_{t} &= E \cdot \hat{l}_{t-1} \\ h_{t} &= LSTM(x_{t}, h_{t-1}, c_{t-1}) \\ p_{t} &= W \cdot h_{t} + b \end{aligned} xthtpt=El^t1=LSTM(xt,ht1,ct1)=Wht+b

where EEE is a word embedding matrix and l^t−1\hat{l}_{t-1}l^t1 is the predicted label index in the previous time step. ctc_tct and hth_tht are the model cell and hidden states in the previous LSTM unit. The prediction vector is denoted by ptp_tpt, and WWW and bbb are the weights and the bias of the fully connected layer.

Training recurrent models

To train the model a dataset with pairs of images and sets of labels is used. Let (I,L)(I, L)(I,L) be one of the pairs containing an image III and its n labels L=l1,l2,...,ln,li∈LL = {l_1, l_2, ..., l_n}, l_i ∈ LL=l1,l2,...,ln,liL, with LLL the set of all labels with cardinality m=∣L∣m = |L|m=L, including the start and end tokens.

The predictions pt of the LSTM are collected in the matrix P=[p1p2...pn],P∈Rm×nP = [p_1 p_2 ... p_n], P ∈ R_{m×n}P=[p1p2...pn],PRm×n. When the number of predicted labels kkk is larger than nnn, we only select the first nnn prediction vectors. In case kkk is smaller than n we pad the matrix with empty vectors to obtain the desired dimensions.

We can now define the standard cross-entropy loss for recurrent models as:
L=tr(Tlog(P))with{Ttj=1,if lt=jTtj=0,otherwise. (3 )\mathfrak{L} = tr(Tlog(P)) \tag{3 } \\ with \left\{\begin{matrix} T_{tj}=1 , & \text{ if } l_{t} = j\\ T_{tj}=0 , & \text{ otherwise. } \end{matrix}\right. L=tr(Tlog(P))with{Ttj=1,Ttj=0,iflt=jotherwise.(3)

where T∈Rn×mT ∈ R_{n×m}TRn×m contains the ground truth label for each time step lll. The loss is computed by comparing the prediction of the model at step ttt with the corresponding label at the same step of the ground truth sequence.
For inherently orderless tasks like multi-label classification, where labels often come in random order, it becomes essential to minimize unnecessary penalization, and several approaches have been proposed in the literature. The most popular solution to improve the alignment between ground truth and predicted labels consists on defining an arbitrary criteria by which the labels will be sorted, such frequent-first, rare-first and dictionary-order. However those methods will delay convergence, as the network will have to learn the arbitrary ordering in addition to predicting the correct labels given the image. Further- more, any misalignment between the predictions and the labels will still result in higher loss and misleading updates to the network.

Orderless recurrent models

To alleviate the problems caused by imposing a fixed order to the labels, we propose to align them to the predictions of the network before computing the loss. We consider two different strategies to achieve this:
The first strategy, called minimal loss alignment (MLA) is computed with:
L=minTtr(Tlog(P))s.t.{Ttj∈{1,0},∑jTtj=1∑jTtj=1,∀j∈L∑jTtj=0,∀j∉L\mathfrak{L} = min_{T} \quad tr(Tlog(P)) \\ s.t. \left\{\begin{matrix} T_{tj} \in \{1, 0\}, & \sum_{j}T_{tj} = 1 \\ \sum_{j}T_{tj}=1 , & \forall j \in L\\ \sum_{j}T_{tj}=0 , & \forall j \notin L \end{matrix}\right. L=minTtr(Tlog(P))s.t.Ttj{1,0},jTtj=1,jTtj=0,jTtj=1jLj/L
where T∈Rn×mT ∈ R_{n×m}TRn×m is a permutation matrix, which is constrained to have a ground truth label for each time step: ∑jTtj=1\sum_{j}T_{tj} = 1jTtj=1, and that each label in the ground truth LLL
should be assigned to a time step. The matrix TTT is chosen in such a way as to minimize the summed cross entropy loss. This minimization problem is an assignment problem and can be solved with the Hungarian algorithm.

We also consider the predicted label alignment (PLA) solution. If we predict a label which is in the set of ground truth labels for the image, then we do not wish to change it. That leads to the following optimization problem:

L=minTtr(Tlog(P))s.t.{Ttj∈{1,0},∑jTtj=1Ttj=1,if l^t∈Land j=l^t∑jTtj=1,∀lj∈L∑jTtj=0,∀j∉L\mathfrak{L} = min_{T} \quad tr(Tlog(P)) \\ s.t. \left\{\begin{matrix} T_{tj} \in \{1, 0\}, & \sum_{j}T_{tj} = 1 \\ T_{tj}=1 , & \text{ if } \hat{l}_{t} \in L \text{ and } j = \hat{l}_{t} \\ \sum_{j}T_{tj}=1 , & \forall l j \in L \\ \sum_{j}T_{tj}=0 , & \forall j \notin L \end{matrix}\right. L=minTtr(Tlog(P))s.t.Ttj{1,0},Ttj=1,jTtj=1,jTtj=0,jTtj=1ifl^tLandj=l^tljLj/L

where l^t\hat{l}_{t}l^t is the label predicted by the model at step ttt. Here we first fix those elements in the matrix TTT for which we know that the prediction is in the ground truth set L, and apply the Hungarian algorithm to assign the remaining labels. This second approach results in higher losses than the first one since there are more restrictions on matrix TTT. Nevertheless, this method is more consistent with the labels which were actually predicted by the LSTM.

To further illustrate our proposed approach to train order-less recurrent models we consider an example image and its cost matrix (see Figure 4). The cost matrix shows the cost of assigning each label to the different time steps. The cost is computed as the negative logarithm of the probability at the corresponding time step. Although the MLA approach achieves the order that yields the lowest loss, in some cases this can cause misguided gradients as it does in the example in the figure. The MLA approach puts the label chair in the time step t3t_3t3, although the network already predicts it in the time step t4t_4t4. Therefore, the gradients force the network to output chair instead of sports ball although sports ball is also one of the labels.

Experiments

Convergence rate

Co-occurrence in the predictions

Comparation of different ordering methods

Comparation of state-of-the-art

Orderless Recurrent Models for Multi-label Classification (CVPR2020)相关推荐

  1. 【医学+深度论文:F14】2018 Joint Optic Disc and Cup Segmentation Based on Multi Label Deep Network

    14 2018 T-MI (IEEE Transactions on Medical Imaging ) Joint Optic Disc and Cup Segmentation Based on ...

  2. 【Recurrent Models of Visual Attention】(讲解)

    首先给出论文地址:[Recurrent Models of Visual Attention][1](https://arxiv.org/pdf/1406.6247v1.pdf) 先大概从整体上聊聊这 ...

  3. Integrating Static and Time-Series Data in Deep Recurrent Models for Oncology Early Warning Systems

    标题 :[CIKM 2021] Integrating Static and Time-Series Data in Deep Recurrent Models for Oncology Early ...

  4. Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification

    这是2019年CVPR的一篇文章,作者来自香港大学和Deepwise AI Lab,该文章是关于细粒度图像分类的 解决的问题 用图像级标签训练的深度神经网络只倾向于聚焦于discriminative ...

  5. 综述系列 | 多标签学习的新趋势

    点击上方"小白学视觉",选择加"星标"或"置顶" 重磅干货,第一时间送达 导读 随着Deep learning领域的不断发展,我们面对的问题 ...

  6. 最新综述:多标签学习的新趋势

    这里给大家带来一篇武大刘威威老师.南理工沈肖波老师和 UTS Ivor W. Tsang 老师合作的 2020 年多标签最新的 Survey,我也有幸参与其中,负责了一部分工作. 论文链接: http ...

  7. 本科生晋升GM记录: Kaggle比赛进阶技巧分享

    关注上方"深度学习技术前沿",选择"星标公众号", 资源干货,第一时间送达! 作者:Gary 知乎链接:https://zhuanlan.zhihu.com/p ...

  8. TF学习——TF之API:TensorFlow的高级机器学习API—tf.contrib.learn的简介、使用方法、案例应用之详细攻略

    TF学习--TF之API:TensorFlow的高级机器学习API-tf.contrib.learn的简介.使用方法.案例应用之详细攻略 目录 tf.contrib.learn的简介 tf.contr ...

  9. 精度,精确率,召回率_了解并记住精度和召回率

    精度,精确率,召回率 Hello folks, greetings. So, maybe you are thinking what's so hard in precision and recall ...

最新文章

  1. 计算机怎么设置计算机组和用户,怎样设置同一工作组的计算机资源共享
  2. C++基础知识学习笔记
  3. 关于“指针的指针”的认识(值传递、指针传递区分)
  4. Java——标签组件:JLabel
  5. tecplot批量导出图片_批量导出Excel图片,用这招,半分钟干的活别人一整天完不成...
  6. fitbit手表中文说明书_我如何分析FitBit中的数据以改善整体健康状况
  7. linux内核的诞生时间,Linux内核诞生已经17年
  8. mybatis中的mapper代理的应用
  9. Docker 安装教程
  10. matlab向量函数求梯度,用Matlab计算含有n个自变量的函数的梯度或句柄的使用
  11. PAT甲级准备方法(附2021年PAT甲级秋季考试题解)
  12. 高并发之阿里云弹性伸缩的使用记录
  13. 手机apk应用程序未安装解决办法
  14. Android系统各个版本发布时间
  15. Echarts 图例分两行显示
  16. Android 9.0 recovery 菜单页跳过弹窗自动 WIPE_DATA(恢复出厂设置)
  17. xcode13 swift语言 ios开发 快捷代码优化方式(代码重构)例子
  18. 【转】最浅显的LDAP介绍
  19. 线程进程计算之多任务同步进行
  20. Qt Quick 如何入门?

热门文章

  1. php mcrypt_decrypt,PHP: mcrypt_decrypt - Manual
  2. 高防CDN与高防IP两者有何区别
  3. Mysql所有高阶函数(不常使用,却功能强大)使用记录,不断学习,更新
  4. fullPage 实现全屏滚动
  5. 线性霍尔在汽车油门踏板上的应用
  6. 什么是Serverless?有哪些应用?终于有人讲明白了
  7. 微服务负载均衡器Ribbon
  8. 【文化课每周学习记录】高三上暑假
  9. 微服务的学习路径是什么?
  10. elementUI表格----合并单元格