Attention Transfer

2024-06-04 04:10:57

Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer

Motivation

大量的论文已经证明Attention在CV、NLP中都发挥着巨大的作用，因为本文利用Attention做KD，即让student学习teacher的attention maps

Activation-based attention transfer

如果定义是spatial attention map

各个channel相同位置绝对值求和
各个channel相同位置p次方求和：对比1，会更加注重于响应高的地方
各个channel相同位置p次方求最大值

3种方式得到的attention map各有侧重，后两种更加侧重一些响应更突出的位置

最终的Loss：

Qs Qt为第j对student和teacher的attention map

beta取1000，式子后半部会在所有位置取平均，整体来说后半部的权重在0.1左右

Gradient-based attention transfer

网络对某些位置输入的敏感性，比如调整某些位置的像素然后观察网络输出的变化，如果某些位置调整后网络输出变化大即说明网络更加paying attention to这个位置

Experiments

activation-based AT， F-AcT(类似FitNets，1x1做feature adaptation后做L2 loss)

平方和效果最好

activation-based好于gradient-based

其他在Scenes这个数据集上AT做的比传统的KD要好很多，猜测是因为we speculate is due to importance of intermediate attention for fine-grained recognition

好像作者写错了吧，这里明明CUB才是fine-grained的数据集

重要

KD struggles to work if teacher and student have different architecture/depth (we observe the same on CIFAR), so we tried using the same architecture and depth for attention transfer.

We also could not find applications of FitNets, KD or similar methods on ImageNet in the literature. Given that, we can assume that proposed activation-based AT is the first knowledge transfer method to be successfully applied on ImageNet.

Attention Transfer相关推荐

论文阅读：Hierarchical Attention Transfer Network for Cross-Domain Sentiment Classification
论文来源:https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16873/16149 发表日期:2018AAAI 研究背景用户通常在社 ...
【cvpr2022-论文笔记】《L2G: A Simple Local-to-Global Knowledge Transfer .... Semantic Segmentation》
目录文章概述网络架构 Classification Loss Attention Transfer Loss Shape Tansfer Loss 相关讨论本文记录弱监督语义分割领域论文笔记&l ...
Paying More Attetion to Attention:Improving the Performance of Convolutional Neural Networks via AT
Paying More Attetion to Attention:Improving the Performance of Convolutional Neural Networks via Att ...
PAYING MORE ATTENTION TO ATTENTION：
PAYING MORE ATTENTION TO ATTENTION : IMPROVING THE PERFORMANCE OF C NVOLUTIONAL NEURAL NETWORKS VIA ...
Pay more attention to attention...Sergey Zagoruyko论文解读及代码解释
pay more attention to attention: improving the performance of convolutional neural networks via atte ...
收藏 | 一文带你总览知识蒸馏，详解经典论文
「免费学习 60+ 节公开课:投票页面,点击讲师头像」作者:凉爽的安迪来源 | 深度传送门(ID:deep_deliver) [导读]这是一篇关于[知识蒸馏]简述的文章,目的是想对自己对于知识蒸馏 ...
关于知识蒸馏，这三篇论文详解不可错过
作者 | 孟让转载自知乎导语:继<从Hinton开山之作开始,谈知识蒸馏的最新进展>之后,作者对知识蒸馏相关重要进行了更加全面的总结.在上一篇文章中主要介绍了attention tra ...
百度15篇论文被AAAI 2019收录
1月27日,第33届 AAAI(AAAI 2019)在美国夏威夷召开,其中百度共有15篇论文被收录. AAAI于1979年成立,是国际人工智能领域的顶级国际会议.这一协会如今在全球已有超过6000名的 ...
【模型蒸馏】从入门到放弃：深度学习中的模型蒸馏技术
点击上方,选择星标或置顶,每天给你送干货! 阅读大概需要17分钟跟随小博主,每天进步一丢丢来自 | 知乎作者 | 小锋子Shawn 地址 | https://zhuanlan.zhihu.c ...

最新文章

热门文章