1.题目

Learning from Noisy Labels with Deep Neural Networks: A Survey
作者团队:韩国科学技术院(KAIST)
Song H , Kim M , Park D , et al. Learning from Noisy Labels with Deep Neural Networks: A Survey. 2020.

2. 摘要

  1. 重述问题:从监督学习的角度来描述使用标签噪声学习的问题;

  2. 方法回顾:对57种最先进的鲁棒训练方法进行了全面的回顾,并根据其方法差异分为5组;然后系统地比较了六种属性来评估它们的优越性;

  3. 评估:对噪声率估计进行了深入的分析,并总结了典型使用的评估方法,包括公共噪声数据集和评估指标;

  4. 总结:我们提出了几个很有前途的研究方向。

3. 研究Noise Learn的意义

对比了三种情况,第一种是clean数据的训练情况;第二种是noise数据没有加入Reg(正则化)的情况;第三种是noise数据加入了Reg的情况;一般情况下我们想方设法去处理正则的内容(data augmentation,weight decay,dropout, batch normalization),可是noise的影响也是很大的,像图中的Gap.

4. 目前的相关综述

Frenay and Verleysen [12] —Classifification in the presence of label noise: A survey-2013–经典的监督学习(说明了noise的定义,来源等等相关内容。Bayes,SVM)—讲述了noise学习的统计学习;

Zhang et al. [27]—Learning from crowdsourced labeled data: A survey–2016-- 讨论众包数据方法(expectation maximization (EM) algorithms)—其实这是不错的一篇综述,工程上挺有用的,众包情景就是一个好的情境。

这篇论文可以结合弱监督的论文来看,特征在后面的推理Ground truth那里。

工具:The project CEKA is available at: http://ceka.sourceforge.net/

Nigam et al.[28] -Impact of noisy labels in learning techniques: A survey- 2020-- 局限在 the loss function and sample selection 两方面

Hanet al. [29] --A survey of label-noise representation learning: Past,present and future-- 2020-- 总结了带有噪声标签的robust学习的基本组成部分,但它们的分类与我们的哲学分类完全不同;从机器学习的定义出去去讲述Noise学习的问题,对于理解机器学习有很帮助的一篇文章,之前也看到篇相关的,忘记哪篇了,也是从机器学习定义出发去讲述一件事情;

  1. 给出LNRL的定义;
  2. 从学习理论的视角对Noise训练有更深一层的理解;
  3. 从数据、目标,优化算子的角度进行了分类;并分析了各类的优缺点;
  4. 提出了新的研究方法;
  5. https://github.com/bhanML/label-noise-papers

这篇文章写得也比较清淅的:

第1节,引言;写作动机与贡献,文章组织情况;

第2节写了Lable-Noise学习的相关文献,完整版本见附录1(早期(1988开始)–Emerging Stage(2015)-- Flourished Stage(2019));

第3节综述的概述,包括LNRL的正式定义、核心问题,以及根据数据、目标和优化对现有工作的分类;

第4节针对利用噪声转换矩阵来求解LNRL的方法;

第5节是关于修改目标函数以使LNRL可行的方法;

第6节是关于利用深度网络的特性来解决LNRL问题的方法;

在第7节中,我们提出了LNRL的未来发展方向。除了LNRL之外,该调查还揭示了几个很有前途的未来方向;

在第8节,总结;

对于数据,主要是一个Noise transition matrix T,T提示了clean标注与noise标注的关系;使用三种方法去使用T来处理Noise标注;

5. 预备知识

这篇综述主要是在系统的方法论上,[29]关注的是一般的视角上( input data, objective functions, optimization policies);

这个综述对存在的robust训练方法作了一个对比;

5.0 lable-noise的监督学习;

5.1 标签噪声分类

​ a.独立于实例的标签噪声;

​ b.依赖实例的标签噪声;

5.2 非深度学习方法—分成四类

​ a. 数据清洗;

​ b. Surrogate Loss(代理损失函数)

​ c. 概率方法

​ d. 基于模型方法

5.3 理论基础

​ a.Label Transition: 从数据的角度来看,noise是来源于label的转移矩阵;这个转移矩阵可以发现其中的内在关系;

​ b. Risk Minimization

​ c. Memorization Effect

5.4 Regression with Noisy Labels

6. 深度学习方法

深度学习的robust训练(分为5类):

它的关注点是深度学习在监督学习过程中更robust.

(P1) Flexibility,(P2) No Pre-training,(P3) Full Exploration,(P4) No Supervision(P5) Heavy Noise(P6) Complex Noise

圆圈:完全支持,叉:不支持,三角:支持但不完全支持

6.1 Robust框架

在DNN上增加一个Noise适应层去学习label transition,或开了一个专用架构来处理;

6.1.A Noise Adaptation Layer

这个方法的原理:

论文:Training deep neural-networks using a noise adaptation layer,” in Proc. ICLR, 2017.

这论文采用了EM算法来处理,理论性学是比较强的。

A.1. Noise Adaptation Layer

Year Venue Title Implementation
2015 ICCV Webly supervised learning of convolutional networks Official (Caffe)
2015 ICLRW Training convolutional networks with noisy labels Unofficial (Keras)
2016 ICDM Learning deep networks from noisy labels with dropout regularization Official (MATLAB)
2016 ICASSP Training deep neural-networks based on unreliable labels Unofficial (Chainer)
2017 ICLR Training deep neural-networks using a noise adaptation layer Official (Keras)

A.2. Dedicated Architecture(专门架构)

Year Venue Title Implementation
2015 CVPR Learning from massive noisy labeled data for image classification Official (Caffe) 管理了两个独立的网络
2018 NeurIPS Masking: A new perspective of noisy supervision Official (TensorFlow) 人工辅助的方法
2018 TIP Deep learning from noisy image labels with quality embedding N/A
2019 ICML Robust inference via generative classifiers for handling noisy labels Official (PyTorch)
6.2 Robust正则化

B.1. Explicit Regularization

Year Venue Title Implementation
2018 ECCV Deep bilevel learning Official (TensorFlow)
2019 CVPR Learning from noisy labels by regularized estimation of annotator confusion Official (TensorFlow)
2019 ICML Using pre-training can improve model robustness and uncertainty Official (PyTorch)
2020 ICLR Can gradient clipping mitigate label noise? Unofficial (PyTorch)
2020 ICLR Wasserstein adversarial regularization (WAR) on label noise N/A
2021 ICLR Robust early-learning: Hindering the memorization of noisy labels Official (PyTorch)

B.2. Implicit Regularization

Year Venue Title Implementation
2015 ICLR Explaining and harnessing adversarial examples Unofficial (PyTorch)
2017 ICLRW Regularizing neural networks by penalizing confident output distributions Unofficial (PyTorch)
2018 ICLR Mixup: Beyond empirical risk minimization Official (PyTorch)

C. Robust Loss Function

Year Venue Title Implementation
2017 AAAI Robust loss functions under label noise for deep neural networks N/A
2017 ICCV Symmetric cross entropy for robust learning with noisy labels Official (Keras)
2018 NeurIPS Generalized cross entropy loss for training deep neural networks with noisy labels Unofficial (PyTorch)
2020 ICLR Curriculum loss: Robust learning and generalization against label corruption N/A
2020 ICML Normalized loss functions for deep learning with noisy labels Official (PyTorch)
2020 ICML Peer loss functions: Learning from noisy labels without knowing noise rates Official (PyTorch)
6.D 损失函数调整

改进损失函数;

D.1. Loss Correction

Year Venue Title Implementation
2017 CVPR Making deep neural networks robust to label noise: A loss correction approach Official (Keras)
2018 NeurIPS Using trusted data to train deep networks on labels corrupted by severe noise Official (PyTorch)
2019 NeurIPS Are anchor points really indispensable in label-noise learning? Official (PyTorch)
2020 NeurIPS Dual T: Reducing estimation error for transition matrix in label-noise learning N/A

D.2. Loss Reweighting

Year Venue Title Implementation
2017 TNNLS Multiclass learning with partially corrupted labels Unofficial (PyTorch)
2017 NeurIPS Active Bias: Training more accurate neural networks by emphasizing high variance samples Unofficial (TensorFlow)

D.3. Label Refurbishment

Year Venue Title Implementation
2015 ICLR Training deep neural networks on noisy labels with bootstrapping Unofficial (Keras)
2018 ICML Dimensionality-driven learning with noisy labels Official (Keras)
2019 ICML Unsupervised label noise modeling and loss correction Official (PyTorch)
2020 NeurIPS Self-adaptive training: beyond empirical risk minimization Official (PyTorch)
2020 ICML Error-bounded correction of noisy labels Official (PyTorch)
2021 AAAI Beyond class-conditional assumption: A primary attempt to combat instancedependent label noise Official (PyTorch)

D.4. Meta Learning

Year Venue Title Implementation
2017 NeurIPSW Learning to learn from weak supervision by full supervision Unofficial (TensorFlow)
2017 ICCV Learning from noisy labels with distillation N/A
2018 ICML Learning to reweight examples for robust deep learning Official (TensorFlow)
2019 NeurIPS Meta-Weight-Net: Learning an explicit mapping for sample weighting Official (PyTorch)
2020 CVPR Distilling effective supervision from severe label noise Official (TensorFlow)
2021 AAAI Meta label correction for noisy label learning Official (PyTorch)
6.4 样本选择

通过多网络或多轮学习,从有噪声的训练数据中识别true-labeled的样本。

E.1. Multi-network Learning – 多网络学习

Year Venue Title Implementation
2017 NeurIPS Decoupling when to update from how to update Official (TensorFlow)
2018 ICML MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels Official (TensorFlow)
2018 NeurIPS Co-teaching: Robust training of deep neural networks with extremely noisy labels Official (PyTorch)
2019 ICML How does disagreement help generalization against label corruption? Official (PyTorch)

E.2. Multi-round Learning–多轮学习方法

Year Venue Title Implementation
2018 CVPR Iterative learning with open-set noisy labels Official (Keras)
2019 ICML Learning with bad training data via iterative trimmed loss minimization Official (GluonCV)
2019 ICML Understanding and utilizing deep neural networks trained with noisy labels Official (Keras)
2019 ICCV O2U-Net: A simple noisy label detection approach for deep neural networks Unofficial (PyTorch)
2020 ICMLW How does early stopping can help generalization against label noise? Official (Tensorflow)
2020 NeurIPS A topological filter for learning with label noise Official (PyTorch)

E.3. Hybrid Learning

Year Venue Title Implementation
2019 ICML SELFIE: Refurbishing unclean samples for robust deep learning Official (TensorFlow)
2020 ICLR SELF: Learning to filter noisy labels with self-ensembling N/A
2020 ICLR DivideMix: Learning with noisy labels as semi-supervised learning Official (PyTorch)
2021 ICLR Robust curriculum learning: from clean label detection to noisy label self-correction N/A

7. 数据集

8. 总结

其实弱监督学习,noise学习,主动学习,出发点都是想去解决语料的问题。弱监督是想在没有标准的数据上进行自动标注,然后对这些标注进行软合并;noise学习,解决标注出来数据的noise问题;主动学习,就是用机器到已标注的数据进行学习,对未标注的样本进行估计,目前是想用直可能标注的样本数据来代替整个样本集的内容。

可是发现,很多领域都是在处理图像的,自然语言是否可以考虑?

9. 参考

论文:https://arxiv.org/pdf/2007.08199.pdf
相关资料: https://github.com/songhwanjun/Awesome-Noisy-Labels

附:

LNRL: Label-Noise Representation Learning

LNSL: label-noise statistical learning

surrogate loss function:代理损失函数或者称为替代损失函数,一般是指当目标函数非凸、不连续时,数学性质不好,优化起来比较复杂,这时候需要使用其他的性能较好的函数进行替换。

ICCV 的全称是 IEEE International Conference on Computer Vision,即国际计算机视觉大会

ICDM(国际数据挖掘会议)
IEEE国际声学、语言和信号处理会议(ICASSP)
国际学习表征会议(International Conference On Learning Representations)

[论文阅读笔记58]Learning from Noisy Labels with Deep Neural Networks:A Survey相关推荐

  1. 【论文阅读】Mastering the game of Go with deep neural networks and tree search

    [论文阅读]Mastering the game of Go with deep neural networks and tree search 1 本文解决了什么问题? 在所有的 完全信息博弈 中, ...

  2. 【论文阅读笔记】Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks

    概念 MAC:multiplication-accumulation operations 2. Convolutional Neural Networks 2.2.1 Normalization l ...

  3. 【论文阅读笔记】Relation Classification via Convolutional Deep Neural Network

    本文发表在Proceedings of COLING 2014,这篇文章发表较早,值得借鉴的是局部特征和全局特征拼接进行分类处理的思路,其实在后续几NLP.CV领域的很多论文都有这种思路的体现,但是是 ...

  4. 论文阅读笔记 1.《Open Flow: Enabling Innovation in Campus Networks》(2022.12.22)

    论文阅读笔记 1.<Open Flow: Enabling Innovation in Campus Networks>(2022.12.22) 一.论文主要内容 二.对 OpenFlow ...

  5. Simultaneous Feature Learning and Hash Coding with Deep Neural Networks

    Simultaneous Feature Learning and Hash Coding with Deep Neural Networks 论文下载地址 自从2014年中山大学潘炎老师讲deep ...

  6. 【论文阅读笔记】Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer

    摘要: 本文主要研究训练和测试类别不相交时(即没有目标类别的训练示例)的对象分类问题.在此之前并没有对于毫无关联的训练集和测试集进行对象检测的工作,只是对训练集所包含的样本进行分类.实验表明,通过使用 ...

  7. 论文阅读:Adding Attentiveness to the Neurons in Recurrent Neural Networks

    目录 Summary Details (Implementation) 原来的 RNN 结构 变为 Element-wise-Attention Gate (EleAttG) 后 论文名称:Addin ...

  8. 机器学习入门课程笔记(二)——deeplearning.ai: Improving Deep Neural Networks

    欢迎前往我的个人博客网站:mathscode.top获取更多学习资源. 所有文本内容会在知乎: MathsCode同步 所有开放资源会在Github: MathsCode开放下载 有问题欢迎大家评论或 ...

  9. 课程一(Neural Networks and Deep Learning),第四周(Deep Neural Networks)—— 0.学习目标...

    Understand the key computations underlying deep learning, use them to build and train deep neural ne ...

最新文章

  1. 梯度下降背后的数学原理几何?
  2. k8s kafka集群 连接不上_图解 K8s 核心概念和术语
  3. php.ini 配快捷方式,求高手解答!界面保存的网页快捷方式图标默认恢复.
  4. Cisco QOS之CBWFQ
  5. 防火墙(6)—— -d的用法
  6. python legend位置_关于matplotlib-legend 位置属性 loc 使用说明
  7. eclipse中java.lang.OutOfMemoryError: Java heap space错误
  8. Android之NetworkOnMainThreadException异常
  9. Zero-copy Receive for vhost
  10. PHP中的数组建必须为数字吗,PHP检查数组中缺少的数字
  11. 债券属性「久期」的本质是什么?
  12. Python 编辑器汇总
  13. CPU测温软件怎么读出来的内核温度?有什么简单靠谱的读取
  14. ora-22858:数据类型的变更无效 for clod
  15. 家里有多台无线路由器怎么连接?三种方法解决不同需求
  16. #今日论文推荐# 斯坦福开发微型机器人,改善靶向给药技术
  17. Mockplus默认图标导出的方法
  18. 陷波器的离散化及仿真验证
  19. 指令级并行(ILP)技术
  20. 引擎Demo演示-算是一个转折点吧

热门文章

  1. TikTok全球月活跃用户突破10亿
  2. 如何有效解决抖音风控屏蔽
  3. 对话Roadstar投资人:一家自动驾驶公司之死(一)...
  4. EKF-SLAM matlab仿真(1)
  5. QQ第三方登录回调地址的问题
  6. 解决jpgraph汉字乱码的两种方法
  7. 针对星空极速拨号无法宽带共享的解决方案
  8. 单链表图解及模板总结
  9. 数字藏品、NFT平台需要办理哪些资质?拍卖经营许可证办理条件?
  10. ccf题库java_CCF题库