Open AI 自监督学习笔记：Self-Supervised Learning | Tutorial

转载自微信公众号
原文链接： https://mp.weixin.qq.com/s?__biz=Mzg4MjgxMjgyMg==&mid=2247486049&idx=1&sn=1d98375dcbb9d0d68e8733f2dd0a2d40&chksm=cf51b898f826318ead24e414144235cfd516af4abb71190aeca42b1082bd606df6973eb963f0#rd

Open AI 自监督学习笔记

文章目录

Open AI 自监督学习笔记
- Outline
- Introduction
- - What is self-supervised learning?
  - What's Possible with Self-Supervised Learning?
- Early Work
- - Early Work: Connecting the Dots
  - Restricted Boltzmann Machines
  - Autoencoder: Self-Supervised Learning for Vision in Early Days
  - Word2Vec: Self-Supervised Learning for Language
  - Autoregressive Modeling
  - Siamese Networks
  - Multiple Instance Learning & Metric Learning
- Methods
- - Methods for Framing Self-Supervised Learning Tasks
  - Self-Prediction
  - Self-prediction: Autoregressive Generation
  - Self-Prediction: Masked Generation
  - Self-Prediction: Innate Relationship Prediction
  - Self-Prediction: Hybrid Self-Prediction Models
  - Contrastive Learning
  - Contrastive Learning: Inter-Sample Classification
  - - Loss function 1: Contrastive loss
    - Loss function 2: Triplet loss
    - Loss function 3: N-pair loss
    - Loss function 4: Lifted structured loss
    - Loss function 5: Noise Contrastive Estimation (NCE)
    - Loss function 6: InfoNCE
    - Loss function 7: Soft-Nearest Neighbors Loss
  - Contrastive Learning: Feature Clustering
  - Contrastive Learning: Multiview Coding
  - Contrastive Learning between Modalities
- Pretext tasks
- - Recap: Pretext Tasks
  - Pretext Tasks: Taxonomy
  - Image / Vision Pretext Tasks
  - - Image Pretext Tasks: Varizational AutoEncoders
    - Image Pretext Tasks: Generative Adversial Networks
    - Vision Pretext Tasks: Autoregressive Image Generation
    - Vision Pretext Tasks: Diffusion Model
    - Vision Pretext Tasks: Masked Prediction
    - Vision Pretext Tasks: Colorization and More
    - Vision Pretext Tasks: Innate Relationship Prediction
    - Contrastive Predictive Coding and InfoNCE
    - Vision Pretext Tasks: Inter-Sample Classification
    - Vision Pretext Tasks: Contrastive Learning
    - Vision Pretext Tasks: Data Augmentation and Multiple Views
    - Vision Pretext Tasks: Inter-Sample Classification
    - - MoCo
      - SimCLR
      - Barlow Twins
    - Vision Pretext Tasks: Non-Contrastive Siamese Networks
    - Vision Pretext Tasks: Feature Clustering with K-Means
    - Vision Pretext Tasks: Feature Clustering with Sinkhorm-Knopp
    - Vision Pretext Tasks: Feature Clustering to improve SSL
    - Vision Pretext Tasks: Nearest-Neighbor
    - Vision Pretext Tasks: Combining with Supervised Loss
  - Video Pretext Tasks
  - - Video Pretext Tasks: Innate Relationship Prediction
    - Video Pretext Tasks: Optical Flow
    - Video Pretext Tasks: Sequence Ordering
    - Video Pretext Tasks: COlorization
    - Video Pretext Tasks: Contrastive Multi-View Learning
    - Video Pretext Task: Autoregressive Generation
  - Audio Pretext Tasks
  - - Audio Pretext Tasks: Contrastive Learning
    - Audio Pretext Task: Masked Languagee Modeling for ASR
  - Multimodal Pretext Tasks
  - Language Pretext Tasks
  - - Language Pretext Tasks: Generative Language Modeling
    - Language Pretext Tasks: Sentence Embedding
- Training Techniques
- - Techniques: Data augmentation
  - - Techniques: Data augmentation -- Image Augmentation
    - Techniques: Data augmentation -- Text Augmentation
  - Hard Negative Mining
  - - What is "hard negative mining"
    - Explicit hard negative mining
    - Implicit hard negative mining
- Theories
- - Contrastive learning captures shared information betweem views
  - The InfoMin Principle
  - Alignment and Uniformity on the Hypersphere
  - Dimensional Collapse
  - Provable Guarantees for Contrastive Learning
- Feature directions
- - Future Directions

Video: https://www.youtube.com/watch?v=7l6fttRJzeU
Slides: https://nips.cc/media/neurips-2021/Slides/21895.pdf

Self-Supervised Learning
– Self-Prediction and Contrastive Learning

Self-Supervised Learning
- a popular paradigm of representation learning

Outline

Introduction: motivation, basic concepts, examples
Early Work: Look into connection with old methods
Methods
- Self-prediction
- Contrastive Learning
- (for each subsection, present the framework and categorization)
Pretext tasks: a wide range of literature review
Techniques: improve training efficiency

Introduction

What is self-supervised learning and why we need it?

What is self-supervised learning?

Self-supervised learning (SSL):
- a special type of representation learning that enables learning good data representation from unlablled dataset
Motivation :
- the idea of constructing supervised learning tasks out of unsupervised datasets
- Why?
  
  ✅ Data labeling is expensive and thus high-quality dataset is limited
  
  ✅ Learning good representation makes it easier to transfer useful information to a variety of downstream tasks $\Rightarrow$ e.g. Few-shot learning / Zero-shot transfer to new tasks

Self-supervised learning tasks are also known as pretext tasks

What’s Possible with Self-Supervised Learning?

Video Colorization (Vondrick et al 2018)
- a self-supervised learning method
- resulting in a rich representation
- can be used for video segmentation + unlabelled visual region tracking, without extra fine-tuning
- just label the first frame
Zero-shot CLIP (Radford et al. 2021)
- Despite of not training on supervised labels
- Zero-shot CLIP classifier achieve great performance on challenging image-to-text classification tasks

Early Work

Precursors 先驱者 to recent self-supervised approaches

Early Work: Connecting the Dots

Some ideas:

Restricted Boltzmann Machines
Autoencoders
Word2Vec
Autogressive Modeling
Siamese networks
Multiple Instance / Metric Learning

Restricted Boltzmann Machines

RBM:
- a special case of markov random field
- consisting of visible units and hidden units
- has connections between any pair across visible and hidden units, but not within each group

Autoencoder: Self-Supervised Learning for Vision in Early Days

Autoencoder: a precursor to the modren self-supervised approaches
- Such as Denoising Autoencoder
Has inspired many self-learning approaches in later years
- such as masked language model (e.g. BERT), MAE

Word2Vec: Self-Supervised Learning for Language

Word Embeddings to map words to vectors
- extract the feature of words
idea:
- the sum of neighboring word embedding is predictive of the word in the middle

An interesting phenomenon resulting from word2Vec:
- you can observe linear substructures in the embedding space where the lines connecting comparable concepts such as the corresponding masculine and feminine words appear in roughly parallel lines

Autoregressive Modeling

Autoregressive model:
- Autoregressive (AR) models are a class of time series models in which the value at a given time step is modeled as a linear function of previous values
- NADE: Neural Autogressive Distribution Estimator
Autogressive model also has been a basis for many self-supervised methods such as gpt

Siamese Networks

Many contrastive self-supervised learning methods use a pair of neural networks and learned from their difference
– this idea can be tracked back to Siamese Networks

Self-organizing neural networks
- where two neural networks take seperate but related parts of the input, and learns to maximize the agreement between the two outputs
Siamese Networks
- if you believe that one network F can well encode x and get a good representation f(x)
- then, 对于两个不同的输入x1和x2，their distance can be d(x1,x2) = L(f(x1),f(x2))
- the idea of running two identical CNN on two different inputs and then comparing them —— a Siamese network
- Train by:
  
  ✅ If xi and xj are the same person, $∣∣ f (x i) - f (x j)$ is small
  
  ✅ If xi and xj are the different person, $∣∣ f (x i) - f (x j)$ is large

Multiple Instance Learning & Metric Learning

Predecessors of the predetestors of the recent contrastive learning techniques : multiple instance learning and metric learning

deviate frome the typical framework of empirical risk minimization
- define the objective function in terms of multiple samples from the dataset $\Rightarrow$ multiple instance learning
ealy work:
- around non-linear dimensionality reduction
- 如multi-dimensional scaling and locally linear embedding
- better than PCA: can preserving the local structure of data samples
metric learning:
- x and y: two samples
- A: A learnable positive semi-definite matrix
contrastive Loss:
- use a spring system to decrease the distance between the same types of inputs, and increase between different type of inputs
Triplet loss
- another way to obtain a learned metric
- defined using 3 data points
- anchor, positive and negative
- the anchor point is learned to become similar to the positive, and dissimilar to the negative
N-pair loss:
- generalized triplet loss
- recent 对比学习就以 N-pair loss 为原型

Methods

self-prediction
Contrastive learning

Methods for Framing Self-Supervised Learning Tasks

Self-prediction: Given an individual data sample, the task is to predict one part of the sample given the other part
- 即 “Intra-sample” prediction

The part to be predicted pretends to be missing

Contrastive learning: Given multiple data samples, the task is to predict the relationship among them
- relationship: can be based on inner logics within data
  
  ✅ such as different camera views of the same scene
  
  ✅ or create multiple augmented version of the same sample

The multiple samples can be selected from the dataset based on some known logics (e.g., the order of words / sentences), or fabricated by altering the original version
即 we know the true relationship between samples but pretend to not know it

Self-Prediction

Self-prediction construct prediction tasks within every individual data sample
- to predict a part of the data from the rest while pretending we don’t know that part
- The following figure: demonstrate how flexible and diverse the options we have for constructing self-prediction learning tasks
  
  ✅ can mask any dimensions
分类：
- Autoregressive generation
- Masked generation
- Innate relationship prediction
- Hybrid self-prediction

Self-prediction: Autoregressive Generation

The autoregressive model predicts future behavior based on past behavior
- Any data that comes with an innate sequential order can be modeled with regression
Examples :
- Audio (WaveNet, WaveRNN)
- Autoregressive language modeling (GPT, XLNet)
- Images in raster scan (PixelCNN, PixelRNN, iGPT)

Self-Prediction: Masked Generation

mask a random portion of information and pretend it is missiing, irrespective of the natural sequence
- The model learns to predict the missing portion given other unmasked information
e.g.,
- predicting random words based on other words in the same context around it
Examples :
- Masked language modeling (BERT)
- Images with masked patch (denoising autoencoder, context autoencoder, colorization)

Self-Prediction: Innate Relationship Prediction

Some transformation (e.g., segmentation, rotation) of one data samples should maintain the original information of follow the desired innate logic
Examples
- Order of image patches
  
  ✅ e.g., shuffle the patches
  
  ✅ e.g., relative position, jigsaw puzzle
- Image rotation
- Counting features across patches

Self-Prediction: Hybrid Self-Prediction Models

Hybrid Self-Prediction Models: Combines different type of generation modeling

VQ-VAE + AR
- Jukebox (Dhariwal et al. 2020), DALL-E (Ramesh et al. 2021)
VQ-VAE + AR + Adversial
- VQGAN (Esser & Rombach et al. 2021)
- VQ-VAE: to learn a discrete code book of context rich visual parts
- A transformer model: trained to auto-aggressively modeling the color combination of this code book

Contrastive Learning

Goal:
- To learn such an embedding space in which similar sample pairs stay close to each other while dissimilar ones are far apart
对比学习 can be applied to both supervised and unsupervised settings
- when working with unsupervised data, 对比学习 is one of the most powerful approach in the self-supervised learning
Category
- Inter-sample classification
  
  Open AI 自监督学习笔记：Self-Supervised Learning | Tutorial | NeurIPS 2021相关推荐
  1. Datacamp 笔记代码 Supervised Learning with scikit-learn 第四章 Preprocessing and pipelines
    更多原始数据文档和JupyterNotebook Github: https://github.com/JinnyR/Datacamp_DataScienceTrack_Python Datacamp ...
  2. Datacamp 笔记代码 Supervised Learning with scikit-learn 第一章 Classification
    更多原始数据文档和JupyterNotebook Github: https://github.com/JinnyR/Datacamp_DataScienceTrack_Python Datacamp ...
  3. 多示例学习（Multi Instance Learning）和弱监督学习（Weakly Supervised Learning）
    目录弱监督: 多示例学习: 弱监督: 1. 弱在缺标签:标签是不完全的,有的有标签,有的无标签 2. 弱在标签不准确:有的标签正确,有的标签错误 3. 弱在标签不精准: 标签不是在样本上,而是在更高 ...
  4. 关于弱监督学习的详细介绍——A Brief Introduction to Weakly Supervised Learning
    目录介绍主动学习半监督学习多实例学习带噪学习 Snorkel 框架介绍参考介绍在机器学习领域,学习任务可大致划分为两类,一种是监督学习,另一种是非监督学习.通常,两者都需要从包含大量训 ...
  5. A brief introduction to weakly supervised learning（简要介绍弱监督学习）
    文章转载自http://www.cnblogs.com/ariel-dreamland/p/8566348.html A brief introduction to weakly supervised ...
  6. 弱监督学习 weakly supervised learning 笔记
    周志华 A Brief Introduction to Weakly Supervised Learning 2018 引言在机器学习领域,学习任务可以划分为监督学习.非监督学习.通常,两者都需要从 ...
  7. ML之SL：监督学习(Supervised Learning)的简介、应用、经典案例之详细攻略
    ML之SL:监督学习(Supervised Learning)的简介.应用.经典案例之详细攻略目录监督学习(Supervised Learning)的简介 1.监督学习问题的两大类-分类问题和回归 ...
  8. 监督学习（supervised learning）与非监督学习（unsupervised learning）
    一,监督学习(supervised learning): 监督学习(supervised learning)的任务是学习一个模型,使模型能够对任意给定的输入,对其相应的输出做出一个好的预测. 即:利用 ...
  9. Supervised learning/ Unsupervised learning监督学习/无监督学习
    [机器学习]两种方法--监督学习和无监督学习(通俗理解) [机器学习] : 监督学习 (框架) 有监督学习与无监督学习的几大区别目录 Supervised learning 监督学习 Unsuper ...
  最新文章
  热门文章

Open AI 自监督学习笔记：Self-Supervised Learning | Tutorial | NeurIPS 2021

Open AI 自监督学习笔记

文章目录

Outline

Introduction

What is self-supervised learning?

What’s Possible with Self-Supervised Learning?

Early Work

Early Work: Connecting the Dots

Restricted Boltzmann Machines

Autoencoder: Self-Supervised Learning for Vision in Early Days

Word2Vec: Self-Supervised Learning for Language

Autoregressive Modeling

Siamese Networks

Multiple Instance Learning & Metric Learning

Methods

Methods for Framing Self-Supervised Learning Tasks

Self-Prediction

Self-prediction: Autoregressive Generation

Self-Prediction: Masked Generation

Self-Prediction: Innate Relationship Prediction

Self-Prediction: Hybrid Self-Prediction Models

Contrastive Learning

Open AI 自监督学习笔记：Self-Supervised Learning | Tutorial | NeurIPS 2021相关推荐

最新文章

热门文章