文章目录

内容简介
What is meta learning?
Why meta learning?
How and what to do meta learning?
- Categories
- - 命名大全
  - 分类
Datasets
Models
- 分类
- Black-box
- Optimization / Gradient based
- - Problems of MAML
  - MAML的其他改进方式
- Metric-based / non-parametric
- - Problems of metric-based
- Hybrid
- Bayesian meta-learning

BY：TA陈建层
除了machine learning之外
又有了新的意义：Meta Learning
本节内容太繁杂，笔记只能记录大概。

内容简介

Outline
● What is meta learning?
● Why meta learning?
● How and what to do meta learning?
● Categories
● Datasets
● Models
公式输入请参考：在线Latex公式

What is meta learning?

这块上节有讲，不啰嗦
就是learn to learn
Usually considered to achieve Few-shot learning (but not limited to)

Why meta learning?

Too many tasks to learn, to learn more efficiently
○ Faster learning methods (adaptation)
○ Better hyper-parameters / learning algorithms
○ Related to:
■ transfer learning
■ domain adaptation
■ multi-task learning
■ life-long learning
Too little data, to fit more accurately
(Better learner, fit more quickly)
○ Traditional supervised may not work

How and what to do meta learning?

Datasets

了解一下ML常用数据集：

Omniglot（omni = all, glot = language）
○ Launched by linguist Simon Ager in 1998
○ As a dataset by Lake in 2015, Science
○ Concept learning
除了真实世界的文字：

还有二次元的文字：

脑洞打开，可以把动漫中的文字做进来：
miniImageNet
○ from ImageNet but few-shot
CUB (Caltech-UCSD Birds)

Models

分类

Meta-LSTM可以看做是黄绿融合

还有一些融合方法：

下面针对这四个颜色的分类进行简单讲解。

Black-box

思想是每个任务都对应有一个 $fθf_\theta$
那把任务看做是数据，丢到RNN中，希望RNN能够预测出新任务对应的参数
也有用LSTM来做的，并加上注意力机制

Optimization / Gradient based

Learn model initialization用来学习初始化参数的模型：
● MAML (Model Agnostic Meta Learning)
● Reptile
● Meta-LSTM (can be also viewed as RNN black-box)
improvements of MAML针对MAML进行改进的模型：
● Meta-SGD
● MAML++
● AlphaMAML
● DEML
● CAVIA
different meta-parameters 学习其他参数的模型：
● iMAML
● R2-D2 / LR-D2
● ALPaCA
● MetaOptNet

Problems of MAML

● Learning rate → Meta-SGD, MAML++：每个任务的参数 $θ\theta$ 都用相同的LR是不太合适
Meta-SGD：“Adaptive learning rate” version of MAML，加入了一个参数 $α\alpha$ 来解决不同任务不同LR的问题
● Second-order derivatives (instability) → MAML++：在MAML推导过程中忽略的二次偏导导致结果不准确
● Batch Normalization → MAML++：在训练过程加入BN
以上issue导致MAML有下面问题：

Training Instability外层循环的参数梯度不稳定，容易爆炸或消失
○ Gradient issues
Second Order Derivative Cost
○ Expensive to compute
○ First-order → harmful to performance
Batch Normalization Statistics
○ No accumulation
○ Shared bias
Shared (across step and across parameter) inner loop learning rate
○ Not well scaled
Fixed outer loop learning rate

Solutions proposed

Training Instability ⇒ Multi-Step Loss Optimization (MSL)多更新几次内部循环的参数在更新外循环的参数（这不是Reptile吗？）
○ Gradient issues
Second Order Derivative Cost ⇒ Derivative-Order Annealing (DA)更新内循环参数的前几次忽略二次偏导项，后面则不忽略
○ Expensive to compute
○ First-order → harmful to performance
Batch Normalization Statistics
○ No accumulation ⇒ Per-Step Batch Normalization Running Statistics
○ Shared bias ⇒ Per-Step Batch Normalization Weights & Biases
Shared (across step and across parameter) inner loop learning rate
⇒ Learning Per-Layer Per-Step Learning Rates & Gradient Directions (LSLR)
Fixed outer loop learning rate
⇒ Cosine Annealing of Meta-Optimizer Learning Rate (CA)

MAML的其他改进方式

当然还有两种从构架上都和MAML不一样的来改进初始化参数的模型：
● Implicit gradients → iMAML
左边是原始MAML，中间是忽略了二次偏导的MAML，右边是iMAML

● Closed-form on feature extraction → R2-D2：用L2正则代替CNN中最后的FC分类。

Metric-based / non-parametric

Learn to compare!
之前的Meta Learning的思想是学一个F来选出一个合适的f，解决某个任务f的参数是 $θ^\hat\theta$

那上面的分类模型我们可以转换思想，不一定要学一个f来进行分辨testing data中是猫咪还狗狗
而是直接判断testing data与左边的猫和狗的相似度，像猫咪就归类为猫咪，模型构架就变成下面的样子。

模型函数就变成抽取特征，将数据都变成向量表示，最后用KNN、L2等来衡量相似度即可。
常见模型：
• Siamese network孪生网络

• Prototypical network：已知原型的表示，然后将数据抽取后与原型的特征向量进行比较

• Matching network：在上面的模型的基础上考虑不同分类之间的关系，用BiLSTM来存储这些关系。

• Relation network

另外两种方法：

IMP (Infinite Mixture Prototypes)
• Modified from prototypical
• The number of mixture determined from data through Bayesian nonparametric methods
GNN

Problems of metric-based

• When the K in N-way K-shot large → difficult to scale（数据量大不好分类）
• Limited to classification (only learning to compare)

Hybrid

Optimization based on model + Metric based embedding (RelationNet z)

这里用的Encoder和Decoder，训练任务通过Encoder映射为Z，然后经过Decoder还原回任务对应的参数 $θ\theta$

Bayesian meta-learning

额外讲一个贝叶斯元学习模型
PS：居然有嘻哈帝国的Lucious Lion

左边数据有三个特征那么如果数据如下图只有两个特征怎么归类：

目前解决这个Uncertainty problems问题的模型有：
Black-box:
• VERSA
Optimization:
• PLATIPUS
• Bayesian MAML (BMAML)
• Probabilistic MAML (PMAML)

李宏毅学习笔记45.Meta Learning番外相关推荐

台大李宏毅Machine Learning 2017Fall学习笔记 (13)Semi-supervised Learning
台大李宏毅Machine Learning 2017Fall学习笔记 (13)Semi-supervised Learning 本博客参考整理自: http://blog.csdn.net/xzy_t ...
台大李宏毅Machine Learning 2017Fall学习笔记 (16)Unsupervised Learning:Neighbor Embedding
台大李宏毅Machine Learning 2017Fall学习笔记 (16)Unsupervised Learning:Neighbor Embedding
台大李宏毅Machine Learning 2017Fall学习笔记 (14)Unsupervised Learning:Linear Dimension Reduction
台大李宏毅Machine Learning 2017Fall学习笔记 (14)Unsupervised Learning:Linear Dimension Reduction 本博客整理自: http ...
安卓开发学习日记第四天番外篇_用Kotlin炒冷饭——越炒越小_莫韵乐的欢乐笔记
安卓开发学习日记第四天番外篇--用Kotlin炒冷饭--越炒越小前情提要安卓开发学习日记第一天_Android Studio3.6安装安卓开发学习日记第二天_破坏陷阱卡之sync的坑安卓开发学 ...
【机器学习笔记】可解释机器学习-学习笔记 Interpretable Machine Learning (Deep Learning)
[机器学习笔记]可解释机器学习-学习笔记 Interpretable Machine Learning (Deep Learning) 目录 [机器学习笔记]可解释机器学习-学习笔记 Interpre ...
联邦学习笔记—《Communication-Efficient Learning of Deep Networks from Decentralized Data》
摘要: Modern mobile devices have access to a wealth of data suitable for learning models, which in tur ...
台湾大学林轩田机器学习基石课程学习笔记1 -- The Learning Problem
红色石头的个人网站:redstonewill.com 最近在看NTU林轩田的<机器学习基石>课程,个人感觉讲的非常好.整个基石课程分成四个部分: When Can Machine Lear ...
《机器学习基石》学习笔记 1 The Learning Problem
B站真是个神奇的地方,平时找不到的课程来B站找找,总是有惊喜的,开心. NTU林轩田的<机器学习基石>课程整个基石课程分成四个部分: When Can Machine Learn? Wh ...
论文学习笔记02（Learning phrase representations using rnn encoder-decoder for statistical machine translat）
论文学习笔记 Learning phrase representations using rnn encoder-decoder for statistical machine translation ...
机器学习李宏毅学习笔记35
文章目录前言一.Meta learning 1.第一步 2.第二步 3.第三步二.machine learning 和 meta learning区别总结前言 Meta learning元学 ...

李宏毅学习笔记45.Meta Learning番外