诸神缄默不语-个人CSDN博文目录

论文名称：Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
ArXiv下载地址：https://arxiv.org/abs/2107.13586
ACM Computing Surveys官方下载地址：https://dl.acm.org/doi/abs/10.1145/3560815
（我看的是ArXiv版）

官网：Pretrain Language Models
（原文中的meta analysis部分我就不截图了）

这篇prompt综述在NLP领域应该挺出名的，我记得当年好几个微信公众号争相报道啊。
作者来自卡耐基梅隆大学。
最近发现2022年ACM Computing Surveys刚接收了这篇综述，看了一下页数比ArXiv版的还少，我就还是继续用ArXiv版的来写笔记了。

本文介绍了prompt的统一定义和当前所用的方法。

文章目录

1. 什么是prompt-based learning
- 1.1 Prompt Addition
- 1.2 Answer Search
- 1.3 Answer Mapping
2. NLP学习范式的变迁
3. Design Considerations for Prompting
- 3.1 Pre-trained Model Choice
- 3.2 Prompt Engineering
- 3.3 Answer Engineering
- 3.4 Multi-Prompt Learning
- 3.5 Training Strategies for Prompting Methods / Prompt-based Training Strategies
4. 应用
5. Prompt-relevant Topics
6. Challenges
7. 其他本文撰写过程中使用过的网络资料

1. 什么是prompt-based learning

传统有监督学习根据输入xxx预测输出yyy的概率P(y∣x;θ)P(y|x;\theta)P(y∣x;θ)
（θ\thetaθ是模型参数）
label
label set

prompt-based learning直接基于预训练语言模型建模文本概率：
输入xxx
template/prompting function（有2个slot，一个填输入，一个用来输出结果）
用template将xxx处理为textual string prompt x′x'x′（将xxx填进template）（包含了一些unfilled slots）
用语言模型根据概率填补unfilled slots，得到final string x^\hat{x}x^
通过x^\hat{x}x^得到最终输出yyy

示例：

推文：I missed the bus today.
预测情感的话就在后面加上：I felt so ____
翻译的x′x'x′：English: I missed the bus today. French: _____

术语列表：

子任务示例：

优势：可以直接应用于小样本甚至零样本学习上

1.1 Prompt Addition

slot在template中间叫cloze prompt，在尾部叫prefix prompt
template不一定要是自然语言tokens，也可以是假词（也能嵌入到连续向量）或者直接就是连续向量
slots数不固定

1.2 Answer Search

找到得分最高的z^\hat{z}z^

ZZZ：zzz的取值范围

基于

计算

argmax search或sampling

1.3 Answer Mapping

将z^\hat{z}z^转换为y^\hat{y}y^

2. NLP学习范式的变迁

Fully supervised learning：传统机器学习范式
为了向模型提供合适的inductive bias，早期NLP模型依赖特征工程，神经网络出现后依赖architecture engineering。
在这一阶段出现了少量预训练模型（如word2vec和GloVe），但只占模型参数的一小部分。
pre-train and fine-tune
预训练固定结构的模型（语言模型LM），用以预测未观测到的文本数据的结果。
依赖objective engineering。
不利于探索模型架构：1. 无监督预训练使structural priors选择范围小。2. 测试不同结构的预训练代价太高。
pre-train, prompt, and predict
通过引入文本prompt，下游任务与预训练模型更相似。可以直接不训练。
依赖prompt engineering。

3. Design Considerations for Prompting

3.1 Pre-trained Model Choice

对本文预训练模型介绍部分的笔记放在了另一篇博文中：预训练语言模型概述（持续更新ing…）

训练目标的选择取决于对特定prompting任务的适配，如left-to-right AR LMs适用于prefix prompts，reconstruction目标适用于cloze prompts。标准LM和FTR目标更适宜于文本生成任务。

prefix LM和encoder-decoder架构自然适用于文本生成任务，但也可以根据prompt修改得适用于其他任务。

3.2 Prompt Engineering

prompt template engineering→首先选择prompt shape，接下来考虑用manual or automated的方式

Prompt Shape
cloze prompts VS. prefix prompts
Manual Template Engineering
Automated Template Learning
1. discrete prompts / hard prompts：文本（其实这一部分总容易让我联想到传统NLG使用模板/规则的方法，本文参考文献里还真的有Re3Sum¹，但是似乎在正文中没有引用过）
  1. Prompt Mining：从语料库中挖掘
  2. Prompt Paraphrasing：复述已有的seed prompt
  3. Gradient-based Search
  4. Prompt Generation：直接视作文本生成任务
  5. Prompt Scoring
2. continuous prompts / soft prompts：LM嵌入域的向量
  1. Prefix Tuning
    MϕM_\phiMϕ：可训练的prefix matrix
    θ\thetaθ：fixed pre-trained LM参数
    
    如时间步在prefix内，直接从MϕM_\phiMϕ中复制；否则用预训练模型计算。
    （后文具体介绍有些没看懂，略）
  2. Tuning Initialized with Discrete Prompts
  3. Hard-Soft Prompt Hybrid Tuning
3. static
4. dynamic

3.3 Answer Engineering

包括对ZZZ和mapping function的设计

answer shape：粒度
1. tokens
2. spans：常用于 cloze prompts
3. sentence：常用于 prefix prompts
answer design method
1. Manual Design
  1. Unconstrained Spaces：所有可选填入项，往往直接将answer zzz匹配到yyy
  2. Constrained Spaces
2. Discrete Answer Search
  1. Answer Paraphrasing：初始化 answer space Z′\mathcal{Z}'Z′（后面的没看懂）
  2. Prune-then-Search
    y→zy→zy→z：verbalizer（后面的没看懂）
  3. Label Decomposition：关系抽取
    关系：
    分解后的标签：
    answer span的概率是每个token概率的总和
3. Continuous Answer Search：略

3.4 Multi-Prompt Learning

Prompt Ensembling：连续prompts可能是通过不同初始化或随机种子学到的
1. Uniform averaging
2. Weighted averaging
3. Majority voting
4. Knowledge distillation
5. Prompt ensembling for text generation：逐token ensemble：
  
  ²
Prompt Augmentation / demonstration learning：细节略
提供answered prompts来类比（学习重复的模式）
1. Sample Selection
2. Sample Ordering
Prompt Composition
Prompt Decomposition

3.5 Training Strategies for Prompting Methods / Prompt-based Training Strategies

Training Settings
不用训练：zero-shot setting（非真，详细略）
full-data learning
few-shot learning
Parameter Update Methods
1. Promptless Fine-tuning：pre-train and fine-tune strategy
  问题是容易过拟合或不鲁棒，容易灾难性遗忘
2. Tuning-free Prompting
  可以用answered prompts增强输入：in-context learning
3. Fixed-LM Prompt Tuning：缺点略
4. Fixed-prompt LM Tuning
  具体细节略
  null prompt
5. Prompt+LM Tuning：优缺点略

4. 应用

具体的论文列表略。

Knowledge Probing
1. Factual Probing / fact retrieval：计算预训练模型的表征包含多少事实知识，关注对模板的学习
2. Linguistic Probing
Classification-based Tasks：如以slot filling的形式实现
1. Text Classification：常用cloze prompts, prompt engineering + answer engineering, few-shot, fixed-prompt LM Tuning
2. Natural Language Inference (NLI)：常用cloze prompts，prompt engineering关注少样本学习场景下的template search。answer spaces常从词表中手动提前选好。
Information Extraction：细节略
1. Relation Extraction
2. Semantic Parsing
3. Named Entity Recognition (NER)
“Reasoning” in NLP：细节略
1. Commonsense Reasoning
2. Mathematical Reasoning
Mathematical Reasoning
extractive QA
multiple-choice QA
free-form QA
Text Generation：其他细节略
prefix prompts + AR预训练语言模型：文本摘要、机器翻译
in-context learning
fixed-LM prompt tuning：data-to-text generation
Automatic Evaluation of Text Generation：建模成文本生成任务（套娃是吧）
Multi-modal Learning
Meta-Applications
1. Domain Adaptation（感觉看起来有点像文本风格迁移，所以文本风格迁移应该也有用prompt来做的工作吧？）
2. Debiasing
3. Dataset Construction

数据集：

5. Prompt-relevant Topics

Ensemble Learning VS. prompt ensembling
Few-shot Learning
Prompt augmentation / priming-based few-shot learning
Larger-context Learning
Query Reformulation
QA-based Task Formulation
Controlled Generation
Supervised Attention
Data Augmentation

6. Challenges

Prompt Design
1. Tasks beyond Classification and Generation
2. Prompting with Structured Information
3. Entanglement of Template and Answer
Answer Engineering
1. Many-class and Long-answer Classification Tasks
2. Multiple Answers for Generation Tasks
Selection of Tuning Strategy
Multiple Prompt Learning
1. Prompt Ensembling
2. Prompt Composition and Decomposition
3. Prompt Augmentation
4. Prompt Sharing
Selection of Pre-trained Models
Theoretical and Empirical Analysis of Prompting
Transferability of Prompts
Combination of Different Paradigms
Calibration of Prompting Methods
概率预测？这部分没看懂这个术语实际上是什么意思？指的是一种模型对某方面的预测倾向，通过一些方式来进行修正吗？

7. 其他本文撰写过程中使用过的网络资料

【论文笔记】Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in NLP_烫烫烫烫的若愚的博客-CSDN博客：这一篇是比较简单的笔记
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in NLP - YouTube：这个视频看了一分半，感觉看起来就是简单的介绍啊，就没继续往下看了
[综述]鹏飞大神的Pre-train, Prompt, and Predict [1] - 知乎：这篇在介绍本文的基础上还给出了一些拓展资料，可资参考
Fine-tune之后的NLP新范式：Prompt越来越火，CMU华人博士后出了篇综述文章
近代自然语言处理技术发展的“第四范式” - 知乎：作者的官方博文，更简单、概括性，文笔有趣，建议阅读。评论感觉可以挖一挖，有很多挺高深的讨论
thunlp/PromptPapers: Must-read papers on prompt-based tuning for pre-trained language models.
Natural Language Inference 学习笔记 | 如果没有人看着我：这一篇感觉对入门了解NLI现状很有意义，介绍了一些经典工作
AllenNLP系列文章之六：Textual Entailment（自然语言推理－文本蕴含）_sparkexpert的博客-CSDN博客：这篇是介绍AllenNLP中NLI功能的实现的
Natural language inference | NLP-progress：这是对NLI最新进展进行整理的网站
KBQA简介_洲洲_starry的博客-CSDN博客_kbqa：这篇的介绍比较难懂，感觉需要一点专业知识才能看懂
元分析 - 维基百科，自由的百科全书
What Does TLDR Mean? Understanding the Internet Shorthand：事实上就是Too Long Didn’t Read
Semantic parsing - Wikipedia
文本生成2：Data-to-text Generation with Entity Modeling - 知乎：具体内容还没有看，只看了一下这个任务是啥。感觉就是看表说话，像是雅思小作文~~

Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization ↩︎
这种一个冒号一个等号的符号是什么意思？ - 知乎：定义为的意思 ↩︎

Re33：读论文 Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Languag相关推荐

prompt综述论文阅读：Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural La
prompt综述论文阅读:Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Lan ...
Pre-train, Prompt, and Predict A Systematic Survey of Prompting Methods in Natural Language Process
这是Prompt系列文章的第一篇,对<Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Na ...
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Process
摘要这篇文章是对最近比较火的"prompt"的一个总结,一篇非常好的综述,做NLP方向建议都要读一读:) 本文将这种方法称之为"prompt-based learnin ...
文献阅读 ——— Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in NLP
文章名称 Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Pr ...
【论文笔记】Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in NLP
摘要 Unlike traditional supervised learning, which trains a model to take in an input x and predict an ...
Re6：读论文 LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Identification fro
诸神缄默不语-个人CSDN博文目录论文名称:LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Ide ...
Re7：读论文 FLA/MLAC/FactLaw Learning to Predict Charges for Criminal Cases with Legal Basis
诸神缄默不语-个人CSDN博文目录论文名称:Learning to Predict Charges for Criminal Cases with Legal Basis 论文ArXiv网址:htt ...
读论文3：SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers
标题读论文3:SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers 标题 Abstract:[读论文1](h ...
Re28：读论文 CECP Charge Prediction by Constitutive Elements Matching of Crimes
诸神缄默不语-个人CSDN博文目录论文名称:Charge Prediction by Constitutive Elements Matching of Crimes IJCAI官方下载地址:htt ...

Re33：读论文 Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Languag