Fear the REAPER A System for Automatic Multi-Document Summarization with Reinforcement Learning

Cody Rioux, Sadid A. Hasan, Yllias Chali

##Abstract

Achieve the largest coverage of the docu
ments content.目标的覆盖整个文档的内容
Concentrate distributed information to hidden units layer by layer. 通过一层一层的隐藏单元集中分散的信息
the whole deep architecture is fine tuned by minimizing the information loss of reconstruction validation. 整个框架是减少重建确认时发生的信息丢失
According to the concentrated information, dynamic programming is used to seek most informative set of sentences as the summary
DP被用来计算最有信息量的集合，来作为摘要
##Relatedwork
We explore the use of SARSA which is a derivative of TD(lamada) that models the action space in addition to the state space modelled by TD(lamada). Furthermore we explore the use of an algorithm not based on temporal difference methods, but instead on policy iteration techniques
REAPER (Relatedness-focused Extractive Automatic
summary Preparation Exploiting Reinfocement learning)
以相关性为中心的抽取自动摘要准备利用强化学习
##Motivation
TD(lamada) is relatively old as far as reinforcement learning (RL)
algorithms are concerned, and the optimal ILP did not outperform ASRL using the same reward function.
强化学习有很大打提升空间
基于查询的摘要得到广泛关注
不对句子压缩的效果做进一步探讨
##Model
TD(lamada)
时间差（TD）学习是一种基于预测的机器学习方法。它主要用于强化学习问题，据说是“ 蒙特卡罗思想和动态规划（DP）思想的结合”。[1] TD类似于蒙特卡洛方法，因为它根据某种策略通过对环境进行采样来学习，并且与动态规划技术相关，因为它基于先前学习的估计来逼近其当前估计（称为自举）。TD学习算法与动物学习的时间差模型有关。[2]
temporal difference methods-wiki
Approximate Policy Iteration
近似策略迭代（API）遵循一个不同的范式，通过迭代地改进马尔可夫决策过程的策略，直到策略收敛为止。
Sarsa算法
Q算法是当选择下一步的时候会找最好的一个走（选最大Q值的）而sarsa是当选择下一步的时候运用和上一步一样/想等的Q值但是最后都会更新之前的一步从而达到学习的效果~
On-policy Sarsa算法与Off-policy Q learning对比
##Experiment
Feature Space depends on the presence of top bigrams，而不用
tf *idf words
Reward Function
based on the n-gram concurrence score metric
the longest-common-subsequence recall metric

Immediate Rewards
Query Focused Rewards

Fear the REAPER A System for Automatic Multi-Document Summarization with Reinforcement Learning相关推荐

AutoML论文笔记（十四）Automatic Data Augmentation via Deep Reinforcement Learning for Effective Kidney Tumor
文章题目:Automatic Data Augmentation via Deep Reinforcement Learning for Effective Kidney Tumor Segmenta ...
阅读《SeedNet: Automatic Seed Generation with Deep Reinforcement Learning for Robust Interactive Segme》
一:介绍这篇文章提出了一个基于强化学习的自动种子生成技术去解决交互分割问题. 交互分割问题的主要难点之一是如何在减少人工参与的同时保持被提取物体的鲁棒性和连续性. 目前大部分存在的算法都高度依赖与输 ...
【论文笔记】Adaptive Reinforcement Learning Neural Network Control for Uncertain Nonlinear System
Adaptive Reinforcement Learning Neural Network Control for Uncertain Nonlinear System With Input Sat ...
【论文笔记】End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis
写在前面 hello大家好,我是fantasy,今天起打算在这里分享自己在NLP上的所学所得,第一篇博客相当于对整篇论文的翻译,并不能算严格意义上的论文"笔记",希望之后可以越写越 ...
2018 Automatic View Planning with Multi-scale Deep Reinforcement Learning Agents具有多尺度深度的自动视图规划
目录摘要相关工作方法状态State 动作Action 奖励Reward 终端状态Terminal State 多尺度代理实验数据集训练结果条件接下来的工作 References 摘 ...
unreal报错 System.ArgumentException: Path fragment 'Document/\345\215\225\346\234\272\
Unreal 项目在使用git进行版本管理的时候如果出现了 System.ArgumentException: Path fragment '"Document/\345\215\225\3 ...
End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis翻译
摘要除了当前引起越来越多关注的对话聊天机器人或面向任务的对话系统外,我们进一步开发一种自动医学诊断的对话系统,该系统通过与患者对话,以收集超出其自我描述以外的其他症状,并自动诊断.除了对话对话系统自 ...
演化强化学习：Wuji: Automatic Online Combat Game Testing Using Evolutionary Deep Reinforcement Learning
0 摘要这篇文章的摘要没有提到很多感兴趣的东西,一句话概括就是 Wuji模型可以使用深度强化学习去进行游戏测试,是一个多任务智能体,不仅要通关游戏,还要尽可能的去探索游戏,找到游戏中的bug . 1 ...
[转]深度学习论文推荐
Startups 机器学习.深度学习.计算机视觉.大数据创业公司 - Startups in AI Deep Reinforcement Learning David Silver. "Tu ...

Fear the REAPER A System for Automatic Multi-Document Summarization with Reinforcement Learning

Fear the REAPER A System for Automatic Multi-Document Summarization with Reinforcement Learning相关推荐

最新文章

热门文章