一.论文信息

题目： Search-Based Testing Approach for DeepReinforcement Learning Agents.【基于搜索的深度强化学习智能体测试方法】

发表年份： 2022

期刊/会议： arkiv

论文链接： http://arxiv.org/abs/2206.07813

作者信息： Amirhossein Zolfagharian, Manel Abdellatif, Lionel Briand, Mojtaba Bagherzadeh and Ramesh S

二.论文结构

1.Introduction
2.Background2.1 Deﬁnitions2.2 State Abstraction
3.Problem Definition3.1 RL Agent Testing Challenges3.2 Assumptions
4.Approach4.1 Reformulation as a Search Problem（重新表述为一个搜索问题）4.2 Overview of the Approach（方法概括）4.3 Initial Population（初始化种群）4.4 Fitness Computations（健康度的计算）4.5 Search Operators（搜索算符）4.6 Execution of Final Results（执行最终结果）
5.Empirical Evaluation（经验评估）5.1 Research Questions（提出的研究问题）5.2 Case Study（案例研究）5.3 Implementation（实现）5.4 Evaluation and Results（效果和评价）
6.Discussions
7.Threats to Validity（威胁的有效性）
8.Related Work
9.Conclusion

三.论文内容

Abstract

Deep Reinforcement Learning (DRL) algorithms have been increasingly employed during the last decade to solve various decision-making problems such as autonomous driving and robotics. However, these algorithms have faced great challenges when deployed in safety-critical environments since they often exhibit erroneous behaviors that can lead to potentially critical errors.

One way to assess the safety of DRL agents is to test them to detect possible faults leading to critical failures during their execution. This raises the question of how we can efficiently test DRL policies to ensure their correctness and adherence to safety requirements.

Most existing works on testing DRL agents use adversarial attacks that perturb states or actions of the agent. However, such attacks often lead to unrealistic states of the environment. Their main goal is to test the robustness of DRL agents rather than testing the compliance of agents’ policies with respect to requirements.

Due to the huge state space of DRL environments, the high cost of test execution, and the black-box nature of DRL algorithms, the exhaustive testing of DRL agents is impossible. In this paper, we propose a Search-based Testing Approach of Reinforcement Learning Agents (STARLA) to test the policy of a DRL agent by effectively searching for failing executions of the agent within a limited testing budget. We use machine learning models and a dedicated genetic algorithm to narrow the search towards faulty episodes.

We apply STARLA on a Deep-Q-Learning agent which is widely used as a benchmark and show that it significantly outperforms Random Testing by detecting more faults related to the agent’s policy. We also investigate how to extract rules that characterize faulty episodes of the DRL agent using our search results. Such rules can be used to understand the conditions under which the agent fails and thus assess its deployment risks.

摘要

在过去十年中(during the last decade)，深度强化学习（DRL）算法被越来越多地用于解决各种决策问题(solve various decision-making problems)，如自动驾驶、交易决策和机器人技术。然而，这些算法在安全关键环境中部署时面临着巨大的挑战，因为它们经常表现出错误的行为(exhibit erroneous behaviors)，可能导致潜在的关键错误。

评估DRL智能体安全性(assess the safety of DRL agents)的方法之一是对其进行测试，以检测在执行过程中可能导致关键故障的故障。这就提出了一个问题(this raises the question of)，即我们如何有效地测试DRL策略，以确保它们的正确性和符合安全需求(adherence to safety requirements)。

大多数现有的测试(most existing works on)DRL智能体的工作使用干扰智能体状态或动作(perturb states or actions)的对抗性攻击。然而，这种攻击往往会导致环境的不现实状态(lead to unrealistic states of the environment)。此外，他们的主要目标是测试DRL智能体的鲁棒性(test the robustness of DRL agents)，而不是测试智能体的策略与需求的合规性(testing the compliance of agents’ policies with respect to requirements)。

由于深度强化学习环境的巨大状态空间(the huge state space of DRL environments)、测试执行成本高(the high cost of test execution)以及深度强化学习算法的黑箱特性(the black-box nature of DRL algorithms)，无法对深度强化学习代理进行穷举测试。本文提出一种基于搜索的强化学习智能体测试方法(STARLA)，通过在有限的测试预算(within a limited testing budget)中有效搜索智能体执行失败的策略来测试DRL智能体的策略。依靠机器学习模型和专用遗传算法(a dedicated genetic algorithm)将搜索范围缩小到错误情节(即DRL智能体产生的状态和动作序列)(faulty episodes)。将STARLA应用于一个广泛使用的深度q学习智能体上，作为基准，表明它通过检测更多与智能体策略相关的错误，明显优于随机测试。

我们还研究了如何使用搜索结果提取描述DRL智能体错误情节的规则。这些规则可用于了解智能体失败的条件，从而评估部署它的风险(assess the risks of deploying it)。

【论文阅读】Search-Based Testing Approach for Deep Reinforcement Learning Agents相关推荐

强化学习泛化性综述论文阅读 A SURVEY OF GENERALISATION IN DEEP REINFORCEMENT LEARNING
强化学习泛化性综述论文阅读摘要一.介绍二.相关工作:强化学习子领域的survey 三.强化学习中的泛化的形式 3.1 监督学习中泛化性 3.2 强化学习泛化性背景 3.3 上下文马尔可夫决策过 ...
论文笔记之：Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning <Computer Science>, 2013 Abstract: 本文提出了一种深度学习方 ...
2018 Automatic View Planning with Multi-scale Deep Reinforcement Learning Agents具有多尺度深度的自动视图规划
目录摘要相关工作方法状态State 动作Action 奖励Reward 终端状态Terminal State 多尺度代理实验数据集训练结果条件接下来的工作 References 摘 ...
【深度强化学习】【论文阅读】【双臂模仿】Deep Imitation Learning for BimanualRobotic Manipulation
title: Deep Imitation Learning for BimanualRobotic Manipulation date: 2023-01-15T20:54:56Z lastmod: ...
【论文翻译】Distort-and-Recover: Color Enhancement using Deep Reinforcement Learning
ABSTRACT: 本文提出--人类修图是按照步骤逐渐进行的序列,于是用MDP建模,训练agent得到一个最优的动作序列.此外,我们提出了一种"失真-复原"训练方案,只需要高质量图 ...
演化强化学习：Wuji: Automatic Online Combat Game Testing Using Evolutionary Deep Reinforcement Learning
0 摘要这篇文章的摘要没有提到很多感兴趣的东西,一句话概括就是 Wuji模型可以使用深度强化学习去进行游戏测试,是一个多任务智能体,不仅要通关游戏,还要尽可能的去探索游戏,找到游戏中的bug . 1 ...
代码实现 Human-level control through deep reinforcement learning
代码实现 Human-level control through deep reinforcement learning 提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档前言使用D ...
Zero-shot Learning零样本学习论文阅读（五）——DeViSE:A Deep Visual-Semantic Embedding Model
Zero-shot Learning零样本学习论文阅读(五)--DeViSE:A Deep Visual-Semantic Embedding Model 背景 Skip-gram 算法算法思路 ...
【论文阅读】DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning
[论文阅读]DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning 1 本文解决了什么问题? 斗地主是一个非常具有 ...

【论文阅读】Search-Based Testing Approach for Deep Reinforcement Learning Agents

文章目录

一.论文信息

二.论文结构

三.论文内容

Abstract

摘要

【论文阅读】Search-Based Testing Approach for Deep Reinforcement Learning Agents相关推荐

最新文章

热门文章