Keeping AI Well Behaved: How Do We Engineer An Artificial System That Has Values?


In Brief

It is estimated that some 10 million self-driving cars will be on the road by the close of 2020, which raises questions about how an AI will respond in lethal situations.

In this exclusive interview, FLI researcher David Parkes discusses his work to engineer a value-aligned AI system.

Teaching AI to make Decisions

Imagine you’re sitting in a self-driving car that’s about to make a left turn into on-coming traffic. One small AI system in the car will be responsible for making the vehicle turn, one system might speed it up or hit the brakes, other systems will have sensors that detect obstacles, and yet another system may be in communication with other vehicles on the road. Each system has its own goals — starting or stopping, turning or traveling straight, recognizing potential problems, etc. — but they also have to all work together toward one common goal: turning into traffic without causing an accident.

想象一下你正坐在一辆自我驱动的无人车中,即将左转面对迎面而来的车流。车内的人工智能系统将会负责控制转向的过程,一个系统也许会加速或者是刹车,另一些系统将会通过传感器探测障碍物,还有一些系统也许会和路上的其他车辆进行通信。每一个系统都有其自己的目标 — 启动或者停止,转向或者直接行驶,分辨出潜在的问题等等。— 但是他们都需要被整合到一起朝向同一个目标:在不发生事故的情况下转换为正常的交通。

Harvard professor and Future of Life researcher, David Parkes, is trying to solve just this type of problem. Parkes told FLI, “The particular question I’m asking is: If we have a system of AIs, how can we construct rewards for individual AIs, such that the combined system is well behaved?”


Essentially, an AI within a system of AIs—like that in the car example above—needs to learn how to meet its own objective, as well as how to compromise so that it’s actions will help satisfy the group objective. On top of that, the system of AIs needs to consider the preferences of society. For example, it needs to determine if the safety of the passenger in the car or a pedestrian in the crosswalk is a higher priority than turning left.

尤其是,一个人包含很多子系统的工智能系统 — 比如上面无人车的例子中说的 — 需要学会怎样才能完成自己的目标,同时还要学会妥协以便于其子系统的行为能对完成宏观的目标提供帮助。在这些问题之上,包含多个子系统的人工智能需要将社会的价值偏好考虑进来。比如,当其需要决定在转弯的过程中究竟是车内乘客的安全更为重要还是人行道上行人的安全更重要。

Because environments like a busy street are so complicated, an engineer can’t just program an AI to act in some way to always achieve its objectives. AIs need to learn proper behavior based on a rewards system. “Each AI has a reward for its action and the action of the other AI,” Parkes explained. With the world constantly changing, the rewards have to evolve, and the AIs need to keep up not only with how their own goals change, but also with the evolving objectives of the system as a whole.


Making an Evolving AI

The idea of a rewards-based learning system is something most people can likely relate to. Who doesn’t remember the excitement of a gold star or a smiley face on a test? And any dog owner has experienced how much more likely their pet is to perform a trick when it realizes it will get a treat. A reward for an AI is similar.


A technique often used in designing artificial intelligence is reinforcement learning. With reinforcement learning, when the AI takes some action, it receives either positive or negative feedback. And it then tries to optimize its actions to receive more positive rewards. However, the reward can’t just be programmed into the AI. The AI has to interact with its environment to learn which actions will be considered good, bad, or neutral. Again, the idea is similar to a dog learning that tricks can earn it treats or praise, but misbehaving could result in punishment.

在设计人工智能系统的时候有一种常用的技术 — 强化学习。在强化学习的作用下,当人工智能采取行动的时候,其会获得积极的或者消极的反馈。而且其随后就会通过最优化其行为去获得更为积极的奖励。然而,这奖励不能直接被编程到AI的系统中。这些人工智能必须要和周围的环境进行互动,去学习哪些行为被认为是好的,坏的或者中性的。同样,这个想法和训练宠物狗的过程类似:执行某种行为获得奖励或者赞美,做错了则可能会引发惩罚。

More than this, Parkes wants to understand how to distribute rewards to subcomponents (the individual AIs) in order to achieve good system-wide behavior. How often should there be positive (or negative) reinforcement, and in reaction to which types of actions?


"Rather than programming a reward specifically into the AI, Parkes shapes the way rewards flow from the environment to the AI in order to promote desirable behaviors as the AI interacts with the world around it."“相比于将特定的奖励机制编程到人工智能,帕吉斯重塑了环境对人工智能施加奖励的途径,这样一来就可以在系统和环境互动的过程中提升目标行为的预期。”

For example, if you were to play a video game without any points or lives or levels or other indicators of success or failure, you might run around the world killing or fighting aliens and monsters, and you might eventually beat the game, but you wouldn’t know which specific actions led you to win. Instead, games are designed to provide regular feedback and reinforcement so that you know when you make progress and what steps you need to take next. To train an AI, Parkes has to determine which smaller actions will merit feedback so that the AI can move toward a larger, overarching goal.


But this is all for just one AI. How do these techniques apply to two or more AIs?


Gaming the System

Much of Parkes’ work involves game theory. Game theory helps researchers understand what types of rewards will elicit collaboration among otherwise self-interested players, or in this case, rational AIs. Once an AI figures out how to maximize its own reward, what will entice it to act in accordance with another AI?


To answer this question, Parkes turns to an economic theory called mechanism design.


Mechanism design theory is a Nobel-prize winning theory that allows researchers to determine how a system with multiple parts can achieve an overarching goal. It is a kind of “inverse game theory.” How can rules of interaction – ways to distribute rewards, for instance – be designed so individual AIs will act in favor of system-wide and societal preferences? Among other things, mechanism design theory has been applied to problems in auctions, e-commerce, regulations, environmental policy, and now, artificial intelligence.

机制设计理论是一个获得过诺奖的理论,这个理论允许研究者去决定一个多部分组成的系统去如何获取宏观的目标。这是一种“反向博弈论”。人工智能系统之间的互动规则 — 在这里是发放奖励的方法或者途径 — 如何被设计用来推动系统层面或者社会层面的偏好?在这些问题中,机制设计理论已经被应用到很多实际的领域中,电子商务,法律法规,环境政策,现在,人工智能。

The difference between Parkes’ work with AIs and mechanism design theory is that the latter requires some sort of mechanism or manager overseeing the entire system. In the case of an automated car or a drone, the AIs within have to work together to achieve group goals, without a mechanism making final decisions. As the environment changes, the external rewards will change. And as the AIs within the system realize they want to make some sort of change to maximize their rewards, they’ll have to communicate with each other, shifting the goals for the entire autonomous system.


Parkes summarized his work for FLI, saying, “The work that I’m doing as part of the FLI grant program is all about aligning incentives so that when autonomous AIs decide how to act, they act in a way that’s not only good for the AI system, but also good for society more broadly.”


Parkes is also involved with the One Hundred Year Study on Artificial Intelligence, and he explained his “research with FLI has informed a broader perspective on thinking about the role that AI can play in an urban context in the near future.” As he considers the future, he asks, “What can we see, for example, from the early trajectory of research and development on autonomous vehicles and robots in the home, about where the hard problems will be in regard to the engineering of value-aligned systems?”



