PS：首先声明是学校的作业= = 我喊它贝塔狗（原谅我不要脸），因为一直觉得阿法狗很厉害但离我很遥远，终于第一次在作业驱动下尝试写了一个能看的AI，有不错的胜率还是挺开心的

正文

对战随机agent的胜率

对战100局，记录胜/负/平与AI思考总时间（第三个是井字棋）

笔者CPU：i5-12500h，12核

测试用例

[--size SIZE] (boardsize) 棋盘大小
[--games GAMES] (number of games) 玩多少盘
[--iterations ITERATIONS] (number of iterations allowed by the agent) 提高这个会提高算度，但下的更慢
[--print-board {all,final}] debug的时候用的
[--parallel PARALLEL] 线程，我的电脑其实可以12，老师给的是8，懒得改了= =

    python main.py --games 100 --size 5 --iterations 100 --parallel 8 shapes1.txt >> results.txt   # two in a rowpython main.py --games 100 --size 10 --iterations 100 --parallel 8 shapes1.txt >> results.txt  # two in a row largepython main.py --games 100 --size 3 --iterations 1000 --parallel 8 shapes2.txt >> results.txt  # tic-tac-toepython main.py --games 100 --size 8 --iterations 1000 --parallel 8 shapes3.txt >> results.txt  # pluspython main.py --games 100 --size 8 --iterations 1000 --parallel 8 shapes4.txt >> results.txt  # circlepython main.py --games 100 --size 8 --iterations 100 --parallel 8 shapes4.txt >> results.txt   # circle fastpython main.py --games 100 --size 10 --iterations 1000 --parallel 8 shapes5.txt >> results.txt # disjoint

思路/ pseudocode

1. Get every possible move

2. Simulate games for each possible move

3. Calculate the reward for each possible move

4. Return move choice for the real game

上代码

不能直接跑，重点是思路，不过我注释的很细节了

from random_agent import RandomAgent
from game import Game
import numpy as np
import copy
import randomclass Agent:def __init__(self, iterations, id):self.iterations = iterationsself.id = iddef make_move(self, game):iter_cnt = 0rand = np.random.random()# parameters for each avaliable positionfreeposnum = len(game.board.free_positions())pos_winrate = np.zeros(freeposnum)pos_reward = np.zeros(freeposnum)pos_cnt = np.zeros(freeposnum)free_positions = game.board.free_positions()# simulation begin with creating a deep copy, which can change without affecting the otherswhile iter_cnt < self.iterations:# create a deep copyboard = copy.deepcopy(game.board)# dynamic epsilon, increased from 0(exploration) to 1(exploitation) by running timeepsilon = iter_cnt / self.iterations# exploration & exploitationif rand > epsilon:                #pointer = game.board.random_free()pointer = random.randrange(0, len(free_positions))else: pointer = np.argmax(pos_winrate)    # make the move in the deepcopy and deduce the game by using random agentsfinalmove = free_positions[pointer]board.place(finalmove, self.id)# attention here, it should be agent no.2 to take the next movedeepcopy_players = [RandomAgent(2), RandomAgent(1)]deepcopy_game = game.from_board(board, game.objectives, deepcopy_players, game.print_board)if deepcopy_game.victory(finalmove, self.id):winner = selfelse:winner = deepcopy_game.play()# give rewards by outcomesif winner:if winner.id == 1:pos_reward[pointer] += 1 else:pos_reward[pointer] -= 1else:pos_reward[pointer] += 0# visit times + 1pos_cnt[pointer] += 1# calculate the winrate of each positionpos_winrate[pointer] = pos_reward[pointer] / pos_cnt[pointer]# next iterationiter_cnt += 1# back to real match with a postion with the highest winratehighest_winrate_pos = np.argmax(pos_winrate)# take the shotfinalmove = free_positions[highest_winrate_pos]return finalmovedef __str__(self):return f'Player {self.id} (betago agent)'

PSS: 其实我也比较懒，没有把测试用例都截图po上来，但时间精力确实有限，比如现在还有别的作业没写完= =

只希望还是能帮到人吧（笑

bandit agent下棋AI（python编写）通过强化学习RL 使用numpy相关推荐

赠书 | 干货！用 Python 动手学强化学习
01 了解强化学习新闻报道中很少将强化学习与机器学习.深度学习.人工智能这些关键词区分开来,所以我们要先介绍什么是强化学习,再讲解其基本机制. 强化学习与机器学习.人工智能这些关键词之间的关系: ...
使用Python实现基于强化学习与游戏化学习典型算法
作者:禅与计算机程序设计艺术随着现代社会和互联网的快速发展,基于网络.移动终端等新型信息技术的应用也越来越多,为人类提供了无限可能.同时,由于计算机科学和互联网技术的飞速发展,计算机已逐渐成为人类社 ...
【干货】AI顶会NeurlPS-2019强化学习方向论文速递（附链接）：Github持续更新中...
点击上方蓝色字体,关注:决策智能与机器学习,每天学点AI干货前言 AI自媒体深度强化学习实验室对NeurIPS(前称NIPS)2019年深度强化学习方向的论文做了较为全面的整理和解读,并发布于Git ...
强化学习RL学习笔记2-概述（2）
强化学习笔记专栏传送上一篇:强化学习RL学习笔记1-概述(1) 下一篇:强化学习RL学习笔记3-gym了解与coding实践目录强化学习笔记专栏传送前言 Major Components of ...
主要内容: 本文提出了一种基于(ppo)的微电网最优调度方法。该方法采用强化学习(RL)来学习调度策略，并积累相应的调度知识。同时，引入ppo模型，将微电网调度策略动作从离散动作空间扩展到连续动作
MATLAB代码:微电网强化学习关键词:微电网强化学习 RL Reinforcement Learning 参考文档:<Optimal Scheduling of Microgrid Ba ...
python原理_强化学习：原理与Python实现
强化学习:原理与Python实现作者:肖智清著出版日期:2019年08月文件大小:17.18M 支持设备: ￥60.00 适用客户端: 言商书局 iPad/iPhone客户端:下载 Andro ...
Python手写强化学习Q-learning算法玩井字棋
Q-learning 是强化学习中的一种常见的算法,近年来由于深度学习革命而取得了很大的成功.本教程不会解释什么是深度 Q-learning,但我们将通过 Q-learning 算法来使得代理学习如何 ...
时空AI技术：深度强化学习在智能城市领域应时空AI技术：深度强化学习在智能城市领域应用介绍...
来源:海豚数据科学实验室作者:京东科技时空AI团队深度强化学习是近年来热起来的一项技术.深度强化学习的控制与决策流程必须包含状态,动作,奖励是三要素.在建模过程中,智能体根据环境的当前状态信息输 ...
时空AI技术：深度强化学习在智能城市领域应用介绍
深度强化学习是近年来热起来的一项技术.深度强化学习的控制与决策流程必须包含状态,动作,奖励是三要素.在建模过程中,智能体根据环境的当前状态信息输出动作作用于环境,然后接收到下一时刻状态信息和奖励.以众 ...

bandit agent下棋AI（python编写）通过强化学习RL 使用numpy

正文

bandit agent下棋AI（python编写）通过强化学习RL 使用numpy相关推荐

最新文章

热门文章

bandit agent下棋AI（python编写） 通过强化学习RL 使用numpy

正文

bandit agent下棋AI（python编写） 通过强化学习RL 使用numpy相关推荐

最新文章

热门文章

bandit agent下棋AI（python编写）通过强化学习RL 使用numpy

bandit agent下棋AI（python编写）通过强化学习RL 使用numpy相关推荐