强化学习-下棋系列

整理强化学习知识，实践出真知。

本篇文章只贴代码，写了一个 game：五子棋环境，一个 player : 随机下棋，一个 WuziBoard : 棋盘可视化

效果图：

绘制棋盘：

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author:Hiuhung Wanimport turtle
from enum import Enumclass PotColor(Enum):Black = 1White = 2import time
class WuziBoard(object):def __init__(self, RowNum):turtle.speed(9)turtle.hideturtle()self.RowNum = RowNumself.halfDim = 500 /(RowNum-1)/2.0passdef drawBoard(self, ActionHis =None):turtle.screensize(400, 400, "white")turtle.title('五子棋')turtle.home()turtle.speed(0)time.sleep(5)for i in range( self.RowNum ):x = 0 - 250 + i * ( self.halfDim ) *2y = 0 -250turtle.penup()turtle.setpos(x, y)turtle.pendown()turtle.goto(x, y + 500)for i in range( self.RowNum ):x = 0 - 250y = 0 -250 + i * ( self.halfDim ) *2turtle.penup()turtle.setpos(x, y)turtle.pendown()turtle.setpos(x+500, y)if( ActionHis != None):self.drawNow( ActionHis )turtle.done()passdef action2potxy(self, action):x = 0 - 250 + action[0]*self.halfDim*2y = 0 - 250 + action[1]*self.halfDim*2return x,ydef drawNow(self, RunAction ):for potsite in RunAction:x,y = self.action2potxy( ( potsite[0], potsite[1] ) )turtle.penup()turtle.setpos(x, y)turtle.pendown()if( potsite[2] != PotColor.Black ) :turtle.dot(10,"Red")else:turtle.dot(10, "Black")if( potsite ==  RunAction[len(RunAction) - 1 ] ):if (potsite[2] != PotColor.Black):turtle.dot(20, "Red")else:turtle.dot(20, "Black")passdef drawAction(self):passpassdef main():ActionHis =[ (0,1,1), (1,1,2), (5,1,1),  ]wuziBoard = WuziBoard( 6 )wuziBoard.drawBoard( ActionHis )passif __name__ == "__main__":#test()main()

游戏代码：

class GameFivePot(object):def __init__(self):self.potCount  =0;self.AllAction    =[]self.ActionHis = []for x in range(ROW_NUM):for y in range(ECO_NUM):self.AllAction     += [(x,y)]self.AvailAction = self.AllActionself.RunAction = [[0 for col in range(ROW_NUM)] for row in range(ECO_NUM)]def getActions(self):return self.AvailActiondef getRunAction(self):      return self.RunActiondef getActionHis(self):return  self.ActionHisdef is_over(self, action, potColor):x = action[0]y = action[1]dimCount =[1,1,1,1]#���� xiang qianfor x1 in range(x+1, x+5):if(x1 >= ROW_NUM ):breakif( self.RunAction[x1][y] == potColor ):dimCount[0] +=1 else:break#- xiang houfor x1 in range(x-1, x-5, -1 ):if(x1 < 0 ):breakif( self.RunAction[x1][y] == potColor ):dimCount[0] +=1 else:breakif( dimCount[0] >= 5 ):return True,True#���� ����for y1 in range(y+1, y+5):if(y1 >= ROW_NUM ):breakif( self.RunAction[x][y1] == potColor ):dimCount[1] +=1 else:break#- ����for y1 in range(y-1, y-5, -1 ):if(y1 < 0 ):breakif( self.RunAction[x][y1] == potColor ):dimCount[1] +=1 else:breakif( dimCount[1] >= 5 ):return True,True#-��б ����for offset in range(1 ,5):x1 = x+offsety1 = y+offsetif(y1 >= ROW_NUM or x1 >= ROW_NUM  ):breakif( self.RunAction[x1][y1] == potColor ):dimCount[2] +=1 else:break#- ����for offset in range(-1, -5, -1 ):x1 = x+offsety1 = y+offset            if(y1 < 0 or x1<0):breakif( self.RunAction[x1][y1] == potColor ):dimCount[2] +=1 else:breakif( dimCount[2] >= 5 ):return True,True#-��б ���� for offset in range(1 ,5):x1 = x+offsety1 = y-offsetif(y1 < 0 or x1 >= ROW_NUM  ):breakif( self.RunAction[x1][y1] == potColor ):dimCount[3] +=1 else:break#- ���� for offset in range(-1, -5, -1 ):x1 = x+offsety1 = y-offset            if(y1 >= ROW_NUM  or x1<0 ):breakif( self.RunAction[x1][y1] == potColor ):dimCount[3] +=1 else:breakif( dimCount[3] >= 5 ):return True,Trueif( len(self.AvailAction) == 0 ):return True,Falsereturn False,Falsepass   def action( self, action,potColor ):self.potCount +=1self.ActionHis += [  ( action[0], action[1], potColor )  ]self.AllAction.remove(  action  )self.RunAction[ action[0] ][  action[1] ] =potColorisOver, isWin = self.is_over(action, potColor)return self.RunAction,  isOver, isWindef __repr__(self):return "Game step count: {}, AvailAction len: {},  ".format( self.potCount,     len(self.AvailAction) )

玩家代码：

class GamePlayer(object):def __init__(self, potColor ):self.actionHis = []self.color = potColordef getActionHis(self):return self.actionHisdef play(self, game):actions = game.getActions()action = self.choiceActions( actions )self.actionHis = self.actionHis +[action]gameInfo , isOver, isWin = game.action(action , self.color )return gameInfo , isOver, isWinpassdef choiceActions( self, actions ):action = random.choice( actions  )return actiondef __repr__(self):return "color: {}, actionHis: {},  ".format(    self.color, self.actionHis )

github 代码地址：

https://github.com/rehylas/play_chess

ps: 下一篇文章，用MCTS 相互博弈

转载于:https://www.cnblogs.com/xiaoxuebiye/p/9272364.html

强化学习-下棋系列 - 01 五子棋相关推荐

python_强化学习算法DQN_玩五子棋游戏
本文公开一个基于强化学习算法DQN的五子棋游戏自动下棋算法源码,并对思路进行讲解. 完整代码和预训练模型(Saver文件夹)地址: python_强化学习算法DQN_玩五子棋游戏一个基于CNN构成的 ...
基于强化学习开发人机对弈五子棋游戏
强化学习主要包括状态空间.价值函数.状态转移三个部分,通过状态之间的转移来得到每个状态的价值,强化学习的目标是使得总价值达到最大.注意,与监督学习不同的是,监督学习通常需要大量的样本来获得有价值的信息 ...
强化学习入门系列一VS强化学习的基本概念
文章目录强化学习的基本概念 1. 强化学习的算法步骤: 2. 强化学习和其他机器学习范式的不同 3. 强化学习的要素 a. 智能体 b. 策略函数 c. 值函数 d. 模型 4. 强化学习的环境类型 ...
网易云课堂Java进阶学习笔记系列01 -- 第3周对象容器
个人为了复习一下Java基础, 在网易云课堂上报了翁恺老师的Java语言程序设计进阶篇的课程, 主要看了其中的3. 对象容器, 6. 设计原则, 7. 抽象与接口, 8. 控制反转与MVC模式这几部分 ...
强化学习系列（一）：强化学习简介
一.强化学习是什么? 首先,我们思考一下学习本身,当一个婴儿在玩耍时可能会挥舞双手,左看右看,没有人来指导他的行为,但是他和外界直接通过了感官进行连接.感知给他传递了外界的各种信息,包括知识等.学习的 ...
强化学习从基础到进阶-案例与实践[3]：表格型方法：Sarsa、Qlearning；蒙特卡洛策略、时序差分等以及Qlearning项目实战
[强化学习原理+项目专栏]必看系列:单智能体.多智能体算法原理+项目实战.相关技巧(调参.画图等.趣味项目实现.学术应用项目实现专栏详细介绍:[强化学习原理+项目专栏]必看系列:单智能体.多智能体算 ...
深度强化学习之：PPO训练红白机1942
本篇是深度强化学习动手系列文章,自MyEncyclopedia公众号文章深度强化学习之:DQN训练超级玛丽闯关发布后收到不少关注和反馈,这一期,让我们实现目前主流深度强化学习算法PPO来打另一个红白机 ...
多智能体强化学习及其在游戏AI上的应用与展望
近年来,人工智能技术在很多领域都取得了亮眼成就,并逐步从感知智能向决策智能迈进.强化学习是实现决策智能的重要路径,而现实世界中往往存在着多智能体的交互,也催生了多智能体强化学习的发展.这篇文章主要对多 ...
强化学习之三：双臂赌博机（Two-armed Bandit）
本文是对Arthur Juliani在Medium平台发布的强化学习系列教程的个人中文翻译,该翻译是基于个人分享知识的目的进行的,欢迎交流!(This article is my personal t ...

强化学习-下棋系列 - 01 五子棋

强化学习-下棋系列 - 01 五子棋相关推荐

最新文章

热门文章