MCTS人工智能围棋

Simulation_once

Simulation_once 为Rollout函数，快速走子策略完成一场棋局，并返回结果。result_0 = 1表示胜利，反之表示失败,其中step表示第step轮放置棋子（i,j)，winner_piece_type代表要判断的胜利者的类型

快速走子策略指的是随机落子策略

def Simulation_once(go,piece_type,i,j,step,winner_piece_type): return result_0    #返回模拟结果值

step function

步数的函数一共有四个。

def read_step(path="step_storage.txt"):#从外部文件读取步数

def add_step(path="step_storage.txt"):#增加步数

def write_step(step,path="step_storage.txt"):#写入步数

def del_txt(path="step_storage.txt"): #清除存储文件

当检测到我们作为黑棋或者白棋第一次落子时，我们会调用write_step()函数写入当前的步数，然后在调用MCTS函数之前，我们会读取步数并作为MCTS函数的参数。并会在输出action之后，步数+2。然后判断 if step > 24。如果步数大于24步，调用del_txt()删除存储步数的文件。

class Go

对 class Go():内部的一些函数做了一些修改，并添加了一些新的函数。大部分和host.py内部的函数类似，这里不再赘述。

Time limit

if step > 20:time_limit = 9.5else:time_limit = 3

这部分函数放在if __name__ == "__main__":中，其中前六步的时间不超过1s，然后这里规定time_limit在第七步至第二十步耗时9.5s，二十步以后耗时3s。当然如果要与random plarer对战，则可将时间缩短为5s。

class MCTS_player

MCT_player 主要函数一共有两个，分别是def sub_process() def main_process

class MCTS_player():def __init__(self):self.type = 'MCTS_player'def MCT_Best_Choice(self, go, piece_type,step,time_start,time_limit): #MCTS 算法def sub_process(self, go, piece_type,step,time_start,time_limit):     #MCTS算法子进程def main_process(self, go, piece_type,step,time_start,time_limit):    #MCTS算法主进程

MCT_Best_Choice()这个函数在后面策略选择过程中没有用上，之前编写代码的遗留版本。

函数def sub_process() def main_process的内容完全一致，是为我们后面实现python多进程运算分成的两个子函数。我们简单介绍一下内部内容即可。

传递的参数：go：前面定义的棋盘类；piece_type：我方当前棋子类型；step：当前我方落子的步数

time_start：调用函数的开始时间；time_limit：限制时间

信息存储：创建总列表来存储回合列表：name_dictionary 创建回合列表存储某一回合的节点信息：backtracking_0 ~ backtracking_24是二十四个列表，来存储每一回合的落点采用一个列表存储节点的信息，[父节点在上一回合的位置，（当前节点），当前节点Rollout胜利次数，当前节点Rollout总次数，UCT价值 ]

UCB公式：

$$
value = \frac{Q(v^,)}{N(v^,)}+c\sqrt{\frac{lnN(v)}{N(v^,)}}
$$

其中v'表示当前树节点，v表示父节点，Q表示这个树节点的Rollout胜利次数，N表示这个树节点的Rollout总次数，C是一个常量参数（可以控制exploitation和exploration权重。通过对c的不同选取，我们可以使搜索树偏向深度搜索或者广度搜索。

算法实现：MCTS的算法分为四步: Selection：就是在树中找到一个最好的值得探索的节点，一般策略是先选择未被探索的子节点，如果都探索过就选择UCB值最大的子节点。主要步骤如下：

for i in range(length_possible_placement):name_dictionary[1][i][4] = name_dictionary[1][i][2] / name_dictionary[1][i][3] + math.sqrt(2)/2*math.sqrt(math.log(name_dictionary[0][0][3],variable_e)/ name_dictionary[1][i][3])#计算第一级子节点的价值max_value = name_dictionary[1][i][4]   #先选择第一级最后一个子节点的价值作为最大价值，真正的最大价值节点会在后面判断length_possible_placement = len(name_dictionary[1])       for i in range(length_possible_placement):#找出第一级子节点真正的最大价值if name_dictionary[1][i][4] >= max_value:max_value  = name_dictionary[1][i][4]   father_node_locations = []     #创建一个空列表来储存本级价值最大的结点             for i in range(length_possible_placement):if name_dictionary[n][i][4] == max_value:father_node_locations.append(i)  #将最大价值的结点存储进 最大价值列表 father_node_locations        father_node_location = random.choice(father_node_locations)  #在最大价值列表中随机选取一个结点作为下一级的父节点

Expansion：就是在前面选中的子节点中走一步创建一个新的子节点，一般策略是随机自行一个操作并且这个操作不能与前面的子节点重复。

Simulation：就是在前面新Expansion出来的节点开始模拟游戏，直到到达游戏结束状态，这样可以收到到这个expansion出来的节点的得分是多少。这个其实就相当于前面的simulation_once函数

Backpropagation：就是把前面expansion出来的节点得分反馈到前面所有父节点中，更新这些节点的quality value和visit times，方便后面计算UCB值。主要代码如下：

for i in range(n): #反向传播，更新父节点的模拟次数以及胜利次数if father_node_location == -1:breakname_dictionary[n-1-i][father_node_location][2] += result_0name_dictionary[n-1-i][father_node_location][3] += 1father_node_location = name_dictionary[n-1-i][father_node_location][0]father_node_location = father_node_location_1    name_dictionary[n][0][4] = 1 - name_dictionary[n][0][2] / name_dictionary[n][0][3] + math.sqrt(2)/2*math.sqrt(math.log(name_dictionary[n-1][father_node_location][3],variable_e)/ name_dictionary[n][0][3])#更新该节点价值    #更新上面那个随机可行落子的价值for i in range(n): #反向传播，更新父节点价值if n-i-2 < 0:breakif (n-1-i) % 2 == 1:name_dictionary[n-1-i][father_node_location][4] = name_dictionary[n-1-i][father_node_location][2] / name_dictionary[n-1-i][father_node_location][3] + math.sqrt(2)/2*math.sqrt(math.log(name_dictionary[n-i-2][name_dictionary[n-1-i][father_node_location][0]][3],variable_e)/ name_dictionary[n-1-i][father_node_location][3])else:name_dictionary[n-1-i][father_node_location][4] = 1 - name_dictionary[n-1-i][father_node_location][2] / name_dictionary[n-1-i][father_node_location][3] + math.sqrt(2)/2*math.sqrt(math.log(name_dictionary[n-i-2][name_dictionary[n-1-i][father_node_location][0]][3],variable_e)/ name_dictionary[n-1-i][father_node_location][3])  father_node_location = name_dictionary[n-1-i][father_node_location][0]father_node_location = father_node_location_1

在代码中我是选择的双循环的方法来实现这一过程，因为第一回合节点具有特殊性，所以我把第一回合节点的选择放在了第一个while循环之下。在第二个while循环下循环进行对手落子和我方落子。

多进程 multi process

def MCT_multi_process( go:GO,piece_type,step,time_start,cpu_cores,time_limit):  #多进程计算import multiprocessingpool = multiprocessing.Pool(processes = (cpu_cores))results = []go_copy = deepcopy(go)for i in range(cpu_cores):result_nodes = pool.apply_async( MCTS_player.sub_process ,args=(MCTS_player,go_copy, piece_type,step,time_start,time_limit))results.append(result_nodes)pool.close()result_1 = MCTS_player.main_process(MCTS_player,go, piece_type,step,time_start,time_limit)results.append(result_1)return results

由于MCTS需要很长的运算时间，所以我采用多进程的方法来提高运算精确性。分别创建MCT主进程main_process和MCT子进程sub_process 通过MCT_multi_process函数实现多进程运算，并将结果返回。这里返回的结果是一个列表，列表里面包含了每个进程最后的运算结果。这些运算结果也用列表存储，其实就是main_process()内部的backtracking_0 []列表。

Best_option function

def Best_option(cpu_cores,step,go:GO,piece_type,time_start,time_limit):if step == 1: #如果棋盘全空，即我方执黑子走第一步棋时，返回一个坐标点if step == 2 :#第二步白子策略
if step == 3:#第三步黑子策略if step == 4:#第四步白子策略if step == 5:#第五步黑子策略if step == 6:#第六步白子策略

由于MCTS在前期的运行效果较差，为了提高棋力，在函数中规定了前六步的落子。具体落子规则在代码中。

 def Best_option(cpu_cores,step,go:GO,piece_type,time_start,time_limit):results = MCT_multi_process(go,piece_type,step,time_start,cpu_cores,time_limit)if results[cpu_cores] == 'PASS':return "PASS"new_results = []core_results = []simulation_times = 0for i in range(cpu_cores):core_results = results[i].get()   new_results.append(core_results)new_results.append(results[cpu_cores])

这部分代码的作用是将主进程和子进程的运行结果导出来，导入到一个新的列表new_results中

 for i in range(length_options):  for j in range(cpu_cores + 1):first_node = new_results[j][i][1]result_2 = result_2 + new_results[j][i][2]result_3 = result_3 + new_results[j][i][3]single_point = [father_node_location,first_node,result_2,result_3,0]final_results.append(single_point)result_2 = 0result_3 = 0for i in range(length_options):simulation_times += final_results[i][3]max_value = final_results[i][3]for i in range(length_options):if final_results[i][3] >= max_value:max_value = final_results[i][3]for i in range(length_options):if final_results[i][3] == max_value:Best_options.append(i)final_option_i = random.choice(Best_options)final_option_location = final_results[final_option_i][1]

选取rollout次数最多的落点，并且作为最终落点

 if step == 22:go.place_chess(final_option_location[0],final_option_location[1],2)go.remove_died_pieces(1)gas_chess, Gg ,Gh =go.find_liberty_ally_opponent_nums(final_option_location[0],final_option_location[1])if gas_chess == 1 :return 'PASS'if step == 23:go.place_chess(final_option_location[0],final_option_location[1],1)go.remove_died_pieces(2)gas_chess, Gg ,Gh =go.find_liberty_ally_opponent_nums(final_option_location[0],final_option_location[1])if gas_chess == 1 :return 'PASS'return final_option_location

这部分代码的作用是如果当我们作为第22步或者23步所选择的落子使得该落子仅剩下一口气，则选择PASS。