由于图像数据的结构复杂,数据量大,考虑到用没有超强算力的电脑运行程序的时候,为了简化模型结构,对数据进行压缩,摄像头传来的图像先设置为80*60。

为了让模型能学到正确的参数,需要对智能体的action和reward进行定义,汽车控制的主要3个参数可以量化成油门力度([0,1]),刹车力度([0,1]),方向盘角度([-1,1]),是否倒档(True/False)。但是根据一般的开车习惯,这些变量并不是相互独立的,比如油门和刹车一般不会同时踩下(除了漂移),定义DQN的输出时,为了计算对应action的Q值,先对action量化为几个类别:
1.直行加速:throttle=1, brake=0, steer=0, reverse=False
2.左转(满舵):throttle=0.5, brake=0, steer=-1, reverse=False
3.右转(满舵):throttle=0.5, brake=0, steer=1, reverse=False
4.直行减速:throttle=0, brake=0.5, steer=0, reverse=False
5.直行倒车:throttle=1, brake=0, steer=0, reverse=True

之后需要定义汽车行驶的reward,我们可以随机在地图上另选一点,将其坐标作为驾驶的终点,每一帧刷新时,如下定义reward:
1.若发生碰撞,reward=-200
2.若下一帧和当前帧相比,汽车到终点的距离更近,reward=1
3.若下一帧和当前帧相比,汽车到终点的距离更远,reward=-1

定义好之后我们需要将上述功能封装进step()函数并加入环境class,修改后环境class代码如下:

import abc
import glob
import os
import sys
from types import LambdaType
from collections import deque
from collections import namedtupletry:sys.path.append(glob.glob('../carla/dist/carla-*%d.%d-%s.egg' % (sys.version_info.major,sys.version_info.minor,'win-amd64' if os.name == 'nt' else 'linux-x86_64'))[0])
except IndexError:passimport carla
import random
import time
import numpy as np
import cv2
import mathIM_WIDTH = 80
IM_HEIGHT = 60
SHOW_PREVIEW = FalseSECOND_PER_EPISODE = 10class Car_Env():SHOW_CAM = SHOW_PREVIEWSTEER_AMT = 1.0im_width = IM_WIDTHim_height = IM_HEIGHTfront_camera = Nonedef __init__(self):self.client = carla.Client('localhost',2000)self.client.set_timeout(10.0)self.world = self.client.get_world()self.blueprint_library = self.world.get_blueprint_library()self.model_3 = self.blueprint_library.filter('model3')[0]def reset(self):self.collision_hist = []self.radar_hist = []self.actor_list = []self.transform = self.world.get_map().get_spawn_points()[100] #spwan_points共265个点,选第一个点作为初始化小车的位置self.vehicle = self.world.spawn_actor(self.model_3 , self.transform)self.actor_list.append(self.vehicle)self.rgb_cam = self.blueprint_library.find('sensor.camera.rgb')self.rgb_cam.set_attribute('image_size_x',f'{self.im_width}')self.rgb_cam.set_attribute('image_size_y',f'{self.im_height}')self.rgb_cam.set_attribute('fov',f'110')transform = carla.Transform(carla.Location(x=2.5 ,z=0.7 ))self.sensor = self.world.spawn_actor(self.rgb_cam,transform, attach_to=self.vehicle)self.actor_list.append(self.sensor)self.sensor.listen(lambda data: self.process_img(data))self.vehicle.apply_control(carla.VehicleControl(throttle=0.0, brake=0.0))time.sleep(4)#collision sensorcolsensor = self.blueprint_library.find('sensor.other.collision')self.colsensor = self.world.spawn_actor(colsensor, transform, attach_to = self.vehicle)self.actor_list.append(self.colsensor)self.colsensor.listen(lambda event: self.collision_data(event))#target_transform 定义驾驶目的地坐标self.target_transform = self.world.get_map().get_spawn_points()[101]self.target_dis = self.target_transform.location.distance(self.vehicle.get_location())while self.front_camera is None:time.sleep(0.01)self.episode_start = time.time()self.vehicle.apply_control(carla.VehicleControl(throttle=0.0, brake=0.0))return self.front_cameradef collision_data(self, event):self.collision_hist.append(event)def radar_data(self, mesure):self.radar_hist.append(mesure)def process_img(self, image):i = np.array(image.raw_data)i2 = i.reshape((self.im_height, self.im_width , 4))i3 = i2[: , : , : 3]if self.SHOW_CAM:cv2.imshow("",i3)cv2.waitKey(1)self.front_camera = i3return i3/255.0def step(self, action):last_dis = self.target_dis  if action==0:            self.vehicle.apply_control(carla.VehicleControl(throttle=1.0, steer=0.0, brake=0.0, hand_brake=False, reverse=False))elif action==1:self.vehicle.apply_control(carla.VehicleControl(throttle=0.5, steer=-1, brake=0.0, hand_brake=False, reverse=False))elif action==2:self.vehicle.apply_control(carla.VehicleControl(throttle=0.5, steer=1, brake=0.0, hand_brake=False, reverse=False))elif action==4:self.vehicle.apply_control(carla.VehicleControl(throttle=0.0, steer=0.0, brake=0.5, hand_brake=False, reverse=False))else:self.vehicle.apply_control(carla.VehicleControl(throttle=1.0, steer=0.0, brake=0.0, hand_brake=False, reverse=True))self.target_dis = self.target_transform.location.distance(self.vehicle.get_location())  v = self.vehicle.get_velocity()kmh = int(3.6 * math.sqrt(v.x**2 + v.y**2 + v.z**2))if len(self.collision_hist)!=0:done = Truereward = -200elif last_dis < self.target_dis:  #距离目标越来越远了done = Falsereward = -1else:done = Falsereward = 1if self.episode_start + SECOND_PER_EPISODE < time.time():done = Truetime.sleep(1)return self.front_camera, reward, done, None

定义好环境后我们就可以开始定义DQN网络了,选择pytorch框架。在训练之前,还要开辟一个存储空间,用来保存小车每次和环境交互的数据(push_memory()函数),每次训练都从buffer中随机抽取batch_size的样本(get_sample()函数)。

import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.nn.functional as Fimport torch.optim as optimimport torchvision.transforms as T
from torch import FloatTensor, LongTensor, ByteTensor
Tensor = FloatTensorEPSILON = 0.9       # epsilon used for epsilon greedy approach
GAMMA = 0.9
TARGET_NETWORK_REPLACE_FREQ = 100       # How frequently target netowrk updates
MEMORY_CAPACITY = 200
BATCH_SIZE = 32
LR = 0.01           # learning rateclass Net(nn.Module):def __init__(self):super(Net, self).__init__()self.conv1 = nn.Conv2d(3, 16, kernel_size=5, stride=2)self.bn1 = nn.BatchNorm2d(16)self.conv2 = nn.Conv2d(16, 32, kernel_size=5, stride=2)self.bn2 = nn.BatchNorm2d(32)self.conv3 = nn.Conv2d(32, 32, kernel_size=5, stride=2)self.bn3 = nn.BatchNorm2d(32)self.head = nn.Linear(896,5)def forward(self, x):x = F.relu(self.bn1(self.conv1(x)))  # 一层卷积x = F.relu(self.bn2(self.conv2(x)))  # 两层卷积x = F.relu(self.bn3(self.conv3(x)))  # 三层卷积return self.head(x.view(x.size(0),-1)) # 全连接层 class DQN(object):def __init__(self):self.eval_net,self.target_net = Net(),Net()# Define counter, memory size and loss functionself.learn_step_counter = 0 # count the steps of learning process        self.memory = []self.position = 0 # counter used for experience replay buff        self.capacity = 200#------- Define the optimizer------#self.optimizer = torch.optim.Adam(self.eval_net.parameters(), lr=LR)# ------Define the loss function-----#self.loss_func = nn.MSELoss()def  choose_action(self, x):# This function is used to make decision based upon epsilon greedyx = torch.unsqueeze(torch.FloatTensor(x), 0) # add 1 dimension to input state xx = x.permute(0,3,2,1)  #把图片维度从[batch, height, width, channel] 转为[batch, channel, height, width]# input only one sampleif np.random.uniform() < EPSILON:   # greedy# use epsilon-greedy approach to take actionactions_value = self.eval_net.forward(x)#print(torch.max(actions_value, 1)) # torch.max() returns a tensor composed of max value along the axis=dim and corresponding index# what we need is the index in this function, representing the action of cart.action = torch.max(actions_value, 1)[1].data.numpy()action = action[0]else: action = np.random.randint(0, 5)return actiondef push_memory(self, s, a, r, s_):if len(self.memory) < self.capacity:self.memory.append(None)self.memory[self.position] = Transition(torch.unsqueeze(torch.FloatTensor(s), 0),torch.unsqueeze(torch.FloatTensor(s_), 0),\torch.from_numpy(np.array([a])),torch.from_numpy(np.array([r],dtype='int64')))self.position = (self.position + 1) % self.capacitydef get_sample(self,batch_size):return random.sample(self.memory, batch_size)def learn(self):# Define how the whole DQN works including sampling batch of experiences,# when and how to update parameters of target network, and how to implement# backward propagation.# update the target network every fixed stepsif self.learn_step_counter % TARGET_NETWORK_REPLACE_FREQ == 0:# Assign the parameters of eval_net to target_netself.target_net.load_state_dict(self.eval_net.state_dict())self.learn_step_counter += 1transitions = self.get_sample(BATCH_SIZE)  # 抽样batch = Transition(*zip(*transitions))# extract vectors or matrices s,a,r,s_ from batch memory and convert these to torch Variables# that are convenient to back propagationb_s = Variable(torch.cat(batch.state))# convert long int type to tensorb_a = Variable(torch.cat(batch.action))b_r = Variable(torch.cat(batch.reward))b_s_ = Variable(torch.cat(batch.next_state))#b_s和b_s_分别对应当前帧和下一帧的图像数据,变量的维度是80*60*3(x*y*rgb_channel),但进入神经网络需将其维度变为3*80*60b_s = b_s.permute(0,3,2,1)  b_s_ = b_s_.permute(0,3,2,1)        # calculate the Q value of state-action pairq_eval = self.eval_net(b_s).gather(1,b_a.unsqueeze(1)) # (batch_size, 1)# calculate the q value of next stateq_next = self.target_net(b_s_).detach() # detach from computational graph, don't back propagate# select the maximum q value# q_next.max(1) returns the max value along the axis=1 and its corresponding indexq_target = b_r + GAMMA * q_next.max(1)[0].view(BATCH_SIZE, 1) # (batch_size, 1)loss = self.loss_func(q_eval, q_target)self.optimizer.zero_grad() # reset the gradient to zeroloss.backward()self.optimizer.step() # execute back propagation for one stepTransition = namedtuple('Transition',('state', 'next_state','action', 'reward'))

之后添加主函数,模型便可以开始训练。每次和环境交互时选择action,一定概率是模型的输出结果,一定概率是随机选择,可以通过阈值设定(EPSILON)。

if __name__ == '__main__':env=Car_Env()s=env.reset()dqn=DQN()count=0for i in range(2000):a=dqn.choose_action(s)s_,r,done,info = env.step(a)dqn.push_memory(s, a, r, s_)    s=s_    if (dqn.position % (MEMORY_CAPACITY-1) )== 0:dqn.learn()count+=1print('learned times:',count)

运行主函数后,我们就可以看到小车在道路中反复做出各种action以便探索环境。

但是现在的模型还很基础,神经网络对驾驶的控制也远没达到智能,需要经过成千上万次的训练,或者增加传感器或摄像头数据的丰富度,才有可能训练出达到驾驶要求的DQN网络。

关于carla中的传感器,在我另一篇文章中有详细介绍。

使用CARLA模拟器实现DQN自动驾驶(二)搭建神经网络相关推荐

  1. 使用CARLA模拟器实现DQN自动驾驶(一)安装环境

    CARLA是一个自动驾驶环境仿真软件(官网),自带python API,对于Q-learning或DQN来说,能从环境中及时获得反馈非常重要.因此自带API的模拟器对于python编程实现自动驾驶非常 ...

  2. 使用CARLA模拟器实现DQN自动驾驶(三)导航系统

    CARLA中有一系列封装好的自动驾驶导航函数库,全部在server的PythonAPI/carla/agents/navigation包中.使用时,可将agents包复制在python文件的同一目录内 ...

  3. carla与ros2的自动驾驶算法-planning与control算法开发与仿真

    欢迎仪式 carla与ros2的自动驾驶算法-planning与control算法开发与仿真 欢迎大家来到自动驾驶Player(L5Player)的自动驾驶算法与仿真空间,在这个空间我们将一起完成这些 ...

  4. 仅展示成果:基于ROS的自动驾驶系统搭建教程(三):激光定位ndt_matching

    仅展示成果:基于ROS的自动驾驶系统搭建教程(三):激光定位ndt_matching 前端搭建完毕,接着完善后端的功能,现在是最基础的激光点云定位. 初步是要把循迹功能所涉及的所有相关模块给完善好,接 ...

  5. 使用 Carla 和 Python 的自动驾驶汽车第 4 部分 —— 强化学习代理

    在我们的自动驾驶汽车的第四部分,Carla, Python, TensorFlow,和强化学习项目,我们将为我们的实际代理编码.在前一篇教程中,我们研究了环境类,我们的代理将与之交互. 在考虑如何创建 ...

  6. 使用 Carla 和 Python 的自动驾驶汽车第 4 部分 —— 强化学习Action

    欢迎来到自动驾驶汽车的第五部分,并与Carla.Python和TensorFlow加强学习. 现在我们已经有了环境和代理,我们只需要添加更多的逻辑将它们连接在一起,这是我们接下来要做的. 首先,我们将 ...

  7. 使用 Carla 和 Python 的自动驾驶汽车第 2 部分 —— 控制汽车并获取传感器数据

    欢迎来到 Carla 自动驾驶汽车教程系列的第 2 部分.在本教程中,我们将向您介绍 Carla 的 Python API 方面. 首先,Carla 中有几种类型的对象: world:这是你的环境: ...

  8. NVIDIA向交通运输行业开源其自动驾驶汽车深度神经网络

    NVIDIA今日宣布,在NVIDIA GPU Cloud (NGC)容器注册上,向交通运输行业开源NVIDIA DRIVE™自动驾驶汽车开发深度神经网络. NVIDIA DRIVE已成为自动驾驶汽车开 ...

  9. Carla自动驾驶模拟器使用教程Python编程 #最全最源

    本文来自转载,学习了很多篇Carla相关的博客,都是源自这两个系列教程,所以看着一篇足够了(入门) 一.Carla入门 Carla的基本架构与介绍 Carla安装 基础Python API的使用 Ca ...

最新文章

  1. Go语言 channel
  2. EZ 2018 03 23 NOIP2018 模拟赛(五)
  3. DNS域名解析服务(正向解析、反向解析、主从服务器)
  4. intent几种传值数组、对象、集合(Array,Object,List)
  5. 根据移动设备屏幕像素密度,给予不同分辨率的图片
  6. sql limit 子句_SQL Join子句介绍和概述
  7. JAVA Swing GUI设计 WindowBuilder Pro Container使用大全3——JScrollPane使用
  8. dial tcp 10.96.0.1:443: i/o timeout
  9. jmeter下载安装配置(超细)
  10. 内Sane外Win:敏捷需要一流的项目经理
  11. polygraph初体验
  12. B-JUI 实践 之 带搜索与编辑的Datagrid
  13. Edge浏览器保存主页视频的方法
  14. PHP - AES 加密解密
  15. python 键盘记录_记录键盘敲击次数 python实现
  16. IntelliJ IDEA集成svn
  17. 不动点迭代(Fixed Point Iteration)
  18. 送书 | 火遍日本 IT 界的「鱼书」终出续作!!!!
  19. 利用搜搜的问问做外链小技巧
  20. 汽车网络安全渗透测试

热门文章

  1. ffmpeg-01-Subtitle
  2. 减肥用smeal代餐有没有效?
  3. C语言练习题1:英文字母大小写转换
  4. 机器视觉HALCON-Windows 10下的HALCON13.0安装
  5. 利用pdfbox读取pdf文件内容和图片
  6. 数据库论文学习——概述
  7. 数据说话:推荐MySQL类的一些图书
  8. 海波龙(Hyperion) 标准教材
  9. 李善友,你太有才了!
  10. html中正方形圆角框,CSS高级技巧:圆角矩形