visual studio code python环境配置_visual-pushing-grasping环境配置及复现

0. 电脑硬件配置

ThinkpadE580-不带独立显卡，这款电脑比较坑，有些包就是安装不上去，比Thinkpad的T系列差多了（小吐一下），VPG的github地址贴一下

andyzeng/visual-pushing-graspinggithub.com

作者代码可在CPU上直接跑，前提安装的torch和torchvision与作者一致，这样速度太慢，约150s一个循环。如果在GPU上跑，就要安装高版本的torch，因为GPU和cuda版本相关，cuda又和torch版本有一定关联

1. Pytorch安装

安装Anaconda3，可去清华镜像源下载.sh文件安装

Tsinghua Open Source Mirrormirrors.tuna.tsinghua.edu.cn

Tsinghua Open Source Mirror

Tsinghua Open Source Mirrormirrors.tuna.tsinghua.edu.cn

bash anaconda文件名.sh

使用如下命令可以将清华源添加到Anaconda仓库

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes

2. 然后使用conda创建python3.6的环境，vpg为环境名字

conda creat -n vpg python=3.6

3. 安装pytorch，原文的pytorch版本太老，后来作者更新了安装代码，但是不建议使用，因为国内使用外网是一大限制，建议使用下面的代码安装:

Andy的VPG项目中pytorch安装方法（不建议使用）

pip --default-timeout=3600 install -i https://pypi.tuna.tsinghua.edu.cn/simple torch torchvision

如果要安装特定torch版本，或者采用Win或OS系统，可以参见官网这篇文章（包括cuda版本下的安装）

PyTorchs0pytorch0org.icopy.site

使用上述办法安装的torch1.4.0，torchvision0.5.0(根据安装时安装源的版本确定的)

1.1 补充内容：pip安装的几个知识点，虽然没使用，但是直觉告诉我应该可以用

将清华源设为 pip 默认源，如果报错，升级pip

pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simplepip install pip -U    #升级pip#安装示例
pip install torch===1.3.0 torchvision===0.4.1 -f https://download.pytorch.org/whl/torch_stable.html -i https://pypi.tuna.tsinghua.edu.cn/simple

解决报错 Could not find a version that satisfies the requirement torch

使用如下镜像网站下载，可以先打开看看有没有要安装的包

http://pypi.doubanio.com/simple/pypi.doubanio.com

安装的命令行如下：

pip install 安装包名字  -i http://pypi.doubanio.com/simple/ --trusted-host pypi.doubanio.com

1.2 参考链接：

Could not find a version that satisfies the requirement tensorflow问题汇总+解决！！！

通过清华镜像源加速下载pytorch--(pip版本)_人工智能_WannaSeaU的博客-CSDN博客

2. 其它包安装

pip --default-timeout=3600 install -i https://pypi.tuna.tsinghua.edu.cn/simple numpy scipy opencv-python matplotlib

3. 代码部分流程-training模式

运行main()参数 --is_sim --push_rewards --experience_replay --explore_rate_decay --save_visualizations
重启仿真器 self.restart_sim()
设置虚拟相机 self.setup_sim_camera()
向仿真环境添加10个物体 self.add_objects() 通过函数vrep.simxCallScriptFunction放置物体，其中参数位置和姿态都是随机的
初始化训练 trainer = Trainer(method, push_rewards.....
创建保存数据的目录 logger
检查仿真环境是否正常 if is_sim: robot.check_sim()
获取相机图像 get_camera_data()
得到heightmap utils.get_heightmap()
保存images和heightmap logger.save_images/heightmap()
对heightmap应用2x scale
将经过旋转后的color和depth图像变换后送入定义的推和抓的网络，得到记录16次旋转的output_prob, interm_feat
检测两幅图像变化
计算训练labels，然后后向传播 trainer.backprop
对经验回放采样
保存模型 snapshot
同步动作线程和训练线程

4. 疑问

利用作者训练的模型测试，无效的pushing太多、抓取点和方位都不是很理想、即便物体分的很开的时候也不能有效抓取
纯CPU运行，推和抓的准确性相比GPU会高很多，这是否跟densenet网络前加with torch.no_grad()有关？
随机种子作用域？Pushing位置计算原理？
抓取物体点是以整体物体为单位确定吗？
推的位置误差太大，且有些抓取位置计算也不对，并没有附着到物体附近
自己训练保存的模型参数是那几个变量？

5. 软硬件调试问题

问题1：加载与训练模型时key值不匹配 Missing key(s) in state_dict:...，这是由于Pytorch版本升级带来的问题，报错如下图

Missing keys报错截图

问题原因：版本升级带来的问题，报错位置norm.1跟norm1的区别，具体解决办法如下：

import re...if args.resume:print("Loading checkpoint from '{}'".format(args.resume))checkpoint = torch.load(args.resume)#modify:# '.'s are no longer allowed in module names, but pervious _DenseLayer# has keys 'norm.1', 'relu.1', 'conv.1', 'norm.2', 'relu.2', 'conv.2'.# They are also in the checkpoints in model_urls. # This pattern is used to find such keys.pattern = re.compile(r'^(.*denselayerd+.(?:norm|relu|conv)).((?:[12]).(?:weight|bias|running_mean|running_var))$')state_dict = checkpoint['state_dict']for key in list(state_dict.keys()):res = pattern.match(key)if res:new_key = res.group(1) + res.group(2)state_dict[new_key] = state_dict[key]del state_dict[key]model.load_state_dict(state_dict)#model.load_state_dict(checkpoint['state_dict'])start_epoch = checkpoint['epoch']

参考链接：

https://github.com/KaiyangZhou/deep-person-reid/issues/23

https://github.com/pytorch/vision/blob/50b2f910490a731c4cd50db5813b291860f02237/torchvision/models/densenet.py#L28

Missing keys的另外一个原因：保存模型使用了nn.DataParallel导致的，解决办法如下：

两种解决办法

第二种j解决办法的代码示例：参考链接

[solved] KeyError: 'unexpected key "module.encoder.embedding.weight" in state_dict'

# original saved file with DataParallel
state_dict = torch.load('myfile.pth.tar')# create new OrderedDict that does not contain `module.`
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict.items():name = k[7:] # remove `module.`new_state_dict[name] = v# load params
model.load_state_dict(new_state_dict)

这个方法有网友整理了更详细的，链接如下

https://blog.csdn.net/qq_32998593/article/details/89343507

问题2：更新后，现在网上下载代码已不再适用torch0.3.1版本（已增加with torch.no_grad）

问题3：robot文件中的代码　async=False会报错，作者最新版本的代码更改为　asynch=False后报错就消失了，报错如图所示：

问题4：内存不够

报错  RuntimeError: $ Torch: not enough memory: you tried to allocate 0GB. Buy new RAM! at /opt/conda/conda-bld/pytorch_1523244252089/work/torch/lib/TH/THGeneral.c:253

问题5：双GPU报错

RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 1 (while checking arguments for cudnn_convolution)

暂时只使用一个GPU解决了本问题

问题6：配置Intel Realsense D435i 相机

安装librealsense和realsense-ros包，这两个安装目录及方法参照google教程，注意两者版本的匹配问题，版本匹配目录见个人硬盘《Intel realsense设备驱动及ROS包版本匹配目录》

问题7：pytorch版本升级后需修改地方（作者的代码torch版本0.3.0较低）

没有了Variable的概念
类似numpy().[0]的地方要修改
with torch.no_grad()设置包含求解的densenet网络，会大大提高计算时间，不至于内存溢出；