论文A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task.源码运行

win10环境运行遇到的错误：

1.Anaconda安装theano

参考教程：https://www.cnblogs.com/Sinte-Beuve/p/8597429.html

2.报错：ImportError: cannot import name downsample

参考教程：https://blog.csdn.net/u011361880/article/details/75463441

3.报错:IOError: [Errno 2] No such file or directory: '/u/nlp/data/deepmind-qa/word-embeddings/glove.6B.100d.txt'

原因：找不到词嵌入文件。

解决：下载glove下来放在某个目录下面即可。下载地址： http://nlp.stanford.edu/data/glove.6B.zip

解压后如图所示，包含4个txt文件。其中的50d、100d、200d、300d表示的是分别用50维、100维、200维、300维的词向量来表示单词。这篇论文的代码中用到的是100维。

4.报错:IOError: [Errno 2] No such file or directory: '/u/nlp/data/deepmind-qa/cnn/train.txt'

原因：找不到训练数据文件。由于作者给出的原下载地址已经失效，下载不到这个文件。

解决：根据作者的提示和源代码，自己写程序对CNN/Daily Mail数据集进行处理

5.报错：“SyntaxError: Non-ASCII character '\xe8' in file”

原因：注释里面出现了中文，而 Python 支持的 ASCII 码无中文。

解决：https://www.cnblogs.com/qinduanyinghua/p/6771494.html

6.报错：AssertionError

原因：这个报错的原因是因为我自己从CNN数据集里面复制了一个例子来进行试验，示例的顺序是文档、问题、答案，结果报了这个错误，分析一下源码发现经过作者处理后示例的顺序应该是问题、答案、文档（无空行），然后示例和示例之间有一行空行。

源码：

import os
import linecachedef gen_new_dataset(rootdir,newdir):file_list = os.listdir(rootdir)for i in range(0,len(file_list)):file_path = os.path.join(rootdir,file_list[i])print(i)if os.path.isfile(file_path):documnet = linecache.getline(file_path, 3)question = linecache.getline(file_path, 5)answer = linecache.getline(file_path, 7)with open(newdir, 'a') as f:f.writelines([question,answer,documnet,"\n"])gen_new_dataset("F:/NRC/dataset/cnn/cnn/questions/training","F:/NRC/dataset/cnn_new/train.txt")
gen_new_dataset("F:/NRC/dataset/cnn/cnn/questions/validation","F:/NRC/dataset/cnn_new/dev.txt")

7.报错：

可能的原因：

这可能是因为安装的时候直接使用了pip install lasagne，这样安装的是lasagneV0.1，而这个模型需要的是lasagneV0.2，查看官网发现需使用pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip来安装lasagneV0.2
参考：https://www.imooc.com/article/29397

安装lasagneV0.2后运行成功，不再报错。

linux环境运行：

源码地址：https://github.com/danqi/rc-cnn-dailymail

源码分析参考：https://www.imooc.com/article/29397

运行步骤：

系统：ubantu18.04

(1)从下载源代码和数据集

源码下载：

git clone https://github.com/danqi/rc-cnn-dailymail

数据集：根据给出的地址进行下载(需要翻墙)

(2)搭建运行环境：

需要的运行环境：

使用Anaconda进行安装：

创建虚拟环境:conda create -n rc-cnn-dailymail python=2.7

(theano 安装参考官网：http://www.deeplearning.net/software/theano/install_ubuntu.html)

安装scipy:conda installl scipy=0.16
安装theano:conda install theano
安装Lasagne0.2.dev.1：（按照上面的方法）pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip

（3）处理数据集（只处理了CNN数据集）：按照上面错误6中代码，改写下路径，生成新的数据集。

（4）训练模型：

报错1：

AttributeError: 'module' object has no attribute '_get_ndarray_c_version'

原因：https://stackoverflow.com/questions/55046335/numpy-attributeerror-with-theano-module-numpy-core-multiarray-has-no-attribut

我安装的numpy版本是1.11，theano是0.9

我的解决方法：conda install theano=1.0.4

报错2：ERROR (theano.gof.cmodule): [Errno 12] Cannot allocate memory

原因:内存不足

解决：换个内存大的服务器跑。

警告3：UserWarning: The file scan_perform.c is not available.

解决：conda uninstall theano

pip install theano

报错4：IOError: [Errno 13] Permission denied: 'model.pkl.gz

解决：普通用户切换到到root用户下运行。

执行过程：