完整项目已上传 Github —— im2txt
模型需要单独下载,Github 免费版不能上传大于100M的文件

1. 下载 im2txt

tensorflow/models 下面有很多模型,但是我们只需要 im2txt,不过在 Github 上面下载子文件夹有点麻烦,所以还是下载整个 models,也许以后会用到其他的模型:

git clone https://github.com/tensorflow/models.git

下载好了之后将 models/research/im2txt/im2txt 文件夹复制到你的工作区。

2. 安装必要的包

首先按照 Github 上 im2txt 的说明,安装所有必需的包:

  • Bazel
  • TensorFlow 1.0或更高版本
  • NumPy
  • Natural Language Toolkit (NLTK)
    • 首先安装 NLTK
    • 然后下载 NLTK 数据

3. 下载模型和词汇

如果要自己训练模型,按照官网的说法,需要先下载几个小时的数据集,然后再训练1~2周,最后还要精调几个星期。

训练要花不少时间,所以用训练好的模型,下载地址是:

  • 原地址(如果有VPN)
  • 网盘,密码:9bun

下载之后放在im2txt/model文件夹下:

im2txt/......model/graph.pbtxtmodel.ckpt-2000000model.ckpt-2000000.meta

同时下载包含词语的文件 word_counts.txt,下载好之后放在 data 文件夹下:

im2txt/......data/......word_counts.txt

4. 编写脚本

在 im2txt 文件夹下新建一个 run.sh 脚本文件,输入以下命令:

CHECKPOINT_PATH="${HOME}/im2txt/model/train"
VOCAB_FILE="${HOME}/im2txt/data/mscoco/word_counts.txt"
IMAGE_FILE="${HOME}/im2txt/data/mscoco/raw-data/val2014/COCO_val2014_000000224477.jpg"bazel build -c opt //im2txt:run_inferencebazel-bin/im2txt/run_inference \--checkpoint_path=${CHECKPOINT_PATH} \--vocab_file=${VOCAB_FILE} \--input_files=${IMAGE_FILE}

其中的变量用自己的路径代替,比如我当前设置的路径:

CHECKPOINT_PATH="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/model.ckpt-2000000"
VOCAB_FILE="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/word_counts.txt"
IMAGE_FILE="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/images/1.jpg"bazel build -c opt run_inferencebazel-bin/im2txt/run_inference \--checkpoint_path=${CHECKPOINT_PATH} \--vocab_file=${VOCAB_FILE} \--input_files=${IMAGE_FILE}

5. 运行脚本

将当前工作目录设置为 im2txt,设置脚本的权限:

sudo chmod 777 run.sh

然后将工作目录设置为 im2txt 的上层目录,运行脚本:

./im2txt/run.sh

输出结果如下,感觉结果还不错:

INFO: Analysed target //im2txt:run_inference (0 packages loaded).
INFO: Found 1 target...
Target //im2txt:run_inference up-to-date:bazel-bin/im2txt/run_inference
INFO: Elapsed time: 0.164s, Critical Path: 0.01s
INFO: Build completed successfully, 1 total action
INFO:tensorflow:Building model.
INFO:tensorflow:Initializing vocabulary from file: /home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/word_counts.txt
INFO:tensorflow:Created vocabulary with 11520 words
INFO:tensorflow:Running caption generation on 1 files matching /home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/images/1.jpg
INFO:tensorflow:Loading model from checkpoint: /home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/newmodel.ckpt-2000000
INFO:tensorflow:Restoring parameters from /home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/newmodel.ckpt-2000000
INFO:tensorflow:Successfully loaded checkpoint: newmodel.ckpt-2000000Captions for image 1.jpg:0) a man riding a wave on top of a surfboard . (p=0.035667)1) a person riding a surf board on a wave (p=0.016235)2) a man on a surfboard riding a wave . (p=0.010144)

同时 bazel build 命令会在 WORKSPACE 的同级目录下生成一些文件夹:

bazel-bin/
bazel-genfiles/
bazel-out/
bazel-testlogs/
......

而 bazel-bin 下就是编译好的 run_inference,会在 run.sh 中被调用。

6. 错误总结

6.1 build 错误

在执行 run.sh 时,bazel 的 build 命令只能运行在工作目录下:

ERROR: The 'build' command is only supported from within a workspace.

解决方法是,在执行 run.sh 的目录下新建一个 WORKSPACE:

touch WORKSPACE

6.2 找不到 im2txt 包

在执行 run.sh 时,出现找不到 im2txt 包的错误:

ERROR: Skipping '//im2txt:run_inference': no such package 'im2txt': BUILD file not found on package path
WARNING: Target pattern parsing failed.
ERROR: no such package 'im2txt': BUILD file not found on package path
INFO: Elapsed time: 0.107s
FAILED: Build did NOT complete successfully (0 packages loaded)
./run.sh: 9: ./run.sh: bazel-bin/im2txt/run_inference: not found

这是因为没有在 im2txt 的上层目录执行,解决方法是在 im2txt 的上层目录执行 run.sh 脚本。

或者直接在 run.sh 添加一句返回上层目录的命令:

CHECKPOINT_PATH="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/model.ckpt-2000000"
VOCAB_FILE="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/word_counts.txt"
IMAGE_FILE="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/images/1.jpg"cd .. # 返回上层目录
bazel build -c opt run_inferencebazel-bin/im2txt/run_inference \--checkpoint_path=${CHECKPOINT_PATH} \--vocab_file=${VOCAB_FILE} \--input_files=${IMAGE_FILE}

然后直接在 run.sh 的当前目录下执行:

./run.sh

6.3 找不到 lstm/basic_lstm_cell/×××

运行 run.sh 时,TensorFlow 在模型中找不到 lstm/basic_lstm_cell/×××:

# 错误1
NotFoundError: Tensor name "lstm/basic_lstm_cell/bias" not foundin checkpoint files# 错误2
NotFoundError: Key lstm/basic_lstm_cell/kernel not found in checkpoint

这是因为 TF1.0 和 TF1.2 的 LSTM 在命名上出现了差异,TF1.0 之前的命名跟 TF1.0 也不一样,所以需要根据错误信息自己修改:

TF1.0 TF1.2
lstm/basic_lstm_cell/weights lstm/basic_lstm_cell/kernel
lstm/basic_lstm_cell/biases lstm/basic_lstm_cell/bias

解决方式是,新建 rename_ckpt.py 文件,使用输入以下方法将原有训练模型转化:

import tensorflow as tfdef rename_ckpt():# 由于 TensorFlow 的版本不同,所以要根据具体错误信息进行修改vars_to_rename = {"lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/bias","lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/kernel"}new_checkpoint_vars = {}reader = tf.train.NewCheckpointReader("/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/model.ckpt-2000000")for old_name in reader.get_variable_to_shape_map():if old_name in vars_to_rename:new_name = vars_to_rename[old_name]else:new_name = old_namenew_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))init = tf.global_variables_initializer()saver = tf.train.Saver(new_checkpoint_vars)with tf.Session() as sess:sess.run(init)saver.save(sess,"/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/newmodel.ckpt-2000000")print("checkpoint file rename successful... ")if __name__ == '__main__':rename_ckpt()

运行 rename_ckpt.py 脚本,成功修改之后的结果如下:

$ python rename_ckpt.pycheckpoint file rename successful...

此时,model 文件夹下会出现几个新的文件:

model/......checkpointnewmodel.ckpt-2000000.data-00000-of-00001newmodel.ckpt-2000000.indexnewmodel.ckpt-2000000.meta

同时还要将 run.sh 脚本中的 CHECKPOINT_PATH 改成修改后的 ckpt 文件:

CHECKPOINT_PATH="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/newmodel.ckpt-2000000"

6.4 读取图片错误

运行 run.sh 时,出现编码的错误信息,而错误追踪信息表明是读取图片时发生的错误:

Traceback (most recent call last):File "/home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/bazel-bin/im2txt/run_inference.runfiles/__main__/im2txt/run_inference.py", line 85, in <module>tf.app.run()File "/home/widiot/workspace/tensorflow-space/tensorflow-gpu/venv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 129, in run_sys.exit(main(argv))File "/home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/bazel-bin/im2txt/run_inference.runfiles/__main__/im2txt/run_inference.py", line 74, in mainimage = f.read()
......'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

解决方法是修改一下打开图片的方式,我出现错误的文件是 run_inference.py:

for filename in filenames:with tf.gfile.GFile(filename, "r") as f:image = f.read()# 第73行修改为for filename in filenames:with tf.gfile.GFile(filename, "rb") as f:image = f.read()

该错误地址为 issue。

6.5 输出相同叙述文本

执行 run.sh 脚本后,输出的结果全是一样的叙述文本,并且后面还有很多 .<S>

......Captions for image 1.jpg:0) a man riding a wave on top of a surfboard . <S> . <S> . <S> <S> . (p=0.001145)1) a man riding a wave on top of a surfboard . <S> . <S> <S> . <S> <S> (p=0.000888)2) a man riding a wave on top of a surfboard . <S> . <S> <S> . <S> <S> (p=0.000658)

我查看了代码,发现 caption_generator.py 脚本中有判断是不是结束符 </S> 的语句:

......# 第194行
if w == self.vocab.end_id:if self.length_normalization_factor > 0:......

而这一行代码的结果始终为 False,我将 w 的值和 end_id 的值对比发现 w=2,而 end_id=3。

然后我去查看 word_counts.txt,发现 <S> 的位置为 2,</S> 的位置为 3,跟代码中模型的输出不一样:

a 969108
<S> 586368
</S> 586368
. 440479
on 213612
of 202290......

将这两个字符调换位置,重新运行 run.sh,结果就正常了:

......Captions for image 1.jpg:0) a man riding a wave on top of a surfboard . (p=0.035667)1) a person riding a surf board on a wave (p=0.016235)2) a man on a surfboard riding a wave . (p=0.010144)

这个 word_counts.txt 是我找的别人已有的,没想到还能有这样的错误,真是刷新我对 BUG 的认知。

【TensorFlow】im2txt — 将图像转为叙述文本相关推荐

  1. 使用TensorFlow进行常用的图像处理-图像转为矩阵以及图像大小调整

    图像编码处理 将图像转为一个三维矩阵,并使用三维矩阵形成一个图像: import tensorflow as tf import matplotlib.pyplot as plt# 读取原始图像数据 ...

  2. 图像扩充边界_使用机器学习来索引数十亿图像中的文本

    自动识别图像中的文本(包括包含图像的PDF)的潜在好处是巨大的.人们在Dropbox中存储了超过200亿个图像和PDF文件.在这些文件中,10-20%是文档类收据和白板图像的照片 - 而不是文档本身. ...

  3. python将图片转换成二进制文本_python图片转为二进制文本

    python图片转为二进制文本 发布时间:2018-11-06 00:05, 浏览次数:487 , 标签: python 写在最前面: 我在研究机器学习的过程中,给的数据集是手写数字图片被处理后的由0 ...

  4. 使用CV2和Keras OCR从图像中删除文本

    点击上方"小白学视觉",选择加"星标"或"置顶" 重磅干货,第一时间送达 介绍 本文将讨论如何快速地从图像中删除文本,作为图像分类器的预处理 ...

  5. R语言ggplot2可视化使用vjust和hjust参数对齐图像中的文本注释信息(左对齐、右对齐、居中)实战

    R语言ggplot2可视化使用vjust和hjust参数对齐图像中的文本注释信息(左对齐.右对齐.居中)实战 目录

  6. R语言在可视化图像中添加文本(Adding Text to plot)

    R语言在可视化图像中添加文本(Adding Text to plot) 创建一个好的可视化包括引导读者,使用图形讲述一个直观的故事.在某些情况下,这个故事可以以完全直观生动的方式呈现,而不需要添加文本 ...

  7. Python可视化(matplotlib)在图像中添加文本和标记(Text and Annotation)

    Python可视化(matplotlib)在图像中添加文本和标记(Text and Annotation) 目录 Python可视化(matplotlib)在图形中添加文本和标记(Text and A ...

  8. 如何用CSS实现图像替换链接文本显示并保证链接可点击

    一个很普通的网页中显示LOGO图像,按照以往的页面制作经验,基本是在页面中插入图像即可(<img src="logo.gif" />),不过以新WEB标准进行CSS布局 ...

  9. 彩色RGB图像转为灰度图像

    将彩色RGB图像转为灰度图像,其中像素值的转换为 灰度值=0.2989 * R + 0.5870 * G + 0.1140 * B 原彩色RGB图像是三通道的,转换成单通道的灰度图像 自己写了一个Py ...

最新文章

  1. linux ubuntu systemd-udevd进程 cpu占用过高 解决方法
  2. Dev控件使用 - 皮肤
  3. centos查看模块信息和模块路径
  4. SpringMVC深度探险(二) —— SpringMVC概览
  5. 2.4-yum工具详解
  6. php pdo 参数绑定,PDO预处理之参数绑定和列绑定
  7. [开发笔记]-C#获取pdf文档的页数
  8. 无中生有!没有视觉信号的视觉语音增强
  9. 路由器互通过程(简述)
  10. java实现redis客户端_Java实现Redis客户端
  11. CollabNet Subversion Edge 安裝筆記 (1):基本安裝設定篇
  12. 浏览器window对象
  13. 使用高德开放平台显示指定的坐标点和线
  14. 潘丽云:魏尔斯特拉斯的复变函数思想分析(2009)(2011-01-14 22:34:30)
  15. 程序员怎么跨年?跨年烟花网页支持自定义文本烟花
  16. Excel表格怎么换行?Excel单元格内换行
  17. 完美识别率 迅捷PDF转换成Word转换器发布
  18. Ciri智能语音系统
  19. 光猫H2-3交换机K2P|K2Padavan无线路由器单臂路由上网
  20. laravel sail的坑

热门文章

  1. cisco路由器升级rom版本
  2. s00devs_Devs @ Home –现场网络研讨会– CEST:4月30日13:00:Hibernate提示和技巧–解决常见问题的15条提示
  3. C语言程序对夏令时的处理
  4. 幼儿英语课前热身小游戏
  5. 自从上了K8S,项目更新都不带停机的!
  6. 异或校验算法 c语言程序,C# 异或校验算法
  7. UML正日薄西山的13个理由
  8. Slasher Flick
  9. mysql大批量数据插入技巧
  10. jenkins docker 编译verify出现crash的问题