TensorFlow练习27: 验证码生成器-从文本生成图像
前文<TensorFlow练习20: 使用深度学习破解字符验证码>是一个基于CNN的识别验证码练习,也就是根据图像预测文本。本帖就来个大反转,即从文本合成图像,看看能不能用深度学习练一个验证码生成器。
本帖使用的模型为GANs,如下图:
上图是根据描述生成花,本帖的问题相对简单,至少不用理解自然语言
- http://blog.topspeedsnail.com/archives/10977
- http://bamos.github.io/2016/08/09/deep-completion/
- Generative Adversarial Text to Image Synthesis
- https://github.com/paarthneekhara/text-to-image
生成验证码训练样本(取自TensorFlow练习20):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
|
from captcha.image import ImageCaptcha# pip install captcha
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import random
# 验证码中的字符, 就不用汉字了
number = ['0','1','2','3','4','5','6','7','8','9']
alphabet = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
ALPHABET = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
# 验证码一般都无视大小写;验证码长度4个字符
def random_captcha_text(char_set=number, captcha_size=4):
captcha_text = []
for i in range(captcha_size):
c = random.choice(char_set)
captcha_text.append(c)
return captcha_text
# 生成字符对应的验证码
def gen_captcha_text_and_image(c_set):
image = ImageCaptcha()
captcha_text = random_captcha_text(char_set=c_set)
captcha_text = ''.join(captcha_text)
captcha = image.generate(captcha_text)
#image.write(captcha_text, captcha_text + '.jpg') # 写到文件
captcha_image = Image.open(captcha)
captcha_image = np.array(captcha_image)
return captcha_text, captcha_image
if __name__ == '__main__':
# 测试
text, image = gen_captcha_text_and_image()
f = plt.figure()
ax = f.add_subplot(111)
ax.text(0.1, 0.9,text, ha='center', va='center', transform=ax.transAxes)
plt.imshow(image)
plt.show()
|
训练:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
|
import tensorflow as tf
from gen_captcha import gen_captcha_text_and_image
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import os
# 生成长度为4的验证码
captcha_len = 4
image_size = [64, 128]# 最好是2的次方
batch_size = 64
# noise
z_dim = 20
t_z = tf.placeholder(tf.float32, [batch_size, z_dim], name="noise")
t_dim = 80
# 验证码字符串
t_text = tf.placeholder(tf.float32, [batch_size, captcha_len], name="captcha_text")
# t_text字符串对应的验证码
real_image = tf.placeholder(tf.float32, [batch_size, image_size[0], image_size[1], 3], name = 'real_image')
# t_text字符串对应的错误验证码
wrong_image = tf.placeholder(tf.float32, [batch_size, image_size[0], image_size[1], 3], name = 'wrong_image')
class batch_norm(object):
def __init__(self, epsilon=1e-5, momentum = 0.9, name="batch_norm"):
with tf.variable_scope(name):
self.epsilon= epsilon
self.momentum = momentum
self.name = name
def __call__(self, x, train=True):
return tf.contrib.layers.batch_norm(x, decay=self.momentum, updates_collections=None,
epsilon=self.epsilon, scale=True, is_training=train, scope=self.name)
def generator(t_z, t_text, is_training=True):
with tf.variable_scope("g_captcha_text_embedding"):
W = tf.get_variable("g_weight", [captcha_len, t_dim], tf.float32, tf.random_normal_initializer(stddev=0.02))
b = tf.get_variable("g_bias", [t_dim], initializer=tf.constant_initializer(0.0))
captcha_text_embedding = tf.matmul(t_text, W) + b # (batch_size, 80)
# leak relu
captcha_text_embedding = tf.maximum(captcha_text_embedding, 0.2*captcha_text_embedding)
z_concat = tf.concat(1, [t_z, captcha_text_embedding])# (batch_size, 100)
w_2, w_4, w_8, w_16 = int(image_size[0]/2), int(image_size[0]/4), int(image_size[0]/8), int(image_size[0]/16)
h_2, h_4, h_8, h_16 = int(image_size[1]/2), int(image_size[1]/4), int(image_size[1]/8), int(image_size[1]/16)
with tf.variable_scope("g_projection_layer"):
W = tf.get_variable("g_weight", [z_dim+t_dim, 64*8*w_16*h_16], tf.float32, tf.random_normal_initializer(stddev=0.02))
b = tf.get_variable("g_bias", [64*8*w_16*h_16], initializer=tf.constant_initializer(0.0))
out = tf.nn.relu(tf.matmul(z_concat, W) + b)
out = tf.reshape(out, [-1, w_16, h_16, 64*8])
# (64, 4, 8, 512)
with tf.variable_scope("g_deconv2d"):
W1 = tf.get_variable('g_deconv2d_W_1', [5, 5, 64*4, 64*8], initializer=tf.random_normal_initializer(stddev=0.02))
b1 = tf.get_variable('g_deconv2d_b_1', [64*4], initializer=tf.constant_initializer(0.0))
deconv_1 = tf.nn.conv2d_transpose(out, W1, output_shape=[batch_size, w_8, h_8, 64*4], strides=[1, 2, 2, 1])
deconv_1 = tf.nn.bias_add(deconv_1, b1)
deconv_1_batch_norm_func = batch_norm(name='g_deconv2d_1_bn')
deconv_1 = tf.nn.relu(deconv_1_batch_norm_func(deconv_1, train=is_training))
# (?, 8, 16, 256)
W2 = tf.get_variable('g_deconv2d_W_2', [5, 5, 64*2, 64*4], initializer=tf.random_normal_initializer(stddev=0.02))
b2 = tf.get_variable('g_deconv2d_b_2', [64*2], initializer=tf.constant_initializer(0.0))
deconv_2 = tf.nn.conv2d_transpose(deconv_1, W2, output_shape=[batch_size, w_4, h_4, 64*2], strides=[1, 2, 2, 1])
deconv_2 = tf.nn.bias_add(deconv_2, b2)
deconv_2_batch_norm_func = batch_norm(name='g_deconv2d_2_bn')
deconv_2 = tf.nn.relu(deconv_2_batch_norm_func(deconv_2, train=is_training))
# (?, 16, 32, 128)
W3 = tf.get_variable('g_deconv2d_W_3', [5, 5, 64, 64*2], initializer=tf.random_normal_initializer(stddev=0.02))
b3 = tf.get_variable('g_deconv2d_b_3', [64], initializer=tf.constant_initializer(0.0))
deconv_3 = tf.nn.conv2d_transpose(deconv_2, W3, output_shape=[batch_size, w_2, h_2, 64], strides=[1, 2, 2, 1])
deconv_3 = tf.nn.bias_add(deconv_3, b3)
deconv_3_batch_norm_func = batch_norm(name='g_deconv2d_3_bn')
deconv_3 = tf.nn.relu(deconv_3_batch_norm_func(deconv_3, train=is_training))
# (?, 32, 64, 64)
W4 = tf.get_variable('g_deconv2d_W_4', [5, 5, 3, 64], initializer=tf.random_normal_initializer(stddev=0.02))
b4 = tf.get_variable('g_deconv2d_b_4', [3], initializer=tf.constant_initializer(0.0))
deconv_4 = tf.nn.conv2d_transpose(deconv_3, W4, output_shape=[batch_size, image_size[0], image_size[1], 3], strides=[1, 2, 2, 1])
deconv_4 = tf.nn.bias_add(deconv_4, b4)
# (?, 64, 128, 3)
return tf.tanh(deconv_4)/2. + 0.5# tanh范围(-1 1), 转为(0 1)
# base on https://github.com/carpedm20/DCGAN-tensorflow/blob/master/model.py
def discriminator(image, t_text, reuse=False):
if reuse:
tf.get_variable_scope().reuse_variables()
with tf.variable_scope("d_conv2d"):
W1 = tf.get_variable('d_conv2d_W_1', [5, 5, image.get_shape()[-1], 64], initializer=tf.truncated_normal_initializer(stddev=0.02))
b1 = tf.get_variable('d_conv2d_b_1', [64], initializer=tf.constant_initializer(0.0))
conv_1 = tf.nn.conv2d(image, W1, strides=[1, 2, 2, 1], padding='SAME')
conv_1 = tf.nn.bias_add(conv_1, b1)
conv_1 = tf.maximum(conv_1, 0.2*conv_1)
# (64, 32, 64, 64)
W2 = tf.get_variable('d_conv2d_W_2', [5, 5, conv_1.get_shape()[-1], 64*2], initializer=tf.truncated_normal_initializer(stddev=0.02))
b2 = tf.get_variable('d_conv2d_b_2', [64*2], initializer=tf.constant_initializer(0.0))
conv_2 = tf.nn.conv2d(conv_1, W2, strides=[1, 2, 2, 1], padding='SAME')
conv_2 = tf.nn.bias_add(conv_2, b2)
conv_2_batch_norm_func = batch_norm(name='d_conv2d_2_bn')
conv_2 = conv_2_batch_norm_func(conv_2, train=True)
conv_2 = tf.maximum(conv_2, 0.2*conv_2)
# (64, 16, 32, 128)
W3 = tf.get_variable('d_conv2d_W_3', [5, 5, conv_2.get_shape()[-1], 64*4], initializer=tf.truncated_normal_initializer(stddev=0.02))
b3 = tf.get_variable('d_conv2d_b_3', [64*4], initializer=tf.constant_initializer(0.0))
conv_3 = tf.nn.conv2d(conv_2, W3, strides=[1, 2, 2, 1], padding='SAME')
conv_3 = tf.nn.bias_add(conv_3, b3)
conv_3_batch_norm_func = batch_norm(name='d_conv2d_3_bn')
conv_3 = conv_3_batch_norm_func(conv_3, train=True)
conv_3 = tf.maximum(conv_3, 0.2*conv_3)
# (64, 8, 16, 256)
W4 = tf.get_variable('d_conv2d_W_4', [5, 5, conv_3.get_shape()[-1], 64*8], initializer=tf.truncated_normal_initializer(stddev=0.02))
b4 = tf.get_variable('d_conv2d_b_4', [64*8], initializer=tf.constant_initializer(0.0))
conv_4 = tf.nn.conv2d(conv_3, W4, strides=[1, 2, 2, 1], padding='SAME')
conv_4 = tf.nn.bias_add(conv_4, b4)
conv_4_batch_norm_func = batch_norm(name='d_conv2d_4_bn')
conv_4 = conv_4_batch_norm_func(conv_4, train=True)
conv_4 = tf.maximum(conv_4, 0.2*conv_4)
# (64, 4, 8, 512)
with tf.variable_scope("d_captcha_text_embedding"):
W = tf.get_variable("d_weight", [captcha_len, t_dim], tf.float32, tf.random_normal_initializer(stddev=0.02))
b = tf.get_variable("d_bias", [t_dim], initializer=tf.constant_initializer(0.0))
captcha_text_embedding = tf.matmul(t_text, W) + b # (batch_size, 80)
# leak relu
captcha_text_embedding = tf.maximum(captcha_text_embedding, 0.2*captcha_text_embedding)
captcha_text_embedding = tf.expand_dims(captcha_text_embedding,1)
captcha_text_embedding = tf.expand_dims(captcha_text_embedding,2)
tiled_embeddings = tf.tile(captcha_text_embedding, [1,4,8,1], name='d_tiled_embeddings')
conv_4_concat = tf.concat(3, [conv_4, tiled_embeddings], name='d_conv_4_concat')
# (64, 4, 8, 592)
W5 = tf.get_variable('d_conv2d_W_5', [1, 1, conv_4_concat.get_shape()[-1], 64*8], initializer=tf.truncated_normal_initializer(stddev=0.02))
b5 = tf.get_variable('d_conv2d_b_5', [64*8], initializer=tf.constant_initializer(0.0))
conv_5 = tf.nn.conv2d(conv_4_concat, W5, strides=[1, 1, 1, 1], padding='SAME')
conv_5 = tf.nn.bias_add(conv_5, b5)
conv_5_batch_norm_func = batch_norm(name='d_conv2d_5_bn')
conv_5 = conv_5_batch_norm_func(conv_5, train=True)
conv_5 = tf.maximum(conv_5, 0.2*conv_5)
# (64, 4, 8, 512)
with tf.variable_scope("d_fully_connect"):
flat = tf.reshape(conv_5, [batch_size, 4*8*512])
W = tf.get_variable("d_W", [flat.get_shape().as_list()[1], 1], initializer=tf.random_normal_initializer(stddev=0.02))
b = tf.get_variable("d_b", [1], initializer=tf.constant_initializer(0.0))
fc = tf.matmul(flat, W) + b
return tf.nn.sigmoid(fc), fc
# 生成一个训练batch
def get_next_batch(batch_size=64):
# 有时生成图像大小不是(60, 160, 3)
batch_texts = []
batch_images = []
batch_wrong_images = []
def wrap_gen_captcha_text_and_image(c_set):
while True:
text, image = gen_captcha_text_and_image(c_set)
if image.shape == (60, 160, 3):
return text, image
for i in range(batch_size):
text, image = wrap_gen_captcha_text_and_image(c_set=['1', '2', '3', '4'])
ord_n = []
for c in text:
ord_n.append(ord(c)/122.0)
image = Image.fromarray(image)
image = image.resize([128, 64])
#image.save(str(i)+'.jpg')
image = np.array(image) / 255.0
_, wrong_image = wrap_gen_captcha_text_and_image(c_set=['a', 'b', 'c', 'd'])
wrong_image = Image.fromarray(wrong_image)
wrong_image = wrong_image.resize([128, 64])
wrong_image = np.array(wrong_image) / 255.0
batch_texts.append(ord_n)
batch_images.append(image)
batch_wrong_images.append(wrong_image)
return np.array(batch_texts), np.array(batch_images), np.array(batch_wrong_images)
def train():
fake_image = generator(t_z, t_text, is_training=True)
fake_image_disc, fake_image_logits_disc = discriminator(fake_image, t_text)
wrong_image_disc, wrong_image_logits_disc = discriminator(wrong_image, t_text, reuse = True)
real_image_disc, real_image_logits_disc = discriminator(real_image, t_text, reuse=True)
# loss
g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(fake_image_logits_disc, tf.ones_like(fake_image_disc)))
d_loss1 = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(real_image_logits_disc, tf.ones_like(real_image_disc)))
d_loss2 = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(wrong_image_logits_disc, tf.zeros_like(wrong_image_disc)))
d_loss3 = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(fake_image_logits_disc, tf.zeros_like(fake_image_disc)))
d_loss = d_loss1 + d_loss2 + d_loss3
# optimizer
train_vars = tf.trainable_variables()
d_vars = [var for var in train_vars if 'd_' in var.name]
g_vars = [var for var in train_vars if 'g_' in var.name]
d_optim = tf.train.AdamOptimizer(0.0002, beta1=0.5).minimize(d_loss, var_list=d_vars)
g_optim = tf.train.AdamOptimizer(0.0002, beta1=0.5).minimize(g_loss, var_list=g_vars)
with tf.Session() as sess:
loop = 0
while True:
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
#saver.restore(sess, '')
batch_text, batch_image, batch_wrong_images = get_next_batch(64)
z_noise = np.random.uniform(-1, 1, [batch_size, z_dim])
_, _d_loss = sess.run([d_optim, d_loss], feed_dict = { real_image : batch_image,
wrong_image : batch_wrong_images,
t_z : z_noise,
t_text : batch_text})
# 更新generator两次,防止d_loss->0
_, _ = sess.run([g_optim, g_loss], feed_dict = { real_image : batch_image,
wrong_image : batch_wrong_images,
t_z : z_noise,
t_text : batch_text})
_, _g_loss = sess.run([g_optim, g_loss], feed_dict = { real_image : batch_image,
wrong_image : batch_wrong_images,
t_z : z_noise,
t_text : batch_text})
print(loop, _d_loss, _g_loss)
if loop%50 == 0:
save_path = saver.save(sess, "captcha.model")
texts, _, _ = get_next_batch(64)
z_noise = np.random.uniform(-1, 1, [batch_size, z_dim])
images = sess.run(fake_image, feed_dict = {t_z : z_noise, t_text : texts})
os.mkdir(str(loop))
i = 0
for img in images:
plt.imsave(str(loop)+'/' +str(i)+'.jpg', img)
i+=1
loop += 1
train()
|
Not Done,码农外出,准备徒步,随身只带了一个小笔记本,以后再修改测试。
如要转载,请保持本文完整,并注明作者@斗大的熊猫和本文原始地址: http://blog.topspeedsnail.com/archives/11150
TensorFlow练习27: 验证码生成器-从文本生成图像相关推荐
- tensorflow循环神经网络(RNN)文本生成莎士比亚剧集
tensorflow循环神经网络(RNN)文本生成莎士比亚剧集 我们将使用 Andrej Karpathy 在<循环神经网络不合理的有效性>一文中提供的莎士比亚作品数据集.给定此数据中的一 ...
- 文本生成图像简述4——扩散模型、自回归模型、生成对抗网络的对比调研
基于近年来图像处理和语言理解方面的技术突破,融合图像和文本处理的多模态任务获得了广泛的关注并取得了显著成功. 文本生成图像(text-to-image)是图像和文本处理的多模态任务的一项子任务,其根据 ...
- 文本生成图像的新SOTA:Google的XMC-GAN
点击上方"机器学习与生成对抗网络",关注星标 获取有趣.好玩的前沿干货! 来源:新智元 [导读]从图像到生成文本.从文本生成图像,多模态模型的探索一直未停止.最近Google又出从 ...
- 字节最新文本生成图像AI,训练集里居然没有一张带文字描述的图片?!
点击上方"视学算法",选择加"星标"或"置顶" 重磅干货,第一时间送达 丰色 发自 凹非寺 量子位 | 公众号 QbitAI 一个文本-图像 ...
- 别说了,有画面了!Google文本生成图像取得新SOTA,CVPR2021已接收
来源:新智元 [导读]从图像到生成文本.从文本生成图像,多模态模型的探索一直未停止.最近Google又出从文本到图像的新模型,75%的人类都说超过了传统的SOTA模型,直呼脑子里有画面了! 文本到图像 ...
- Text to image论文精读PDF-GAN:文本生成图像新度量指标SSD Semantic Similarity Distance
SSD,全称为Semantic Similarity Distance,是一种基于CLIP的新度量方式,是西交利物浦大学学者提出的一种新的文本生成图像度量指标,受益于所提出的度量,作者进一步设计了并行 ...
- AI艺术的背后:详解文本生成图像模型【基于GAN】
系列文章链接: AI艺术的背后:详解文本生成图像模型[基于 VQ-VAE] AI艺术的背后:详解文本生成图像模型[基于GAN] AI艺术的背后:详解文本生成图像模型[基于Diffusion Model ...
- 英伟达“核弹”再次来袭?Web3.0最高8万招聘,周星驰也来了/文本生成图像引“掐架”……...
本周,业界有哪些新鲜事? 产业界 AlphaGo之后"人机对决"还有必要吗?商汤决定要试试 还记得AlphaGo碾压人类围棋冠军柯洁.李世石的人机大战吗?最近,商汤科技的象棋机器人 ...
- IS指标复现 文本生成图像IS分数定量实验全流程复现 Inception Score定量评价实验踩坑避坑流程
目录 一.IS分数简介 二.IS分数 CUB定量实验步骤 第一步:B_VALIDATION改为True 第二步:配置训练好的生成器 第三步:采样生成图像 第四步:下载IS代码并配置 第五步:下载预训练 ...
最新文章
- 独家 | 一文读懂最大似然估计(附R代码)
- Javascript JQuery获取当前元素的兄弟元素/上一个/下一个元素(转)
- Linux内核源码中使用宏定义的若干技巧
- toolbar + DrawerLayout 实现抽屉菜单
- 人口增长模型_未来中国近一半人口将生活在20强城市,这是异想天开还是大势所趋?...
- linux引导过程和服务控制
- Julia: wsl ubuntu下安装、vscode及配置profile错误补正
- 3D MAX插件大全介绍
- linux下caffe安装过程原理,caffe安装过程详解linux版本
- EwonCOSY 141 MPI EC51410【路由器】
- web前端面试总结(自认为还算全面哈哈哈哈哈!!!)
- 做实验好比开车,危险一直都在,为啥出事的就是你?
- rocksdb配置参数
- STM32——打地鼠
- Android源码的Binder权限是如何控制,附超全教程文档
- 输入n行的杨辉三角java,杨辉三角 Java代码 可以根据输入 输出相应行数的杨辉三角...
- Android之面试题!初级到大师!!!!50道
- 浙江省机电工程师职称评审条件及流程
- 机械硬盘提示函数不正确,要如何找到数据
- 算法 {欧拉函数,欧拉定理,费马小定理}
热门文章
- 设某一机器由n个部件组成_每日小课堂超级攻略!工业机器人知识点全知道
- centos selinux mysql 5.6_centos 6.4下安装mysql 5.6.11
- 数据库设计中的9大常见错误
- Uber推出数据湖集成神器DBEvents,支持MySQL、Cassandra等
- 基础数据类型之集合和深浅copy,还有一些数据类型补充
- Windows 7合理虚拟内存RAMDISK提升运行性能
- mysql 时区与时间函数
- java 两个数交换问题
- CocoStudio 0.2.4.0 UI编辑器下根Panel控件设置背景图片时一个BUG
- NeHe OpenGL第十课:3D世界