整体介绍:

环境python3.6+TensorFlow1.12  显卡是英伟达GTX1070(后头换个好些的显卡)训练了四天四夜

主要技术点CTC,BRNN,MFCC特征,全连接神经网络

CTC时序分类算法: 适合这种不知道输入输出是否对齐的情况(哪个字对应哪段声音)使用的算法,所以CTC适合语音识别和手写字符识别的任务,而传统的语音识别是基于语音学的方法,通常包含拼写、声学和语音模型等单独组件。训练模型的语料除了标注具体的文字外。还要标注按时间对应的音素,这就需要大量的人工成本;使用神经网络的语音识别就变得简单多了,通过能进行时序分类的连续时间分类目标函数(CTC),计算多个标签序列的概率,而序列是语音样本中所有可能的对应文字的集合。然后把预测结果跟实际比较,计算误差,不断更新网络权重。这样就丢弃音素的概念,就省了大量人工标注的成本,也不需要语言模型,只要有足够的样本,就可以训练各种语言的语音识别了。

BRNN双向循环神经网络:可参考我的这篇博文

MFCC梅尔频率倒谱系数:在TensorFlow中的使用方式通俗的说法就是他讲语音按时序转化(转化过程可能比较复杂)为一帧接一帧的,然后每一帧都是一个维度是20或更多,本文中的代码中设置的是26维度,训练的最终目的就是讲每一帧通过他的味独特征来定义它是哪个字符或者更像哪个字符。

全连接神经网络 这个不用多说了

介绍完毕,说干就干

第一步收集数据:

在实际的项目中这一步我们可能要消耗不少精力去做,而此处我们没有必要,可以借鉴清华大学提供的数据进行模型的训练

下载地址 http://www.openslr.org/18/ 或 http://166.111.134.19:8081/data/thchs30-openslr/README.html

只需这一个6.4g的就够了,下载下来解压后是这样的:

data中是所有的数据集,train是训练集,test是测试集

每个文件夹中都是一个wav(语音文件)和对应的trn(语音对应的文字)文件,由于train和test文件中trn中记录的不是语音对应的文字而是一个对应语音文字所在data中的那个trn的文件地址,所以在在代码中我们我们训练集的语音使用train中的,寻找对应的trn文字时在data中找

第二部编写代码训练模型

# coding: UTF-8
# 训练数据下载地址 http://www.openslr.org/18/ 或 http://166.111.134.19:8081/data/thchs30-openslr/README.html
# 原博客中的程序将空格算入label中,导致训练程序中无法准确定位空格字符的特征(因为任何两个发音之间的空格的mfcc特征都不一样),
# 导致程序收敛极慢甚至无法收敛,手动将tran_texts中的空格都去除掉后程序开始收敛了
# 还有一点程序中输出的末尾的"龚"字是字符集中出现频率最小的那个,我们可以手动将字符集中的末尾再加一个空格words+=[""]
import numpy as np
from python_speech_features import mfcc
import scipy.io.wavfile as wav
import os
import time
import tensorflow as tf
from tensorflow.python.ops import ctc_ops
from collections import Counter# 获取文件夹下所有的WAV文件
def get_wav_files(wav_path):wav_files = []for (dirpath, dirnames, filenames) in os.walk(wav_path):for filename in filenames:if filename.endswith('.wav') or filename.endswith('.WAV'):# print(filename)filename_path = os.path.join(dirpath, filename)# print(filename_path)wav_files.append(filename_path)return wav_files# 获取wav文件对应的翻译文字
def get_tran_texts(wavfiles, tran_path):tran_texts = []wav_files = []for wav_file in wavfiles:(wav_path, wav_filename) = os.path.split(wav_file)tran_file = os.path.join(tran_path, wav_filename + '.trn')# print(tran_file)if os.path.exists(tran_file) is False:return Nonefd = open(tran_file, 'r', encoding='UTF-8')text = fd.readline()wav_files.append(wav_file)# 不知为何原数据中的文字中有很多空格,去除空格干扰因子(通俗的解释就是空格的MFCC特性种类太多,因为任何两个文字语音间的空格特征都不一样,模型无法定位空格到底长啥样,导致模型收敛极慢甚至不收敛)tran_texts.append(text.split('\n')[0].replace(' ',''))fd.close()return wav_files,tran_texts# 获取wav和对应的翻译文字
def get_wav_files_and_tran_texts(wav_path, tran_path):wavfiles = get_wav_files(wav_path)wav_files,tran_texts = get_tran_texts(wavfiles, tran_path)return wav_files, tran_texts# 旧的训练集使用该方法获取音频文件名和译文
def get_wavs_lables(wav_path, label_file):wav_files = []for (dirpath, dirnames, filenames) in os.walk(wav_path):for filename in filenames:if filename.endswith('.wav') or filename.endswith('.WAV'):filename_path = os.sep.join([dirpath, filename])if os.stat(filename_path).st_size < 240000:  # 剔除掉一些小文件continuewav_files.append(filename_path)labels_dict = {}with open(label_file, 'rb') as f:for label in f:label = label.strip(b'\n')label_id = label.split(b' ', 1)[0]label_text = label.split(b' ', 1)[1]labels_dict[label_id.decode('ascii')] = label_text.decode('utf-8')labels = []new_wav_files = []for wav_file in wav_files:wav_id = os.path.basename(wav_file).split('.')[0]if wav_id in labels_dict:labels.append(labels_dict[wav_id])new_wav_files.append(wav_file)return new_wav_files, labels# 将稀疏矩阵的字向量转成文字
# tuple是sparse_tuple_from函数的返回值
def sparse_tuple_to_texts_ch(tuple, words):# 索引indices = tuple[0]values = tuple[1]results = [''] * tuple[2][0]for i in range(len(indices)):index = indices[i][0]c =  words[values[i]]results[index] = results[index] + creturn results# 将密集矩阵的字向量转成文字
def ndarray_to_text_ch(value, words):results = ''for i in range(len(value)):if value[i]==len(words):results +=' 'else:results += words[value[i]]  return results# 创建序列的稀疏表示
def sparse_tuple_from(sequences, dtype=np.int32):indices = []values = []for n, seq in enumerate(sequences):indices.extend(zip([n] * len(seq), range(len(seq))))values.extend(seq)indices = np.asarray(indices, dtype=np.int64)values = np.asarray(values, dtype=dtype)shape = np.asarray([len(sequences), indices.max(0)[1] + 1], dtype=np.int64)# return tf.SparseTensor(indices=indices, values=values, shape=shape)return indices, values, shape# 将音频数据转为时间序列(列)和MFCC(行)的矩阵,将对应的译文转成字向量
def get_audio_and_transcriptch(txt_files, wav_files, n_input, n_context, word_num_map, txt_labels=None):audio = []audio_len = []transcript = []transcript_len = []if txt_files != None:txt_labels = txt_filesfor txt_obj, wav_file in zip(txt_labels, wav_files):# load audio and convert to featuresaudio_data = audiofile_to_input_vector(wav_file, n_input, n_context)audio_data = audio_data.astype('float32')# print(word_num_map)audio.append(audio_data)audio_len.append(np.int32(len(audio_data)))# load text transcription and convert to numerical arraytarget = []if txt_files != None:  # txt_obj是文件target = get_ch_lable_v(txt_obj, word_num_map)else:target = get_ch_lable_v(None, word_num_map, txt_obj)  # txt_obj是labels# target = text_to_char_array(target)transcript.append(target)transcript_len.append(len(target))audio = np.asarray(audio)audio_len = np.asarray(audio_len)transcript = np.asarray(transcript)transcript_len = np.asarray(transcript_len)return audio, audio_len, transcript, transcript_len# 将字符转成向量,其实就是根据字找到字在word_num_map中所应对的下标
def get_ch_lable_v(txt_file, word_num_map, txt_label=None):words_size = len(word_num_map)to_num = lambda word: word_num_map.get(word, words_size)if txt_file != None:txt_label = get_ch_lable(txt_file)# print(txt_label)labels_vector = list(map(to_num, txt_label))# print(labels_vector)return labels_vectordef get_ch_lable(txt_file):labels = ""with open(txt_file, 'rb') as f:for label in f:# labels =label.decode('utf-8')labels = labels + label.decode('gb2312')# labels.append(label.decode('gb2312'))return labels# 将音频信息转成MFCC特征
# 参数说明---audio_filename:音频文件   numcep:梅尔倒谱系数个数
#       numcontext:对于每个时间段,要包含的上下文样本个数
def audiofile_to_input_vector(audio_filename, numcep, numcontext):# 加载音频文件fs, audio = wav.read(audio_filename)# 获取MFCC系数orig_inputs = mfcc(audio, samplerate=fs, numcep=numcep)# 打印MFCC系数的形状,得到比如(955, 26)的形状# 955表示时间序列,26表示每个序列的MFCC的特征值为26个# 这个形状因文件而异,不同文件可能有不同长度的时间序列,但是,每个序列的特征值数量都是一样的# print(np.shape(orig_inputs))# 因为我们使用双向循环神经网络来训练,它的输出包含正、反向的结# 果,相当于每一个时间序列都扩大了一倍,所以# 为了保证总时序不变,使用orig_inputs =# orig_inputs[::2]对orig_inputs每隔一行进行一次# 取样。这样被忽略的那个序列可以用后文中反向# RNN生成的输出来代替,维持了总的序列长度。orig_inputs = orig_inputs[::2]  # (478, 26)# print(np.shape(orig_inputs))# 因为我们讲解和实际使用的numcontext=9,所以下面的备注我都以numcontext=9来讲解# 这里装的就是我们要返回的数据,因为同时要考虑前9个和后9个时间序列,# 所以每个时间序列组合了19*26=494个MFCC特征数train_inputs = np.array([], np.float32)train_inputs.resize((orig_inputs.shape[0], numcep + 2 * numcep * numcontext))# print(np.shape(train_inputs))#)(478, 494)# Prepare pre-fix post fix contextempty_mfcc = np.array([])empty_mfcc.resize((numcep))# Prepare train_inputs with past and future contexts# time_slices保存的是时间切片,也就是有多少个时间序列time_slices = range(train_inputs.shape[0])# context_past_min和context_future_max用来计算哪些序列需要补零context_past_min = time_slices[0] + numcontextcontext_future_max = time_slices[-1] - numcontext# 开始遍历所有序列for time_slice in time_slices:# 对前9个时间序列的MFCC特征补0,不需要补零的,则直接获取前9个时间序列的特征need_empty_past = max(0, (context_past_min - time_slice))empty_source_past = list(empty_mfcc for empty_slots in range(need_empty_past))data_source_past = orig_inputs[max(0, time_slice - numcontext):time_slice]assert (len(empty_source_past) + len(data_source_past) == numcontext)# 对后9个时间序列的MFCC特征补0,不需要补零的,则直接获取后9个时间序列的特征need_empty_future = max(0, (time_slice - context_future_max))empty_source_future = list(empty_mfcc for empty_slots in range(need_empty_future))data_source_future = orig_inputs[time_slice + 1:time_slice + numcontext + 1]assert (len(empty_source_future) + len(data_source_future) == numcontext)# 前9个时间序列的特征if need_empty_past:past = np.concatenate((empty_source_past, data_source_past))else:past = data_source_past# 后9个时间序列的特征if need_empty_future:future = np.concatenate((data_source_future, empty_source_future))else:future = data_source_future# 将前9个时间序列和当前时间序列以及后9个时间序列组合past = np.reshape(past, numcontext * numcep)now = orig_inputs[time_slice]future = np.reshape(future, numcontext * numcep)train_inputs[time_slice] = np.concatenate((past, now, future))assert (len(train_inputs[time_slice]) == numcep + 2 * numcep * numcontext)# 将数据使用正太分布标准化,减去均值然后再除以方差train_inputs = (train_inputs - np.mean(train_inputs)) / np.std(train_inputs)return train_inputs# 对齐处理
def pad_sequences(sequences, maxlen=None, dtype=np.float32,padding='post', truncating='post', value=0.):# [478 512 503 406 481 509 422 465]lengths = np.asarray([len(s) for s in sequences], dtype=np.int64)nb_samples = len(sequences)# maxlen,该批次中,最长的序列长度if maxlen is None:maxlen = np.max(lengths)# 在下面的主循环中,从第一个非空序列中获取样本形状以检查一致性sample_shape = tuple()for s in sequences:if len(s) > 0:sample_shape = np.asarray(s).shape[1:]breakx = (np.ones((nb_samples, maxlen) + sample_shape) * value).astype(dtype)for idx, s in enumerate(sequences):if len(s) == 0:continue  # 序列为空,跳过# post表示后补零,pre表示前补零if truncating == 'pre':trunc = s[-maxlen:]elif truncating == 'post':trunc = s[:maxlen]else:raise ValueError('Truncating type "%s" not understood' % truncating)# check `trunc` has expected shapetrunc = np.asarray(trunc, dtype=dtype)if trunc.shape[1:] != sample_shape:raise ValueError('Shape of sample %s of sequence at position %s is different from expected shape %s' %(trunc.shape[1:], idx, sample_shape))if padding == 'post':x[idx, :len(trunc)] = truncelif padding == 'pre':x[idx, -len(trunc):] = truncelse:raise ValueError('Padding type "%s" not understood' % padding)return x, lengths# 因为我的数据下载到D:\迅雷下载\任务组_20190426_1706\中了所以
wav_path = 'D:\迅雷下载\任务组_20190426_1706\data_thchs30/train'
label_file = 'D:\迅雷下载\任务组_20190426_1706\data_thchs30\data'
# wav_files, labels = get_wavs_lables(wav_path,label_file)
wav_files, labels = get_wav_files_and_tran_texts(wav_path, label_file)# 字表
all_words = []
for label in labels:# print(label)audiofile_to_input_vectorall_words += [word for word in label]
counter = Counter(all_words)
words = sorted(counter)
words+=[""]#可以手动将字符集中的末尾再加一个空格
words_size = len(words)
word_num_map = dict(zip(words, range(words_size)))
print('字表大小:', words_size)# 梅尔倒谱系数的个数
n_input = 26
# 对于每个时间序列,要包含上下文样本的个数
n_context = 9
# batch大小
batch_size = 8def next_batch(wav_files, labels, start_idx=0, batch_size=1):filesize = len(labels)# 计算要获取的序列的开始和结束下标end_idx = min(filesize, start_idx + batch_size)idx_list = range(start_idx, end_idx)# 获取要训练的音频文件路径和对于的译文txt_labels = [labels[i] for i in idx_list]wav_files = [wav_files[i] for i in idx_list]# 将音频文件转成要训练的数据(source, audio_len, target, transcript_len) = get_audio_and_transcriptch(None,wav_files,n_input,n_context, word_num_map, txt_labels)start_idx += batch_size# Verify that the start_idx is not largVerify that the start_idx is not ler than total available sample sizeif start_idx >= filesize:start_idx = -1# Pad input to max_time_step of this batch# 如果多个文件将长度统一,支持按最大截断或补0source, source_lengths = pad_sequences(source)# 返回序列的稀疏表示sparse_labels = sparse_tuple_from(target)return start_idx, source, source_lengths, sparse_labelsprint('音频文件:  ' + wav_files[0])
print('文字内容:  ' + labels[0])
# 获取一个batch的数据
next_idx, source, source_len, sparse_lab = next_batch(wav_files, labels, 0, batch_size)
print(np.shape(source))
# 将字向量转成文字
t = sparse_tuple_to_texts_ch(sparse_lab, words)
print(t[0])
# source已经将变为前9(不够补空)+本身+后9,每个26,第一个顺序是第10个的数据。b_stddev = 0.046875
h_stddev = 0.046875n_hidden = 1024
n_hidden_1 = 1024
n_hidden_2 = 1024
n_hidden_5 = 1024
n_cell_dim = 1024
n_hidden_3 = 2 * 1024keep_dropout_rate = 0.95
relu_clip = 20"""
used to create a variable in CPU memory.
"""def variable_on_gpu(name, shape, initializer):# Use the /gpu:0 device for scoped operationswith tf.device('/gpu:0'):# Create or get apropos variablevar = tf.get_variable(name=name, shape=shape, initializer=initializer)return vardef BiRNN_model(batch_x, seq_length, n_input, n_context, n_character, keep_dropout):# batch_x_shape: [batch_size, amax_stepsize, n_input + 2 * n_input * n_context]batch_x_shape = tf.shape(batch_x)# 将输入转成时间序列优先batch_x = tf.transpose(batch_x, [1, 0, 2])# 再转成2维传入第一层# [amax_stepsize * batch_size, n_input + 2 * n_input * n_context]batch_x = tf.reshape(batch_x, [-1, n_input + 2 * n_input * n_context])# 使用clipped RELU activation and dropout.# 1st layerwith tf.name_scope('fc1'):b1 = variable_on_gpu('b1', [n_hidden_1], tf.random_normal_initializer(stddev=b_stddev))h1 = variable_on_gpu('h1', [n_input + 2 * n_input * n_context, n_hidden_1],tf.random_normal_initializer(stddev=h_stddev))layer_1 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(batch_x, h1), b1)), relu_clip)layer_1 = tf.nn.dropout(layer_1, keep_dropout)# 2nd layerwith tf.name_scope('fc2'):b2 = variable_on_gpu('b2', [n_hidden_2], tf.random_normal_initializer(stddev=b_stddev))h2 = variable_on_gpu('h2', [n_hidden_1, n_hidden_2], tf.random_normal_initializer(stddev=h_stddev))layer_2 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(layer_1, h2), b2)), relu_clip)layer_2 = tf.nn.dropout(layer_2, keep_dropout)# 3rd layerwith tf.name_scope('fc3'):b3 = variable_on_gpu('b3', [n_hidden_3], tf.random_normal_initializer(stddev=b_stddev))h3 = variable_on_gpu('h3', [n_hidden_2, n_hidden_3], tf.random_normal_initializer(stddev=h_stddev))layer_3 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(layer_2, h3), b3)), relu_clip)layer_3 = tf.nn.dropout(layer_3, keep_dropout)# 双向rnnwith tf.name_scope('lstm'):# Forward direction cell:lstm_fw_cell = tf.contrib.rnn.BasicLSTMCell(n_cell_dim, forget_bias=1.0, state_is_tuple=True)lstm_fw_cell = tf.contrib.rnn.DropoutWrapper(lstm_fw_cell,input_keep_prob=keep_dropout)# Backward direction cell:lstm_bw_cell = tf.contrib.rnn.BasicLSTMCell(n_cell_dim, forget_bias=1.0, state_is_tuple=True)lstm_bw_cell = tf.contrib.rnn.DropoutWrapper(lstm_bw_cell,input_keep_prob=keep_dropout)# `layer_3`  `[amax_stepsize, batch_size, 2 * n_cell_dim]`layer_3 = tf.reshape(layer_3, [-1, batch_x_shape[0], n_hidden_3])outputs, output_states = tf.nn.bidirectional_dynamic_rnn(cell_fw=lstm_fw_cell,cell_bw=lstm_bw_cell,inputs=layer_3,dtype=tf.float32,time_major=True,sequence_length=seq_length)# 连接正反向结果[amax_stepsize, batch_size, 2 * n_cell_dim]outputs = tf.concat(outputs, 2)# to a single tensor of shape [amax_stepsize * batch_size, 2 * n_cell_dim]outputs = tf.reshape(outputs, [-1, 2 * n_cell_dim])with tf.name_scope('fc5'):b5 = variable_on_gpu('b5', [n_hidden_5], tf.random_normal_initializer(stddev=b_stddev))h5 = variable_on_gpu('h5', [(2 * n_cell_dim), n_hidden_5], tf.random_normal_initializer(stddev=h_stddev))layer_5 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(outputs, h5), b5)), relu_clip)layer_5 = tf.nn.dropout(layer_5, keep_dropout)with tf.name_scope('fc6'):# 全连接层用于softmax分类b6 = variable_on_gpu('b6', [n_character], tf.random_normal_initializer(stddev=b_stddev))h6 = variable_on_gpu('h6', [n_hidden_5, n_character], tf.random_normal_initializer(stddev=h_stddev))layer_6 = tf.add(tf.matmul(layer_5, h6), b6)# 将2维[amax_stepsize * batch_size, n_character]转成3维 time-major [amax_stepsize, batch_size, n_character].layer_6 = tf.reshape(layer_6, [-1, batch_x_shape[0], n_character])print('n_character:' + str(n_character))# Output shape: [amax_stepsize, batch_size, n_character]return layer_6# input_tensor为输入音频数据,由前面分析可知,它的结构是[batch_size, amax_stepsize, n_input + (2 * n_input * n_context)]
# 其中,batch_size是batch的长度,amax_stepsize是时序长度,n_input + (2 * n_input * n_context)是MFCC特征数,
# batch_size是可变的,所以设为None,由于每一批次的时序长度不固定,所有,amax_stepsize也设为None
input_tensor = tf.placeholder(tf.float32, [None, None, n_input + (2 * n_input * n_context)], name='input')
# Use sparse_placeholder; will generate a SparseTensor, required by ctc_loss op.
# targets保存的是音频数据对应的文本的系数张量,所以用sparse_placeholder创建一个稀疏张量
targets = tf.sparse_placeholder(tf.int32, name='targets')
# seq_length保存的是当前batch数据的时序长度
seq_length = tf.placeholder(tf.int32, [None], name='seq_length')
# keep_dropout则是dropout的参数
keep_dropout = tf.placeholder(tf.float32)# logits is the non-normalized output/activations from the last layer.
# logits will be input for the loss function.
# nn_model is from the import statement in the load_model function
logits = BiRNN_model(input_tensor, tf.to_int64(seq_length), n_input, n_context, words_size + 1, keep_dropout)# 使用ctc loss计算损失
avg_loss = tf.reduce_mean(ctc_ops.ctc_loss(targets, logits, seq_length))# 优化器
learning_rate = 0.001
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(avg_loss)# 使用CTC decoder
with tf.name_scope("decode"):decoded, log_prob = ctc_ops.ctc_beam_search_decoder(logits, seq_length, merge_repeated=False)# 计算编辑距离
with tf.name_scope("accuracy"):distance = tf.edit_distance(tf.cast(decoded[0], tf.int32), targets)# 计算label error rate (accuracy)ler = tf.reduce_mean(distance, name='label_error_rate')# 迭代次数
epochs = 100
# 模型保存地址
savedir = "saver/"
# 如果该目录不存在,新建
if os.path.exists(savedir) == False:os.mkdir(savedir)# 生成saver
saver = tf.train.Saver(max_to_keep=1)
# 创建session
with tf.Session() as sess:# 初始化sess.run(tf.global_variables_initializer())# 没有模型的话,就重新初始化kpt = tf.train.latest_checkpoint(savedir)print("kpt:", kpt)startepo = 0if kpt != None:saver.restore(sess, kpt)ind = kpt.find("-")startepo = int(kpt[ind + 1:])print(startepo)# 准备运行训练步骤section = '\n{0:=^40}\n'print(section.format('Run training epoch'))train_start = time.time()floss = open("loss_value", 'w', encoding='UTF-8')for epoch in range(epochs):  # 样本集迭代次数epoch_start = time.time()if epoch < startepo:continueprint("epoch start:", epoch, "total epochs= ", epochs)#######################run batch####n_batches_per_epoch = int(np.ceil(len(labels) / batch_size))print("total loop ", n_batches_per_epoch, "in one epoch,", batch_size, "items in one loop")train_cost = 0train_ler = 0next_idx = 0for batch in range(n_batches_per_epoch):  # 一次batch_size,取多少次# 取数据print('开始获取数据咯:' + str(batch))next_idx, source, source_lengths, sparse_labels = next_batch(wav_files, labels, next_idx, batch_size)print('结束咯')feed = {input_tensor: source, targets: sparse_labels, seq_length: source_lengths,keep_dropout: keep_dropout_rate}# 计算 avg_loss optimizer ;batch_cost, _ = sess.run([avg_loss, optimizer], feed_dict=feed)train_cost += batch_cost# 验证模型的准确率,比较耗时,我们训练的时候全力以赴,所以这里先不跑if (batch + 1) % 100 == 0:print('loop:', batch, 'Train cost: ', train_cost / (batch + 1))print('loop:', batch, 'Train cost: ', train_cost / (batch + 1),file=floss)feed2 = {input_tensor: source, targets: sparse_labels, seq_length: source_lengths, keep_dropout: 1.0}d, train_ler = sess.run([decoded[0], ler], feed_dict=feed2)dense_decoded = tf.sparse_tensor_to_dense(d, default_value=-1).eval(session=sess)dense_labels = sparse_tuple_to_texts_ch(sparse_labels, words)counter = 0print('Label err rate: ', train_ler)for orig, decoded_arr in zip(dense_labels, dense_decoded):# convert to stringsdecoded_str = ndarray_to_text_ch(decoded_arr, words)print(' file {}'.format(counter))print('Original: {}'.format(orig))print('Decoded:  {}'.format(decoded_str))counter = counter + 1break# 每训练100次保存一下模型if (batch + 1) % 100 == 0:saver.save(sess, savedir + "saver.cpkt", global_step=epoch)epoch_duration = time.time() - epoch_startlog = 'Epoch {}/{}, train_cost: {:.3f}, train_ler: {:.3f}, time: {:.2f} sec'print(log.format(epoch, epochs, train_cost, train_ler, epoch_duration))train_duration = time.time() - train_startprint('Training complete, total duration: {:.2f} min'.format(train_duration / 60))

三.将测试集放入训练好的模型进行测试

四.延申

在其他项目中的应用可以针对业务场景收集具有业务特色的语音训练集,然后训练针对业务更精准的模型

TensorFlow实现语音识别相关推荐

  1. TensorFlow发布语音识别入门教程,附1GB数据集代码

    原标题:TensorFlow发布语音识别入门教程,附1GB数据集&代码 机械鸡的鸡友经常问:如何开始入门深度学习语音和其他音频识别,例如关键字检测或语音命令. 虽然有一些伟大的开源语音识别系统 ...

  2. 手把手教你:基于TensorFlow的语音识别系统

    系列文章 第十章.手把手教你:基于Django的用户画像可视化系统 第九章.手把手教你:个人信贷违约预测模型 第八章.手把手教你:基于LSTM的股票预测系统 目录 系列文章 一.项目简介 二.语音数据 ...

  3. python动物语音识别_GitHub - bestpower/Speech_Recognition_Test: 利用Python+TensorFlow实现语音识别...

    Speech_Recognition_Test 中文语音识别 1.项目运行环境 Windows7x64 Pycharm 2018.2.4 Python 3.6.2 独立显卡 GTX1050Ti 2.项 ...

  4. 使用Tensorflow进行语音识别 代码阅读笔记2

    看一下这个工程中的数据加载方式 数据加载 1 Dataset 类 examples/timit/data/load_dataset_ctc.py #! /usr/bin/env python # -* ...

  5. 喜大普奔!TensorFlow终于支持A卡了

    作者 | 非主流 出品 | AI科技大本营 工资不涨,英伟达 GPU 的售价年年涨.因此,多一个竞争对手,总是好事. 近日,Google 宣布推出适用于 ROCm GPU 的 TensorFlow v ...

  6. 语音识别_ML-KWS-for-MCU_资料整理

    ML-KWS-for-MCU[1]是一个把Google的基于TensorFlow开源语音识别项目[2](Keyword spotting,简称KWS)应用于MCU上的实例,所以要了解ML-KWS-fo ...

  7. TensorFlow已死,TensorFlow万岁!

    如果你是一名人工智能爱好者,却没有关注到一条重大新闻,就好比你在一场罕见的地震中打了个盹.等你醒来,会发现一切都将改变! TensorFlow 2.0来了! 革命就在这里!欢迎来到TensorFlow ...

  8. 一文了解 2018年最火爆的30个机器学习项目

    机器学习是当前最为火爆的话题之一,机器学习的开源项目也层出不穷,让人目不暇接.本文从受欢迎程度方面,对比以及挑选出了去年发布的30个最火的机器学习项目. 下面,让我们一起来看看,2018年究竟有哪些机 ...

  9. 深度学习框架中的魔鬼:探究人工智能系统中的安全问题

    ISC 2017中国互联网安全大会举办了人工智能安全论坛. 我们把论坛总结成为一系列文章,本文为系列中的第一篇. 深度学习引领着新一轮的人工智能浪潮,受到工业界以及全社会的广泛关注. 虽然大家对人工智 ...

最新文章

  1. MyBatis入门示例
  2. 可持续发展的人工智能
  3. HDU-2612 Find a way
  4. 如何使用graphpad做柱形图_系列文章 如何使用PaddleDetection做一个完整项目(三)...
  5. linux命令行变大,Linux命令行下'!'的8大神奇的用法!
  6. shell脚本遍历分库分表数据
  7. onenote 思维导图_学生党做笔记,我为什么更推荐OneNote?看后你就明白了
  8. amazeui学习笔记二(进阶开发2)--Web组件简介Web Component
  9. 免费干货课程!发放官方证书!参与更有礼品相送!戳进绝不后悔~
  10. Docker学习总结(67)—— 取代 Dockerfile 的新型镜像构建技术 Buildpacks 详解
  11. PowerBuilder 数据窗口实例四(用户信息查询)
  12. [转]java 中的序列化是什么意思?有什么好处?
  13. 优化理论08-----约束优化的最优性条件、拉格朗日条件、凸性、约束规范、二阶最优性条件(下)
  14. Linux中/proc目录下文件详解(一)
  15. LeetCode之根据字符出现频率排序
  16. 联想笔记本驱动升级,指纹识别不了不成功解决方法
  17. SpringBoot更换Apache Log4 2.15.0-rc2j漏洞补丁
  18. 新建word 无法切换输入法_为什么word文件中无法切换输入法?
  19. IntelliJ IDEA 在使用manven后的纠结(每次修改代码都要重启tomcat才能看效果吗?),请各位大侠来看看问题
  20. 十六进制转换成八进制(超级详细注释了)

热门文章

  1. 当我们与某远程网络连接不上时,就需要跟踪路由查看,以便 了解在网络的什么位置出现了问题,满足该目的的命令是
  2. 语义分割介绍和FCN
  3. 查询电脑关机/重启记录
  4. 华为设备配置Smart Link负载分担
  5. 塑胶卡扣弹性计算公式_详细讲解塑胶卡扣结构设计要点.ppt
  6. 使用java进行pdf转word实战
  7. WPS衍生新软件,填补一大缺憾,让office汗颜,Excel用户很开心
  8. IFIX数据写入html,iFIX常见问题问答.doc
  9. Vue----登录主页动态背景短视频制作
  10. Z-Wave 700 秘钥生成、固件签名、及OTA过程