Kaldi中如何使用已经训练好的模型进行语音识别ASR呢？

我们如何运用已经训练好的模型进行语音识别呢？这才是我们研究的目的啊，是不？

很好，细心的你一定会发现kaldi源码src目录中有online*相关的模块，这就是我们今天的主角啦！！！

Kaldi中有两个版本的online、online2分别是第一代、第二代，现在已经不维护online，转到online2了，但作为我们入门的，我建议还是选择online，由简入深嘛！！！

默认kaldi是不会编译online模块的，怎么让她理解我的意图呢？

[houwenbin@localhost ~]$ cd ~/kaldi-master/src

[houwenbin@localhost src]$ make ext -j 6

顺利编译出来~~~~

[houwenbin@localhost ~]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/onlinebin/online-gmm-decode-faster
/home/houwenbin/kaldi-master/src/onlinebin/online-gmm-decode-faster Decode speech, using microphone input(PortAudio)Utterance segmentation is done on-the-fly.
Feature splicing/LDA transform is used, if the optional(last) argument is given.
Otherwise delta/delta-delta(2-nd order) features are produced.Usage: online-gmm-decode-faster [options] <model-in><fst-in> <word-symbol-table> <silence-phones> [<lda-matrix-in>]Example: online-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 model HCLG.fst words.txt '1:2:3:4:5' lda-matrix
Options:--acoustic-scale            : Scaling factor for acoustic likelihoods (float, default = 0.1)--batch-size                : Number of feature vectors processed w/o interruption (int, default = 27)--beam                      : Decoding beam.  Larger->slower, more accurate. (float, default = 16)--beam-delta                : Increment used in decoder [obscure setting] (float, default = 0.5)--beam-update               : Beam update rate (float, default = 0.01)--cmn-window                : Number of feat. vectors used in the running average CMN calculation (int, default = 600)--delta-order               : Order of delta computation (int, default = 2)--delta-window              : Parameter controlling window for delta computation (actual window size for each delta order is 1 + 2*delta-window-size) (int, default = 2)--hash-ratio                : Setting used in decoder to control hash behavior (float, default = 2)--inter-utt-sil             : Maximum # of silence frames to trigger new utterance (int, default = 50)--left-context              : Number of frames of left context (int, default = 4)--max-active                : Decoder max active states.  Larger->slower; more accurate (int, default = 2147483647)--max-beam-update           : Max beam update rate (float, default = 0.05)--max-utt-length            : If the utterance becomes longer than this number of frames, shorter silence is acceptable as an utterance separator (int, default = 1500)--min-active                : Decoder min active states (don't prune if #active less than this). (int, default = 20)--min-cmn-window            : Minumum CMN window used at start of decoding (adds latency only at start) (int, default = 100)--num-tries                 : Number of successive repetitions of timeout before we terminate stream (int, default = 5)--right-context             : Number of frames of right context (int, default = 4)--rt-max                    : Approximate maximum decoding run time factor (float, default = 0.75)--rt-min                    : Approximate minimum decoding run time factor (float, default = 0.7)--update-interval           : Beam update interval in frames (int, default = 3)Standard options:--config                    : Configuration file to read (this option may be repeated) (string, default = "")--help                      : Print out usage message (bool, default = false)--print-args                : Print the command line arguments (to stderr) (bool, default = true)--verbose                   : Verbose level (higher->more logging) (int, default = 0)

语音识别就看egs/voxforge查看下面的run.sh

#!/bin/bash# Copyright 2012 Vassil Panayotov
# Apache 2.0# Note: you have to do 'make ext' in ../../../src/ before running this.# Set the paths to the binaries and scripts needed
KALDI_ROOT=`pwd`/../../..
export PATH=$PWD/../s5/utils/:$KALDI_ROOT/src/onlinebin:$KALDI_ROOT/src/bin:$PATHdata_file="online-data"
data_url="http://sourceforge.net/projects/kaldi/files/online-data.tar.bz2"# Change this to "tri2a" if you like to test using a ML-trained model
ac_model_type=tri2b_mmi# Alignments and decoding results are saved in this directory(simulated decoding only)
decode_dir="./work"# Change this to "live" either here or using command line switch like:
# --test-mode live
test_mode="simulated". parse_options.shac_model=${data_file}/models/$ac_model_type
trans_matrix=""
audio=${data_file}/audioif [ ! -s ${data_file}.tar.bz2 ]; thenecho "Downloading test models and data ..."wget -T 10 -t 3 $data_url;if [ ! -s ${data_file}.tar.bz2 ]; thenecho "Download of $data_file has failed!"exit 1fi
fiif [ ! -d $ac_model ]; thenecho "Extracting the models and data ..."tar xf ${data_file}.tar.bz2
fiif [ -s $ac_model/matrix ]; thentrans_matrix=$ac_model/matrix
ficase $test_mode inlive)echoecho -e "  LIVE DEMO MODE - you can use a microphone and say something\n"echo "  The (bigram) language model used to build the decoding graph was"echo "  estimated on an audio book's text. The text in question is"echo "  \"King Solomon's Mines\" (http://www.gutenberg.org/ebooks/2166)."echo "  You may want to read some sentences from this book first ..."echoonline-gmm-decode-faster --rt-min=0.5 --rt-max=0.7 --max-active=4000 \--beam=12.0 --acoustic-scale=0.0769 $ac_model/model $ac_model/HCLG.fst \$ac_model/words.txt '1:2:3:4:5' $trans_matrix;;simulated)echoecho -e "  SIMULATED ONLINE DECODING - pre-recorded audio is used\n"echo "  The (bigram) language model used to build the decoding graph was"echo "  estimated on an audio book's text. The text in question is"echo "  \"King Solomon's Mines\" (http://www.gutenberg.org/ebooks/2166)."echo "  The audio chunks to be decoded were taken from the audio book read"echo "  by John Nicholson(http://librivox.org/king-solomons-mines-by-haggard/)"echoecho "  NOTE: Using utterances from the book, on which the LM was estimated"echo "        is considered to be \"cheating\" and we are doing this only for"echo "        the purposes of the demo."echoecho "  You can type \"./run.sh --test-mode live\" to try it using your"echo "  own voice!"echomkdir -p $decode_dir# make an input .scp file> $decode_dir/input.scpfor f in $audio/*.wav; dobf=`basename $f`bf=${bf%.wav}echo $bf $f >> $decode_dir/input.scpdoneonline-wav-gmm-decode-faster --verbose=1 --rt-min=0.8 --rt-max=0.85\--max-active=4000 --beam=12.0 --acoustic-scale=0.0769 \scp:$decode_dir/input.scp $ac_model/model $ac_model/HCLG.fst \$ac_model/words.txt '1:2:3:4:5' ark,t:$decode_dir/trans.txt \ark,t:$decode_dir/ali.txt $trans_matrix;;*)echo "Invalid test mode! Should be either \"live\" or \"simulated\"!";exit 1;;
esac# Estimate the error rate for the simulated decoding
if [ $test_mode == "simulated" ]; then# Convert the reference transcripts from symbols to word IDssym2int.pl -f 2- $ac_model/words.txt < $audio/trans.txt > $decode_dir/ref.txt# Compact the hypotheses belonging to the same test utterancecat $decode_dir/trans.txt |\sed -e 's/^\(test[0-9]\+\)\([^ ]\+\)\(.*\)/\1 \3/' |\gawk '{key=$1; $1=""; arr[key]=arr[key] " " $0; } END { for (k in arr) { print k " " arr[k]} }' > $decode_dir/hyp.txt# Finally compute WERcompute-wer --mode=present ark,t:$decode_dir/ref.txt ark,t:$decode_dir/hyp.txt
fi

脚本自动去下载预训练模型：http://sourceforge.net/projects/kaldi/files/online-data.tar.bz2

我们输入：./run.sh --test-mode simulated就可以直接识别wav音频文件了！！！

[houwenbin@localhost]$cd ~/kaldi-master/egs/voxforge/online-demo
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ./run.sh --test-mode simulated/liveSIMULATED ONLINE DECODING - pre-recorded audio is usedThe (bigram) language model used to build the decoding graph wasestimated on an audio book's text. The text in question is"King Solomon's Mines" (http://www.gutenberg.org/ebooks/2166).The audio chunks to be decoded were taken from the audio book readby John Nicholson(http://librivox.org/king-solomons-mines-by-haggard/)NOTE: Using utterances from the book, on which the LM was estimatedis considered to be "cheating" and we are doing this only forthe purposes of the demo.You can type "./run.sh --test-mode live" to try it using yourown voice!online-wav-gmm-decode-faster --verbose=1 --rt-min=0.8 --rt-max=0.85 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 scp:./work/input.scp online-data/models/tri2b_mmi/model online-data/models/tri2b_mmi/HCLG.fst online-data/models/tri2b_mmi/words.txt 1:2:3:4:5 ark,t:./work/trans.txt ark,t:./work/ali.txt online-data/models/tri2b_mmi/matrix
File: test1
YOUR WARRIORS MUST GROW WEARY OF RESTING ON THEIR SPEARS INFADOOS MY LORD THERE WAS ONE WAR JUST AFTER WE DESTROYED THE PEOPLE IT CAME DOWN UPON US BUT IT WAS A CIVIL WAR DOG ATE DOG HOW WAS THAT MY LORD THE KING MY HALF BROTHER HOW BROTHER BORN AT THE SAME BIRTH AND OF THE SAME WOMAN IT IS NOT OUR CUSTOM MY LORD TO SUFFER TWINS TO LIVE THE WEAKER ALWAYS BEEN MUST DIE BUT THE MOTHER OF THE KING HID AWAY THE FEEBLER CHILD WHICH WAS BORN THE LAST FOR HER HEART YEARNED OVER IT AND THAT CHILD IS TWALA THE KING File: test2
I AM HIS YOUNGER BROTHER BORN ANOTHER WIFE WELL MY LORD KAFA OUR FATHER DIED WHEN WE CAME TO MANHOOD IN MY BROTHER IMOTU WAS MADE KING AND HIS PLACE AND FOR A SPACE REIGNED AND HAD A SIGN BY HIS FAVOURITE WIFE WHEN THE BABE WAS THREE YEARS OLD JUST AFTER THE GREAT WAR DURING WHICH NO MAN COULD SOW OR REAP A FAMINE CAME UPON THE LAND AND THE PEOPLE MURMURED BECAUSE OF THE FAMINE AND LOOKED ROUND LIKE A STARVED LION FOR SOMETHING TO REND THAT IT WAS DEAD GO GOLD THE WISE AND TERRIBLE WOMAN WHO DOES NOT DIE MADE A PROCLAMATION TO THE PEOPLE SAYING THE KING IMOTU IS NO GAME AND AT THE TIME IMOTU WAS SICK WAS A WIND AND LAY IN HIS KRAAL NOT ABLE TO MOVE THEN GAGOOL WENT INTO A HUT AND LED OUT TWALA MY HALF BROTHER File: test3
EVEN IF THIS CHILD IGNOSI HAD LIVED HE WOULD BE THE TRUE KING OF THE KUKUANA PEOPLE I SAW MY LORD THE SACRED SNAKE IS ROUND HIS LITTLE IGNOSI IS KING BUT ALAS HE IS LONG DEAD SEE MY LORDS AND INFADOOS POINTED TO A VAST COLLECTION OF HUTS SURROUNDED BY A FENCE WHICH WAS ITS TURN ENCIRCLED BY A GREAT DITCH THAT LIE ON THE PLAIN BENEATH US THAT IS THE KRAAL WHERE THE WHITE HEARD HIM OR TWO WAS LAST SCENE WITH THE CHILD IGNOSI IT IS THERE THAT WE SHALL SLEEP TO NIGHT IF INDEED HE ADDED DOUBTFULLY MY LORDS SLEEP UPON THIS EARTH compute-wer --mode=present ark,t:./work/ref.txt ark,t:./work/hyp.txt
%WER 10.11 [ 37 / 366, 5 ins, 10 del, 22 sub ]
%SER 100.00 [ 3 / 3 ]
Scored 3 sentences, 0 not present in hyp.
[houwenbin@localhost online_demo]$

Kaldi中如何使用已经训练好的模型进行语音识别ASR呢？相关推荐

在 C/C++ 中使用 TensorFlow 预训练好的模型—— 直接调用Ｃ++ 接口实现
现在的深度学习框架一般都是基于 Python 来实现,构建.训练.保存和调用模型都可以很容易地在 Python 下完成.但有时候,我们在实际应用这些模型的时候可能需要在其他编程语言下进行,本文将通过直 ...
kaldi中的chain model(LFMMI)详解
chain model的结构 chain model实际上是借鉴了CTC的思想,引入了blank用来吸收不确定的边界.但CTC只有一个blank,而chain model中每一个建模单元都有自己的bl ...
用浏览器训练Tensorflow.js模型的18个技巧（上）
摘要: 送你18个训练Tensorflow.js模型的小技巧! 在移植现有模型(除tensorflow.js)进行物体检测.人脸检测.人脸识别后,我发现一些模型不能以最佳性能发挥.而tensorflo ...
目标检测 YOLO v3 训练人脸检测模型
YOLO,是You Only Look Once的缩写,一种基于深度卷积神经网络的物体检测算法,YOLO v3是YOLO的第3个版本,检测算法更快更准. 本文源码:https://github.com ...
kaldi中的深度神经网络
这个文档主要来说kaldi中Karel Vesely部分的深度神经网络代码. 如果想了解kaldi的全部深度神经网络代码,请Deep Neural Networks in Kaldi, 和Dan的版本 ...
基于kaldi训练唤醒词模型的一种方法
0. 前言什么是唤醒?激活Google智能助手,你可以对手机说"Hey Google"或者"OK Google",其他诸如阿里的天猫精灵智能音箱(天猫精灵). ...
Kaldi中DNN的实现
本文主要讲解kaldi中run.sh和run_tdnn.sh的代码,从中了解Kaldi的DNN的实现. 在 kaldi 训练过程中,DNN 的训练是主要是依赖于 GMM-HMM 模型的,通过 GMM- ...
kaldi中的声纹识别
kaldi中的声纹识别文章目录 kaldi中的声纹识别 kaldi的安装运行aishell例程使用TIMIT数据库进行声纹识别 kaldi中声纹识别的流程我的博客:https://yutouw ...
kaldi 学习笔记-单音素训练
本人初入语音识别一个月, 最近开始学习kaldi源码.本文介绍kaldi语音识别对单音素训练的大致流程.欢迎指正纠错,谢谢. 0. 预备知识单音素的训练在一个名为train-mono.sh的shel ...

Kaldi中如何使用已经训练好的模型进行语音识别ASR呢？

Kaldi中如何使用已经训练好的模型进行语音识别ASR呢？相关推荐

最新文章

热门文章