Python实现语音识别：SpeechRecognition

最近在学习语音识别的一些基本知识，也在了解Python的语音识别功能依赖库。分享一下。

常用Python语音识别依赖库

Python的依赖库中有一些现成的语音识别软件包。其中包括：

apiai

google-cloud-speech

pocketsphinx

SpeechRcognition

watson-developer-cloud

wit

其中SpeechRecognition，是google出的，专注于语音向文本的转换。

wit 和 apiai 提供了一些超出基本语音识别的内置功能，如识别讲话者意图的自然语言处理功能。

SpeechRecognition库的优势

满足几种主流语音 API ，灵活性高

Google Web Speech API 支持硬编码到 SpeechRecognition 库中的默认 API 密钥，无需注册就可使用

SpeechRecognition无需构建访问麦克风和从头开始处理音频文件的脚本，只需几分钟即可自动完成音频输入、检索并运行。因此易用性很高。

SpeechRecognition的识别器

SpeechRecognition 的核心就是识别器类。一共有七个Recognizer API ，包含多种设置和功能来识别音频源的语音，分别是：

recognize_bing()：Microsoft Bing Speech

recognize_google()： Google Web Speech API

recognize_google_cloud()：Google Cloud Speech - requires installation of the google-cloud-speech package

recognize_houndify()： Houndify by SoundHound

recognize_ibm()：IBM Speech to Text

recognize_sphinx()：CMU Sphinx - requires installing PocketSphinx

recognize_wit()：Wit.ai

以上七个中只有 recognition_sphinx（）可与CMU Sphinx 引擎脱机工作，其他六个都需要连接互联网。

另外，SpeechRecognition 附带 Google Web Speech API 的默认 API 密钥，可直接使用它。其他六个 API 都需要使用 API 密钥或用户名/密码组合进行身份验证，因此本文使用了 Web Speech API。

SpeechRecognition 的使用要求

To use all of the functionality of the library, you should have:

Python 2.6, 2.7, or 3.3+ (required)

需要Python 2.6、2.7和3.3以上的版本

PyAudio 0.2.11+ (required only if you need to use microphone input, Microphone)

需要安装PyAudio 0.2.11+的版本

PocketSphinx (required only if you need to use the Sphinx recognizer, recognizer_instance.recognize_sphinx)

需要安装PocketSphinx

Google API Client Library for Python (required only if you need to use the Google Cloud Speech API, recognizer_instance.recognize_google_cloud)

需要使用Google API Client Library for Python

FLAC encoder (required only if the system is not x86-based Windows/Linux/OS X)

需要安装FLAC encoder，如果系统不是X86

SpeechRecognition 支持的文件类型

支持的文件类型有：

WAV: 必须是 PCM/LPCM 格式

AIFF

AIFF-C

FLAC: 必须是初始 FLAC 格式；OGG-FLAC 格式不可用

安装 SpeechRecognation

上篇文章介绍了SpeechRecognition的基本概念和优势，这篇文章介绍如何安装和体验一下demo。

一、安装Python，基于Python3.7

从终端安装 SpeechRecognition，使用命令：pip3 install SpeechRecognition：

alicedembp:~ alice$ pip3 install SpeechRecognition
Requirement already satisfied: SpeechRecognition in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (3.8.1)
alicedembp:~ alice$ python -m speech_recognition

二、验证安装是否成功

安装完成后打开解释器窗口输入以下内容来验证安装：

>>> import speech_recognition as sr
>>> sr.__version__
'3.8.1'

三、安装portaudio、pyaudio

接下来，安装必须依赖的两个包，注意顺序不能错，安装pyaudio时必须依赖于portaudio

brew install portaudio
pip install pyaudio

如下：

alicedembp:~ alice$ brew install portaudio
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 1 tap (homebrew/core).
==> New Formulae
allureofthestars     csound               inlets               libgr                terrahub
boringtun            cubelib              itk                  nlohmann-json        vapoursynth-imwri
cfn-lint             cypher-shell         kahip                otf2                 vapoursynth-ocr
cmix                 fasttext             ktlint               phpstan              vapoursynth-sub
cpp-gsl              faudio               kubeaudit            scws
cql                  gel                  leela-zero           sk
==> Updated Formulae
libpng ✔                  godep                     libdap                    picard-tools
amazon-ecs-cli            golang-migrate            libebml                   pijul
ammonite-repl             gopass                    libedit                   pilosa
ansifilter                goreleaser                libestr                   platformio
apache-geode              gradle                    libetonyek                postgresql
apache-spark              grafana                   libfabric                 postgresql@10
arangodb                  graphene                  libfixbuf                 pre-commit
aravis                    groovysdk                 libgit2                   presto
argyll-cms                grpc                      libgit2-glib              privoxy
asciidoctor               gst-editing-services      libical                   prometheus
autojump                  gst-libav                 libiconv                  pspg
autopep8                  gst-plugins-bad           libjson-rpc-cpp           psql2csv
avra                      gst-plugins-base          liblcf                    pulumi
aws-iam-authenticator     gst-plugins-good          liblinear                 purescript
aws-okta                  gst-plugins-ugly          libltc                    pushpin
aws-sdk-cpp               gst-python                libmatroska               py3cairo
azure-cli                 gst-rtsp-server           libmicrohttpd             pygobject3
badtouch                  gstreamer                 libmspub                  qalculate-gtk
ballerina                 gtranslator               libphonenumber            qbs
bash                      hadoop                    libpqxx                   qemu
bdw-gc                    harfbuzz                  libpulsar                 quazip
binaryen                  hebcal                    libqalculate              r
bind                      helmfile                  librealsense              rawtoaces
bit                       hexyl                     libressl                  rclone
blast                     hfstospell                libssh                    readline
boost                     hivemind                  libtorrent-rasterbar      rebar3
botan                     hledger                   libuv                     restic
btfs                      hlint                     libvisio                  ripgrep
buildkit                  hopenpgp-tools            libvmaf                   rke
bwfmetaedit               howdoi                    libxo                     roll
carla                     htmlcxx                   linkerd                   root
castxml                   http-parser               lmod                      rsyslog
ccache                    httpd                     lynis                     ruby
certbot                   hub                       lz4                       ruby-build
chakra                    hugo                      mapnik                    rust
chronograf                hydra                     maven                     rustup-init
clang-format              hypre                     maxwell                   s-nail
cmake                     i2p                       media-info                salt
cmocka                    iamy                      memcached                 serverless
cockroach                 icu4c                     meson                     shfmt
cogl                      idnits                    mimic                     ship
cointop                   igv                       mingw-w64                 sile
conan                     ilmbase                   minio                     silk
couchdb                   imagemagick               minio-mc                  skaffold
cpprestsdk                imagemagick@6             mkvtoolnix                sn0int
cromwell                  imake                     modules                   sonobuoy
crowdin                   influxdb                  mono                      sops
crystal                   iniparser                 mosquitto                 sqldiff
crystal-icr               ios-sim                   mpd                       sqlite
ctl                       ios-webkit-debug-proxy    mps-youtube               sqlite-analyzer
cython                    iozone                    msmtp                     sqlmap
dartsim                   ipbt                      mypy                      ssh-copy-id
dbhash                    ipfs                      mysql                     stubby
dfmt                      ipython                   n                         subversion
digdag                    ircii                     nagios                    svgo
dmd                       isl                       nano                      swagger-codegen
docfx                     istioctl                  nats-streaming-server     swagger-codegen@2
doctl                     itstool                   ncmpcpp                   swiftformat
dwdiff                    jailkit                   neovim                    swiftlint
emscripten                jbig2dec                  netdata                   synfig
epubcheck                 jena                      newsboat                  tarantool
erlang                    jenkins                   nghttp2                   tcpreplay
erlang@20                 jetty                     nginx                     tectonic
ethereum                  jfrog-cli-go              nifi                      telegraf
exploitdb                 jhiccup                   node                      teleport
faas-cli                  john                      node-build                tmux
ffmpeg                    joplin                    node@10                   tmuxinator-completion
field3d                   jp2a                      node@8                    tomcat
firebase-cli              jruby                     nomad                     topgrade
flatbuffers               json_spirit               numpy                     traefik
flow                      jump                      ocamlbuild                triton
fluxctl                   just                      octave                    tundra
fn                        kafka                     odpi                      typescript
freeling                  khard                     opencoarrays              ucloud
freetds                   kibana@5.6                opencolorio               ultralist
frps                      kitchen-sync              opencv                    urbit
frugal                    klavaro                   opencv@2                  v8
galen                     knot                      opencv@3                  vapoursynth
gauge                     knot-resolver             openexr                   varnish
gcc                       kore                      openimageio               vault
gcc@5                     kotlin                    openrct2                  vcdimager
gcc@6                     krb5                      openssh                   vim
gcc@7                     kubeprod                  openvdb                   vips
gegl                      kubernetes-cli            openvpn                   volt
getdns                    kyoto-cabinet             operator-sdk              vte3
ghc                       kyoto-tycoon              packer                    vtk
ghq                       lastpass-cli              paket                     webdis
gifsicle                  laszip                    parallel                  widelands
git-lfs                   latex2html                passenger                 wp-cli
gitfs                     latexml                   pazpar2                   wtf
gitlab-runner             lbdb                      pbrt                      xonsh
gitless                   lcdf-typetools            pcapplusplus              yaf
gjs                       lego                      pcl                       yaz
glances                   lgogdownloader            pcre2                     ykman
glfw                      libatomic_ops             pdal                      you-get
glib                      libb2                     pdfgrep                   youtube-dl
glooctl                   libbluray                 pdnsrec                   zebra
glslang                   libcddb                   php                       znc
gmic                      libcdio                   php-cs-fixer              zorba
gmsh                      libcdr                    php@7.1                   zstd
go                        libchamplain              php@7.2
goaccess                  libcoap                   phpunit
==> Deleted Formulae
safe==> Downloading https://homebrew.bintray.com/bottles/portaudio-19.6.0.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring portaudio-19.6.0.high_sierra.bottle.tar.gz
?  /usr/local/Cellar/portaudio/19.6.0: 33 files, 452KB
alicedembp:~ alice$ pip3 install pyaudio
Collecting pyaudioUsing cached https://files.pythonhosted.org/packages/ab/42/b4f04721c5c5bfc196ce156b3c768998ef8c0ae3654ed29ea5020c749a6b/PyAudio-0.2.11.tar.gz
Building wheels for collected packages: pyaudioBuilding wheel for pyaudio (setup.py) ... doneStored in directory: /Users/alice/Library/Caches/pip/wheels/f4/a8/a4/292214166c2917890f85b2f72a8e5f13e1ffa527c4200dcede
Successfully built pyaudio
Installing collected packages: pyaudio
Successfully installed pyaudio-0.2.11
alicedembp:~ alice$

否则会出现错误提示：src/_portaudiomodule.c:29:10: fatal error: 'portaudio.h' file not found

gcc -fno-strict-aliasing -Wsign-compare -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch i386 -arch x86_64 -g -DMACOSX=1 -I/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c src/_portaudiomodule.c -o build/temp.macosx-10.6-intel-3.7/src/_portaudiomodule.osrc/_portaudiomodule.c:29:10: fatal error: 'portaudio.h' file not found#include "portaudio.h"^~~~~~~~~~~~~1 error generated.error: command 'gcc' failed with exit status 1

SpeechRecognition的Demo调试

import speech_recognition as srr = sr.Recognizer()test = sr.AudioFile('/Users/alice/Documents/Work/Blog/AI/语音识别/speechrecognition/audiofiles/test1.wav')with test as source:audio = r.record(source)type (audio)r.recognize_google(audio, language='zh-CN', show_all= True)