目录

代码实践一之torch hub

代码实践二之deepspeech

代码实践三之speech_recognition


代码实践一之torch hub

 1. 环境准备

python 3.7以及下面的包

pip install torch torchaudio omegaconf

2. 下载并加载已经训练好的speech2text模型

model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',model='silero_stt',language='en', # also available 'de', 'es'device=device, )

3.下载一段音频来测试效果。

torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)

完整代码

import torch
from glob import globdevice = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',model='silero_stt',language='en', # also available 'de', 'es'device=device, )(read_batch, split_into_batches,read_audio, prepare_model_input) = utils  # see function signature for details# download a single file in any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav')batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),device=device)output = model(input)
for example in output:print(decoder(example.cpu()))

运行结果

the boch canoe slit on the smooth planks blew the sheet to the dark blue background it's easy to tell a depth of a well four hours of steady work faced us

初步测试总结

英语的语速快慢影响结果的输出。你们也可以自己录一段英语,试一试。

代码实践二之deepspeech

1. 所需环境

window环境, python=3.7

2. 下载模型和测试用到的音频. 分别放入model文件夹和audio文件夹

https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer
https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/audio-0.9.3.tar.gz

3. 创建一个新的虚拟环境(用Conda或者是virtualenv都可以). 

4. 安装deepspeech

conda install deepspeech

5. 执行下面命令

deepspeech --model model/deepspeech-0.9.3-models.pbmm --scorer model/deepspeech-0.9.3-models.scorer --audio audio/2830-3980-0043.wav

输出结果

(speed2text_tf) PS ...speech2text_tf> deepspeech --model model/deepspeech-0.9.3-models.pbmm --scorer model/deepspeech-0.9.3-models.scorer --audio audio/2830-3980-0043.wav
Loading model from file model/deepspeech-0.9.3-models.pbmm
TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
2022-12-24 13:24:09.108387: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loaded model in 0.0119s.
Loading scorer from files model/deepspeech-0.9.3-models.scorer
Loaded scorer in 0.0107s.
Running inference.
experience proves this
Inference took 0.670s for 1.975s audio file.

6. 实时的语音转换(从mic到文字)

DeepSpeech-examples/mic_vad_streaming at r0.9 · mozilla/DeepSpeech-examples · GitHub

下载上面github里的mic_vad_streaming.py和requirements.txt

用下面命令安装所需要的包

pip install -r requirements.txt

执行下面命令

python mic_vad_streaming/mic_vad_streaming.py -m model/deepspeech-0.9.3-models.pbmm -s model/deepspeech-0.9.3-models.scorer

结果如下:

(speed2text_tf) PS ...speech2text_tf> python mic_vad_streaming/mic_vad_streaming.py -m model/deepspeech-0.9.3-models.pbmm -s model/deepspeech-0.9.3-models.scorer
Initializing model...
INFO:root:ARGS.model: model/deepspeech-0.9.3-models.pbmm
TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
2022-12-24 13:51:58.003296: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO:root:ARGS.scorer: model/deepspeech-0.9.3-models.scorer
Listening (ctrl-C to exit)...
Recognized: no
Recognized: he
Recognized: hear me
Recognized: hear me
Recognized: to
Recognized: for i think seven

但是说实话,效果很一般,可能和我的口音有关吧,我只能这样解释。

参考资料

Welcome to DeepSpeech’s documentation! — Mozilla DeepSpeech 0.9.3 documentation

GitHub - mozilla/DeepSpeech: DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

代码实践三之speech_recognition

1. 安装包

pip install SpeechRecognition

2. 实践代码

导入speech_recognition包并创建Recoginzer实例

import speech_recognition as sr
r = sr.Recognizer()

每个 Recognizer 实例都有七种方法,用于使用各种 API 从音频源识别语音。这些是:

  • recognize_bing(): Microsoft Bing Speech
  • recognize_google(): Google Web Speech API
  • recognize_google_cloud(): Google Cloud Speech - requires installation of the google-cloud-speech package
  • recognize_houndify(): Houndify by SoundHound
  • recognize_ibm(): IBM Speech to Text
  • recognize_sphinx(): CMU Sphinx - requires installing PocketSphinx
  • recognize_wit(): Wit.ai

在这七个中,只有 recognize_sphinx() 可以离线使用 CMU Sphinx 引擎。其他六个都需要互联网连接。

监听你的麦克风

with sr.Microphone() as source:print("Say something!")audio = r.listen(source)print(audio);
print('dd',audio);

具体代码中的其它api可以参考下面链接

The Ultimate Guide To Speech Recognition With Python – Real Python

完整的代码实例如下:

#!/usr/bin/env python3# NOTE: this example requires PyAudio because it uses the Microphone classimport speech_recognition as sr# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:print("Say something!")audio = r.listen(source)print(audio);
print('dd',audio);# recognize speech using Sphinx
try:print("Sphinx thinks you said " + r.recognize_sphinx(audio))
except sr.UnknownValueError:print("Sphinx could not understand audio")
except sr.RequestError as e:print("Sphinx error; {0}".format(e))# recognize speech using Google Speech Recognition
try:# for testing purposes, we're just using the default API key# to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`# instead of `r.recognize_google(audio)`print("Google Speech Recognition thinks you said " + r.recognize_google(audio))
except sr.UnknownValueError:print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:print("Could not request results from Google Speech Recognition service; {0}".format(e))# recognize speech using Google Cloud Speech
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE"""
try:print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS))
except sr.UnknownValueError:print("Google Cloud Speech could not understand audio")
except sr.RequestError as e:print("Could not request results from Google Cloud Speech service; {0}".format(e))# recognize speech using Wit.ai
WIT_AI_KEY = "INSERT WIT.AI API KEY HERE"  # Wit.ai keys are 32-character uppercase alphanumeric strings
try:print("Wit.ai thinks you said " + r.recognize_wit(audio, key=WIT_AI_KEY))
except sr.UnknownValueError:print("Wit.ai could not understand audio")
except sr.RequestError as e:print("Could not request results from Wit.ai service; {0}".format(e))# recognize speech using Microsoft Bing Voice Recognition
BING_KEY = "INSERT BING API KEY HERE"  # Microsoft Bing Voice Recognition API keys 32-character lowercase hexadecimal strings
try:print("Microsoft Bing Voice Recognition thinks you said " + r.recognize_bing(audio, key=BING_KEY))
except sr.UnknownValueError:print("Microsoft Bing Voice Recognition could not understand audio")
except sr.RequestError as e:print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))# recognize speech using Microsoft Azure Speech
AZURE_SPEECH_KEY = "INSERT AZURE SPEECH API KEY HERE"  # Microsoft Speech API keys 32-character lowercase hexadecimal strings
try:print("Microsoft Azure Speech thinks you said " + r.recognize_azure(audio, key=AZURE_SPEECH_KEY))
except sr.UnknownValueError:print("Microsoft Azure Speech could not understand audio")
except sr.RequestError as e:print("Could not request results from Microsoft Azure Speech service; {0}".format(e))# recognize speech using Houndify
HOUNDIFY_CLIENT_ID = "INSERT HOUNDIFY CLIENT ID HERE"  # Houndify client IDs are Base64-encoded strings
HOUNDIFY_CLIENT_KEY = "INSERT HOUNDIFY CLIENT KEY HERE"  # Houndify client keys are Base64-encoded strings
try:print("Houndify thinks you said " + r.recognize_houndify(audio, client_id=HOUNDIFY_CLIENT_ID, client_key=HOUNDIFY_CLIENT_KEY))
except sr.UnknownValueError:print("Houndify could not understand audio")
except sr.RequestError as e:print("Could not request results from Houndify service; {0}".format(e))# recognize speech using IBM Speech to Text
IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE"  # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE"  # IBM Speech to Text passwords are mixed-case alphanumeric strings
try:print("IBM Speech to Text thinks you said " + r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD))
except sr.UnknownValueError:print("IBM Speech to Text could not understand audio")
except sr.RequestError as e:print("Could not request results from IBM Speech to Text service; {0}".format(e))# recognize speech using whisper
try:print("Whisper thinks you said " + r.recognize_whisper(audio, language="english"))
except sr.UnknownValueError:print("Whisper could not understand audio")
except sr.RequestError as e:print("Could not request results from Whisper")

3. pyaudio 安装问题解放方法

如你是mac m1的电脑,安装pyaudio时,如果遇到问题,可以参考下面博客

https://blog.csdn.net/keeppractice/article/details/128484193?csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22128484193%22%2C%22source%22%3A%22keeppractice%22%7D

参考资料

GitHub - Uberi/speech_recognition: Speech recognition module for Python, supporting several engines and APIs, online and offline.

[STT, AST, SpeechToText]的几个简单例子相关推荐

  1. webpack入门之简单例子跑起来

    webpack入门之简单例子跑起来 webpack介绍 Webpack是当下最热门的前端资源模块化管理和打包工具,它可以将很多松散的模块按照依赖和规则打包成符合生产环境部署的前端资源,还可以将按需加载 ...

  2. 图片上传(加水印、缩略图、远程保存)的简单例子

    图片上传(加水印.缩略图.远程保存)的简单例子(应用于51aspx.com) 该源码下载地址:http://51aspx.com/CV/ImageUpload 今天看到xiongeee发的文章使用使用 ...

  3. java hashtable import,Hashtable的一个简单例子

    该楼层疑似违规已被系统折叠 隐藏此楼查看此楼 以下是关于Hashtable的简单例子,谁知道别的遍历Hashtable的方法,请回复! package no1; import java.util.En ...

  4. SAP MM采购定价过程的一个简单例子

    SAP MM采购定价过程的一个简单例子 本文以一个简单的例子阐述了SAP MM模块中采购定价的基本原理.本例中,假定采购订单里输入的是含税采购价,然后系统自动计算出物料最终的采购价格(含税价-税额=采 ...

  5. .net中使用反射的简单例子

    说明:由于工作原因,本人使用反射的机会不是很多,所以没有必要为了炫耀技术而使用这种技术,不过今天有人问到这方面的问题,所以做了一个简单例子,供初学者参考,代码如下: using System; usi ...

  6. linux下Makefile中包含有shared library动态链接库文件时候的简单例子

    如果不知道什么是makefile,可以首先看我的另一篇博客: linux下Makefile的简单例子及解释 http://www.cnblogs.com/lihaozy/archive/2012/08 ...

  7. java 国际化例子_JavaSE 国际化 简单例子

    ①在src下添加两个文件: base_zh_CN.properties Test=\u8fd9\u662f\u4e2d\u6587 base_en_US.properties Test=english ...

  8. 6翻了C语言,《嗨翻C语言》随书练习六 6章 二叉树简单例子

    二叉树简单例子/* <嗨翻C语言>随书练习 6章    2016-12-06 xiousheng@126.com  二叉树例子,警务罪犯判断档案系统,哈哈 书中可以专家系统例子 */ #i ...

  9. QT 信号与槽 最简单例子

    QT  信号与槽 最简单例子 main.cpp 和 my_head.h源码: [cpp] view plaincopy #ifndef MY_HEAD_H #define MY_HEAD_H #inc ...

最新文章

  1. IssueVission的命令处理
  2. linux ssh和sftp区别,使用 SSH 和 SFTP 协议
  3. 68 cookie在登录中的作用
  4. with语句python_Python之with语句
  5. 安装WordPress图解
  6. Oracle数据库常用十一大操作指令
  7. 我只注视你全cg存档_在暴戾的他怀里撒个娇 作者:春风榴火全娱乐圈都在等我们离婚作者:魔安...
  8. 2018-09-14
  9. qt设置边框颜色_Qt开源作品14-导航按钮控件
  10. java 获取视频时长
  11. python贴吧自动发帖-Python之自动发帖
  12. vue高德多条路线规划+带途径节点多组多个maker text标签创建+各路线颜色区别
  13. 图像滤镜艺术---ZPhotoEngine超级算法库
  14. 【业务分析】为什么YouTube广告只看5秒就可跳过,却更赚钱?
  15. Day6——yaml简介
  16. 从崩溃的系统中恢复多可文档管理系统的办法
  17. MinGW+MSYS安装
  18. 使用Navicat导入execl到mysql数据库中日期值显示0000-00-00的问题解决
  19. 20200329——剑指offer 面试题49:丑数
  20. 一直以来伴随我的一些学习习惯(二):时间管理

热门文章

  1. php 立方根,PHP立方根
  2. GNU GRUB手册之安装(一)
  3. 我知道很多主播因为以前因为公会的名声不太好,或者不想签约被束缚等原因
  4. 【如何20秒内进入XP系统】
  5. torch基本功能介绍
  6. 借助Jackson的JsonTypeInfo注解实现多态类的解析
  7. excel中利用综合应用len(),lenb(),left() ,find()函数筛选汉字问题
  8. 【报告分享】2021国民健康洞察报告-丁香(附下载)
  9. 计算机启动时滴滴两声,电脑开机时出现滴滴两声后,不能开机,怎么回事。
  10. 《那些年啊,那些事——一个程序员的奋斗史》——47