“In this 10-year time frame, I believe that we’ll not only be using the keyboard and the mouse to interact but during that time we will have perfected speech recognition and speech output well enough that those will become a standard part of the interface.” — Bill Gates, 1 October 1997

“在这十年的时间里,我相信我们不仅将使用键盘和鼠标进行交互,而且在这段时间内,我们将充分完善语音识别和语音输出,使它们将成为语音识别的标准部分。接口。” —比尔·盖茨,1997年10月1日

Technology has come a long way, and with each new advancement, the human race becomes more attached to it and longs for these new cool features across all devices.

技术已经走了很长一段路,并且随着每一项新的进步,人类变得对它越来越依赖,并且渴望在所有设备上拥有这些新的酷功能。

With the advent of Siri, Alexa, and Google Assistant, users of technology have yearned for speech recognition in their everyday use of the internet. In this post, I’ll be covering how to integrate native speech recognition and speech synthesis in the browser using the JavaScript WebSpeech API.

随着Siri,Alexa和Google Assistant的出现,技术用户已经渴望在日常使用互联网时进行语音识别。 在本文中,我将介绍如何使用JavaScript WebSpeech API在浏览器中集成本机语音识别和语音合成。

According to the Mozilla web docs:

根据Mozilla网络文档:

The Web Speech API enables you to incorporate voice data into web apps. The Web Speech API has two parts: SpeechSynthesis (Text-to-Speech), and SpeechRecognition (Asynchronous Speech Recognition.)

Web Speech API使您可以将语音数据合并到Web应用程序中。 Web语音API有两个部分:语音合成(文本到语音)和语音识别(异步语音识别)。

我们将需要构建我们的应用程序的要求 (Requirements we will need to build our application)

For this simple speech recognition app, we’ll be working with just three files which will all reside in the same directory:

对于这个简单的语音识别应用程序,我们将仅处理三个文件,它们都位于同一目录中:

  • index.html containing the HTML for the app.

    index.html其中包含应用程序HTML。

  • style.css containing the CSS styles.

    包含CSS样式的style.css

  • index.js containing the JavaScript code.

    包含JavaScript代码的index.js

Also, we need to have a few things in place. They are as follows:

另外,我们需要准备一些东西。 它们如下:

  • Basic knowledge of JavaScript.JavaScript的基础知识。
  • A web server for running the app. The Web Server for Chrome will be sufficient for this purpose.

    用于运行应用程序的Web服务器。 Chrome的Web服务器 为此目的就足够了。

设置我们的语音识别应用 (Setting up our speech recognition app)

Let’s get started by setting up the HTML and CSS for the app. Below is the HTML markup:

让我们开始为应用程序设置HTML和CSS。 以下是HTML标记:

<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="ie=edge"><title>Speech Recognition</title><link rel="stylesheet" href="style.css"><link href="https://fonts.googleapis.com/css?family=Shadows+Into+Light" rel="stylesheet"><!-- load font awesome here for icon used on the page -->
</head>
<body><div class="container"> <!--page container --><div class="text-box" contenteditable="true"></div> <!--text box which will contain spoken text --><i class="fa fa-microphone"></i> <!-- microphone icon to be clicked before speaking --></div><audio class="sound" src="chime.mp3"></audio> <!-- sound to be played when we click icon => http://soundbible.com/1598-Electronic-Chime.html --><script src="index.js"></script> <!-- link to index.js script -->
</body>
</html>

Here is its accompanying CSS style:

这是其随附CSS样式:

body {background: #1e2440;color: #f2efe2;font-size: 16px;font-family: 'Kaushan Script', cursive;font-family: 'Shadows Into Light', cursive;
}
.container {position: relative;border: 1px solid #f2efe2;width: 40vw;max-width: 60vw;margin: 0 auto;border-radius: 0.1rem;background: #f2efe2;padding: 0.2rem 1rem;color: #1e2440;overflow: scroll;margin-top: 10vh;
}
.text-box {max-height: 70vh;overflow: scroll;
}
.text-box:focus {outline: none;
}
.text-box p {border-bottom: 1px dotted black;margin: 0px !important;
}
.fa {color: white;background: #1e2440;border-radius: 50%;cursor: pointer;margin-top: 1rem;float: right;width: 2rem;height: 2rem;display: flex !important;align-items: center;justify-content: center;
}
@media (max-width: 768px) {.container {width: 85vw;max-width: 85vw;}
.text-box {max-height: 55vh;}
}

Copying the code above should result in something similar to this:

复制上面的代码应导致类似于以下内容:

使用WebSpeech API增强我们的语音识别应用程序 (Powering up our speech recognition app with the WebSpeech API)

As of the time of writing, the WebSpeech API is only available in Firefox and Chrome. Its speech synthesis interface lives on the browser’s window object as speechSynthesis while its speech recognition interface lives on the browser’s window object as SpeechRecognition in Firefox and as webkitSpeechRecognition in Chrome.

在撰写本文时,WebSpeech API仅在Firefox和Chrome中可用。 浏览器上的语音合成界面的生命window对象speechSynthesis而浏览器上的语音识别界面的生命window对象作为SpeechRecognition在Firefox和webkitSpeechRecognition在Chrome中。

We are going to set the recognition interface to SpeechRecognition regardless of the browser we’re on:

无论使用哪种浏览器,我们都将识别接口设置为SpeechRecognition

window.SpeechRecognition = window.webkitSpeechRecognition || window.SpeechRecognition;

Next we’ll instantiate the speech recognition interface:

接下来,我们将实例化语音识别界面:

const recognition = new SpeechRecognition();
const icon = document.querySelector('i.fa.fa-microphone')
let paragraph = document.createElement('p');
let container = document.querySelector('.text-box');
container.appendChild(paragraph);
const sound = document.querySelector('.sound');

In the code above, apart from instantiating speech recognition, we also selected the icon, text-box, and sound elements on the page. We also created a paragraph element which will hold the words we say, and we appended it to the text-box.

在上面的代码中,除了实例化语音识别之外,我们还选择了页面上的icontext-box,sound元素。 我们还创建了一个段落元素,其中将容纳我们所说的单词,并将其附加到text-box

Whenever the microphone icon on the page is clicked, we want to play our sound and start the speech recognition service. To achieve this, we add a click event listener to the icon:

每当单击页面上的麦克风图标时,我们都想播放声音并启动语音识别服务。 为此,我们向图标添加了一个click事件监听器:

icon.addEventListener('click', () => {sound.play();dictate();
});
const dictate = () => {recognition.start();
}

In the event listener, after playing the sound, we went ahead and created and called a dictate function. The dictate function starts the speech recognition service by calling the start method on the speech recognition instance.

在事件监听器中,播放声音后,我们继续进行创建并调用了dictate功能。 dictate功能通过在语音识别实例上调用start方法来启动语音识别服务。

To return a result for whatever a user says, we need to add a result event to our speech recognition instance. The dictate function will then look like this:

要根据用户说的话返回结果,我们需要将result事件添加到语音识别实例中。 dictate函数将如下所示:

const dictate = () => {recognition.start();recognition.onresult = (event) => {const speechToText = event.results[0][0].transcript;paragraph.textContent = speechToText;}
}

The resulting event returns a SpeechRecognitionEvent which contains a results object. This in turn contains the transcript property holding the recognized speech in text. We save the recognized text in a variable called speechToText and put it in the paragraph element on the page.

结果event返回包含results对象的SpeechRecognitionEvent 。 反过来,它包含保存文本中已识别语音的transcript属性。 我们将识别出的文本保存在名为speechToText的变量中,并将其放在页面上的paragraph元素中。

If we run the app at this point, click the icon and say something, it should pop up on the page.

如果我们此时运行该应用程序,请单击icon并说出一些内容,它应会在页面上弹出。

用文字转语音包装 (Wrapping it up with text to speech)

To add text to speech to our app, we’ll make use of the speechSynthesis interface of the WebSpeech API. We’ll start by instantiating it:

要到语音文本添加到我们的应用程序,我们会利用的speechSynthesis的WebSpeech API的接口。 我们将从实例化开始:

const synth = window.speechSynthesis;

Next, we will create a function speak which we will call whenever we want the app to say something:

下一步,我们将创建一个函数speak它,每当我们希望应用程序说些什么,我们会调用:

const speak = (action) => {utterThis = new SpeechSynthesisUtterance(action());synth.speak(utterThis);
};

The speak function takes in a function called the action as a parameter. The function returns a string which is passed to SpeechSynthesisUtterance. SpeechSynthesisUtterance is the WebSpeech API interface that holds the content the speech recognition service should read. The speechSynthesis speak method is then called on its instance and passed the content to read.

speak功能将称为action的功能作为参数。 该函数返回一个字符串,该字符串传递到SpeechSynthesisUtteranceSpeechSynthesisUtterance是WebSpeech API接口,其中包含语音识别服务应阅读的内容。 该speechSynthesis speak那么方法被调用它的实例并通过了内容阅读。

To test this out, we need to know when the user is done speaking and says a keyword. Luckily there is a method to check that:

为了测试这一点,我们需要知道用户何时说完并说出一个keyword. 幸运的是,有一种方法可以检查:

const dictate = () => {...if (event.results[0].isFinal) {if (speechToText.includes('what is the time')) {speak(getTime);};if (speechToText.includes('what is today\'s date
')) {speak(getDate);};if (speechToText.includes('what is the weather in')) {getTheWeather(speechToText);};}...
}
const getTime = () => {const time = new Date(Date.now());return `the time is ${time.toLocaleString('en-US', { hour: 'numeric', minute: 'numeric', hour12: true })}`
};
const getDate = () => {const time = new Date(Date.now())return `today is ${time.toLocaleDateString()}`;
};
const getTheWeather = (speech) => {
fetch(`http://api.openweathermap.org/data/2.5/weather?q=${speech.split(' ')[5]}&appid=58b6f7c78582bffab3936dac99c31b25&units=metric`)
.then(function(response){return response.json();
})
.then(function(weather){if (weather.cod === '404') {utterThis = new SpeechSynthesisUtterance(`I cannot find the weather for ${speech.split(' ')[5]}`);synth.speak(utterThis);return;}utterThis = new SpeechSynthesisUtterance(`the weather condition in ${weather.name} is mostly full of ${weather.weather[0].description} at a temperature of ${weather.main.temp} degrees Celcius`);synth.speak(utterThis);});
};

In the code above, we called the isFinal method on our event result which returns true or false depending on if the user is done speaking.

在上面的代码中,我们在事件结果上调用了isFinal方法,该方法根据用户是否讲话而返回truefalse

If the user is done speaking, we check if the transcript of what was said contains keywords such as what is the time , and so on. If it does, we call our speak function and pass it one of the three functions getTime, getDate or getTheWeather which all return a string for the browser to read.

如果用户说完了,我们会检查说出的文字记录是否包含关键字,例如what is the timewhat is the time ,等等。 如果确实如此,我们呼吁我们的speak功能,并通过它的三大功能之一getTimegetDategetTheWeather为浏览器的字符串来读取所有的回报。

Our index.js file should now look like this:

现在,我们的index.js文件应如下所示:

window.SpeechRecognition = window.webkitSpeechRecognition || window.SpeechRecognition;
const synth = window.speechSynthesis;
const recognition = new SpeechRecognition();const icon = document.querySelector('i.fa.fa-microphone')
let paragraph = document.createElement('p');
let container = document.querySelector('.text-box');
container.appendChild(paragraph);
const sound = document.querySelector('.sound');icon.addEventListener('click', () => {sound.play();dictate();
});const dictate = () => {recognition.start();recognition.onresult = (event) => {const speechToText = event.results[0][0].transcript;paragraph.textContent = speechToText;if (event.results[0].isFinal) {if (speechToText.includes('what is the time')) {speak(getTime);};if (speechToText.includes('what is today\'s date')) {speak(getDate);};if (speechToText.includes('what is the weather in')) {getTheWeather(speechToText);};}}
}const speak = (action) => {utterThis = new SpeechSynthesisUtterance(action());synth.speak(utterThis);
};const getTime = () => {const time = new Date(Date.now());return `the time is ${time.toLocaleString('en-US', { hour: 'numeric', minute: 'numeric', hour12: true })}`
};const getDate = () => {const time = new Date(Date.now())return `today is ${time.toLocaleDateString()}`;
};const getTheWeather = (speech) => {fetch(`http://api.openweathermap.org/data/2.5/weather?q=${speech.split(' ')[5]}&appid=58b6f7c78582bffab3936dac99c31b25&units=metric`) .then(function(response){return response.json();}).then(function(weather){if (weather.cod === '404') {utterThis = new SpeechSynthesisUtterance(`I cannot find the weather for ${speech.split(' ')[5]}`);synth.speak(utterThis);return;}utterThis = new SpeechSynthesisUtterance(`the weather condition in ${weather.name} is mostly full of ${weather.weather[0].description} at a temperature of ${weather.main.temp} degrees Celcius`);synth.speak(utterThis);});
};

Let’s click the icon and try one of the following phrases:

让我们单击该图标,然后尝试以下短语之一:

  • What is the time?现在几点了?
  • What is today’s date?今天几号?
  • What is the weather in Lagos?拉各斯的天气如何?

We should get a reply from the app.

我们应该收到该应用的回复。

结论 (Conclusion)

In this article, we’ve been able to build a simple speech recognition app. There are a few more cool things we could do, like select a different voice to read to the users, but I’ll leave that for you to do.

在本文中,我们已经能够构建一个简单的语音识别应用程序。 我们还可以做其他一些很酷的事情,例如选择一种不同的声音朗读给用户,但是我会留给您去做。

If you have questions or feedback, please leave them as a comment below. I can’t wait to see what you build with this. You can hit me up on Twitter @developia_.

如果您有任何疑问或反馈,请在下面留下它们作为评论。 我等不及要看你用这个构建了什么。 您可以在Twitter @developia_上打我。

翻译自: https://www.freecodecamp.org/news/how-to-build-a-simple-speech-recognition-app-a65860da6108/

如何构建一个简单的语音识别应用程序相关推荐

  1. Nest的基本概念,以及如何使用Nest CLI来构建一个简单的Web应用程序

    Nest是一个用于构建高效.可扩展的Node.js服务器端应用程序的框架.它是基于Express.js构建的,并且提供了多种新特性和抽象层,可以让开发者更加轻松地构建复杂的应用程序. 本文将介绍Nes ...

  2. java jsf_使用Java和JSF构建一个简单的CRUD应用

    java jsf 使用Okta的身份管理平台轻松部署您的应用程序 使用Okta的API在几分钟之内即可对任何应用程序中的用户进行身份验证,管理和保护. 今天尝试Okta. JavaServer Fac ...

  3. 使用Java和JSF构建一个简单的CRUD应用

    使用Okta的身份管理平台轻松部署您的应用程序 使用Okta的API在几分钟之内即可对任何应用程序中的用户进行身份验证,管理和保护. 今天尝试Okta. JavaServer Faces(JSF)是用 ...

  4. LabWindows_CVI测试技术及工程应用_学习笔记1(构建一个简单的程序)

    构建一个简单的程序 1.创建工程文件 或File--〉New--〉Project(*.prj),默认名Untitled.prj,存储在Unititled.cws的工作空间中 File--〉Save U ...

  5. 【编译原理】让我们来构建一个简单的解释器(Let’s Build A Simple Interpreter. Part 5.)(python/c/c++版)(笔记)Lexer词法分析程序

    [编译原理]让我们来构建一个简单的解释器(Let's Build A Simple Interpreter. Part 5.) 文章目录 python代码 C语言代码 总结 你如何处理像理解如何创建解 ...

  6. 如何训练一个简单的语音识别网络模型---基于TensorFlow

    这篇文章是翻译自Google的教程:还有一部分没完成,但今天上去发现登录不了,只好先发这部分上来. 欢迎有兴趣学习的朋友,与我交流.微信:18221205301 如何训练一个简单的语音识别网络模型 准 ...

  7. python推荐系统-利用python构建一个简单的推荐系统

    摘要: 快利用python构建一个属于你自己的推荐系统吧,手把手教学,够简单够酷炫. 本文将利用python构建一个简单的推荐系统,在此之前读者需要对pandas和numpy等数据分析包有所了解. 什 ...

  8. 【编译原理】构建一个简单的解释器(Let’s Build A Simple Interpreter. Part 9.)(笔记)语法分析(未完,先搁置了!)

    [编译原理]让我们来构建一个简单的解释器(Let's Build A Simple Interpreter. Part 9.) 文章目录 spi.py spi_lexer 我记得当我在大学(很久以前) ...

  9. 【编译原理】构建一个简单的解释器(Let’s Build A Simple Interpreter. Part 8.)(笔记)一元运算符正负(+,-)

    [编译原理]让我们来构建一个简单的解释器(Let's Build A Simple Interpreter. Part 8.) 文章目录 C语言代码(作者没提供完整的python代码,关键的改动提供了 ...

最新文章

  1. SpringInAction--Spring Web应用之SpringMvc 注解配置
  2. SLG手游Java服务器数据管理方案
  3. 神奇的 Go init 函数
  4. C++ 学习之旅(4)——调试Debug
  5. Functional ProgrammingLazy Code:被我忘记的迭代器
  6. yiilite.php,YII Framework学习教程-YII的V-view的render若干函数-2011-11-17 | 学步园
  7. Java Web学习总结(29)——Java Web中的Filter和Interceptor比较
  8. mysql 常用日期,时间函数
  9. 6.jQuery appendTo问题解决
  10. 深度图补全-depth inpainting
  11. arduino连接12864LCD方法
  12. h5微信游戏服务器,H5游戏微信大型帮派战源码分享 带服务器端+客户端
  13. 在VMware16虚拟机安装Ubuntu详细教程
  14. 02- 在夜神模拟器内部安装App
  15. java poi 数据透视_java 通过 poi pivotTable 实现 数据透视表
  16. 打破第一范式的要求 (中英对照)Michael Rys 对 SQL Server 2005 中XML 的 评论——对微软SQL Server项目经理Michael Rys博士的采访
  17. postgres/pgadmin的使用
  18. MySQL:一主两从架构(读写分离)
  19. PID控制及整定算法
  20. 各开发者android平台的注册及上传方法汇总

热门文章

  1. 窗体的布局 1124
  2. java演练代码 银行取款演练 java
  3. 前端开发 css样式的简写
  4. request技巧-utils的功能-cookie对象与字典的转换-URL编码与解码-关掉SSL验证
  5. 如何在Ubuntu 18.04上安装Django
  6. mysql主从同步搭建和故障排除
  7. 写给mybatis小白的入门指南
  8. 【求知探新】Unity中ShaderLab内存优化
  9. jQuery图片延迟加载插件jQuery.lazyload
  10. 【Android】论ViewHolder存在的意义