如何构建一个简单的语音识别应用程序

“In this 10-year time frame, I believe that we’ll not only be using the keyboard and the mouse to interact but during that time we will have perfected speech recognition and speech output well enough that those will become a standard part of the interface.” — Bill Gates, 1 October 1997

“在这十年的时间里，我相信我们不仅将使用键盘和鼠标进行交互，而且在这段时间内，我们将充分完善语音识别和语音输出，使它们将成为语音识别的标准部分。接口。” —比尔·盖茨，1997年10月1日

Technology has come a long way, and with each new advancement, the human race becomes more attached to it and longs for these new cool features across all devices.

技术已经走了很长一段路，并且随着每一项新的进步，人类变得对它越来越依赖，并且渴望在所有设备上拥有这些新的酷功能。

With the advent of Siri, Alexa, and Google Assistant, users of technology have yearned for speech recognition in their everyday use of the internet. In this post, I’ll be covering how to integrate native speech recognition and speech synthesis in the browser using the JavaScript WebSpeech API.

随着Siri，Alexa和Google Assistant的出现，技术用户已经渴望在日常使用互联网时进行语音识别。在本文中，我将介绍如何使用JavaScript WebSpeech API在浏览器中集成本机语音识别和语音合成。

According to the Mozilla web docs:

根据Mozilla网络文档：

The Web Speech API enables you to incorporate voice data into web apps. The Web Speech API has two parts: SpeechSynthesis (Text-to-Speech), and SpeechRecognition (Asynchronous Speech Recognition.)

Web Speech API使您可以将语音数据合并到Web应用程序中。 Web语音API有两个部分：语音合成(文本到语音)和语音识别(异步语音识别)。

我们将需要构建我们的应用程序的要求 (Requirements we will need to build our application)

For this simple speech recognition app, we’ll be working with just three files which will all reside in the same directory:

对于这个简单的语音识别应用程序，我们将仅处理三个文件，它们都位于同一目录中：

index.html containing the HTML for the app.

index.html其中包含应用程序HTML。
style.css containing the CSS styles.

包含CSS样式的style.css 。
index.js containing the JavaScript code.

包含JavaScript代码的index.js 。

Also, we need to have a few things in place. They are as follows:

另外，我们需要准备一些东西。它们如下：

Basic knowledge of JavaScript.JavaScript的基础知识。
A web server for running the app. The Web Server for Chrome will be sufficient for this purpose.

用于运行应用程序的Web服务器。 Chrome的Web服务器为此目的就足够了。

设置我们的语音识别应用 (Setting up our speech recognition app)

Let’s get started by setting up the HTML and CSS for the app. Below is the HTML markup:

让我们开始为应用程序设置HTML和CSS。以下是HTML标记：

<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="ie=edge"><title>Speech Recognition</title><link rel="stylesheet" href="style.css"><link href="https://fonts.googleapis.com/css?family=Shadows+Into+Light" rel="stylesheet"><!-- load font awesome here for icon used on the page -->
</head>
<body><div class="container"> <!--page container --><div class="text-box" contenteditable="true"></div> <!--text box which will contain spoken text --><i class="fa fa-microphone"></i> <!-- microphone icon to be clicked before speaking --></div><audio class="sound" src="chime.mp3"></audio> <!-- sound to be played when we click icon => http://soundbible.com/1598-Electronic-Chime.html --><script src="index.js"></script> <!-- link to index.js script -->
</body>
</html>

Here is its accompanying CSS style:

这是其随附CSS样式：

body {background: #1e2440;color: #f2efe2;font-size: 16px;font-family: 'Kaushan Script', cursive;font-family: 'Shadows Into Light', cursive;
}
.container {position: relative;border: 1px solid #f2efe2;width: 40vw;max-width: 60vw;margin: 0 auto;border-radius: 0.1rem;background: #f2efe2;padding: 0.2rem 1rem;color: #1e2440;overflow: scroll;margin-top: 10vh;
}
.text-box {max-height: 70vh;overflow: scroll;
}
.text-box:focus {outline: none;
}
.text-box p {border-bottom: 1px dotted black;margin: 0px !important;
}
.fa {color: white;background: #1e2440;border-radius: 50%;cursor: pointer;margin-top: 1rem;float: right;width: 2rem;height: 2rem;display: flex !important;align-items: center;justify-content: center;
}
@media (max-width: 768px) {.container {width: 85vw;max-width: 85vw;}
.text-box {max-height: 55vh;}
}

Copying the code above should result in something similar to this:

复制上面的代码应导致类似于以下内容：

使用WebSpeech API增强我们的语音识别应用程序 (Powering up our speech recognition app with the WebSpeech API)

As of the time of writing, the WebSpeech API is only available in Firefox and Chrome. Its speech synthesis interface lives on the browser’s window object as speechSynthesis while its speech recognition interface lives on the browser’s window object as SpeechRecognition in Firefox and as webkitSpeechRecognition in Chrome.

在撰写本文时，WebSpeech API仅在Firefox和Chrome中可用。浏览器上的语音合成界面的生命window对象speechSynthesis而浏览器上的语音识别界面的生命window对象作为SpeechRecognition在Firefox和webkitSpeechRecognition在Chrome中。

We are going to set the recognition interface to SpeechRecognition regardless of the browser we’re on:

无论使用哪种浏览器，我们都将识别接口设置为SpeechRecognition ：

window.SpeechRecognition = window.webkitSpeechRecognition || window.SpeechRecognition;

Next we’ll instantiate the speech recognition interface:

接下来，我们将实例化语音识别界面：

const recognition = new SpeechRecognition();
const icon = document.querySelector('i.fa.fa-microphone')
let paragraph = document.createElement('p');
let container = document.querySelector('.text-box');
container.appendChild(paragraph);
const sound = document.querySelector('.sound');

In the code above, apart from instantiating speech recognition, we also selected the icon, text-box, and sound elements on the page. We also created a paragraph element which will hold the words we say, and we appended it to the text-box.

在上面的代码中，除了实例化语音识别之外，我们还选择了页面上的icon ， text-box,和sound元素。我们还创建了一个段落元素，其中将容纳我们所说的单词，并将其附加到text-box 。

Whenever the microphone icon on the page is clicked, we want to play our sound and start the speech recognition service. To achieve this, we add a click event listener to the icon:

每当单击页面上的麦克风图标时，我们都想播放声音并启动语音识别服务。为此，我们向图标添加了一个click事件监听器：

icon.addEventListener('click', () => {sound.play();dictate();
});
const dictate = () => {recognition.start();
}

In the event listener, after playing the sound, we went ahead and created and called a dictate function. The dictate function starts the speech recognition service by calling the start method on the speech recognition instance.

在事件监听器中，播放声音后，我们继续进行创建并调用了dictate功能。 dictate功能通过在语音识别实例上调用start方法来启动语音识别服务。

To return a result for whatever a user says, we need to add a result event to our speech recognition instance. The dictate function will then look like this:

要根据用户说的话返回结果，我们需要将result事件添加到语音识别实例中。 dictate函数将如下所示：

const dictate = () => {recognition.start();recognition.onresult = (event) => {const speechToText = event.results[0][0].transcript;paragraph.textContent = speechToText;}
}

The resulting event returns a SpeechRecognitionEvent which contains a results object. This in turn contains the transcript property holding the recognized speech in text. We save the recognized text in a variable called speechToText and put it in the paragraph element on the page.

结果event返回包含results对象的SpeechRecognitionEvent 。反过来，它包含保存文本中已识别语音的transcript属性。我们将识别出的文本保存在名为speechToText的变量中，并将其放在页面上的paragraph元素中。

If we run the app at this point, click the icon and say something, it should pop up on the page.

如果我们此时运行该应用程序，请单击icon并说出一些内容，它应会在页面上弹出。

用文字转语音包装 (Wrapping it up with text to speech)

To add text to speech to our app, we’ll make use of the speechSynthesis interface of the WebSpeech API. We’ll start by instantiating it:

要到语音文本添加到我们的应用程序，我们会利用的speechSynthesis的WebSpeech API的接口。我们将从实例化开始：

const synth = window.speechSynthesis;

Next, we will create a function speak which we will call whenever we want the app to say something:

下一步，我们将创建一个函数speak它，每当我们希望应用程序说些什么，我们会调用：

const speak = (action) => {utterThis = new SpeechSynthesisUtterance(action());synth.speak(utterThis);
};

The speak function takes in a function called the action as a parameter. The function returns a string which is passed to SpeechSynthesisUtterance. SpeechSynthesisUtterance is the WebSpeech API interface that holds the content the speech recognition service should read. The speechSynthesis speak method is then called on its instance and passed the content to read.

speak功能将称为action的功能作为参数。该函数返回一个字符串，该字符串传递到SpeechSynthesisUtterance 。 SpeechSynthesisUtterance是WebSpeech API接口，其中包含语音识别服务应阅读的内容。该speechSynthesis speak那么方法被调用它的实例并通过了内容阅读。

To test this out, we need to know when the user is done speaking and says a keyword. Luckily there is a method to check that:

为了测试这一点，我们需要知道用户何时说完并说出一个keyword. 幸运的是，有一种方法可以检查：

const dictate = () => {...if (event.results[0].isFinal) {if (speechToText.includes('what is the time')) {speak(getTime);};if (speechToText.includes('what is today\'s date
')) {speak(getDate);};if (speechToText.includes('what is the weather in')) {getTheWeather(speechToText);};}...
}
const getTime = () => {const time = new Date(Date.now());return `the time is ${time.toLocaleString('en-US', { hour: 'numeric', minute: 'numeric', hour12: true })}`
};
const getDate = () => {const time = new Date(Date.now())return `today is ${time.toLocaleDateString()}`;
};
const getTheWeather = (speech) => {
fetch(`http://api.openweathermap.org/data/2.5/weather?q=${speech.split(' ')[5]}&appid=58b6f7c78582bffab3936dac99c31b25&units=metric`)
.then(function(response){return response.json();
})
.then(function(weather){if (weather.cod === '404') {utterThis = new SpeechSynthesisUtterance(`I cannot find the weather for ${speech.split(' ')[5]}`);synth.speak(utterThis);return;}utterThis = new SpeechSynthesisUtterance(`the weather condition in ${weather.name} is mostly full of ${weather.weather[0].description} at a temperature of ${weather.main.temp} degrees Celcius`);synth.speak(utterThis);});
};

In the code above, we called the isFinal method on our event result which returns true or false depending on if the user is done speaking.

在上面的代码中，我们在事件结果上调用了isFinal方法，该方法根据用户是否讲话而返回true或false 。

If the user is done speaking, we check if the transcript of what was said contains keywords such as what is the time , and so on. If it does, we call our speak function and pass it one of the three functions getTime, getDate or getTheWeather which all return a string for the browser to read.

如果用户说完了，我们会检查说出的文字记录是否包含关键字，例如what is the time是what is the time ，等等。如果确实如此，我们呼吁我们的speak功能，并通过它的三大功能之一getTime ， getDate或getTheWeather为浏览器的字符串来读取所有的回报。

Our index.js file should now look like this:

现在，我们的index.js文件应如下所示：

window.SpeechRecognition = window.webkitSpeechRecognition || window.SpeechRecognition;
const synth = window.speechSynthesis;
const recognition = new SpeechRecognition();const icon = document.querySelector('i.fa.fa-microphone')
let paragraph = document.createElement('p');
let container = document.querySelector('.text-box');
container.appendChild(paragraph);
const sound = document.querySelector('.sound');icon.addEventListener('click', () => {sound.play();dictate();
});const dictate = () => {recognition.start();recognition.onresult = (event) => {const speechToText = event.results[0][0].transcript;paragraph.textContent = speechToText;if (event.results[0].isFinal) {if (speechToText.includes('what is the time')) {speak(getTime);};if (speechToText.includes('what is today\'s date')) {speak(getDate);};if (speechToText.includes('what is the weather in')) {getTheWeather(speechToText);};}}
}const speak = (action) => {utterThis = new SpeechSynthesisUtterance(action());synth.speak(utterThis);
};const getTime = () => {const time = new Date(Date.now());return `the time is ${time.toLocaleString('en-US', { hour: 'numeric', minute: 'numeric', hour12: true })}`
};const getDate = () => {const time = new Date(Date.now())return `today is ${time.toLocaleDateString()}`;
};const getTheWeather = (speech) => {fetch(`http://api.openweathermap.org/data/2.5/weather?q=${speech.split(' ')[5]}&appid=58b6f7c78582bffab3936dac99c31b25&units=metric`) .then(function(response){return response.json();}).then(function(weather){if (weather.cod === '404') {utterThis = new SpeechSynthesisUtterance(`I cannot find the weather for ${speech.split(' ')[5]}`);synth.speak(utterThis);return;}utterThis = new SpeechSynthesisUtterance(`the weather condition in ${weather.name} is mostly full of ${weather.weather[0].description} at a temperature of ${weather.main.temp} degrees Celcius`);synth.speak(utterThis);});
};

Let’s click the icon and try one of the following phrases:

让我们单击该图标，然后尝试以下短语之一：

What is the time?现在几点了？
What is today’s date?今天几号？
What is the weather in Lagos?拉各斯的天气如何？

We should get a reply from the app.

我们应该收到该应用的回复。

结论 (Conclusion)

In this article, we’ve been able to build a simple speech recognition app. There are a few more cool things we could do, like select a different voice to read to the users, but I’ll leave that for you to do.

在本文中，我们已经能够构建一个简单的语音识别应用程序。我们还可以做其他一些很酷的事情，例如选择一种不同的声音朗读给用户，但是我会留给您去做。

If you have questions or feedback, please leave them as a comment below. I can’t wait to see what you build with this. You can hit me up on Twitter @developia_.

如果您有任何疑问或反馈，请在下面留下它们作为评论。我等不及要看你用这个构建了什么。您可以在Twitter @developia_上打我。

翻译自: https://www.freecodecamp.org/news/how-to-build-a-simple-speech-recognition-app-a65860da6108/