react-emotion_如何使用Web Speech API和Node.js构建语音转Emotion Converter

react-emotion

Have you ever wondered - can we make Node.js check to see if what we say is positive or negative?

您是否曾经想过-我们可以让Node.js检查一下我们说的是肯定还是否定的？

I got a newsletter which discussed tone detection. The program can check what we write and then tells us if it might be seen as aggressive, confident, or a variety of other feelings.

我收到了一个通讯，讨论了音调检测。该程序可以检查我们写的内容，然后告诉我们它是否可能被视为具有进取心，自信或其他多种感觉。

That got me wondering how could I build a simplified version using the browser and Node.js that would be initiated by speaking.

那让我想知道如何使用通过语音启动的浏览器和Node.js构建简化版本。

As a result, I ended up with a small project that detects if what was spoken has positive, neutral, or negative valence.

结果，我结束了一个小项目，该项目可以检测所讲的内容是否具有正价，中性价或负价。

Here's how I did it.

这是我的方法。

计划 (The plan)

When you're starting a project, you should sketch out - at least vaguely - your goal and how to reach it. Before starting my search I noted down that I needed:

当您开始一个项目时，您应该(至少模糊地)勾勒出您的目标以及如何实现它。在开始搜索之前，我记下了我需要：

Voice recording录音
A way to translate the recording to text一种将录音转换为文本的方法
A way to give the text a score一种给文本打分的方法
A way to show the result to the user that just spoke向刚刚讲过的用户显示结果的方法

After researching for a while, I discovered that the voice recording and translation to text parts were already done by the Web Speech API that's available in Google Chrome. It has exactly what we need in the SpeechRecognition interface.

经过一段时间的研究，我发现语音记录和文本部分的翻译已经由Google Chrome浏览器中提供的Web Speech API完成。正是我们在SpeechRecognition界面中所需要的。

As for text scoring, I found AFINN which is a list of words that are already scored. It has a limited scope with "only" 2477 words but it's more than enough for our project.

至于文字评分，我发现AFINN是已经评分的单词列表。它的范围有限，只有“ 2477个字”，但对于我们的项目来说已经绰绰有余了。

Since we're already using the browser we can show a different emoji with HTML, JavaScript and CSS depending on the result. So that handles our last step.

由于我们已经在使用浏览器，因此可以根据结果显示具有HTML，JavaScript和CSS的不同表情符号。这就是我们的最后一步。

Now that we know what we're going to use, we can sum it up:

现在我们知道了要使用的内容，我们可以总结一下：

The browser listens to the user and returns some text using the Web Speech API浏览器收听用户并使用Web Speech API返回一些文本
It makes a request to our Node.js server with the text它使用文本向我们的Node.js服务器发出请求
The server evaluates the text using AFINN's list and returns the score服务器使用AFINN的列表评估文本并返回分数
The browser shows a different emoji depending on the score浏览器会根据得分显示不同的表情符号

Note: If you're familiar with project setup you can mostly skip the "project files and setup" section below.

注意：如果您熟悉项目设置，则通常可以跳过下面的“项目文件和设置”部分。

项目文件和设置 (Project files and setup)

Our project folder and files structure will be as follows:

我们的项目文件夹和文件结构如下：

src/|-public // folder with the content that we will feed to the browser|-style // folder for our css and emojis|-css // optional folder, we have only one obvious file|-emojis.css|-images // folder for the emojis|-index.html|-recognition.jspackage.jsonserver.js // our Node.js server

On the front end side of things, our index.html file will include the JS and CSS:

在事情的前端，我们的index.html文件将包含JS和CSS：

<html><head><title>Speech to emotion</title><link rel="stylesheet" href="style/css/emojis.css"></head><body>nothing for now<script src="recognition.js"></script></body>
</html>

The recognition.js file will be wrapped in the DOMContentLoaded event so we make sure that the page has loaded before executing our JS:

该recognition.js文件将被包裹在DOMContentLoaded 事件，因此我们在执行JS之前确保页面已加载：

document.addEventListener('DOMContentLoaded', speechToEmotion, false);function speechToEmotion() {// Web Speech API section code will be added here
}

We leave our emojis.css empty for now.

我们将emojis.css留空现在。

On our folder, we will run npm run init which will create package.json.

在我们的文件夹中，我们将运行npm run init来创建package.json 。

For now, we will need to install two packages to make our life easier. So just npm install both:

目前，我们将需要安装两个软件包以简化我们的生活。所以只需npm都安装 ：

expressjs - to have an HTTP server quickly running

expressjs-使HTTP服务器快速运行
nodemon - so we don't constantly type node server.js whenever we make a change in our server.js file.

nodemon-因此，只要在server.js文件中进行更改，我们就不会不断输入node server.js 。

package.json will end up looking something like this:

package.json最终看起来像这样：

{"name": "speech-to-emotion","version": "1.0.0","description": "We speak and it feels us :o","main": "index.js","scripts": {"server": "node server.js","server-debug": "nodemon --inspect server.js"},"author": "daspinola","license": "MIT","dependencies": {"express": "^4.17.1"},"devDependencies": {"nodemon": "^2.0.2"}
}

server.js starts like this:

server.js像这样启动：

const express = require('express')
const path = require('path')const port = 3000
const app = express()app.use(express.static(path.join(__dirname, 'public')))app.get('/', function(req, res) {res.sendFile(path.join(__dirname, 'index.html'))
})app.get('/emotion', function(req, res) {// Valence of emotion section code will be here for not it returns nothingres.send({})
})app.listen(port, function () {console.log(`Listening on port ${port}!`)
})

And with this, we can run npm run server-debug in the command line and open the browser on localhost:3000. Then we'll see our "nothing for now" message that's in the HTML file.

这样，我们可以在命令行中运行npm run server-debug并在localhost：3000上打开浏览器。然后，我们将在HTML文件中看到“暂时没有”消息。

网络语音API (Web Speech API)

This API comes out of the box in Chrome and contains SpeechRecognition. This is what will allow us to turn on the microphone, speak, and get the result back as text.

该API在Chrome浏览器中开箱即用，并且包含SpeechRecognition 。这就是使我们能够打开麦克风，讲话并以文本形式返回结果的原因。

It works with events that can detect, for example, when audio is first and last captured.

它可以与事件一起使用，例如可以检测到何时首次捕获音频和最后捕获音频。

For now, we will need the onresult and onend events so we can check what the microphone captured and when it stops working, respectively.

现在，我们将需要onresult和onend事件，以便我们可以分别检查捕获的麦克风和停止工作的时间。

To make our first sound to text capture we just need a dozen lines or so of code in our recognition.js file.

为了使我们的声音第一次出现在文本捕获中，我们在Recognition.js文件中仅需要十几行代码。

We can find a list of available languages in the Google docs here.

我们可以找到可用语言的谷歌文档的列表在这里。

If we want it to stay connected for more than a few seconds (or for when we speak more than once) there is a property called continuous. It can be changed the same as the lang property by just assigning it true. This will make the microphone listen for audio indefinitely.

如果我们希望它保持连接状态超过几秒钟(或者当我们说不止一次时)，则有一个称为continuous的属性。只需将其赋值为true即可将其更改为与lang属性相同的属性。这将使麦克风无限期地收听音频。

If we refresh our page, at first it should ask whether we want to allow the usage of the microphone. After replying yes we can speak and check on the Chrome DevTools console the result of our speech.

如果我们刷新页面，首先应该询问我们是否要允许使用麦克风。回答“是”后，我们可以发言并在Chrome DevTools控制台上查看发言结果。

Profanity is shown censored and there doesn't seem to be a way to remove the censorship. What this means is that we can't rely on profanity for scoring even though AFINN is uncensored.

亵渎行为已被审查，似乎没有办法取消审查制度。这意味着即使AFINN未经审查，我们也不能依靠亵渎行为来得分。

Note: At the moment of writing, this API can be found only in Chrome and Android with expected support for Edge in the near future. There are probably polyfills or other tools that give better browser compatibility but I didn't test them out. You can check the compatibility in Can I use.

注意：在撰写本文时，此API仅可在预期不久将支持Edge的Chrome和Android中找到。可能有polyfills或其他工具可以提供更好的浏览器兼容性，但我没有对其进行测试。您可以在可以使用中检查兼容性。

提出要求 (Making the request)

For the request, a simple fetch is enough. We send the transcript as a query parameter which we will call text.

对于请求，简单的提取就足够了。我们将成绩单作为查询参数发送，称为文本。

Our onresult function should now look like this:

现在，我们的onresult函数应如下所示：

情感价 (Valence of emotion)

Valence can be seen as a way to measure if our emotions are positive or negative and if they create low or high arousal.

价可以被视为一种衡量我们的情绪是正面还是负面以及它们是否引起低或高唤醒的一种方法。

For this project, we will use two emotions: happy on the positive side for any score above zero, and upset on the negative side for scores below zero. Scores of zero will be seen as indifferent. Any score of 0 will be treated as "what?!"

对于本项目，我们将使用两种情感：对于高于零的任何分数，在积极方面感到高兴 ；对于低于零的分数，在消极方面感到沮丧 。零分数将被视为无关紧要。任何分数0都将被视为“ 什么？！ ”

The AFINN list is scored from -5 to 5 and the file contains words organised like this:

AFINN列表的得分是-5到5，并且该文件包含如下单词：

As an example, let's say we spoke to the microphone and said "I hope this is not horrendous". That would be 2 points for "hope" and -3 points for "horrendous" which would make our sentence negative with -1 points. All the other words that are not on the list we would ignore for scoring.

例如，假设我们对着麦克风讲话并说：“我希望这不会太可怕”。对于“希望”将是2分，对于“可怕”将是-3分，这将使我们的句子为-1分。不在列表中的所有其他单词，我们将不计分。

We could parse the file and convert it into a JSON file that looks similar to this:

我们可以解析该文件并将其转换为类似于以下内容的JSON文件：

{<word>: <score>,<word1>: <score1>,..
}

And then we could check each word in the text and sum up the scores. But this is something that Andrew Sliwinski has already done with sentiment. So we're going to use that instead of coding everything from scratch.

然后我们可以检查文本中的每个单词并总结分数。但这是安德鲁·斯利温斯基(Andrew Sliwinski)出于情感已经做过的事情。因此，我们将使用它而不是从头开始编写所有代码。

To install we use npm install sentiment and open server.js so we can import the library with:

要安装，我们使用npm install感官并打开server.js，以便我们可以通过以下方式导入库：

const Sentiment = require('sentiment');

Followed by changing the route "/emotion" to:

接着将路线“ / emotion”更改为：

app.get('/emotion', function(req, res) {const sentiment = new Sentiment()const text = req.query.text // this returns our request query "text"const score = sentiment.analyze(text);res.send(score)
})

sentiment.analyze(<our_text_variable>) does the steps described before: it checks each word of our text against AFINN's list and gives us a score at the end.

sentiment.analyze(<our_text_variable>)执行前面描述的步骤：对照AFINN的列表检查文本的每个单词，最后给我们一个分数。

The variable score will have an object similar to this:

可变分数将具有类似于以下内容的对象：

Now that we have the score returned, we just have to make it show in our browser.

现在我们已经返回了分数，我们只需要在浏览器中显示它即可。

Note: AFINN is in English. While we can select other languages in the Web Speech API we would have to find a scored list similar to AFINN in our desired language to make the matching work.

注意： AFINN是英文。虽然我们可以在Web Speech API中选择其他语言，但我们必须找到一种与AFINN类似的评分列表，以我们所需的语言进行匹配。

让它微笑 (Making it smile)

For our last step, we will update our index.html to display an area where we can show the emoji. So we change it to the following:

对于最后一步，我们将更新index.html 显示可以显示表情符号的区域。因此，我们将其更改为以下内容：

<html><head><title>Speech to emotion</title><link rel="stylesheet" href="style/css/emojis.css"></head><body><!-- We replace the "nothing for now" --><div class="emoji"><img class="idle"></div><!-- And leave the rest alone --><script src="recognition.js"></script></body>
</html>

The emoji used in this project are free for commercial use and can be found here. Kudos to the artist.

此项目中使用的表情符号可免费用于商业用途，可以在此处找到。对艺术家表示敬意。

We download the icons we like and add them to the images folder. We will be needing emoji for:

我们下载所需的图标并将其添加到images文件夹。我们将需要表情符号用于：

error - When an error occurs

错误 -发生错误时
idle - Whenever the microphone is not active

空闲 -麦克风不活动时
listening - When the microphone is connected and waiting for input

收听 -连接麦克风并等待输入时
negative - For positive scores

负数-分数为正
neutral - For when the score is zero

中立-适用于分数为零时
positive - For negative scores

正面-负面分数
searching - For when our server request is being done

搜索-当我们的服务器请求完成时

And in our emojis.css we simply add:

在我们的emojis.css中，我们只需添加：

When we reload the page after these changes it'll show the idle emoji. It never changes, though, since we haven't replaced our idle class in the <img> element depending on the scenario.

在进行这些更改后，当我们重新加载页面时，它将显示空闲表情符号。但是，它永远不会改变，因为我们没有根据情况替换<img>元素中的空闲类。

To fix that we go one last time to our recognition.jsfile. There, we're going to add a function to change the emoji:

为了解决这个问题，我们最后一次访问了ognition.js 文件。在那里，我们将添加一个函数来更改表情符号：

/*** @param {string} type - could be any of the following:*   error|idle|listening|negative|positive|searching*/
function setEmoji(type) {const emojiElem = document.querySelector('.emoji img')emojiElem.classList = type
}

On the response of our server request, we add the check for positive, negative or neutral score and call our setEmoji function accordingly:

根据服务器请求的响应，我们添加正，负或中性分数的检查并调用setEmoji函数因此：

Finally, we add the events onerror and onaudiostart and change the event onend so we have them set with the proper emoji.

最后，我们添加事件onerror和onaudiostart并更改事件onend，以便为它们设置适当的表情符号。

recognition.onerror = function(event) {console.error('Recognition error -> ', event.error)setEmoji('error')}recognition.onaudiostart = function() {setEmoji('listening')}recognition.onend = function() {setEmoji('idle')}

Our final recognition.js file should look something like this:

我们最终的identification.js文件应如下所示：

document.addEventListener('DOMContentLoaded', speechToEmotion, false);function speechToEmotion() {const recognition = new webkitSpeechRecognition()recognition.lang = 'en-US'recognition.continuous = truerecognition.onresult = function(event) {const results = event.results;const transcript = results[results.length-1][0].transcriptconsole.log(transcript)setEmoji('searching')fetch(`/emotion?text=${transcript}`).then((response) => response.json()).then((result) => {if (result.score > 0) {setEmoji('positive')} else if (result.score < 0) {setEmoji('negative')} else {setEmoji('listening')}}).catch((e) => {console.error('Request error -> ', e)recognition.abort()})}recognition.onerror = function(event) {console.error('Recognition error -> ', event.error)setEmoji('error')}recognition.onaudiostart = function() {setEmoji('listening')}recognition.onend = function() {setEmoji('idle')}recognition.start();/*** @param {string} type - could be any of the following:*   error|idle|listening|negative|positive|searching*/function setEmoji(type) {const emojiElem = document.querySelector('.emoji img')emojiElem.classList = type}
}

And by testing our project we can now see the final results:

通过测试我们的项目，我们现在可以看到最终结果：

Note: Instead of a console.log to check what the recognition understood, we can add an element on our html and replace the console.log. That way we always have access to what it understood.

注意：我们可以在html上添加一个元素并替换console.log ，而不是用console.log来检查识别的内容。这样，我们始终可以访问它所了解的内容。

结束语 (Final remarks)

There are some areas where this project can be vastly improved:

在某些方面可以大大改善该项目：

it can't detect sarcasm 它无法检测到讽刺
there is no way to check if you're enraged due to the censorship of the speech to text API 由于语音对文本API的审查，因此无法检查您是否感到愤怒
there's probably a way to do it with just voice without conversion to text.可能有一种方法可以只使用语音而无需转换为文本。

From what I saw while researching this project, there are implementations that check if your tone and mood will lead to a sale in a call centre. And the newsletter I got was from Grammarly, which is using it to check the tone of what you write. So as you can see there are interesting applications.

从我在研究该项目时所看到的，有一些实现可以检查您的语气和情绪是否会导致呼叫中心的销售。我收到的通讯来自Grammarly，它正在使用它来检查您所写内容的语气。因此，如您所见，有一些有趣的应用程序。

Hopefully, this content has helped out in some way. If anybody builds anything using this stack let me know – it's always fun to see what people build.

希望该内容在某种程度上有所帮助。如果有人使用该堆栈构建任何东西，请告诉我-看看人们在构建什么总是很有趣。

The code can be found in my github here.

该代码可以在我的github上找到。

See you in the next one, in the meantime, go code something!

在下一个与您见面的同时，编写一些代码！

翻译自: https://www.freecodecamp.org/news/speech-to-sentiment-with-chrome-and-nodejs/

react-emotion