安全警报该站点安全证书

Citizen scans thousands of public first responder radio frequencies 24 hours a day in major cities across the US. The collected information is used to provide real-time safety alerts about incidents like fires, robberies, and missing persons to more than 5M users. Having humans listen to 1000+ hours of audio daily made it very challenging for the company to launch new cities. To continue scaling, we built ML models that could discover critical safety incidents from audio.

公民每天24小时在美国主要城市扫描成千上万个公共第一响应无线电频率。所收集的信息用于向超过500万用户提供有关火灾，抢劫和失踪人员等事件的实时安全警报。每天让人们听1000多个小时的音频，对于公司启动新城市来说非常困难。为了继续扩展，我们建立了ML模型，可以从音频中发现严重的安全事件。

Our custom software-defined radios (SDRs) capture large swathes of radio frequency (RF) and create optimized audio clips that are sent to an ML model to flag relevant clips. The flagged clips are sent to operations analysts to create incidents in the app, and finally, users near the incidents are notified.

我们的定制软件定义的无线电(SDR)捕获了大量的射频(RF)并创建优化的音频剪辑，然后将其发送到ML模型以标记相关的剪辑。标记的剪辑将发送给运营分析人员，以在应用程序中创建事件，最后，事件附近的用户将得到通知。

Figure 1. Safety alerts workflow (Image by Author)

使公共语音转文本引擎适应我们的问题领域 (Adapting a Public Speech-to-Text Engine to Our Problem Domain)

Figure 2. Clip classifier using public speech-to-text engine (Image by Author)

We started with a top performing speech-to-text engine based on the word error rate (WER). There are a lot of special codes used by police that are not part of the normal vernacular. For example, an NYPD officer requests backup units by transmitting a “Signal 13”. We customized the vocabulary to our domain using speech contexts.

我们从基于单词错误率(WER)的性能最高的语音到文本引擎开始。警察使用许多特殊代码，这些特殊代码不属于普通语言。例如，NYPD官员通过发送“信号13”来请求备份单元。我们使用语音上下文针对我们的领域定制了词汇表。

We also boosted some words to fit our domain, for example, “assault” isn’t used colloquially, but is very common in our use case. We had to bias our models towards detecting “assault” over “a salt”.

我们还增加了一些词来适合我们的领域，例如，“突击”不是口语化的，但在我们的用例中很常见。我们不得不将模型偏向于检测“攻击”而不是“盐”。

After tuning the parameters, we were able to get reasonable accuracy for transcriptions in some cities. The next step was to use the transcribed data of the audio clips and figure out which ones were relevant to Citizen.

调整参数后，我们能够在某些城市中获得合理的转录准确性。下一步是使用音频片段的转录数据，找出哪些与公民有关。

基于转录和音频特征的二进制分类器 (Binary Classifier Based on Transcriptions and Audio Features)

We modeled a binary classification problem with the transcriptions as input and a confidence level as output. XGBoost gave us the best performance on our dataset.

我们用转录作为输入，置信度作为输出对二进制分类问题建模。 XGBoost在我们的数据集上为我们提供了最佳性能。

We had insight from someone who previously worked in law enforcement that radio transmissions about major incidents in some cities are preceded by special alert tones to get the attention of police on the ground. This extra feature helped make our model more reliable, especially in cases of bad transcriptions. Some other useful features we found were the police channel and transmission IDs.

我们从以前在执法部门工作过的人那里了解到，在某些城市发生重大事件的无线电广播之前，会先发出特殊的提示音，以引起现场警察的注意。这个额外的功能使我们的模型更加可靠，尤其是在转录错误的情况下。我们发现其他一些有用的功能是警察通道和传输ID。

We A/B tested the ML model in operations workflow. After a few days of running the test, we noticed no degradation in the incidents created by analysts who were using the model-flagged clips only.

我们A / B在操作流程中测试了ML模型。经过几天的测试，我们发现，仅使用模型标记的剪辑的分析师所产生的事件没有降低。

We launched the model in a few cities. Now a single analyst could handle multiple cities at once, which wasn’t previously possible! With the new spare capacity on operations, we were able to launch multiple new cities.

我们在一些城市推出了该模型。现在，一个分析师可以一次处理多个城市，这在以前是不可能的！有了新的运营备用容量，我们得以启动多个新城市。

Figure 3. Model rollout leading to a significant reduction in audio for analysts (Image by Author)

超越公共语音转文字引擎 (Beyond a Public Speech-to-Text Engine)

The model didn’t turn out to be a panacea for all our problems. We could only use it in a few cities which had good quality audio. Public speech-to-text engines are trained on phone models with different acoustic profile than radios; as a result, the transcription quality was sometimes unreliable. Transcriptions were completely unusable for the older analog systems, which were very noisy.

该模型并没有成为解决我们所有问题的灵丹妙药。我们只能在有高质量音频的几个城市中使用它。公开语音到文本引擎在电话模型上接受了与收音机不同的声学配置的训练；结果，转录质量有时是不可靠的。转录对于嘈杂的老式模拟系统是完全不可用的。

We tried multiple models from multiple providers, but none of them were trained on an acoustic profile similar to our dataset and couldn’t handle noisy audio.

我们尝试了来自多个提供商的多个模型，但是没有一个模型是在类似于我们的数据集的声学轮廓上进行训练的，并且无法处理嘈杂的音频。

We explored replacing the speech-to-text engine with the one trained on our data while keeping the rest of the pipeline the same. However, we needed several hundred hours of transcription data for our audio which was very slow and expensive to generate. We had an option to optimize the process by only transcribing the “important” words defined in our vocabulary and adding blanks for the irrelevant words — but that was still just an incremental reduction in effort.

我们研究了用训练有素的数据替换语音到文本引擎，同时保持其余管道不变。但是，我们需要数百小时的转录数据来获取音频，这非常缓慢且生成成本很高。我们可以选择仅通过转录词汇中定义的“重要”单词并为不相关的单词添加空格来优化流程的方法，但这仍然只是逐步减少的工作量。

Eventually, we decided to build a custom speech processing pipeline for our problem domain.

最终，我们决定为我们的问题域建立定制的语音处理管道。

卷积神经网络的关键词识别 (Convolutional Neural Network for Keyword Spotting)

Since we only care about the presence of keywords, we didn’t need to find the right order of words and could reduce our problem to keyword spotting. That was a much easier problem to solve! We decided to do so using a convolutional neural network (CNN) trained on our dataset.

由于我们只关心关键字的存在，因此我们不需要找到正确的单词顺序，并且可以将我们的问题归结为关键字发现。那是一个容易解决的问题！我们决定使用在我们的数据集上训练的卷积神经网络(CNN)来做到这一点。

Using CNNs over Recurrent neural networks (RNNs) or Long short-term memory (LSTM) models meant that we could train much faster and had quicker iterations. We also evaluated using the Transformer model which is massively parallel but requires a lot of hardware to run. Since we were only looking for short term dependencies between audio segments to detect words, a computationally simple CNN seemed a better choice over Transformers and it freed up hardware for us to be more vigorous with hyperparameter tuning.

在递归神经网络(RNN)或长短期记忆(LSTM)模型上使用CNN意味着我们可以更快地训练并且迭代更快。我们还使用了大规模并行但需要大量硬件才能运行的Transformer模型进行了评估。由于我们只是在寻找音频片段之间的短期依赖关系来检测单词，因此与Transformers相比，计算简单的CNN似乎是更好的选择，并且它释放了硬件，使我们可以更灵活地进行超参数调整。

Figure 4. Clip flagging model with a CNN for keyword spotting (Image by Author)

We split the audio clips into fixed duration subclips. We gave a positive label to a subclip if a vocabulary word was present. We then marked an audio clip as useful if any such subclip was found in it. During the training process, we tried how varying the duration of subclips affected our convergence performance. Long clips made it much harder for the model to figure out which portion of the clip was useful and also harder to debug. Short clips meant that words partially appeared across multiple clips, which made it harder for the model to identify them. We were able to tune this hyperparameter and find a reasonable duration.

我们将音频片段分成固定持续时间的子片段。如果存在词汇，则给子剪辑加上正标签。然后，如果在其中找到任何此类子剪辑，则将音频剪辑标记为有用。在训练过程中，我们尝试了改变子剪辑的持续时间如何影响我们的收敛性能。较长的剪辑使模型更难确定剪辑的哪个部分有用并且也较难调试。短剪辑意味着单词在多个剪辑中部分出现，这使得模型更难识别它们。我们能够调整此超参数并找到合理的持续时间。

For each subclip, we convert the audio into MFCC coefficients and also add the first and second-order derivatives. The features are generated with a frame size of 25ms and stride of 10ms. The features are then fed into a neural network based on Keras Sequential model using a Tensorflow backend. The first layer is a Gaussian noise which makes the model more robust to noise differences between different radio channels. We tried an alternative approach of artificially overlaying real noise to clips, but that slowed down training time significantly with no meaningful performance gains.

对于每个子剪辑，我们将音频转换为MFCC系数，还添加一阶和二阶导数。生成的特征具有25ms的帧大小和10ms的步幅。然后使用Tensorflow后端将特征输入基于Keras 序列模型的神经网络中。第一层是高斯噪声，它使模型对不同无线电信道之间的噪声差异更鲁棒。我们尝试了一种将人工噪声人为地叠加到片段上的替代方法，但是这种方法大大降低了训练时间，并且没有明显的性能提升。

We then added subsequent layers of Conv1D, BatchNormalization, and MaxPooling1D. Batch normalization helped with the model convergence, and max pooling helped in making the model more robust to minor variations in speech and also to channel noise. Also, we tried adding dropout layers, but those didn’t improve the model meaningfully. Finally, we added a densely-connected neural network layer which fed into a single output-dense layer with sigmoid activation.

然后，我们添加了Conv1D，BatchNormalization和MaxPooling1D的后续层。批量归一化有助于模型收敛，最大池化有助于使模型对语音中的微小变化和信道噪声更加健壮。另外，我们尝试添加了辍学层，但是这些并没有有意义地改善模型。最后，我们添加了一个紧密连接的神经网络层，该层通过S型激活被馈送到单个输出密集层。

生成标签数据 (Generating Labeled Data)

Figure 5. Labeling process for audio clips (Image by Author)

To label the training data, we gave annotators the list of keywords for our domain and asked them to mark the start and end positions within a clip along with the word label if any of the vocabulary words were present.

为了标记训练数据，我们为注释者提供了我们域的关键字列表，并要求他们在片段中标记单词的开始和结束位置以及单词标签(如果存在任何词汇)。

To ensure the annotations were reliable, we had a 10% overlap across annotators and calculated how they performed on the overlapped clips. Once we had ~50 hours of labeled data, we started the training process. We kept collecting more data while iterating on the training process.

为确保注释可靠，我们在注释器之间有10％的重叠，并计算了它们在重叠剪辑上的表现。拥有约50个小时的标记数据后，我们便开始了培训过程。我们不断地在训练过程中不断收集更多数据。

Since some words in our vocabulary were much more common than others, our model had a reasonable performance on common words but struggled with rarer words with fewer examples. We tried creating artificial examples of those words by overlaying the word utterance in other clips. However, the performance gains were not commensurate with actually getting labeled data for those words. Eventually, as our model improved with common words, we ran it on unlabeled audio clips and excluded the ones where the model found those words. That helped us reduce the redundant words from our future labeling.

由于词汇中的某些单词比其他单词普遍得多，因此我们的模型对常见单词具有合理的表现，但与较少的示例(较少的示例)一起苦苦挣扎。我们尝试通过将单词话语覆盖在其他片段中来创建这些单词的人工示例。但是，性能的提高与实际获得这些单词的标签数据并不相称。最终，随着我们的模型改进了常用词，我们将其在未标记的音频剪辑上运行，并排除了模型找到这些词的剪辑。这有助于我们减少将来标签中的多余单词。

模型发布 (Model Launch)

After several iterations of data collection and hyperparameter tuning, we were able to train a model with high recall on our vocabulary words and reasonable precision. High recall was very important to capture critical safety alerts. The flagged clips are always listened to before an alert is sent, so false positives were not a huge concern.

经过数次数据收集和超参数调整的迭代，我们能够训练出词汇量和合理精确度高的召回模型。高召回率对于捕获关键的安全警报非常重要。发送警报之前，始终会监听标记的剪辑，因此误报并不是一个大问题。

We A/B tested the model in some boroughs of New York City. The model was able to cut down audio volume by 50–75% (depending on the channel). It also clearly outcompeted our model trained on public speech-to-text engine since NYC has very noisy audio due to analog systems.

我们A / B在纽约市的一些行政区测试了该模型。该模型能够将音频音量降低50％至75％(取决于通道)。由于纽约市由于模拟系统而产生的声音非常嘈杂，因此它显然也胜过我们在公共语音转文本引擎上训练过的模型。

Somewhat surprisingly, we then found that the model transferred well to audio from Chicago even though the model was trained on NYC data. After collecting a few hours of Chicago clips, we were able to transfer-learn from the NYC model to get reasonable performance in Chicago.

出乎意料的是，我们随后发现该模型可以很好地从芝加哥传输到音频，即使该模型是根据NYC数据进行训练的。收集了几个小时的芝加哥片段后，我们就可以从纽约市模型中转移学习信息，从而在芝加哥获得合理的表现。

结论 (Conclusion)

Our speech processing pipeline with the custom deep neural network was broadly applicable to police audio from major US cities. It discovered critical safety incidents from the audio, allowing Citizen to expand rapidly into cities across the country and serve the mission of keeping communities safe.

我们带有定制深度神经网络的语音处理管道广泛适用于美国主要城市的警察音频。它从音频中发现了严重的安全事件，使公民能够Swift扩展到全国的城市，并以维护社区安全为使命。

Picking computationally simple CNN architecture over RNN, LSTM, or Transformer and simplifying our labeling process were major breakthroughs that allowed us to outperform public speech-to-text models in a very short time and with limited resources.

在RNN，LSTM或Transformer上选择计算简单的CNN架构并简化我们的标记过程是主要的突破，这使我们能够在非常短的时间内和有限的资源上胜过公共语音转文本模型。

翻译自: https://towardsdatascience.com/how-deep-learning-can-keep-you-safe-with-real-time-crime-alerts-95778aca5e8a

安全警报该站点安全证书

查看全文

http://www.taodudu.cc/news/show-863512.html

现代分层、聚集聚类算法_分层聚类：聚集性和分裂性-解释
特斯拉自动驾驶使用的技术_使用自回归预测特斯拉股价
熊猫分发_实用熊猫指南
救命代码_救命！如何选择功能？
回归模型评估_评估回归模型的方法
gan学到的是什么_GAN推动生物学研究
揭秘机器学习
投影仪投影粉色_DecisionTreeRegressor —停止用于将来的投影！
机器学习中的随机过程_机器学习过程
ci/cd heroku_在Heroku上部署Dash或Flask Web应用程序。简易CI / CD。
图像纹理合成_EnhanceNet：通过自动纹理合成实现单图像超分辨率
变压器耦合和电容耦合_超越变压器和抱抱面的分类
梯度下降法_梯度下降
学习机器学习的项目_辅助项目在机器学习中的重要性
计算机视觉知识基础_我见你：计算机视觉基础知识
配对交易方法_COVID下的自适应配对交易，一种强化学习方法
设计数据密集型应用程序_设计数据密集型应用程序书评
pca 主成分分析_超越普通PCA：非线性主成分分析
全局变量和局部变量命名规则_变量范围和LEGB规则
dask 使用_在Google Cloud上使用Dask进行可扩展的机器学习
计算机视觉课_计算机视觉教程—第4课
用camelot读取表格_如何使用Camelot从PDF提取表格
c盘扩展卷功能只能向右扩展_信用风险管理：功能扩展和选择
使用OpenCV，Keras和Tensorflow构建Covid19掩模检测器
使用Python和OpenCV创建自己的“ CamScanner”
cnn图像进行预测_CNN方法：使用聚合物图像预测其玻璃化转变温度
透过性别看世界_透过树林看森林
gan神经网络_神经联觉：当艺术遇见GAN
rasa聊天机器人_Rasa-X是持续改进聊天机器人的独特方法
python进阶指南_Python特性工程动手指南

安全警报该站点安全证书_深度学习如何通过实时犯罪警报确保您的安全相关推荐

深度学习图像分类_深度学习时代您应该阅读的10篇文章了解图像分类
深度学习图像分类前言 (Foreword) Computer vision is a subject to convert images and videos into machine-under ...
ann人工神经网络_深度学习-人工神经网络（ANN）
ann人工神经网络 Building your first neural network in less than 30 lines of code. 用不到30行代码构建您的第一个神经网络. 1.W ...
脑电波之父:汉斯·贝格尔_深度学习，认识聪明的汉斯
脑电波之父:汉斯·贝格尔 Around 1900, a German farmer made an extraordinary claim: he had taught a horse basic a ...
处理器_深度学习及 KPU 基础知识
kpu 处理器_深度学习及 KPU 基础知识_weixin_39909212的博客-CSDN博客深度学习及 KPU 基础知识1. 阅读完本章文档可以了解什么?了解深度学习一些基础内容了解 K210 内 ...
深度学习深度前馈网络_深度学习前馈网络中的讲义第4部分
深度学习深度前馈网络 FAU深度学习讲义 (FAU Lecture Notes in Deep Learning) These are the lecture notes for FAU's YouT ...
贾扬清分享_深度学习框架caffe
Caffe是一个清晰而高效的深度学习框架,其作者是博士毕业于UC Berkeley的贾扬清,目前在Google工作.本文是根据机器学习研究会组织的online分享的交流内容,简单的整理了一下. ## ...
史上最全机器学习_深度学习毕设题目汇总——视频
下面是该类的一些题目: 题目基于RGB-D视频序列的大尺度场景三维语义表面重建技术研究基于上下文信息的视频目标跟踪问题研究基于上下文的视频理解关键技术研究基于动态纹理和卷积神经网络的视频烟雾探 ...
无人驾驶汽车系统入门（十二）——卷积神经网络入门，基于深度学习的车辆实时检测
无人驾驶汽车系统入门(十二)--卷积神经网络入门,基于深度学习的车辆实时检测上篇文章我们讲到能否尽可能利用上图像的二维特征来设计神经网络,以此来进一步提高识别的精度.在这篇博客中,我们学习一类专门用 ...
深度学习背后的数学_深度学习背后的简单数学
深度学习背后的数学 Deep learning is one of the most important pillars in machine learning models. It is based ...

安全警报该站点安全证书_深度学习如何通过实时犯罪警报确保您的安全

使公共语音转文本引擎适应我们的问题领域 (Adapting a Public Speech-to-Text Engine to Our Problem Domain)

基于转录和音频特征的二进制分类器 (Binary Classifier Based on Transcriptions and Audio Features)

超越公共语音转文字引擎 (Beyond a Public Speech-to-Text Engine)

卷积神经网络的关键词识别 (Convolutional Neural Network for Keyword Spotting)

生成标签数据 (Generating Labeled Data)

模型发布 (Model Launch)

结论 (Conclusion)

相关文章：

安全警报该站点安全证书_深度学习如何通过实时犯罪警报确保您的安全相关推荐

最新文章

热门文章

安全警报 该站点安全证书_深度学习如何通过实时犯罪警报确保您的安全

使公共语音转文本引擎适应我们的问题领域 (Adapting a Public Speech-to-Text Engine to Our Problem Domain)

基于转录和音频特征的二进制分类器 (Binary Classifier Based on Transcriptions and Audio Features)

超越公共语音转文字引擎 (Beyond a Public Speech-to-Text Engine)

卷积神经网络的关键词识别 (Convolutional Neural Network for Keyword Spotting)

生成标签数据 (Generating Labeled Data)

模型发布 (Model Launch)

结论 (Conclusion)

相关文章：

安全警报 该站点安全证书_深度学习如何通过实时犯罪警报确保您的安全相关推荐

最新文章

热门文章

安全警报该站点安全证书_深度学习如何通过实时犯罪警报确保您的安全

安全警报该站点安全证书_深度学习如何通过实时犯罪警报确保您的安全相关推荐