深度学习 噪声抑制

Credits: Zoom
学分:缩放

At the time of boom of online video conferencing and virtual communication, the ability of a platform to suppress background noise plays a crucial roll to give it a leading edge. Platforms like Google Meet constantly use Machine Learning to perform noise suppression to provide the best audio quality possible. Today I will show you how you could make your own Deep Learning model to perform Noise Suppression

在在线视频会议和虚拟通信蓬勃发展之际,平台抑制背景噪声的能力发挥了至关重要的作用,从而使其具有领先优势。 像Google Meet这样的平台不断使用机器学习来执行噪声抑制,以提供最佳的音频质量。 今天,我将向您展示如何创建自己的深度学习模型来执行噪声抑制

抑制噪音有什么大不了的?(What’s the Big Deal with Noise Suppression?)

The task of Noise Suppression can be approached in a few different ways. These might include Generative Adversarial Networks (GAN’s), Embedding Based Models, Residual Networks, etc. Irrespective of the approach, there are two major problems with Noise Suppression

噪声抑制的任务可以通过几种不同的方法来解决。 这些可能包括生成对抗网络(GAN),基于嵌入的模型,残差网络等。不管采用哪种方法,噪声抑制都有两个主要问题

  1. Handling variable length audio sequences

    处理可变长度的音频序列

  2. Slow processing time resulting in lag

    缓慢的处理时间导致滞后

I will show you some basic methods in this article on how to deal with these problems

我将在本文中向您展示一些有关如何解决这些问题的基本方法

开始吧(Let’s Start)

We will first import our libraries. We will be using tensorflow. You are free to implement a PyTorch version of the same.

我们将首先导入我们的库。 我们将使用tensorflow。 您可以自由地实现相同的PyTorch版本。

import tensorflow as tf
from tensorflow.keras.layers import Conv1D,Conv1DTranspose,Concatenate,Input
import numpy as np
import IPython.display
import glob
from tqdm.notebook import tqdm
import librosa.display
import matplotlib.pyplot as plt

The data we will be using is a combination of Clean and Noisy audio samples of different sizes. The dataset is provided by University of Edinburgh and can be downloaded from here

我们将使用的数据是不同大小的Clean和Noisy音频样本的组合。 数据集由爱丁堡大学提供,可从此处下载

加载和可视化数据(Loading and Visualising the data)

We will use tensorflow’s tf.audio module to load our data. Using tf.audio() along with tf.io.read_file() has given me 50% faster loading times as compared to librosa.load() because of tensorflow using the GPU

我们将使用tensorflow的tf.audio模块加载我们的数据。 与librosa.load()相比,将tf.audio()与tf.io.read_file()结合使用可使我的加载时间缩短了50%,这是因为使用GPU进行了张量流

clean_sounds = glob.glob('/content/CleanData/*')
noisy_sounds = glob.glob('/content/NoisyData/*')clean_sounds_list,_ = tf.audio.decode_wav(tf.io.read_file(clean_sounds[0]),desired_channels=1)
for i in tqdm(clean_sounds[1:]):so,_ = tf.audio.decode_wav(tf.io.read_file(i),desired_channels=1)clean_sounds_list = tf.concat((clean_sounds_list,so),0)noisy_sounds_list,_ = tf.audio.decode_wav(tf.io.read_file(noisy_sounds[0]),desired_channels=1)
for i in tqdm(noisy_sounds[1:]):so,_ = tf.audio.decode_wav(tf.io.read_file(i),desired_channels=1)noisy_sounds_list = tf.concat((noisy_sounds_list,so),0)clean_sounds_list.shape,noisy_sounds_list.shape

Here we load our individual audio files using tf.audio.decode_wav() and concatentate them to get two tensors named clean_sounds_list and noisy_sounds_list. This process takes about 3–4 minutes to complete and is visually represented using the tqdm loading bar

在这里,我们使用tf.audio.decode_wav()加载我们的单个音频文件,并使其合并以获得两个名为clean_sounds_listnoisy_sounds_list的张量 此过程大约需要3-4分钟才能完成,并使用tqdm加载栏直观地表示出来

batching_size = 12000clean_train,noisy_train = [],[]for i in tqdm(range(0,clean_sounds_list.shape[0]-batching_size,batching_size)):clean_train.append(clean_sounds_list[i:i+batching_size])noisy_train.append(noisy_sounds_list[i:i+batching_size])clean_train = tf.stack(clean_train)
noisy_train = tf.stack(noisy_train)clean_train.shape,noisy_train.shape

After the loading is done, we will need to make uniform splits of the one big audio waveform. Although this is not compulsory to do, our main aim is to convert this model to a tflite model which currently, does not support variable length inputs. I decided an arbitrary value of batching_size as 12000. You are free to change it but keep it as small as possible. It will be useful later.

加载完成后,我们将需要对一个大音频波形进行均匀分割。 尽管这不是强制性的,但我们的主要目标是将该模型转换为tflite模型,该模型目前不支持可变长度输入。 我决定将batching_size的任意值设置为12000。您可以随意更改它,但请使其尽可能小。 以后会有用。

Clean Audio
干净的音频
Noisy Audio
嘈杂的音频

For the visualising part, we use librosa’s display module which basically uses matplotlib in the backend to plot the data. On plotting the data as seen above, we can see that the noise is quite visible. The noise can be anything ranging from people and cars to dish-washing sounds.

对于可视化部分,我们使用librosa的显示模块,该模块基本上在后端使用matplotlib绘制数据。 如上所示,在绘制数据时,我们可以看到噪声非常明显。 噪音可以是从人和汽车到洗碗碟声音的任何东西。

创建用于流水线的tf.data.Dataset (Creating a tf.data.Dataset for pipelining)

We will now create a very basic helper function called get_dataset() to generate a tf.data.Dataset. We choose 40000 samples for training and the remaining 5000 for testing. Again, you are free to tweak and add to this but I will not be going into the depth here as pipeline optimization is not the main goal of this article

现在,我们将创建一个非常基本的辅助函数,称为get_dataset()来生成tf.data.Dataset。 我们选择40000个样本进行训练,其余5000个样本进行测试。 同样,您可以随意调整并添加此内容,但由于管道优化不是本文的主要目标,因此我不会在此深入探讨

def get_dataset(x_train,y_train):dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))dataset = dataset.shuffle(100).batch(64,drop_remainder=True)return datasettrain_dataset = get_dataset(noisy_train[:40000],clean_train[:40000])
test_dataset = get_dataset(noisy_train[40000:],clean_train[40000:])

创建模型(Creating the Model)

The code for the model architecture is as follows:

模型架构的代码如下:

inp = Input(shape=(batching_size,1))
c1 = Conv1D(2,32,2,'same',activation='relu')(inp)
c2 = Conv1D(4,32,2,'same',activation='relu')(c1)
c3 = Conv1D(8,32,2,'same',activation='relu')(c2)
c4 = Conv1D(16,32,2,'same',activation='relu')(c3)
c5 = Conv1D(32,32,2,'same',activation='relu')(c4)dc1 = Conv1DTranspose(32,32,1,padding='same')(c5)
conc = Concatenate()([c5,dc1])
dc2 = Conv1DTranspose(16,32,2,padding='same')(conc)
conc = Concatenate()([c4,dc2])
dc3 = Conv1DTranspose(8,32,2,padding='same')(conc)
conc = Concatenate()([c3,dc3])
dc4 = Conv1DTranspose(4,32,2,padding='same')(conc)
conc = Concatenate()([c2,dc4])
dc5 = Conv1DTranspose(2,32,2,padding='same')(conc)
conc = Concatenate()([c1,dc5])
dc6 = Conv1DTranspose(1,32,2,padding='same')(conc)
conc = Concatenate()([inp,dc6])
dc7 = Conv1DTranspose(1,32,1,padding='same',activation='linear')(conc)
model = tf.keras.models.Model(inp,dc7)
model.summary()

The model is a purely Convolutional one. The goal is to find a bunch of filters so as to help minimize the background noise. To help with this, we add residual connections to help with context from the original audio sample. The idea behind this model is derived from the implementation of SEGAN network [et.al Santiago Pascual]. The idea derives from the fact that a purely convolutional network can handle multiple shape inputs with ease, leading to more flexibility. The convolutional nature forces the model to focus on the temporally close correlations throughout the model. The model can be split into two parts, the convolutions and the de-convolutions(or upsampling layers). The strided convolutional layers behave as an auto-encoder where after N layers, a reduced representation of the input is obtained. The deconvolutions perform exactly the opposite strided procedures to obtain a cleaned representation of the noisy input. The skip connections provide the required context to the deconvolution layers at every step which results in better overall results.

该模型是纯粹的卷积模型。 目标是找到一堆滤波器,以帮助最小化背景噪声。 为了解决这个问题,我们添加了残余连接以帮助处理原始音频样本中的上下文。 该模型背后的思想源自SEGAN网络的实施[et.al Santiago Pascual ]。 该思想源于以下事实:纯卷积网络可以轻松处理多种形状输入,从而带来更大的灵活性。 卷积性质迫使模型将注意力集中在整个模型的时间紧密相关性上。 该模型可以分为两部分:卷积和反卷积(或上采样层)。 跨步的卷积层表现为自动编码器,其中在N层之后,获得了输入的简化表示。 解卷积执行完全相反的跨步过程,以获得带噪输入的清晰表示。 跳过连接在每个步骤都为反卷积层提供了所需的上下文,从而带来了更好的总体效果。

The model is compiled with a Mean Absolute Loss

使用平均绝对损失编译模型

The choice of optimizer was difficult as, SGD,RMSprop and Adam stood pretty close. I finally went ahead with Adam just because of it’s slightly more robust nature. Some hyperparameter tweaking gave me a pretty good learning rate of 0.002

由于SGD,RMSprop和Adam处于非常接近的位置,因此很难选择优化器。 我最终还是选择了Adam,只是因为它的特性更加强大。 一些超参数调整使我的学习率达到了0.002

The final results:

最终结果:

  1. Training loss: 0.0117
    训练损失:0.0117
  2. Testing loss: 0.0117
    测试损失:0.0117
Visible reduction of noise
明显减少噪音

The results are quite pleasing but we are not done yet. We still have to handle the inference procedure for variable sized inputs. We will do that next

结果非常令人满意,但我们尚未完成。 对于可变大小的输入,我们仍然必须处理推理过程。 接下来我们会做

处理可变输入形状(Handling the Variable Input Shape)

Our model is trained with a very specific input shape which is dependent on our batching_size. To allow multiple shape inputs, a simple strategy will work really well.

我们的模型使用非常具体的输入形状进行训练,具体取决于我们的batching_size。 为了允许多种形状输入,一种简单的策略将非常有效。

We run our model through all the splits till the (n-1)th split. Consider the example where batching_size is 12000 and your audio array is of shape (37500,). In this case we split the audio waveform into min(37500/12000) = 3 splits. The remaining part of the array will be of shape (1500,). To fix this problem we sample another frame but this time from the rear end of the array. Something like this

我们通过所有拆分运行模型,直到第(n-1)个拆分为止。 考虑以下示例,其中batching_size为12000,并且音频阵列的形状为(37500,)。 在这种情况下,我们将音频波形分割为min(37500/12000)= 3分割。 数组的其余部分将具有形状(1500,)。 为了解决这个问题,我们从阵列的后端取样另一个帧,但这一次。 像这样

Overlapped Frames
重叠的镜框
  1. Now, we run all the 4 splits through the model to get individual predictions.
    现在,我们对模型进行所有4个拆分,以获取单独的预测。
  2. From the output predictions, we extract the first three frames as they are and clip the last frame to get only the remaining part
    从输出预测中,我们按原样提取前三个帧,然后裁剪最后一帧以仅获取剩余部分

At this point, some code will help with clarity

在这一点上,一些代码将有助于澄清

def get_audio(path):audio,_ = tf.audio.decode_wav(tf.io.read_file(path),1)return audiodef inference_preprocess(path):audio = get_audio(path)audio_len = audio.shape[0]batches = []for i in range(0,audio_len-batching_size,batching_size):batches.append(audio[i:i+batching_size])batches.append(audio[-batching_size:])diff = audio_len - (i + batching_size)  # Calculation of length of remaining waveformreturn tf.stack(batches), diffdef predict(path):test_data,diff = inference_preprocess(path)predictions = model.predict(test_data)final_op = tf.reshape(predictions[:-1],((predictions.shape[0]-1)*predictions.shape[1],1))  # Reshape the array to get complete framesfinal_op = tf.concat((final_op,predictions[-1][-diff:]),axis=0)  # Concat last, incomplete frame to the restreturn final_op

好吧,那有多快?(Okay but how fast is it?)

%%timeit tf.squeeze(predict(noisy_sounds[3]))OUTPUT: 10 loops, best of 3: 31.3 ms per loop

If we specify the input shape of the model as (None,1), we can pass a variable length tensor to the model which gives even faster results. For now, we want to quantize the model for cross device compatibility.

如果我们将模型的输入形状指定为(None,1),则可以将可变长度张量传递给模型,从而获得更快的结果。 目前,我们想对模型进行量化以实现跨设备兼容性。

TFLite模型的优化和创建 (Optimization and Creation of TFLite Model)

Using the TFLiteConverter() is pretty straightforward. You pass the keras model along with an optimization strategy (TF documentation recommends using DEFAULT only) and write the converted model to a binary file for future use.

使用TFLiteConverter()非常简单。 您将keras模型与优化策略(TF文档建议仅使用DEFAULT)一起传递,并将转换后的模型写入二进制文件以备将来使用。

lite_model = tf.lite.TFLiteConverter.from_keras_model(model)
lite_model.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model_quant = lite_model.convert()with open('TFLiteModel.tflite','wb') as f:f.write(tflite_model_quant)

TFLite模型推论(TFLite Model Inference)

The preprocessing is similar to that of the Keras model but since I could not find anything on batching for TFLite models (please let me know if the support is present), I had to use a pythonic for loop for iterating over all splits. The code below describes instantiation of the Interpreter and allocation of tensors, followed by invoking it to get us our results

预处理与Keras模型类似,但是由于我在批处理中找不到TFLite模型的任何内容(请告诉我是否存在支持),因此我不得不使用pythonic for循环遍历所有拆分。 下面的代码描述了解释器的实例化和张量的分配,然后调用它来获得我们的结果

# Initializing the Interpreter and allocating tensors
interpreter = tf.lite.Interpreter(model_path='/content/TFLiteModel.tflite')
interpreter.allocate_tensors()def predict_tflite(path):test_audio,diff = inference_preprocess(path)input_index = interpreter.get_input_details()[0]["index"]output_index = interpreter.get_output_details()[0]["index"]preds = []for i in test_audio:interpreter.set_tensor(input_index, tf.expand_dims(i,0))  # We will have to pass individual splits since tflite doesn't support batching at this momentinterpreter.invoke()predictions = interpreter.get_tensor(output_index)preds.append(predictions)predictions = tf.squeeze(tf.stack(preds,axis=1))final_op = tf.reshape(predictions[:-1],((predictions.shape[0]-1)*predictions.shape[1],1))final_op = tf.concat((tf.squeeze(final_op),predictions[-1][-diff:]),axis=0)return final_op

Now the question arises, how better is this than the Keras model? The answer to that isn’t simple. Since I was unable to to process all batches together, the overall inference time was affected but the TFLite Model on it’s own is faster than the Keras model.

现在出现了问题,这比Keras模型好吗? 答案并不简单。 由于我无法同时处理所有批次,因此总体推理时间受到影响,但是TFLite模型本身比Keras模型要快。

%%timeit predict_tflite(noisy_sounds[3])OUTPUT: 10 loops, best of 3: 41.7 ms per loop

Out of all the advantages of the TFLite format, cross platform deployment is the biggest one. The model can now be ported much easily than the Keras model, not to mention the super small size of the model- just 346 kB

在TFLite格式的所有优势中,跨平台部署是最大的优势。 与Keras模型相比,该模型现在可以轻松移植,更不用说模型的超小尺寸-仅346 kB

Plot for TFLite Model
TFLite模型图

现在就这样!(That’s it for now!)

The model can be improved further by addition of filters, creating a deeper model and optimizing the pipeline but that is for next time. As we come to the end of the article I would like to mention some references and links.

可以通过添加过滤器,创建更深的模型并优化管道来进一步改善模型,但这是下一次。 当我们到本文结尾时,我想提及一些参考和链接。

  1. Colab Notebook for the code and audio samples: Here

    Colab Notebook的代码和音频示例:这里

  2. Dataset: Here

    数据集:此处

  3. SEGAN paper: Here

    SEGAN纸:在这里

Any comments or suggestions would be much appreciated. Thanks for reading!

任何意见或建议将不胜感激。 谢谢阅读!

翻译自: https://medium.com/analytics-vidhya/noise-suppression-using-deep-learning-6ead8c8a1839

深度学习 噪声抑制

http://www.taodudu.cc/news/show-4412175.html

相关文章:

  • 服务器创建虚拟环境跑代码
  • 分享155个ASP源码,总有一款适合您
  • PhishTank恶意URL数据集分享
  • 【Matlab图像去噪】改进非局部均值红外图像混合噪声【含源码 1640期】
  • 怦然心栋-冲刺日志(第10天)
  • 一名普通22届本科毕业生|前端程序员|22年年终总结
  • 2022年食盐市场现状
  • 食用盐的12个美容方法
  • 废水硝酸盐的深度去除
  • 海盐、低钠岩盐、腌制盐……这么多盐到底有什么不同啊?
  • 【Shiro权限管理】10.Shiro为密码加盐
  • 晶体生长计算机实验报告,食盐晶体生长实验报告
  • 【模糊综合评价的运用】——《电子舌技术在食用盐模糊感官评价中的应用》论文笔记(内附MATLAB程序)
  • 疯狂抢购食用盐了
  • Leaflet加载百度地图
  • vue使用百度地图3.0,使用JavaScriptAPI版,聚合点,个性化地图切换卫星地图
  • leaflet、cesium加载百度地图,加载自定义样式百度地图
  • 百度地图获取省市边界、设置图片背景
  • 百度地图自定义边界
  • 百度地图实现普通地图、卫星图、三维图、混合图(卫星图+路网)
  • 基于百度地图API在AI Studio上的卫星地图块图像处理与分类
  • 调整图片尺寸大小
  • 【已解决】LaTeX调整图片大小
  • 【latex】LaTeX调整图片大小的方法;自动调整合适的大小
  • 如何调整照片的大小?电脑怎么修改图片大小尺寸?
  • Python PIL调整图片大小、尺寸和转换图片格式,removebg改变图片背景、透明化处理
  • 标准方程法(正规方程法)
  • 如何用matlab中syms建立符号方程,用matlab求解符号方程及符号方程组
  • 五款堪称神器的网页翻译插件,不知道就亏大了!
  • 第三方在线地图源有哪些?

深度学习 噪声抑制_使用深度学习抑制噪声相关推荐

  1. 智能ai深度学习技术_人工智能深度学习与医学

    智能ai深度学习技术 As physicians, nurses, dentists, or any healthcare expert, we all have experienced the ea ...

  2. 深度残差网络_关于深度残差收缩网络,你需要知道这几点

    深度残差收缩网络是什么?为什么提出这个概念?它的核心步骤是什么?文章围绕深度残差收缩网络的相关研究,对这个问题进行了回答,与大家分享. 深度残差网络ResNet获得了2016年CVPR会议的最佳论文奖 ...

  3. 基于特征的对抗迁移学习论文_有关迁移学习论文

    如果你有好的想法,欢迎讨论! 1 Application of Transfer Learning in Continuous Time Series for Anomaly Detection in ...

  4. 从零开始学习编程_如何开始学习编程

    从零开始学习编程 最近有很多关于学习编程的话题. 与软件开发中的开放和待定职位相比,不仅人手不足 ,编程还是一种薪水 最高,工作满意度最高的职业 . 难怪有这么多人想要进入这个行业! 但是,究竟如何做 ...

  5. 深度置信网络_人工智能深度学习之父Hinton深度置信网络北大最新演讲(含PPT)...

    这是2019年5月14日Hinton在北大做的远程讲座 Abstract In 2006, there was a resurgence of interest in deep neural netw ...

  6. 深度特征提取方法_基于深度学习的文本数据特征提取方法之Glove和FastText

    作者:Dipanjan (DJ) Sarkar 编译:ronghuaiyang 导读 今天接着昨天的内容,给大家介绍Glove模型和FastText模型,以及得到的词向量如何应用在机器学习任务里. ( ...

  7. 深度学习数据驱动_利用深度学习实现手绘数据可视化的生成

    前一段时间,我开发了Sketchify, 该工具可以把任何以SVG为渲染技术的可视化转化为手绘风格.(参考手绘风格的数据可视化实现 Sketchify) 那么问题来了,很多的chart是以Canvas ...

  8. 深度学习去燥学习编码_通过编码学习编码

    深度学习去燥学习编码 "Teach Yourself to program in 10 years." That's how Peter Norvig - a Berkeley p ...

  9. 小样本点云深度学习库_论文 | 小样本学习综述

    转自:专知[https://www.zhuanzhi.ai/] [导读]现有的机器学习方法在很多场景下需要依赖大量的训练样本.但机器学习方法是否可以模仿人类,基于先验知识等,只基于少量的样本就可以进行 ...

最新文章

  1. Google开发者模式调试css样式的方法
  2. sort函数_MATLAB--数字图像处理 sort()函数
  3. 软件开发安全性_开发具有有效安全性的软件的最佳方法
  4. 业务爆发式增长,音视频服务如何做好质量监控与优化?
  5. qml入门学习(一):hello world
  6. 写出杨辉三角_认识杨辉三角
  7. [SAP ABAP开发技术总结]动态修改选择屏幕
  8. MySQL binlog_format (Mixed,Statement,Row)
  9. Java集合源码剖析-Java集合框架
  10. 小游戏《堆木头》开发
  11. WinISO5.3 注册码 不需要注册机!
  12. 2020年终系列:国内区块链专利授权总数不足3000项|链塔智库
  13. 第二章 Hadoop序列化
  14. Deeplearning4j 快速入门
  15. 谷歌google安装vue插件,(npm安装)避坑指南
  16. CodeForces 596B Wilbur and Array 贪心
  17. linux用户名设置,怎样更改linux的用户名
  18. 2021CCPC华为云挑战赛热身赛
  19. 各大协作机器人厂商人机安全协作方式(HRC)简介
  20. flask 定时器警告

热门文章

  1. vio前端图像匹配与对极几何
  2. blender2.8为模型添加材质和纹理
  3. dpdk环境搭建+创建dpdk项目,并连接dpdk库
  4. 8大模块、40个思维模型,打破思维桎梏,满足你工作不同阶段、场景的思维需求,赶紧收藏慢慢学
  5. eclipsepython插件_Eclipse安装配置PyDev插件
  6. 交付实施工程师是做什么的?
  7. three.js 导入显示模型的时候自动计算模型缩放比例
  8. 扰码器原理详解及verilog实现
  9. java调用R 画词云
  10. zookeeper领导者选举源码分析