最近在看speech enhancement 内容,看完谱减法部分后,在网上找相应的代码来看,然后将MATLAB代码转成Python代码,顺便学习一下Python的使用。

谱减法的基础实现:

论文《Enhancement of speech corrupted by acoustic noise》提出的实现:

算法流程如下:

效果如下:

这是一段火车站附近的录音,噪声比较平稳;设置的VAD阈值是3dB,一般应用上设置的是6dB

虽然floor值的存在填充了频谱中一定的 “波谷”, 但可以看到还是有一些音乐噪声的存在。

代码段

Python代码如下:

  1. #!/usr/bin/env python
    # encoding: utf-8
    '''
    @author: Kolor
    @license:
    @contact: colorsu1922@163.com
    @software: pycharm
    @file: spec_sub.py
    @time: 2020/12/21 23:44
    @desc:Implements the basic power spectral subtraction algorithm [1].Usage:  specsub(noisyFile, outputFile)noisyFile - noisy speech file in .wav formatoutputFile - enhanced output file in .wav formatAlgorithm uses the first 5 frames to estimate the noise psd, andthen uses a very simple VAD algorithm for updating the noise psd.Alternatively, the VAD in the file ss_rdc.m (this folder) can be used.References:[1] Berouti, M., Schwartz, M., and Makhoul, J. (1979). Enhancement of speechcorrupted by acoustic noise. Proc. IEEE Int. Conf. Acoust., Speech,Signal Processing, 208-211.Author: Philipos C. LoizouCopyright (c) 2006 by Philipos C. Loizou$Revision: 0.0 $  $Date: 10/09/2006 $
    -------------------------------------------------------------------------
    '''
    from typing import Typeimport librosa
    import numpy as np
    import sys
    import matplotlib.pyplot as plt
    import soundfile as sfdef rh_spec_sub(infile, outfile):print(np.__version__)x, samp_rate = librosa.load(infile, sr=8000, mono=True)# sys.exit("pause")# =============== Initialize variables ===============# Frame size in samplesframe_len = int(np.floor(20 * samp_rate / 1000))if np.remainder(frame_len, 2) == 1:frame_len = frame_len + 1# window overlap in percent of frame sizeoverlap_percent = 50len1 = int(np.floor(frame_len * overlap_percent / 100))len2 = int(frame_len - len1)# VAD threshold in dB SNRvad_thresh = 3# power exponentalpha = 2.0FLOOR = 0.002G = 0.9# define windowwin = np.hanning(frame_len+1)[0:frame_len]win_gain = len2 / np.sum(win)  # normalization gain for overlap+add with 50% overlap# Noise magnitude calculations - assuming that the first 5 frames is noise/silenceNFFT = int(2 * 2 ** next_pow2(frame_len))noise_mean = np.zeros(NFFT)j = 0for k in range(0, 5):windowed = np.multiply(win, x[j:j+frame_len])noise_mean = np.add(noise_mean, np.abs(np.fft.fft(windowed, NFFT)))j = j + frame_lennoise_mu = noise_mean / 5# --- allocate memory and initialize various variablesk = 0img = 1j  # np.sqrt(-1)x_old = np.zeros(len1)Nframs = int(np.floor(len(x)/len2) - 1)xfinal = np.zeros(Nframs * len2)# ===============================  Start Processing ==================================for n in range(0, Nframs):insign = np.multiply(win, x[k:k+frame_len])spec = np.fft.fft(insign, NFFT)sig = np.abs(spec)  # compute the magnitude# save the noisy phase informationtheta = np.angle(spec)SNR_seg = 10*np.log10(np.linalg.norm(sig, 2)**2 / np.linalg.norm(noise_mu, 2)**2)if alpha == 1.0:beta = berouti1(SNR_seg)else:beta = berouti(SNR_seg)# &&&&&&&&&&&&&&&sub_speech = sig ** alpha - beta * noise_mu ** alphadiffw = sub_speech - FLOOR * noise_mu ** alpha# floor negative componentz = np.argwhere(diffw < 0)if len(z) is not 0:sub_speech[z] = FLOOR * noise_mu[z] ** alpha# --- implement a simple VAD detector - -------------if SNR_seg < vad_thresh:   # Update noise spectrumnoise_temp = G * noise_mu ** alpha + (1 - G) * sig ** alphanoise_mu = noise_temp ** (1 / alpha)  # new noise spectrum# to ensure conjugate symmetry for real reconstructionsub_speech[int(NFFT/2) + 1: NFFT] = np.flipud(sub_speech[1: int(NFFT/2)])x_phase = (sub_speech**(1/alpha)) * (np.cos(theta) + img * (np.sin(theta)))# take the iFFTxi = np.real(np.fft.ifft(x_phase))# plt.plot(xi)# plt.show()# --- Overlap and add ---------------xfinal[k:k+len2] = x_old + xi[0:len1]x_old = xi[len1:frame_len]k = k + len2# write outputsf.write(outfile, win_gain * xfinal, samp_rate, 'PCM_16')def berouti1(snr):beta = 0if -5.0 <= snr <= 20:beta = 3 - snr * 2 / 20elif snr < -5.0:beta = 4elif snr > 20:beta = 1return betadef berouti(snr):beta = 0if -5.0 <= snr <= 20:beta = 4 - snr * 3 / 20elif snr < -5.0:beta = 5elif snr > 20:beta = 1return betadef next_pow2(n):return np.ceil(np.log2(np.abs(n)))def load_audio(filename, trace=0):"""load wav file using audioread.This is not available in python x,y."""data = np.array([])with audioread.audio_open(filename) as af:trace_n = af.channelsif trace >= trace_n:print('number of traces in file is', trace_n)quit()nsamp = np.ceil(af.samplerate * af.duration)print(f"nsamp =%d" % nsamp)data = np.ascontiguousarray(np.zeros(nsamp, 1))index = 0for buffer in af:full_data = np.fromstring(buffer).reshape(-1, af.channels)n = full_data.shape[0]if index + n > len(data):n = len(data) - indexif n > 0:data[index:index + n] = full_data[:n, trace]index += nelse:breakreturn af.samplerate, data / 2.0 ** 15, 'a.u.'

上面代码是根据speech enhancement中的MATLAB代码改写的,仅用作学习用途。

原始的MATLAB代码如下:

function specsub(filename,outfile)%  Implements the basic power spectral subtraction algorithm [1].
%
%  Usage:  specsub(noisyFile, outputFile)
%
%         noisyFile - noisy speech file in .wav format
%         outputFile - enhanced output file in .wav format
%
%   Algorithm uses the first 5 frames to estimate the noise psd, and
%   then uses a very simple VAD algorithm for updating the noise psd.
%   Alternatively, the VAD in the file ss_rdc.m (this folder) can be used.
%
%  References:
%   [1] Berouti, M., Schwartz, M., and Makhoul, J. (1979). Enhancement of speech
%       corrupted by acoustic noise. Proc. IEEE Int. Conf. Acoust., Speech,
%       Signal Processing, 208-211.
%
% Author: Philipos C. Loizou
%
% Copyright (c) 2006 by Philipos C. Loizou
% $Revision: 0.0 $  $Date: 10/09/2006 $
%-------------------------------------------------------------------------if nargin<2fprintf('Usage: specsub noisyfile.wav outFile.wav \n\n');return;
end[x,Srate]=audioread(filename);% =============== Initialize variables ===============
%len=floor(20*Srate/1000); % Frame size in samples
if rem(len,2)==1, len=len+1; end;
PERC=50; % window overlap in percent of frame size
len1=floor(len*PERC/100);
len2=len-len1; Thres=3; % VAD threshold in dB SNRseg
alpha=2.0; % power exponent
FLOOR=0.002;
G=0.9;win=hanning(len, 'periodic'); %tukey(len,PERC);  % define window
winGain=len2/sum(win); % normalization gain for overlap+add with 50% overlap% Noise magnitude calculations - assuming that the first 5 frames is noise/silence
%
nFFT=2*2^nextpow2(len);
noise_mean=zeros(nFFT,1);
j=1;
for k=1:5windowed = win.*x(j:j+len-1);noise_mean=noise_mean+abs(fft(win.*x(j:j+len-1),nFFT));j=j+len;
end
noise_mu=noise_mean/5;%--- allocate memory and initialize various variablesk=1;
img=sqrt(-1);
x_old=zeros(len1,1);
Nframes=floor(length(x)/len2)-1;
xfinal=zeros(Nframes*len2,1);%===============================  Start Processing ==================================
%
for n=1:Nframes t = x(k:k+len - 1);insign=win.*x(k:k+len-1);     %Windowing  spec=fft(insign,nFFT);     %compute fourier transform of a framesig=abs(spec); % compute the magnitude%save the noisy phase information theta=angle(spec);  SNRseg=10*log10(norm(sig,2)^2/norm(noise_mu,2)^2);if alpha==1.0beta=berouti1(SNRseg);elsebeta=berouti(SNRseg);end%&&&&&&&&&sub_speech=sig.^alpha - beta*noise_mu.^alpha;diffw = sub_speech-FLOOR*noise_mu.^alpha;% Floor negative componentsz=find(diffw <0);  if~isempty(z)sub_speech(z)=FLOOR*noise_mu(z).^alpha;end% --- implement a simple VAD detector --------------%if (SNRseg < Thres)   % Update noise spectrumnoise_temp = G*noise_mu.^alpha+(1-G)*sig.^alpha;   noise_mu=noise_temp.^(1/alpha);   % new noise spectrumendsub_speech(nFFT/2+2:nFFT)=flipud(sub_speech(2:nFFT/2));  % to ensure conjugate symmetry for real reconstructionx_phase=(sub_speech.^(1/alpha)).*(cos(theta)+img*(sin(theta)));% take the IFFT xi=real(ifft(x_phase));% --- Overlap and add ---------------xfinal(k:k+len2-1)=x_old+xi(1:len1);x_old=xi(1+len1:len);k=k+len2;
end
%========================================================================================% wavwrite(winGain*xfinal,Srate,16,outfile);
audiowrite(outfile, winGain*xfinal,Srate);%--------------------------------------------------------------------------function a=berouti1(SNR)if SNR>=-5.0 & SNR<=20a=3-SNR*2/20;
elseif SNR<-5.0a=4;endif SNR>20a=1;endendfunction a=berouti(SNR)if SNR>=-5.0 & SNR<=20a=4-SNR*3/20;
elseif SNR<-5.0a=5;endif SNR>20a=1;endend

语音降噪_标准谱减法(附Python源码) @Learning Speech enhancement__1相关推荐

  1. python 自动化微信小程序_干货 | 微信小程序自动化测试最佳实践(附 Python 源码)...

    原标题:干货 | 微信小程序自动化测试最佳实践(附 Python 源码) 本文为霍格沃兹测试学院测试大咖公开课<微信小程序自动化测试>图文整理精华版. 随着微信小程序的功能和生态日益完善, ...

  2. python和苹果_苹果手机评论情感分析(附python源码和评论数据)

    原标题:苹果手机评论情感分析(附python源码和评论数据) 首先抓取网页上的数据,每一页十条评论,生成为一个txt文件. 数据链接 回复公众号 datadw 关键字"苹果"获取. ...

  3. Agisoft Metashape 照片高程改正 附python源码

    Agisoft Metashape 照片高程改正 附python源码 文章目录 Agisoft Metashape 照片高程改正 附python源码 前言 一.计算高程改正数 二.python源码 三 ...

  4. 华中杯 数学建模 A题简单复盘(附Python源码)

    华中杯 A题简单复盘(附Python 源码) 文章目录 华中杯 A题简单复盘(附Python 源码) 前言 题目简介 问题背景 题目以及思路 分批算法设计 MindMap 遗传算法优缺点 优点 缺点 ...

  5. 用Python和Tableau对母婴商品销量进行数据分析(附Python源码及Tableau文件)

    为减少篇幅,本文将尽量少的配上源码,在文末提供的源码文件中已经有详细注释. 本案例结合Python和Tableau,由于数据量及维度较少,所以更多的是使用Tableau进行可视化处理. 项目介绍 项目 ...

  6. 微信 小程序 python 渲染_干货 | 微信小程序自动化测试最佳实践(附 Python 源码)...

    本文为霍格沃兹测试学院测试大咖公开课<微信小程序自动化测试>图文整理精华版,进阶学习文末加群! 随着微信小程序的功能和生态日益完善,很多公司的产品业务形态逐渐从 App 延升到微信小程序. ...

  7. 微信小程序python自动化测试_干货 | 微信小程序自动化测试最佳实践(附 Python 源码)...

    本文为霍格沃兹测试学院测试大咖公开课<微信小程序自动化测试>图文整理精华版. 随着微信小程序的功能和生态日益完善,很多公司的产品业务形态逐渐从 App 延升到微信小程序.微信公众号等.小程 ...

  8. python 覆盖list_这套python 面试题你还没有?保证让你面试通关《附Python源码+实战项目》...

    随着Python在企业中的应用越来越多,岗位需求越来越大,面试成为了搞定优质职位的快速方式,下面是笔者面试10余家企业总结的面试题,希望对Python从业者有帮助. 介绍一下Python的数据结构,并 ...

  9. 趋势预测:基于期货未平仓合约、展期和FII/DII指标【附Python源码】

    01 引言 2020 年一季度是二战后最具挑战性的时期之一, 由于地缘政治原因导致的油价暴跌和 COVID-19 全球大流行是主要主题.金融市场充当风向标,可以反映世界经济的整体情绪.这些情绪不仅反映 ...

最新文章

  1. debian10 raid5+lvm
  2. 可以查python题的_python练习题 -股票查询
  3. AI入门:不用任何公式把逐步提升讲清楚
  4. 四、卫星定位《苹果iOS实例编程入门教程》
  5. HashMap和ArrayList初始大小和扩容后的大小
  6. 我是如何用Jquery实现网页缩小放大的
  7. ssm java上传图片预览_ssm文件上传_上传图片
  8. Flink 1.10 细粒度资源管理解析
  9. Struts2数据封装
  10. linux下执行java_Linux下运行java项目
  11. html图片上加水印,css给图片添加水印
  12. 【Spring】Spring boot 可以通过集成jolokia来使用HTTP形式访问mbean
  13. Visual Studio 打开程序提示仅我的代码怎么办
  14. pc端rem适配_自适应PC端网页制作使用REM
  15. zabbix3.4+grafana5.0.1数据可视化
  16. 阶段5 3.微服务项目【学成在线】_day02 CMS前端开发_13-webpack研究-webpack入门程序...
  17. 训练过程loss突然增大可能的原因
  18. 程序员必备的七个电脑软件
  19. 发一款增强音效和放大声音的软件
  20. Index out of range using input dim 2; input has only 2 dims

热门文章

  1. AGC与一次调频的区别、AGC与AVC的区别
  2. 2021-06-30软件测试day1学习笔记
  3. Zilliz 荣获「2021 年优秀开源创新企业」称号
  4. 智慧公寓云系统解决方案
  5. 大学物理华南农业大学专版(上)
  6. 达芬奇davinci部署指南
  7. vtk.js、three.js在浏览器展示3d图形
  8. 关于jar包运行报错:Exception in thread “main“ java.lang.UnsupportedClassVersionError的解决办法
  9. APP推荐 | 安卓手机上让人耳目一新的国产应用
  10. 线性表之单链表 图解和代码实现