语音增强效果的测试方法

关于语音增强效果测试方法，以前只知道这样分类：主观测试方法和客观测试方法。这个也是语音信号测试教科书交给我们的。
1）主观测试方法，平均意见得分（MOS），
2) 客观测试方法，信噪比，分段信噪比，板仓距离，PESQ等。

现在看到文献中提及的客观测试方法也是可以分的，侵入式（Intrusive）和非侵入式（non-intrusive）.
侵入式方法依靠参考语音和测试语音之间某种形式的距离特性来预测主观平均观点得分(Mean option score, MOS). 非侵入式方法则仅依据测试语音来预测语音的质量, 因而更加具有挑战性.简单地说，侵入式还需要一个原始纯净语音做参考语音。而非侵入式测试，则不需要原始纯净做参考。

看到一个文献总结不少客观测试方法,总结的方法很多，有些听说过，如SII等，更多的方法闻所未闻。另外，如侵入式检测的方法，p.563,列了下载地址，可以拿着源码来学习一下，也是很不错的。列如下：

Intrusive measures: need a reference clean signal (x) to judge the noisy signal (y)
LSP based weights:
Inverse harmonic mean weighting (IHMW)
● higher weights to regions where LSP are closer to each other → strong resonance
Inverse variance weighting (IVW)
● Euclidian distance between LSP extracted from x and y, normalized by variance →
approximation of log spectral distortion
Gardner weighting (GW)
● sensitivity matrix for LSP → approximation of log spectral distortion
Formant bounded weight (FBW)
● combines IHMW and GW
Positions and distance weighting
● it is the weighted sum of the Euclidian distances taken with regard to LSP values and their
relative position

Standards for Quality or Intelligibility of speech
Perceptual Evaluation of Speech Quality (PESQ) 1
● based mostly on Perceptual Speech Quality Measure (PSQM) and Perceptual Analysis
Measurement System (PAMS)
● shows high correlation with subjective measures
● works only for sampling frequency up to 16kHz
Speech Intelligibility Index (SII) 3
● a weighted SNR in frequency domain
● it compares the clean signal x with the noise
● the internal representation is the critical band filtered signal
● the weights and bands are defined in the standard

Coherence SII (cSII)
● extension of the Speech Intelligibility Index (SII)
● incorporate the coherence for the SNR (SDR) calculation, so it also includes distortion
effects
● coherence is the normalised cross spectral density and calculated in 3 different levels of
spectral amplitude regions
● for additive noise cSII == SII

Measures based in perceptual models:
Dau measure 4
● based on the Dau model for the effective processing in the human auditory system
● calculates time-frequency domain internal representation of x and y
● the internal representation considers:
○ filter banks
○ spectral and temporal masking
○ hair cell transformation
○ non linear adaptive -> realistic dynamic compression, temporal masking effects
● the measure is the average normalized linear correlation coefficient taken across overlapping
frames of the internal representation signals
Glimpse proportion 5
● based on the Glimpse model
● calculates time-frequency domain internal representation of x and the noise
● the internal representation considers:
○ gammaton filter banks
● the measure is the proportion of time-frequency bins where clean speech x has higher energy
levels than noise

HNS
● based on Dau model as well with an extra frequency weight in the output (higher weights to
higher frequencies)
PAR
● calculated in the frequency domain
● it is based on soft frequency masking thresholds
● designed for sinusoidal type of distortions (sinusoidal audio coders)
TAA
● based on spectro-temporal masking curves
● computational complexity as low as a spectral masking model
● PAR with parts of DAU (log instead of NL and hair cell transformation)

Non intrusive measures (don’t need a reference clean speech signal)
ITU-T P-563 6
● vocal tract analysis
● speech reconstruction from corrected vocal tract parameters -> reference signal
● parameter extraction and classification of degradations: low static SNR, mutes, low sSNR,
unnatural voice, unnatural male voice, unnatural female voice.
HMM based approach for speech synthesis
● measure is the normalized log likelihood of features extracted from the synthesised signal
and evaluated in a HMM trained with natural speech (models are gender dependent)

Resources for distance measures codes
1 – P. Loizou, Speech Enhancement: Theory and Practice. Boca Raton, CRC, 2007.
Matlab code that comes with the book
(LLR, IS, CEP, WSS, FWS, nFWS, PESQ)
2 – P. Loizou, COLEA: A Matlab software tool for speech analysis
Available to download: http://www.utdallas.edu/~loizou/speech/colea.htm
(LLR, IS, CEP, WSS, SNR)
3 – Implementation of the standard SII, Matlab and C codes
Available to download: http://www.sii.to/html/programs.html
(SII)
4 – Computational Auditory Signal Processing and Perception (CASP) model
Available to download upon request: http://www.dtu.dk/centre/cahr/English/downloads.aspx
(Dau model)
5 – Glimpse proportion measure
Ask Martin Cooke for code.
6 – Implementation of the standard ITU-T P-563
available to download: http://www.itu.int/rec/T-REC-P.563-200405-I
(P-563)

语音增强效果的测试方法相关推荐

AliCloudDenoise 语音增强算法：助力实时会议系统进入超清音质时代
简介:近些年,随着实时通信技术的发展,在线会议逐渐成为人们工作中不可或缺的重要办公工具,据不完全统计,线上会议中约有 75% 为纯语音会议,即无需开启摄像头和屏幕共享功能,此时会议中的语音质量和清晰度 ...
语音信号处理（九）——离散余弦变换
文章目录 1.定义 2.用C语言实现离散余弦变换 1.定义 DCT(Discrete Cosine Transform)离散余弦变换,其常见用途是对音视频进行数据压缩.离散余弦变换具有信号谱分量丰富. ...
【信号处理】基于小波变换的语音增强matlab源码
一.简介语音通信是人类传播信息,进行交流时使用最多.最自然.最基本的一种手段.而这种通信中的信息载体-语音信号却是一种时变的.非平稳的信号,只有在很短的一段时间内(通常为10~30ms)才被认为是平 ...
基于小波变换的语音增强算法简单综述
前言: 语音通信是人类传播信息,进行交流时使用最多.最自然.最基本的一种手段.而这种通信中的信息载体-语音信号却是一种时变的.非平稳的信号,只有在很短的一段时间内(通常为10~30ms)才被认为是平稳 ...
ICASSP 2023 | 解密实时通话中基于 AI 的一些语音增强技术
‍ 动手点关注干货不迷路背景介绍实时音视频通信 RTC 在成为人们生活和工作中不可或缺的基础设施后,其中所涉及的各类技术也在不断演进以应对处理复杂多场景问题,比如音频场景中,如何在多设备.多人. ...
单通道语音信噪分离算法研究
单通道语音信噪分离算法研究摘要:为了评估单通道语音信噪分离的效果,本文分别对六种传统语音增强算法进行了探讨.在理想的高斯白噪声环境下,子空间法增强后的语音信号输出信噪比SNR最大,VMD(Varia ...
语音信号处理（DSP）论文优选：Interactive Modeling for Speech Enhancement
声明:语音合成(TTS)论文优选系列主要分享论文,分享论文不做直接翻译,所写的内容主要是我对论文内容的概括和个人看法.如有转载,请标注来源. 欢迎关注微信公众号:低调奋进 Interactive Sp ...
离散余弦变换和C语言实现-win32版本
离散余弦变换,DCT for Discrete Cosine Transform. 是与傅里叶变换相关的一种变换,它类似于离散傅里叶变换(DFT for Discrete Fourier Transf ...
【线上分享】音频多麦盲分技术
随着移动游戏的暴发增长,游戏语音进入了2.0时代,实时语音跟移动游戏的接合更紧密,最大的特点是语音全面内嵌至游戏.手机等移动设备越来越多地使用多mic硬件达到更好的语音增强效果.但是大部分在实时音视频 ...

语音增强效果的测试方法

语音增强效果的测试方法相关推荐

最新文章

热门文章