【信息技术】【2018.02】稳健的基于相位的语音信号处理

本文为英国谢菲尔德大学（作者：Erfan Loweimi）的博士论文，共304页。

傅立叶分析在语音信号处理中起着关键作用。作为一个复数，它可以用幅度谱和相位谱以极性形式表示。幅度谱在语音处理的各个方面都有着广泛应用。然而，相位谱并不是语音信号处理的一个很有吸引力的起点。相对于精细和粗糙结构与语音感知有明显关系的幅度谱而言，相位谱难以解释和处理。事实上，没有一个有意义的趋势或极值可以促进建模过程。尽管如此，语音相位谱最近再次受到关注。大量工作表明，它可以有效地应用于多种语音处理中。现在基于相位的语音处理潜力已经确定，因此需要一个基本模型来帮助理解相位编码语音信息的方式。

本文提出了一种新的相位域声源滤波模型，该模型允许通过相位处理对语音声道（滤波器）和激励（源）分量进行反褶积。该模型利用Hilbert变换，显示了激励和声道元素在相位域中的混合，并提供了通过相位操作有效分离源和滤波器成分的框架。为了研究该方法的有效性，从用于自动语音识别（ASR）的相位滤波器部分提取一组特征，并利用相位的源部分进行基频估计。对两种情况下的精度和鲁棒性进行了说明和讨论。此外，在Hilbert变换中用广义对数函数代替对数函数，并通过回归滤波器计算群时延，从而进一步改进了该方法。

研究了特征提取过程中相位谱的统计分布及其表示方法。结果表明，相位谱呈钟形分布。一些统计规范化方法，如均值-方差规范化、拉普拉斯化、高斯化和直方图均衡化，成功地应用于基于相位的特征，并导致了显著的鲁棒性改进。

通过使用统计正规化和广义对数函数实现的鲁棒性增益鼓励使用更先进的基于模型的统计技术，如向量泰勒级数（VTS）。VTS在其原始公式中假设使用log函数进行压缩。为了同时利用VTS和广义对数函数，首先提出了一个新的公式，将两者合并为一个统一的框架，称为广义VTS（gVTS）。为了充分利用gVTS框架，提出了一种新的信道噪声估计方法，然后研究了gVTS框架的扩展和信道估计用于群延迟域的方法。文中对所提出的问题进行了分析和讨论，提出了一些解决办法，并导出了相应的计算公式。此外，还研究了相位延迟域和群延迟域中的加性噪声和信道失真影响，并将结果用于推导gVTS方程。HMM/GMM中的Aurora-4 ASR任务和基于DNN的瓶颈系统在clean和多样式训练模式下的实验结果证实了该方法在处理加性噪声和信道噪声方面的有效性。

The Fourier analysis plays a key role in speech signal processing. As a complex quantity, it can be expressed in the polar form using the magnitude and phase spectra. The magnitude spectrum is widely used in almost every corner of speech processing. However, the phase spectrum is not an obviously appealing start point for processing the speech signal. In contrast to the magnitude spectrum whose fine and coarse structures have a clear relation to speech perception, the phase spectrum is difficult to interpret and manipulate. In fact, there is not a meaningful trend or extrema which may facilitate the modelling process. Nonetheless, the speech phase spectrum has recently gained renewed attention. An expanding body of work is showing that it can be usefully employed in a multitude of speech processing applications.Now that the potential for the phase-based speech processing has been established, there is a need for a fundamental model to help understand the way in which phase encodes speech information.In this thesis a novel phase-domain source-flter model is proposed that allows for deconvolution of the speech vocal tract (flter) and excitation (source) components through phase processing. This model utilises the Hilbert transform, shows how the excitation and vocal tract elements mix in the phase domain and provides a framework for efficiently segregating the source and filter components through phase manipulation. To investigate the efficacy of the suggested approach, a set of features is extracted from the phase filter part for automatic speech recognition (ASR) and the source part of the phase is utilised for fundamental frequency estimation. Accuracy and robustness in both cases are illustrated and discussed. In addition, the proposed approach is improved by replacing the log with the generalised logarithmic function in the Hilbert transform and also by computing the group delay via regression filter.Furthermore, statistical distribution of the phase spectrum and its representations along the feature extraction pipeline are studied. It is illustrated that the phase spectrum has a bell-shaped distribution. Some statistical normalisation methods such as mean-variance normalisation, Laplacianisation, Gaussianisation and Histogram equalisation are successfully applied to the phase-based features and lead to a significant robustness improvement.

The robustness gain achieved through using statistical normalisation and generalized logarithmic function encouraged the use of more advanced model-based statistical techniques such as vector Taylor Series (VTS). VTS in its original formulation assumes usage of the log function for compression. In order to simultaneously take advantage of the VTS and generalised logarithmic function, a new formulation is first developed to merge both into a unified framework called generalised VTS (gVTS). Also in order to leverage the gVTS framework, a novel channel noise estimation method is developed. The extensions of the gVTS framework and the proposed channel estimation to the group delay domain are then explored. The problems it presents are analysed and discussed, some solutions are proposed and fnally the corresponding formulae are derived. Moreover, the effect of additive noise and channel distortion in the phase and group delay domains are scrutinised and the results are utilised in deriving the gVTS equations. Experimental results in the Aurora-4 ASR task in an HMM/GMM set up along with a DNN-based bottleneck system in the clean and multi-style training modes confirmed the efficacy of the proposed approach in dealing with both additive and channel noise.

引言
背景与相关工作
相位信息
相位域的源-滤波器分离
用于鲁棒ASR的相位/群时延域的广义VTS
结论与未来工作展望
附录A 希尔伯特变换
附录B 用于鲁棒ASR的广义向量泰勒级数（gVTS）方法
附录C 基于广义向量泰勒级数的信道噪声估计
附录D 用于ASR的深度神经网络
附录E 使用的数据库描述
附录F 特征提取技术回顾

更多精彩文章请关注公众号：

【信息技术】【2018.02】稳健的基于相位的语音信号处理相关推荐

基于matlab的语音信号基本处理系统,基于matlab的语音信号处理及分析
内容简介: 毕业设计基于matlab的语音信号处理及分析(共19页,8147字) 引言数字信号处理的主要研究对象是数字信号,且是采用运算的方法达到处理的目的的,因此,其实现方法,基本上分成两种实现 ...
基于matlab的音频信号处理系统,毕业设计-基于matlab的语音信号处理及分析
资料简介毕业设计基于matlab的语音信号处理及分析(共19页,8147字) 引言数字信号处理的主要研究对象是数字信号,且是采用运算的方法达到处理的目的的,因此,其实现方法,基本上分成两种实现方 ...
基于matlab的语音信号处理,基于MATLAB语音信号处理的研究
摘要:语音信号处理是研究用数字信号处理技术和语音学知识对语音信号进行处理的新兴的学科,是目前发展最为迅速的信息科学研究领域的核心技术之一.通过语音传递信息是人类最重要.最有效.最常用和最方便的交换信 ...
基于AI的语音信号处理技术
语音信号处理的目的就是在复杂的语音环境中提取有效的语音信息. 一.语音信号技术思想及原理分析语音唤醒的原理是让模型学习特定唤醒词的语音信号特征,当输入设备捕捉到一定阈值范围内的语音信号时,当前设备 ...
基于matlab的语音信号,科学网—[转载]【信息技术】【2014.06】【含源码】基于MATLAB的语音信号处理与分析 - 刘春静的博文...
本文为瑞典耶夫勒大学(作者:Nan Wu)的学士论文,共48页. 语音传递是人类最重要.最有效.最常用的信息交流方式.语言是人类特有的特征,而人声是常用的工具,也是相互传递信息的重要途径.语音具有较大 ...
基于MATLAB的语音信号处理系统的设计
目录摘要 I Abstract II 1 绪论 1 2 语音信号处理系统设计方案 2 2.1MATLAB的说明 2 2.2系统框架及实现流程图 2 2.2.1系统框架: 2 2.2.2系统流程图 ...
基于matlab的音频信号处理毕业设计(含源文件
基于MATLAB的语音信号处理GUI设计一.课题研究的意义本课题旨在对音频文件的调制和滤波.音频信号处理是研究用数字信号处理技术和语音学知识对语音信号进行处理的新兴的学科,是目前发展最为迅速的信息 ...
基于相干解调法和基于相位比较法的2DPSK数字通信系统 MATLAB Simulink仿真
1 课程设计目的通过课程设计,巩固已经学过的通信原理课程中有关数字调制系统的知识,加深对相关知识的理解和应用,学会应用Matlab Simulink工具对通信系统进行仿真和调试.设计与实现的过程中充 ...
matlab相位增量法图,基于相位增量的相位优化快速算法分析
描述 1引言正交频分复用是一种多载波调制技术,具有很高的频谱利用率,能够有效减小无线信道的时间弥散所带来的ISI.广泛应用于现在流行的高速无线通信技术中,如WIMAX和WIFI.OFDM技术有2个关 ...

【信息技术】【2018.02】稳健的基于相位的语音信号处理

【信息技术】【2018.02】稳健的基于相位的语音信号处理相关推荐

最新文章

热门文章