python （语音）信号拆分为数据块,计算短期能量和过零率

学习目标（ILO）：
您应该
了解如何将（语音）信号拆分为数据块（帧）并在这些块上进行分析/转换
计算短期能量和过零率并将它们可视化以区分浊音和清音语音部分
了解相关性的基础知识并能够实现相关性估计器
了解 Python 命令的基本用法:枚举;范围

正如之前的实验表中所做的那样，我们首先使用 curl 命令从互联网地址 https://staffwww.dcs.shef.ac.uk/people/S.Goetze/sound/speech_8kHz_murder.wav 下载一个波形文件 s[k] 并将其加载到变量 s。稍后您还需要文件的采样频率，因此将其存储在变量 fs 中。

任务一：加载并显示语音音频信号
加载上面的波形文件并在时域中显示信号。

# Let's do the ususal necessary and nice-to-have imports
%matplotlib inline
import matplotlib.pyplot as plt  # plotting
import seaborn as sns; sns.set() # styling (uncomment if you want)
import numpy as np               # math# download speech and noise example files
s_file_name = 'speech_8kHz_murder.wav'
!curl https://staffwww.dcs.shef.ac.uk/people/S.Goetze/sound/{s_file_name} -o {s_file_name} import soundfile as sf
from IPython import display as ipd# load speech wave into variable
s, fs = sf.read(s_file_name)print('File "' + s_file_name + '" loaded. Its sampling rate is ' + str(fs) + ' Hz.')# listen to the sound file (if you want)
ipd.Audio(s, rate=fs)

任务 2: 块处理
为了说明块处理，我们首先切割了 4096 个样本，从音频 10 秒后开始，即从样本号开始。采样率为 fs=8000Hz 时为 80,000。

# lets cut out a piece of the data and cisualise it
start_sample = int(10*fs);                   # start at 10 sec
#print(start_sample)
no_of_samples = 4096;                        # no of samples we want to cut out
end_sample   = start_sample + no_of_samples; # last sample to be cut out
sample_vec = np.linspace(start_sample, end_sample, no_of_samples)
x1=s[start_sample:end_sample];plt.figure(figsize=(8,5))
plt.subplot(2,1,1)
plt.plot(np.arange(0,len(s)),s)
plt.ylabel('$x_1[k]$')
plt.plot(sample_vec,x1,'r')
plt.subplot(2,1,2)
plt.plot(sample_vec,x1,'r')
plt.xlabel('$k$')
plt.ylabel('$x_1$[' + str(start_sample) + '...' + str(end_sample) + ']')
plt.title('$x_1[k]$ for ' + str(len(x1)) + ' samples between ' + str(start_sample) + ' to ' + str(end_sample) + ' ($f_s$=' + str(fs) +')')
plt.tight_layout() # this allowes for some space for the title text.

I do not understand why end_sample is add not minus?
what does the x1=s[start_sample:end_sample] mean?

正如语音处理算法通常所做的那样，遵循块处理（滑动窗口方法）。

# lets cut out a piece of the data
Lw   = 512                    # frame length
Lov  = 1                      # frame overlap factor
Lhop = int(np.round(Lw/Lov)); # frame hop size# creating grid of axes for subplots
plt.figure(figsize=(12, 6))
ax1 = plt.subplot2grid(shape=(2, 5), loc=(0, 0), colspan=5)
ax2 = plt.subplot2grid(shape=(2, 5), loc=(1, 0), colspan=1)
ax3 = plt.subplot2grid(shape=(2, 5), loc=(1, 1), colspan=1)
ax4 = plt.subplot2grid(shape=(2, 5), loc=(1, 2), colspan=1)
ax5 = plt.subplot2grid(shape=(2, 5), loc=(1, 3), colspan=1)
ax6 = plt.subplot2grid(shape=(2, 5), loc=(1, 4), colspan=1)
ax_blocks = [ax2, ax3, ax4, ax5, ax6]# plot signal part in upper panel (axis ax1)
ax1.plot(sample_vec,x1,'r') #
ax1.set_xlabel('$k$')
ax1.set_ylabel('$x_1$[' + str(start_sample) + '...' + str(end_sample) + ']');
ax1.set_title('A piece between sample ' + str(start_sample) + ' and ' + str(end_sample) + ' (of length ' + str(len(x1)) + ') from the 1st channel ($f_s$=' + str(fs) +')')# plot single blocks in lower panels
clrs = ['g','y','m','b','c','k']; # define a vector of colours
for ii,k in enumerate(range(start_sample,start_sample+5*Lhop,Lhop)):block_k_vec = np.arange(k,k+Lw)block_sig_vec = x1[ii*Lhop:ii*Lhop+Lw]ax1.plot(block_k_vec,block_sig_vec,clrs[ii])ax_blocks[ii].plot(block_k_vec,block_sig_vec,clrs[ii])ax_blocks[ii].set_xlabel('$k$')ax_blocks[ii].set_ylim(-0.35, 0.35)ax_blocks[ii].set_title('Block ' + str(ii))# automatically adjust padding horizontally
# as well as vertically.
plt.tight_layout()

np.round 四舍五入, 可指定精度，但是0.5奇数进偶数不进
colspan是“column span（跨列）”的缩写。colspan属性用在td标签中，用来指定单元格横向跨越的列数

np.arange(k,k+Lw) what is the initial value for k
block_sig_veg?
ax.set_xlim,ax.set_ylim 设置x,y轴的最大值的上限

上图显示了下面板中长度 LBl=512 的一些框架/块。给定fs=8000Hz 的采样频率，我们可以计算出一个块的长度是:
LBl=512 样本/8000Hz=64ms
由于我们知道语音信号在 ≈10…30ms 的时间窗口内（短期）是静止的，我们可以得出结论，使用较短的帧（即较小的 LBl）可能是一个好主意。这也可以在最后两帧（蓝色）中看到，其中甚至可以从时域信号中观察到统计数据的变化（例如，块 3 的前半部分看起来与后半部分有很大不同）。

任务 3:浊音/清音分析
由于它们具有不同的采样频率fs，我们将它们的采样频率存储在单独的变量fs_e和fs_z中。

file_name = 'voiced_unvoiced_e.wav'
#file_name = 'word_fish.wav'         # another one to play around with (if you like)# download speech and noise example files
!curl https://staffwww.dcs.shef.ac.uk/people/S.Goetze/sound/{file_name} -o {file_name}
# load speech wave into variable
e, fs_e = sf.read(file_name)print('File "' + file_name + '" loaded. It has a sampling rate of f_s = ' + str(fs_e) + ' Hz.')file_name = 'voiced_unvoiced_z.wav'
#file_name = 'word_speech.wav'     # another one to play around with (if you like)# download speech and noise example files
!curl https://staffwww.dcs.shef.ac.uk/people/S.Goetze/sound/{file_name} -o {file_name}
# load speech wave into variable
z, fs_z = sf.read(file_name)print('File "' + file_name + '" loaded. It has a sampling rate of f_s = ' + str(fs_z) + ' Hz.')# listen to the sound file (if you want)
ipd.Audio(e, rate=fs_e)
ipd.Audio(z, rate=fs_z)

任务 4: 短期能源与零交叉率(ZCR)
由于我们将多次重复使用代码，因此我们创建了一个函数。

def calc_STE(signal, sampsPerFrame):nFrames       = int(len(signal) / sampsPerFrame)        # number of non-overlapping E = np.zeros(nFrames)for frame in range(nFrames):startIdx = frame * sampsPerFramestopIdx = startIdx + sampsPerFrameE[frame]=np.sum(signal[startIdx:stopIdx] ** 2)return E

dose the nFrames = LBI-1?

signal = e
sampsPerFrame = int(0.02 * fs_e) #20 ms# creating grid for subplots
plt.figure(figsize=(12, 6))
plt.subplot(2,1,1)
plt.plot(signal)
plt.subplot(2,1,2)
plt.plot(calc_STE(signal, sampsPerFrame))
plt.title('(Short-term) Energy per block ($L_{\mathrm{Bl}}=' + str(sampsPerFrame) + '$, which is ' + str(sampsPerFrame/fs_e*1000) + 'ms @ $f_s=' + str(fs_e) +'$)')
#plt.text(18,0.3, 'Short Term Energy is higher for voiced speech parts', style='italic',
#        bbox={'facecolor': 'red', 'alpha': 0.5, 'pad': 10})
plt.tight_layout() # automatically adjust padding to make space for titlesipd.Audio(e, rate=fs_e) # add possibility here to listen to the sound once againsignal = z
sampsPerFrame = int(0.02 * fs_z) #20 ms# plot
plt.figure(figsize=(12, 6))
plt.subplot(2,1,1)
plt.plot(signal)
plt.subplot(2,1,2)
plt.plot(calc_STE(signal, sampsPerFrame))
plt.title('(Short-term) Energy per block ($L_{\mathrm{Bl}}=' + str(sampsPerFrame) + '$, which is ' + str(sampsPerFrame/fs_z*1000) + 'ms @ $f_s=' + str(fs_z) +'$)')
#plt.text(13,0.09, 'Short Term Energy is higher for voiced speech parts', style='italic',
#        bbox={'facecolor': 'red', 'alpha': 0.5, 'pad': 10})
plt.tight_layout() # automatically adjust padding to make space for titlesipd.Audio(z, rate=fs_z) # add possibility here to listen to the sound once again

上面的信号比我们之前分析的“e”音的浊音要小。我们看到，计算的（短期）能量低于前一个示例。

像以前一样，我们将实际计算ZCR的代码放入一个函数中，以便重用。

def calc_ZCR(signal, sampsPerFrame):nFrames = int(len(signal) / sampsPerFrame)        # number of non-overlapping ZCR = np.zeros(nFrames)for frame in range(nFrames):startIdx = frame * sampsPerFramestopIdx = startIdx + sampsPerFramesignalframe = signal[startIdx:stopIdx]for k in range(1, len(signalframe)):ZCR[frame] += 0.5 * abs(np.sign(signalframe[k]) - np.sign(signalframe[k - 1]))return ZCR

python （语音）信号拆分为数据块,计算短期能量和过零率相关推荐

python语音信号快速傅里叶变换
python语音信号快速傅里叶变换文章目录 python语音信号快速傅里叶变换快速傅里叶变换的理解引入必要的库快速傅里叶变换函数用法快速傅里叶变换的理解快速傅里叶变换 (fast Four ...
python语音信号时频分析_librosa-madmom:音频和音乐分析
读取音频提取特征Log-Mel Spectrogram MFCC 绘制波形图和梅尔频谱图 prerequisites install 起始点检测 onset detection tutorial l ...
【Python 标准库学习】数据科学计算库 — math
欢迎加入 Python 官方文档翻译团队:https://www.transifex.com/python-doc/ math 模块官方文档:https://docs.python.org/3/lib ...
python 使用多个elif代码块计算阶梯电费
根据需要使用任意数量的elif 代码快, 例如阶梯电费 1<= a<=240 度 ,总电费按阶梯电费每度0.483元计算 240< a<=400 度,总电费按阶梯电费每度0.5 ...
数字语音信号处理学习笔记——语音信号的短时时域分析（4）
版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/u013538664/article/details/26141939 3.7 基于能量和过零率 ...
语音信号短时时域分析
语音信号短时时域分析类型短时能量对数平方和绝对值短时平均过零率短时自相关分析说明语音信号的分帧处理,实际上就是对各帧进行某种变换或运算. T[ ]:表示这种变换或运算 x(n):输入 ...
基于MATLAB语音信号的说话人识别[声纹识别]
基于语音信号的说话人识别摘要语音是人类相互交流和通信最方便快捷的手段.如何高效地实现语音传输存储或通过语音实现人机交互,是语音信号处理领域中的重要研究课题.语音信号处理涉及数字信号处理.语音学 ...
【数字语音处理】Part3 语音信号的短时时域分析：短时平均、短时自相关、语音端点检测、基音周期估值
Part3 语音信号的短时时域分析一.帧和加窗的概念二.短时平均能量三.短时平均幅度函数四.短时平均过零率五.短时自相关分析六.基于能量和过零率的语音端点检测七.基音周期估值八.总结 ...
判断清浊音 matlab,基于MATLAB的语音信号的清浊音分析.doc
基于MATLAB的语音信号的清浊音分析目录 1 语音信号概述1 1.1 语音信号的基本组成1 1.2 语音信号的"短时谱"1 1.3 基音周期2 1.4 短时分析技术2 2 语音 ...

python （语音）信号拆分为数据块,计算短期能量和过零率

python （语音）信号拆分为数据块,计算短期能量和过零率相关推荐

最新文章

热门文章