语音－MFCC,Fbank特征提取

提取12维MFCC特征和23维FBank

import librosa
import numpy as np
import matplotlib.pyplot as plt
import librosa.display
from scipy.fftpack import dct# 绘制频谱图
def plot_spectrogram(spec, note):fig = plt.figure(figsize=(20, 5))heatmap = plt.pcolor(spec)fig.colorbar(mappable=heatmap)plt.xlabel('Time(s)')plt.ylabel(note)plt.tight_layout()#preemphasis config
alpha = 0.97 #滤波器系数（α）# Enframe config
frame_len = 400      # 25ms, fs=16kHz
frame_shift = 160    # 10ms, fs=15kHz
fft_len = 512        #参与FFT运算的512个数据# Mel filter config
num_filter = 23
num_mfcc = 12# Read wav file
wav, fs = librosa.load('./test.wav', sr=None) #Librosa默认的采样率是22050，如果需要读取原始采样率，需要设定参数sr=None
##plt.plot(wav)
##plt.show()
#print(fs)
plt.figure()
librosa.display.waveplot(wav, fs)
plt.show()
# Pre-Emphasis
def preemphasis(signal, coeff=alpha):"""perform preemphasis on the input signal.:param signal: The signal to filter.:param coeff: The preemphasis coefficient. 0 is no filter, default is 0.97.:returns: the filtered signal."""return np.append(signal[0], signal[1:] - coeff * signal[:-1]) #预加重
# Enframe with Hamming window function
def enframe(signal, frame_len=frame_len, frame_shift=frame_shift, win=np.hamming(frame_len)):"""Enframe with Hamming widow function.:param signal: The signal be enframed:param win: window function, default Hamming:returns: the enframed signal, num_frames by frame_len array"""num_samples = signal.sizenum_frames = np.floor((num_samples - frame_len) / frame_shift)+1frames = np.zeros((int(num_frames),frame_len))for i in range(int(num_frames)):frames[i,:] = signal[i*frame_shift:i*frame_shift + frame_len] frames[i,:] = frames[i,:] * winreturn framesdef get_spectrum(frames, fft_len=fft_len):"""Get spectrum using fft:param frames: the enframed signal, num_frames by frame_len array:param fft_len: FFT length, default 512:returns: spectrum, a num_frames by fft_len/2+1 array (real)"""cFFT = np.fft.fft(frames, n=fft_len)valid_len = int(fft_len / 2 ) + 1spectrum = np.abs(cFFT[:,0:valid_len])##print(spectrum.shape)return spectrum
def fbank(spectrum, num_filter = num_filter):"""Get mel filter bank feature from spectrum:param spectrum: a num_frames by fft_len/2+1 array(real):param num_filter: mel filters number, default 23:returns: fbank feature, a num_frames by num_filter array DON'T FORGET LOG OPRETION AFTER MEL FILTER!"""low_freq_mel = 0high_freq_mel = 2595 * np.log10(1 + (fs / 2) / 700)#print(low_freq_mel, high_freq_mel)mel_points = np.linspace(low_freq_mel, high_freq_mel, num_filter + 2)  # 所有的mel中心点，为了方便后面计算mel滤波器组，左右两边各补一个中心点hz_points = 700 * (10 ** (mel_points / 2595) - 1)#fbank_feats = librosa.feature.melspectrogram(fbank_feats=feats)  # 使用stft频谱求Mel频谱feats = np.zeros(( num_filter, int(fft_len/2+1))) # 各个mel滤波器在能量谱对应点的取值bin = (hz_points/(fs/2)) * (fft_len/2)  # 各个mel滤波器中心点对应FFT的区域编码，找到有值的位置#print(bin)#print(bin.shape)for i in range(1, num_filter + 1):left = int(bin[i-1])center = int(bin[i])right = int(bin[i+1])for j in range(left, center):feats[i-1, j+1] = (j+1 - bin[i-1]) / (bin[i] - bin[i-1])#print(feats.shape)for j in range(center, right):feats[i-1, j+1] = (bin[i+1] - (j+1)) / (bin[i+1] - bin[i])#print(feats.shape)fbank = np.dot(spectrum, feats.T)#print(fbank.shape)fbank = np.where(fbank == 0, np.finfo(float).eps, fbank)fbank = 20 * np.log10(fbank)  # dB#print(fbank.shape)plot_spectrogram(fbank.T, 'Fbank')plt.show()return fbankdef mfcc(fbank, num_mfcc = num_mfcc):"""Get mfcc feature from fbank feature:param fbank: a num_frames by  num_filter array(real):param num_mfcc: mfcc number, default 12:returns: mfcc feature, a num_frames by num_mfcc array """feats = np.zeros((fbank.shape[0],num_mfcc))##feats = dct(fbank, type=2, axis=1, norm='ortho')[:,:num_mfcc]feats = dct(fbank, type=2, axis=1, norm='ortho')[:, 1:(num_mfcc+1)]#print(feats)#print(feats.shape)plot_spectrogram(feats.T, 'MFCC')plt.show()return featsdef write_file(feats, file_name):"""Write the feature to file:param feats: a num_frames by feature_dim array(real):param file_name: name of the file"""f=open(file_name,'w')(row,col) = feats.shapefor i in range(row):f.write('[')for j in range(col):f.write(str(feats[i,j])+' ')f.write(']\n')f.close()def main():wav, fs = librosa.load('./test.wav', sr=None)signal = preemphasis(wav)frames = enframe(signal)spectrum = get_spectrum(frames)fbank_feats = fbank(spectrum)mfcc_feats = mfcc(fbank_feats)#plot_spectrogram(fbank_feats, 'Filter Bank','fbank.png')write_file(fbank_feats,'./test.fbank')#plot_spectrogram(mfcc_feats.T, 'MFCC','mfcc.png')write_file(mfcc_feats,'./test.mfcc')if __name__ == '__main__':main()

语音－MFCC,Fbank特征提取相关推荐

语音识别入门第二节：语音信号处理及特征提取
目录数字信号处理基础基础知识傅里叶分析常用特征提取特征提取流程 Fbank MFCC 数字信号处理基础基础知识模拟信号到数字信号转化(ADC):在科学和工程中,遇到的大多数信号都是连续的 ...
（超详细）语音信号处理之特征提取
语音信号处理之特征提取语音信号处理之特征提取要对语音信号进行分析,首先要分析并提取出可表示该语音本质的特征参数.有了特征参数才能利用这些特征参数进行有效的处理. 根据提取参数的方法不同,可将语音信号 ...
librosa能量_语音MFCC提取：librosa amp;amp; python_speech_feature(2019.12)
最近在阅读语音方向的论文,其中有个被提及很多的语音信号特征MFCC(Mel-Frequency Cepstral Coefficients),找到了基于python的语音库librosa(versio ...
python中numpy函数ftt_语音MFCC提取：librosa python_speech_feature(2019.12)
最近在阅读语音方向的论文,其中有个被提及很多的语音信号特征MFCC(Mel-Frequency Cepstral Coefficients),找到了基于python的语音库librosa(versio ...
利用python实现语音文件的特征提取
概述语音识别是当前人工智能的比较热门的方向,技术也比较成熟,各大公司也相继推出了各自的语音助手机器人,如百度的小度机器人.阿里的天猫精灵等.语音识别算法当前主要是由RNN.LSTM.DNN-HMM等 ...
计算机算log的原理,语音声学特征提取：MFCC和LogFBank算法的原理
语音声学特征提取:MFCC和LogFBank算法的原理语音识别最后更新 2021-03-04 11:57 阅读 998 最后更新 2021-03-04 11:57 阅读 998 语音识别几乎任何 ...
python 声音基频f0_ASR中常用的语音特征之FBank和MFCC（原理 + Python实现）
ASR中常用的语音特征之FBank和MFCC(原理 + Python实现) 一步一步讲解和实现ASR中常用的语音特征--FBank和MFCC的提取,包括算法原理.代码和可视化等. 文章目录语音信号的 ...
手写Fbank语音特征提取
语音特征-Fbank的绘制 Fbank提取过程如下图所示: 导入需要的包 import numpy as np import numpy import scipy.io.wavfile from sc ...
语音识别之语音特征提取一
语音识别的第一步就是语音特征提取,语音信号是在人体中肺.喉.声道等器官构成的语音产生系统中产生的,它是一个高度不平稳的信号,它的幅度谱和功率谱也随着时间不停的变化,但是在足够短的时间内,其频谱特征相当 ...
语音信号处理之语音特征提取（1）机器学习的语音处理
本文首先是将Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFC ...

语音－MFCC,Fbank特征提取

语音－MFCC,Fbank特征提取相关推荐

最新文章

热门文章