基础概念

给定一个音频文件，通过 Load 加载进来得打signal, shape 为（M ）, 比如 sr = 22050, t = 10.62s;

通过分帧后，将一维的 M 转化为二维的分帧矩阵：
矩阵的行数 = 单帧的帧长度
矩阵的列数 = 分帧的帧数 nfnfnf:

0.1 样本总点数；

M=sr∗time=采样率×时间M = sr * time = 采样率 × 时间M=sr∗time=采样率×时间

0.2 单帧内的重叠点数；

重叠个数 = 帧长 - 帧移动长度:
overlap=wlen−incoverlap = wlen - incoverlap=wlen−inc

0.3 分帧后的帧数；

nf=(信号点数−重叠个数)/帧移长度nf= (信号点数 - 重叠个数) / 帧移长度nf=(信号点数−重叠个数)/帧移长度:
nf=M−overlapincnf = \frac{M - overlap}{ inc} nf=incM−overlap

比如帧长=2048；帧移 = 512；
overlap = 2048 -512 = 1536;

举个例子，假设一个音频向量为：
[0,1,2,3,4,5][0, 1, 2, 3, 4, 5][0,1,2,3,4,5]，
点数 m = 6
若帧长为4，帧移为2，overlap = 4 -2 = 2;
则分帧后得到的帧数为：
nf = ( 6 - 2) /2 = 2 , 得到 2 帧；

[02132435]\begin{bmatrix} 0& 2 \\ 1&3 \\ 2 &4 \\ 3 &5 \\ \end{bmatrix} ⎣⎢⎢⎡01232345⎦⎥⎥⎤
每一帧都有4个基本元素。

库函数的调用

librosa.feature.mfcc(y=None, sr=22050, S=None, n_mfcc=20, dct_type=2, norm='ortho', lifter=0, **kwargs）

返回一个 numpy.ndarray [shape=(n_mfcc, t)]这个函数输出的是一帧一帧的特征向量，t为帧数，一般与音频时长相关；

而每一帧的特征向量维数由参数 n_mfcc 决定，它代表了在梅尔倒谱系数计算过程中的最后一步即离散余弦变换（DCT）后取原数据的前多少个维度。这是由于在离散余弦变换后，大部分信号数据将集中在低频区，因此只需要取变换后的前一部分，舍弃掉冗余的数据就可以了。

n_mfcc的默认取值为20，其余常见的取值也有13，24等。

---------------------------- 说明-------------------------------

LibROSA（本文使用的版本是0.6.3）
https://github.com/librosa/librosa

mfcc函数可以用来提取音频的梅尔频率倒谱系数（Mel-Frequency Cepstral Coefficients，MFCCs）特征，MFCC被广泛应用于语音识别。

LibROSA的mfcc函数源码如下：

# -- Mel spectrogram and MFCCs -- #
def mfcc(y=None, sr=22050, S=None, n_mfcc=20, dct_type=2, norm='ortho', **kwargs):if S is None:S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs))return scipy.fftpack.dct(S, axis=0, type=dct_type, norm=norm)[:n_mfcc]

从mfcc函数的代码能发掘的信息有限，因此我们需要进一步查看调用的函数代码。以下将从函数调用链的底层往上分析，直到mfcc函数。

1. 音频的预处理

对于音频，经典的MFCC提取过程分为预加重、分帧、加窗、快速傅里叶变换（FFT）、梅尔滤波器组过滤、取对数、离散余弦变换（DCT）这几个步骤。

1.1 分帧

LibROSA提取MFCC的过程没有预加重步骤，而是直接进行了分帧。分帧函数为librosa.util.frame()，源码如下：

def frame(y, frame_length=2048, hop_length=512):'''Slice a time series into overlapping frames.This implementation uses low-level stride manipulation to avoidredundant copies of the time series data.Parameters----------y : np.ndarray [shape=(n,)]Time series to frame. Must be one-dimensional and contiguousin memory.frame_length : int > 0 [scalar]Length of the frame in sampleshop_length : int > 0 [scalar]Number of samples to hop between framesReturns-------y_frames : np.ndarray [shape=(frame_length, N_FRAMES)]An array of frames sampled from `y`:`y_frames[i, j] == y[j * hop_length + i]`Raises------ParameterErrorIf `y` is not contiguous in memory, not an `np.ndarray`, ornot one-dimensional.  See `np.ascontiguous()` for details.If `hop_length < 1`, frames cannot advance.If `len(y) < frame_length`.'''if not isinstance(y, np.ndarray):raise ParameterError('Input must be of type numpy.ndarray, ''given type(y)={}'.format(type(y)))if y.ndim != 1:raise ParameterError('Input must be one-dimensional, ''given y.ndim={}'.format(y.ndim))if len(y) < frame_length:raise ParameterError('Buffer is too short (n={:d})'' for frame_length={:d}'.format(len(y), frame_length))if hop_length < 1:raise ParameterError('Invalid hop_length: {:d}'.format(hop_length))if not y.flags['C_CONTIGUOUS']:raise ParameterError('Input buffer must be contiguous.')# Compute the number of frames that will fit. The end may get truncated.n_frames = 1 + int((len(y) - frame_length) / hop_length)# Vertical stride is one sample# Horizontal stride is `hop_length` samplesy_frames = as_strided(y, shape=(frame_length, n_frames),strides=(y.itemsize, hop_length * y.itemsize))return y_frames

可以看到，LibROSA实际上调用了scipy库中的numpy.lib.stride_tricks.as_strided函数进行分帧（若不指定，帧长默认为2048，帧移默认为512），as_strided函数的实现就不细究了。LibROSA的frame是将X1的音频向量处理成NM的矩阵，即将一个时间序列转化为元素部分重叠的帧序列，若帧数不能整除，则最后补零成完整的帧。举个例子，假设一个音频向量为：[0, 1, 2, 3, 4, 5]，若帧长为4，帧移为2，则分帧后得到的矩阵为：[[0, 2] , [1, 3], [2, 4], [3, 5]]，每一帧都有4个基本元素。

1.2 加窗

对分帧后得到的矩阵，下一步进行加窗（奇怪的是，LibROSA是先加窗后分帧的，在下一个步骤：STFT的代码中会体现）。加窗的目的是为了一定程度消除分帧后出现的帧与帧之间的不连续性。

LibROSA的librosa.filters.get_window即为加窗函数，源码如下：

def get_window(window, Nx, fftbins=True):'''Compute a window function.This is a wrapper for `scipy.signal.get_window` that additionallysupports callable or pre-computed windows.Parameters----------window : string, tuple, number, callable, or list-likeThe window specification:- If string, it's the name of the window function (e.g., `'hann'`)- If tuple, it's the name of the window function and any parameters(e.g., `('kaiser', 4.0)`)- If numeric, it is treated as the beta parameter of the `'kaiser'`window, as in `scipy.signal.get_window`.- If callable, it's a function that accepts one integer argument(the window length)- If list-like, it's a pre-computed window of the correct length `Nx`Nx : int > 0The length of the windowfftbins : bool, optionalIf True (default), create a periodic window for use with FFTIf False, create a symmetric window for filter design applications.Returns-------get_window : np.ndarrayA window of length `Nx` and type `window`See Also--------scipy.signal.get_windowNotes-----This function caches at level 10.Raises------ParameterErrorIf `window` is supplied as a vector of length != `n_fft`,or is otherwise mis-specified.'''if six.callable(window):return window(Nx)elif (isinstance(window, (six.string_types, tuple)) ornp.isscalar(window)):# TODO: if we add custom window functions in librosa, call them herereturn scipy.signal.get_window(window, Nx, fftbins=fftbins)elif isinstance(window, (np.ndarray, list)):if len(window) == Nx:return np.asarray(window)raise ParameterError('Window size mismatch: ''{:d} != {:d}'.format(len(window), Nx))else:raise ParameterError('Invalid window specification: {}'.format(window))

从源码中可以看出，LibROSA实际调用了scipy.signal.windows.get_sindow()来进行加窗。scipy库中提供了多种窗函数，如汉明窗、汉宁窗、矩形窗、三角窗等等（想了解scipy库提供的窗函数请点击

https://docs.scipy.org/doc/scipy/reference/signal.windows.html?highlight=scipy%20signal%20windows#module-scipy.signal.windows）。

LibROSA默认使用汉宁窗，窗函数的详细步骤在此就不细说了。

1.3 快速傅里叶变换

经上一步处理后，得到的结果会逐帧进行快速傅里叶变换（fast Fourier transform，FFT）。逐帧进行快速傅里叶变换的过程被称为短时傅里叶变换（short-time Fourier transform 或 short-term Fourier transform，STFT）。LibROSA的librosa.core.stft()源码如下：

def stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann',center=True, dtype=np.complex64, pad_mode='reflect'):"""Short-time Fourier transform (STFT)Returns a complex-valued matrix D such that`np.abs(D[f, t])` is the magnitude of frequency bin `f`at frame `t``np.angle(D[f, t])` is the phase of frequency bin `f`at frame `t`Parameters----------y : np.ndarray [shape=(n,)], real-valuedthe input signal (audio time series)n_fft : int > 0 [scalar]FFT window sizehop_length : int > 0 [scalar]number audio of frames between STFT columns.If unspecified, defaults `win_length / 4`.win_length : int <= n_fft [scalar]Each frame of audio is windowed by `window()`.The window will be of length `win_length` and then paddedwith zeros to match `n_fft`.If unspecified, defaults to ``win_length = n_fft``.window : string, tuple, number, function, or np.ndarray [shape=(n_fft,)]- a window specification (string, tuple, or number);see `scipy.signal.get_window`- a window function, such as `scipy.signal.hanning`- a vector or array of length `n_fft`.. see also:: `filters.get_window`center : boolean- If `True`, the signal `y` is padded so that frame`D[:, t]` is centered at `y[t * hop_length]`.- If `False`, then `D[:, t]` begins at `y[t * hop_length]`dtype : numeric typeComplex numeric type for `D`.  Default is 64-bit complex.pad_mode : stringIf `center=True`, the padding mode to use at the edges of the signal.By default, STFT uses reflection padding.Returns-------D : np.ndarray [shape=(1 + n_fft/2, t), dtype=dtype]STFT matrixSee Also--------istft : Inverse STFTifgram : Instantaneous frequency spectrogramnp.pad : array paddingNotes-----This function caches at level 20."""# By default, use the entire frameif win_length is None:win_length = n_fft# Set the default hop, if it's not already specifiedif hop_length is None:hop_length = int(win_length // 4)fft_window = get_window(window, win_length, fftbins=True)# Pad the window out to n_fft sizefft_window = util.pad_center(fft_window, n_fft)# Reshape so that the window can be broadcastfft_window = fft_window.reshape((-1, 1))# Check audio is validutil.valid_audio(y)# Pad the time series so that frames are centeredif center:y = np.pad(y, int(n_fft // 2), mode=pad_mode)# Window the time series.y_frames = util.frame(y, frame_length=n_fft, hop_length=hop_length)# Pre-allocate the STFT matrixstft_matrix = np.empty((int(1 + n_fft // 2), y_frames.shape[1]),dtype=dtype,order='F')fft = get_fftlib()# how many columns can we fit within MAX_MEM_BLOCK?n_columns = int(util.MAX_MEM_BLOCK / (stft_matrix.shape[0] *stft_matrix.itemsize))for bl_s in range(0, stft_matrix.shape[1], n_columns):bl_t = min(bl_s + n_columns, stft_matrix.shape[1])stft_matrix[:, bl_s:bl_t] = fft.rfft(fft_window *y_frames[:, bl_s:bl_t],axis=0)return stft_matrix

从stft源码可以看到，LibROSA实际上是先加窗后分帧的，其中原因我也没有仔细研究，留待后续补充吧。

源码中，变量fft的类型是一个numpy库的ndarray。逐帧进行快速傅里叶变换时，调用的是scipy或numpy的fft函数，rfft就是实数范围的fft。

1.4 功率谱

经过短时傅里叶变换后，还需要取绝对值，再平方后才能得到能量谱图。
这一步骤在librosa.core.spectrum._spectrogram()中：

def _spectrogram(y=None, S=None, n_fft=2048, hop_length=512, power=1,win_length=None, window='hann', center=True, pad_mode='reflect'):'''Helper function to retrieve a magnitude spectrogram.This is primarily used in feature extraction functions that can operate oneither audio time-series or spectrogram input.Parameters----------y : None or np.ndarray [ndim=1]If provided, an audio time seriesS : None or np.ndarraySpectrogram input, optionaln_fft : int > 0STFT window sizehop_length : int > 0STFT hop lengthpower : float > 0Exponent for the magnitude spectrogram,e.g., 1 for energy, 2 for power, etc.win_length : int <= n_fft [scalar]Each frame of audio is windowed by `window()`.The window will be of length `win_length` and then paddedwith zeros to match `n_fft`.If unspecified, defaults to ``win_length = n_fft``.window : string, tuple, number, function, or np.ndarray [shape=(n_fft,)]- a window specification (string, tuple, or number);see `scipy.signal.get_window`- a window function, such as `scipy.signal.hanning`- a vector or array of length `n_fft`.. see also:: `filters.get_window`center : boolean- If `True`, the signal `y` is padded so that frame`t` is centered at `y[t * hop_length]`.- If `False`, then frame `t` begins at `y[t * hop_length]`pad_mode : stringIf `center=True`, the padding mode to use at the edges of the signal.By default, STFT uses reflection padding.Returns-------S_out : np.ndarray [dtype=np.float32]- If `S` is provided as input, then `S_out == S`- Else, `S_out = |stft(y, ...)|**power`n_fft : int > 0- If `S` is provided, then `n_fft` is inferred from `S`- Else, copied from input'''if S is not None:# Infer n_fft from spectrogram shapen_fft = 2 * (S.shape[0] - 1)else:# Otherwise, compute a magnitude spectrogram from inputS = np.abs(stft(y, n_fft=n_fft, hop_length=hop_length,win_length=win_length, center=center,window=window, pad_mode=pad_mode))**powerreturn S, n_fft

_spectrogram函数的power虽然默认值为1，但被上层函数melspectrogram调用时传入的值为2（后面讲到）。

----------------------------------- MFCC ------------------------------------------

LibROSA的mfcc函数源码如下：

# -- Mel spectrogram and MFCCs -- #
def mfcc(y=None, sr=22050, S=None, n_mfcc=20, dct_type=2, norm='ortho', **kwargs):if S is None:S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs))return scipy.fftpack.dct(S, axis=0, type=dct_type, norm=norm)[:n_mfcc]

从mfcc函数的代码能发掘的信息有限，因此我们需要进一步查看调用的函数代码。以下将从函数调用链的底层往上分析，直到mfcc函数。

2. 梅尔滤波器

在获得音频的能量谱的同时，还需要构造一个梅尔滤波器组，并与能量谱进行点积运算。梅尔滤波器的作用是将能量谱转换为更接近人耳机理的梅尔频率。

LibROSA将其实现于librosa.filters.mel()中：

def mel(sr, n_fft, n_mels=128, fmin=0.0, fmax=None, htk=False,norm=1, dtype=np.float32):"""Create a Filterbank matrix to combine FFT bins into Mel-frequency binsParameters----------sr        : number > 0 [scalar]sampling rate of the incoming signaln_fft     : int > 0 [scalar]number of FFT componentsn_mels    : int > 0 [scalar]number of Mel bands to generatefmin      : float >= 0 [scalar]lowest frequency (in Hz)fmax      : float >= 0 [scalar]highest frequency (in Hz).If `None`, use `fmax = sr / 2.0`htk       : bool [scalar]use HTK formula instead of Slaneynorm : {None, 1, np.inf} [scalar]if 1, divide the triangular mel weights by the width of the mel band(area normalization).  Otherwise, leave all the triangles aiming fora peak value of 1.0dtype : np.dtypeThe data type of the output basis.By default, uses 32-bit (single-precision) floating point.Returns-------M         : np.ndarray [shape=(n_mels, 1 + n_fft/2)]Mel transform matrixNotes-----This function caches at level 10,"""if fmax is None:fmax = float(sr) / 2if norm is not None and norm != 1 and norm != np.inf:raise ParameterError('Unsupported norm: {}'.format(repr(norm)))# Initialize the weightsn_mels = int(n_mels)weights = np.zeros((n_mels, int(1 + n_fft // 2)), dtype=dtype)# Center freqs of each FFT binfftfreqs = fft_frequencies(sr=sr, n_fft=n_fft)# 'Center freqs' of mel bands - uniformly spaced between limitsmel_f = mel_frequencies(n_mels + 2, fmin=fmin, fmax=fmax, htk=htk)fdiff = np.diff(mel_f)ramps = np.subtract.outer(mel_f, fftfreqs)for i in range(n_mels):# lower and upper slopes for all binslower = -ramps[i] / fdiff[i]upper = ramps[i+2] / fdiff[i+1]# .. then intersect them with each other and zeroweights[i] = np.maximum(0, np.minimum(lower, upper))if norm == 1:# Slaney-style mel is scaled to be approx constant energy per channelenorm = 2.0 / (mel_f[2:n_mels+2] - mel_f[:n_mels])weights *= enorm[:, np.newaxis]# Only check weights if f_mel[0] is positiveif not np.all((mel_f[:-2] == 0) | (weights.max(axis=1) > 0)):# This means we have an empty channel somewherewarnings.warn('Empty filters detected in mel frequency basis. ''Some channels will produce empty responses. ''Try increasing your sampling rate (and fmax) or ''reducing n_mels.')return weights

在未指定时，LibROSA默认的梅尔滤波器个数为128。

关于梅尔滤波器的知识将在另一篇博文中介绍。

3. 梅尔语谱图

将功率谱与 Mel 滤波器组，两者进行点积运算后即可得到梅尔频谱图，这一步骤体现在
librosa.feature.melspectrogram()中：

def melspectrogram(y=None, sr=22050, S=None, n_fft=2048, hop_length=512,win_length=None, window='hann', center=True, pad_mode='reflect',power=2.0, **kwargs):"""Compute a mel-scaled spectrogram.If a spectrogram input `S` is provided, then it is mapped directly ontothe mel basis `mel_f` by `mel_f.dot(S)`.If a time-series input `y, sr` is provided, then its magnitude spectrogram`S` is first computed, and then mapped onto the mel scale by`mel_f.dot(S**power)`.  By default, `power=2` operates on a power spectrum.Parameters----------y : np.ndarray [shape=(n,)] or Noneaudio time-seriessr : number > 0 [scalar]sampling rate of `y`S : np.ndarray [shape=(d, t)]spectrogramn_fft : int > 0 [scalar]length of the FFT windowhop_length : int > 0 [scalar]number of samples between successive frames.See `librosa.core.stft`win_length : int <= n_fft [scalar]Each frame of audio is windowed by `window()`.The window will be of length `win_length` and then paddedwith zeros to match `n_fft`.If unspecified, defaults to ``win_length = n_fft``.window : string, tuple, number, function, or np.ndarray [shape=(n_fft,)]- a window specification (string, tuple, or number);see `scipy.signal.get_window`- a window function, such as `scipy.signal.hanning`- a vector or array of length `n_fft`.. see also:: `filters.get_window`center : boolean- If `True`, the signal `y` is padded so that frame`t` is centered at `y[t * hop_length]`.- If `False`, then frame `t` begins at `y[t * hop_length]`pad_mode : stringIf `center=True`, the padding mode to use at the edges of the signal.By default, STFT uses reflection padding.power : float > 0 [scalar]Exponent for the magnitude melspectrogram.e.g., 1 for energy, 2 for power, etc.kwargs : additional keyword argumentsMel filter bank parameters.See `librosa.filters.mel` for details.Returns-------S : np.ndarray [shape=(n_mels, t)]Mel spectrogramSee Also--------librosa.filters.melMel filter bank constructionlibrosa.core.stftShort-time Fourier Transform"""S, n_fft = _spectrogram(y=y, S=S, n_fft=n_fft, hop_length=hop_length, power=power,win_length=win_length, window=window, center=center,pad_mode=pad_mode)# Build a Mel filtermel_basis = filters.mel(sr, n_fft, **kwargs)return np.dot(mel_basis, S)

4. 梅尔语谱图取对数

对微小的声音，只要响度稍有增加人耳即可感觉到，但是当声音响度已经大到一定程度后，即使再有较大的增加，人耳的感觉却无明显变化。我们把人耳对声音响度的这种听觉特性称为“对数式”特性。

因此，对梅尔频谱图取对数的原因就是为了模拟人耳的“对数式”特性。

LibROSA将这一步骤实现在librosa.core.power_to_db()中：

def power_to_db(S, ref=1.0, amin=1e-10, top_db=80.0):"""Convert a power spectrogram (amplitude squared) to decibel (dB) unitsThis computes the scaling ``10 * log10(S / ref)`` in a numericallystable way.Parameters----------S : np.ndarrayinput powerref : scalar or callableIf scalar, the amplitude `abs(S)` is scaled relative to `ref`:`10 * log10(S / ref)`.Zeros in the output correspond to positions where `S == ref`.If callable, the reference value is computed as `ref(S)`.amin : float > 0 [scalar]minimum threshold for `abs(S)` and `ref`top_db : float >= 0 [scalar]threshold the output at `top_db` below the peak:``max(10 * log10(S)) - top_db``Returns-------S_db : np.ndarray``S_db ~= 10 * log10(S) - 10 * log10(ref)``See Also--------perceptual_weightingdb_to_poweramplitude_to_dbdb_to_amplitudeNotes-----This function caches at level 30."""S = np.asarray(S)if amin <= 0:raise ParameterError('amin must be strictly positive')if np.issubdtype(S.dtype, np.complexfloating):warnings.warn('power_to_db was called on complex input so phase ''information will be discarded. To suppress this warning, ''call power_to_db(np.abs(D)**2) instead.')magnitude = np.abs(S)else:magnitude = Sif six.callable(ref):# User supplied a function to calculate reference powerref_value = ref(magnitude)else:ref_value = np.abs(ref)log_spec = 10.0 * np.log10(np.maximum(amin, magnitude))log_spec -= 10.0 * np.log10(np.maximum(amin, ref_value))if top_db is not None:if top_db < 0:raise ParameterError('top_db must be non-negative')log_spec = np.maximum(log_spec, log_spec.max() - top_db)return log_spec

5. 离散余弦变换

最后一步是离散余弦变换（Discrete Cosine Transform，DCT），这一步的目的是改变数据分布，将冗余数据分开。变换后，大部分信号数据将集中在低频区，因此我们通常只需要取变换后的前面一部分数据就可以了（LibROSA的mfcc函数默认取前20个）。

# -- Mel spectrogram and MFCCs -- #
def mfcc(y=None, sr=22050, S=None, n_mfcc=20, dct_type=2, norm='ortho', **kwargs):"""Mel-frequency cepstral coefficients (MFCCs)Parameters----------y : np.ndarray [shape=(n,)] or Noneaudio time seriessr : number > 0 [scalar]sampling rate of `y`S : np.ndarray [shape=(d, t)] or Nonelog-power Mel spectrogramn_mfcc: int > 0 [scalar]number of MFCCs to returndct_type : None, or {1, 2, 3}Discrete cosine transform (DCT) type.By default, DCT type-2 is used.norm : None or 'ortho'If `dct_type` is `2 or 3`, setting `norm='ortho'` uses an ortho-normalDCT basis.Normalization is not supported for `dct_type=1`.kwargs : additional keyword argumentsArguments to `melspectrogram`, if operatingon time series inputReturns-------M : np.ndarray [shape=(n_mfcc, t)]MFCC sequenceSee Also--------melspectrogramscipy.fftpack.dct"""if S is None:S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs))return scipy.fftpack.dct(S, axis=0, type=dct_type, norm=norm)[:n_mfcc]

这一步中LibROSA调用了scipy.fftpack.dct()来做DCT，并取每一帧的前n_mfcc个元素值。至此，MFCC的提取就完成了。

总结：LibROSA提取MFCC的函数调用链如下图所示：

LibROSA提取音频MFCC特征的函数调用链

前面按照MFCC提取的步骤，逆着调用链解析，

发现LibROSA实现的两个特点：没有预加重的过程；

不同于其他库从stft→mel就完成了数据降维，LibROSA是留到最后才进行降维。

对于第二个特点，这么做的原因、对性能是否会有影响我还没有去深究。

librosa 语音库（四）librosa.feature.mfcc相关推荐

librosa 语音库（一）简介
librosa是一个非常强大的python语音信号处理的第三方库; 即 librosa 使用python去实现多种的算法: 本文参考的是librosa的官方文档主要总结了一些重要. 先总结一下本文中常 ...
librosa 语音库（三） librosa.feature. 中的 spectrogram 与 melspectrogram
窗口的长度与 n_fft 需要匹配大小长度: 1. Mel 语谱图的函数定义 librosa.feature.melspectrogram()函数在spectral.py 中,实现过程为: def m ...
『语音信号处理』语音库 librosa 学习
librosa 前言音频读取重采样读取时长写音频过零率波形图短时傅里叶变换短时傅里叶逆变换幅度转dB 功率转dB 频谱图 Mel滤波器组梅尔频谱提取MFCC系数前言安装 li ...
【librosa】librosa.feature.mfcc介绍
librosa.feature.mfcc参数介绍 librosa.feature.mfcc(y=None, sr=22050, S=None, n_mfcc=20, dct_type=2, norm= ...
librosa语音信号处理
librosa是一个非常强大的python语音信号处理的第三方库,本文参考的是librosa的官方文档,本文主要总结了一些重要,对我来说非常常用的功能.学会librosa后再也不用用python去实现 ...
【全志R329-NPU助力】Maix-Speech为嵌入式环境设计的离线语音库
Maix-Speech是专为嵌入式环境设计的离线语音库,设计目标包括:ASR/TTS/CHAT 作者的设计初衷是完成一个低至Cortex-A7 1.0GHz 单核下可以实时运行的ASR库. 目前市面上 ...
jacob TTS语音库异常ComFailException invoke of: Speak的产生原因，以及解决办法。
问题描述在本地开发调试语音库都没问题,一部署到机器上就出现下述异常. 这里要注意的是:jacob.jar和jacob.dll文件放置位置,jacob.dll放置C:/windows/System32 ...
计算机网络的拓扑结构三种基本型,2018年自考《计算机网络基本原理》试题库四...
2018年自考<计算机网络基本原理>试题库四二.填空题(本大题共20个空,每空1分,共20分) 请在每小题的空格中填上正确答案.错填.不填均无分. 21.第一阶段的计算机网络系统实质上就 ...
win10语音语言服务器,win10系统：朗读女语音库（发音人）安装方法说明
win10系统:朗读女语音库(发音人)安装方法说明朗读女使用帮助本文将介绍在win10系统下,朗读女软件如何添加安装:发音人(语音库). 一.安装开启win10系统自带的三个发音人. 1.首先要 ...

librosa 语音库（四）librosa.feature.mfcc

基础概念

0.1 样本总点数；

0.2 单帧内的重叠点数；

0.3 分帧后的帧数；

库函数的调用

1. 音频的预处理

1.1 分帧

1.2 加窗

1.3 快速傅里叶变换

1.4 功率谱

2. 梅尔滤波器

3. 梅尔语谱图

4. 梅尔语谱图取对数

5. 离散余弦变换

librosa 语音库（四）librosa.feature.mfcc相关推荐

最新文章

热门文章

librosa 语音库（四）librosa.feature.mfcc

基础概念

0.1 样本总点数；

0.2 单帧内的重叠点数；

0.3 分帧后的 帧数；

库函数的调用

1. 音频的预处理

1.1 分帧

1.2 加窗

1.3 快速傅里叶变换

1.4 功率谱

2. 梅尔滤波器

3. 梅尔语谱图

4. 梅尔语谱图取对数

5. 离散余弦变换

librosa 语音库（四）librosa.feature.mfcc相关推荐

最新文章

热门文章

0.3 分帧后的帧数；