文章目录

[ （一 ~ 四）librosa学习点此处](https://blog.csdn.net/qq_44250700/article/details/119685358)
五、频谱特性 Spectral representations
- （1）短时傅里叶变换 short-time Fourier Transform（STFT）
- （2）短时傅里叶逆变换（ISTFT）
- （3）瞬时频率 ifgram()
- （4）音乐中常用的CQT算法（constant-Q transform）
- （5）icqt()
- （6）hybrid_cqt()
- （7）pseudo_cqt()
- （8）快速梅林变换（fmt）
- （8）计算时频信号中谐波的能量 interp_harmonics()
- （9）谐波显示功能 salience()
- （10）相位声码 phase_vocoder()
- （11）相位幅值 magphase()
- （12）使用IIR滤波器的时频表示 iirt()
六、幅度 Magnitude scaling
- （1）amplitude_to_db()
- （2）db_to_amplitude()
- （3）power_to_db()
- （4）db_to_power(S_db, ref=1.0)
- （5）perceptual_weighting()
- （6）A_weighting()
- （7）pcen()
七、时频转化 Time and frequency conversion
- （1）frames_to_samples()
- （2）frames_to_time()
- （3）samples_to_frames()
- （4）samples_to_time()
- （5）time_to_frames()
- （6）time_to_samples()
- （7）hz_to_note()
- （8）hz_to_midi()
- （9）midi_to_hz()
- （10）midi_to_note()
- （11）note_to_hz()
- （12）note_to_midi()
- （13）hz_to_mel()
- （14）hz_to_octs()
- （15）mel_to_hz()
- （16）octs_to_hz()
- （17）fft_frequencies()
- （18）cqt_frequencies()
- （19）mel_frequencies()
- （20）tempo_frequencies()
- （21）samples_like()
- （22）times_like()
八、librosa.effects
- （1）librosa.effects.split
- （2）librosa.effects.hpss(y)
九、librosa.filters
- Mel滤波器组
十、librosa.onset
十一、librosa.segment
十二、librosa.sequence
十三、librosa.util
- （1）librosa.util.frame()
- （2）librosa.util.pad_center()
- （3）librosa.util.fix_length()
- （4）librosa.util.fix_frames()
- （5）librosa.util.index_to_slice()
- （6）librosa.util.softmask()
- （7）librosa.util.sync()
- （8）librosa.util.axis_sort()
- （9）librosa.util.normalize()
- （9）librosa.util.roll_sparse()
- （10）librosa.util.sparsify_rows()
- （11）librosa.util.buf_to_float()
- （12）librosa.util.tiny()
- （9）动态范围压缩Dynamic range compression（DRC）
十四、Deprecated(moved)
- （1）dtw() 动态时间扭曲
- （2）fill_off_diagonal()
十五、Rhythm features
- （1）tempogram()
十六、Feature manipulation
- （1）delta()
- （2）stack_memory()
十七、Spectrogram decomposition
- （1）librosa.decompose.decompose() 分解一个特征矩阵
- （2）librosa.decompose.hpss()
- （3）librosa.decompose.nn_filter() 谱分解
Matching
- match_intervals() 将一组时间间隔与另一组时间间隔匹配。
- match_events() 将一组事件与另一组事件匹配。
Miscellaneous
- localmax() 在数组x中找到局部最大值。
- peak_pick() 使用灵活的启发式算法选择信号中的峰值。
Input Validation
- valid_audio() 验证变量是否包含有效的单声道音频数据。
- valid_int() 确保输入值是整型的。
- valid_intervals() 确保数组是时间间隔的有效表示。
File operations
- example_audio_file() 获取包含音频示例文件的路径。
- find_files() 获取目录或目录子树中已排序的(音频)文件列表。
、magphase

（一 ~ 四）librosa学习点此处

五、频谱特性 Spectral representations

（1）短时傅里叶变换 short-time Fourier Transform（STFT）

torch.stft()与librosa.stft()的对比

librosa.stft(y, n_fft=2048, hop_length=None,win_length=None,window='hann',      center=True,      pad_mode='reflect')

复数的实部：np.abs(D(f,t))频率的振幅
复数的虚部：np.angle(D(f,t))频率的相位
参数：

y：音频时间序列
n_fft：FFT窗口大小，n_fft=hop_length+overlapping
hop_length：帧移。
spectrum = np.abs(librosa.stft(frame, n_fft=self.nfft))，未指定hop_length时，则默认win_length / 4
spectrum = np.abs(librosa.stft(frame, n_fft=self.nfft, hop_length=len(frame)))时，如果帧移长度小于傅里叶变换点数，librosa.stft输出为hop_length+1
spectrum = np.abs(librosa.stft(frame, n_fft=self.nfft, hop_length=self.nfft))时，无论win_length设置为帧长还是nfft，librosa.stft输出都只有一帧。
最后得出结论librosa.stft的输出帧数为speech_length // hop_length + 1
win_length：每一帧音频都由window()加窗。窗长win_length，然后用零填充以匹配n_fft。
默认win_length=n_fft。
window：字符串，元组，数字，函数 shape =（n_fft, )
窗口（字符串，元组或数字）
窗函数，例如scipy.signal.hanning
长度为n_fft的向量或数组
center：bool
如果为True，则填充信号y，以使帧 D [:, t]以y [t * hop_length]为中心
如果为False，则D [:, t]从y [t * hop_length]开始
dtype：D的复数值类型。默认值为64-bit complex复数
pad_mode：如果center = True，则在信号的边缘使用填充模式。默认情况下，STFT使用reflection padding

返回一个复数矩阵使得D(f,t)
• STFT矩阵 shape =（1 + nfft/2，t）
•其中，n_fft/2是因为实数FFT信号具有对称性，我们只需要去一般的数据分析即可，全部返回有数据冗余。
•n_frames: n_frames = (speech_len) // hop_len + 1。具体可以画图，信号处理之前首先需要padding, padding之后分帧，画图可以看到，真正与帧数有关系的，是hop_len。

代码：

#利用STFT将声音信号转换为时频信号:import librosa.display
import matplotlib.pyplot as plt
import numpy as np
# 声音文件路径
audio_path = 'D:/My life/music/some music/sweeter.mp3'
# 加载音频
x, sr = librosa.load(audio_path, sr=None, offset=0)  # sr置为None即采用原采样率，若不指定则采用默认的22.05khz
# 对声音信号做STFT转为时频信号
X = librosa.stft(x)
# 将时频信号中的实部ndarray也存为npy文件，以供在ISTFT时使用
np.save("D:/My life/music/some music/real.npy", np.real(X))
# 将时频信号中的虚部ndarray也存为npy文件，以供在ISTFT时使用
np.save("D:/My life/music/some music/imag.npy", np.imag(X))
# 将振幅转为db
Xdb = librosa.amplitude_to_db(np.abs(X))
print(Xdb.shape)
# 将db的ndarray存起来以供训练时使用
np.save("D:/My life/music/some music/test.npy", Xdb)
# 画出时频图
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
# 添加颜色条
plt.colorbar()
# 限制y轴大小
plt.ylim(19800, 20200)
# 显示画布
plt.show()

#librosa语音库当中的STFT代码阅读（含注释）

def stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann',center=True, dtype=np.complex64, pad_mode='reflect'):Use left-aligned frames, instead of centered frames>>> D_left = np.abs(librosa.stft(y, center=False))Use a shorter hop length>>> D_short = np.abs(librosa.stft(y, hop_length=64))Display a spectrogram>>> import matplotlib.pyplot as plt>>> librosa.display.specshow(librosa.amplitude_to_db(D,...                                                  ref=np.max),...                          y_axis='log', x_axis='time')>>> plt.title('Power spectrogram')>>> plt.colorbar(format='%+2.0f dB')>>> plt.tight_layout()# By default, use the entire frame if win_length is None:win_length = n_fft# Set the default hop, if it's not already specifiedif hop_length is None:hop_length = int(win_length // 4)fft_window = get_window(window, win_length, fftbins=True)# Pad the window out to n_fft size 将窗口大小扩展到与N_FFT大小相同fft_window = util.pad_center(fft_window, n_fft)# Reshape so that the window can be broadcastfft_window = fft_window.reshape((-1, 1))# Check audio is validutil.valid_audio(y)# Pad the time series so that frames are centeredif center:y = np.pad(y, int(n_fft // 2), mode=pad_mode)# 在y的两侧，分别镜像填充n_fft//2个数据# eg:[3, 2, 1, 2, 3, 4, 5, 4, 3]（填充两个数据）# Window the time series.将信号进行分帧
y_frames = util.frame(y, frame_length=n_fft, hop_length=hop_length)# Pre-allocate the STFT matrix 对输出矩阵进行内存分配，应该有助于计算速度的提升
stft_matrix = np.empty((int(1 + n_fft // 2), y_frames.shape[1]),dtype=dtype,order='F')# how many columns can we fit within MAX_MEM_BLOCK? 在librosa当中，设定为256KB
# 计算在最大内存的限制下，最多能够存储多少帧（列）数据的FFT变换
n_columns = int(util.MAX_MEM_BLOCK / (stft_matrix.shape[0] *stft_matrix.itemsize))for bl_s in range(0, stft_matrix.shape[1], n_columns):bl_t = min(bl_s + n_columns, stft_matrix.shape[1])# 当n_columns大于分帧数的时候，for循环当中就只有一次bl_s=0# 反之，每次进行n_columns帧数据的运算，然后进行循环拼接stft_matrix[:, bl_s:bl_t] = fft.fft(fft_window *y_frames[:, bl_s:bl_t],axis=0)[:stft_matrix.shape[0]]return stft_matrix

此图片参考资料

（2）短时傅里叶逆变换（ISTFT）

librosa.istft(stft_matrix,hop_length=None, win_length=None, window='hann', center=True,length=None)

将复数值D(f,t)频谱矩阵转换为时间序列y，窗函数、帧移等参数应与stft相同
利用ISTFT将时频信号转换为声音信号并存为wav文件
参数：

stft_matrix：经过STFT之后的矩阵
hop_length：帧移，默认为winlength4
win_length：窗长，默认为n_fft
window：字符串，元组，数字，函数或shape = (n_fft, )
窗口（字符串，元组或数字）
窗函数，例如scipy.signal.hanning
长度为n_fft的向量或数组
center：bool
如果为True，则假定D具有居中的帧
如果False，则假定D具有左对齐的帧
length：如果提供，则输出y为零填充或剪裁为精确长度音频

y：时域信号

应用：

import librosa.display
import matplotlib.pyplot as plt
import numpy as np# 加载npy文件
Xreal = np.load('D:/My life/music/some music/real.npy')
Ximag = np.load('D:/My life/music/some music/imag.npy')result = 1j * Ximag
result += Xreal# 通过ISTFT转为声音信号
Y = librosa.istft(result)# 画出原始声音波形图
y, sr = librosa.load("D:/My life/music/some music/EG DT.wav", sr=None)
librosa.display.waveplot(y, sr)
plt.title("audio_raw")
plt.show()# 画出ISTFT声音波形图
librosa.display.waveplot(Y)
plt.title("audio after ISTFT")
plt.show()import soundfile
# 将声音信号输出到wav文件
soundfile.write('D:/My life/music/some music/EG DT_istft.wav', Y, sr=16000)

（3）瞬时频率 ifgram()

计算得到的瞬时频率(作为采样率的比例)作为复谱相位的时间导数。对音频信号的处理可以通过 librosa.ifgram 方法获取 stft 短时傅立叶变换的矩阵，对该矩阵进行修改搬移，再进行 istft 逆转换获得处理后的音频信号。

参数为：

norm：STFT归一化
ref_power：最小化阈值估计瞬时频率

返回值：

if_gram：瞬时频率
D：短时傅里叶变化

应用：

y, sr = librosa.load(path)
frequencies, D = librosa.ifgram(y, sr=sr)
y = librosa.istft(D)
D为stft变换的矩阵，x 轴为时间序列，y轴为频率序列坐标对应frequencies，值为幅度。
由于D类型为numpy.ndarray，所以可以通过numpy库对矩阵处理。

（4）音乐中常用的CQT算法（constant-Q transform）

计算音频的常数Q变化的值，常数Q转换（ConstantQtransform）与短时距傅立叶转换一样为重要时频分析工具，其中特别适用于音乐信号的分析，这个转换产生的频谱最大的特色是在频率轴为对数标度（logscale）而不是线性标度（linearscale），且窗口长度（windowlength）会随着频率而改变。

librasa.cqt(fmin, n_bins, bins_per_octave, tuning)

参数为：

fmin：最小频率
n_bins：从最小频率开始，频率窗的数
bins_per_octave：每倍频程的bin数量
tuning：调整bin
…

CQT  =  librosa.amplitude_to_db(librosa.cqt(y, sr = 16000 ), ref = np. max )
plt.subplot( 4 ,  2 ,  3 )
librosa.display.specshow(CQT, y_axis = 'cqt_note' )
plt.colorbar( format = '%+2.0f dB' )
plt.title( 'Constant-Q power spectrogram (note)' )

（5）icqt()

常数Q逆变换

（6）hybrid_cqt()

混合CQT变换

（7）pseudo_cqt()

计算音频信号的伪常量- q变换。

（8）快速梅林变换（fmt）

fmt(y, t_min=0.5, n_fmt=None, kind='cubic', beta=0.5, over_sample=1, axis=-1)

参数：

y: np.ndarray, real-valued。输入信号，可以是多维的。The target axis must contain at least 3 samples.
t_min: float > 0
The minimum time spacing (in samples).
This value should generally be less than 1 to preserve as much information as
possible.
n_fmt: int > 2 or None
The number of scale transform bins to use.
If None, then n_bins = over_sample * ceil(n * log((n-1)/t_min)) is taken, where n = y.shape[axis]
kind: str
The type of interpolation to use when re-sampling the input.
See scipy.interpolate.interp1d for possible values.
Note that the default is to use high-precision (cubic) interpolation.
This can be slow in practice; if speed is preferred over accuracy,
then consider using kind='linear'.
beta: float
The Mellin parameter. beta=0.5 provides the scale transform.
over_sample: float >= 1
Over-sampling factor for exponential resampling.
axis: int
The axis along which to transform y

x_scale : np.ndarray [dtype=complex]
The scale transform of y along the axis dimension.
Raises

ParameterError：

if n_fmt < 2 or t_min <= 0
or if y is not finite
or if y.shape[axis] < 3.

Notes：

This function caches at level 30.

应用：

# Generate a signal and time-stretch it (with energy normalization)
import numpy as np
scale = 1.25
freq = 3.0
x1 = np.linspace(0, 1, num=1024, endpoint=False)
x2 = np.linspace(0, 1, num=scale * len(x1), endpoint=False)
y1 = np.sin(2 * np.pi * freq * x1)
y2 = np.sin(2 * np.pi * freq * x2) / np.sqrt(scale)
# Verify that the two signals have the same energy
np.sum(np.abs(y1)**2), np.sum(np.abs(y2)**2)#(255.99999999999997, 255.99999999999969)
scale1 = librosa.fmt(y1, n_fmt=512)
scale2 = librosa.fmt(y2, n_fmt=512)# And plot the results
import matplotlib.pyplot as plt
plt.figure(figsize=(8, 4))
plt.subplot(1, 2, 1)
plt.plot(y1, label='Original')
plt.plot(y2, linestyle='--', label='Stretched')
plt.xlabel('time (samples)')
plt.title('Input signals')
plt.legend(frameon=True)
plt.axis('tight')
plt.subplot(1, 2, 2)
plt.semilogy(np.abs(scale1), label='Original')
plt.semilogy(np.abs(scale2), linestyle='--', label='Stretched')
plt.xlabel('scale coefficients')
plt.title('Scale transform magnitude')
plt.legend(frameon=True)
plt.axis('tight')
plt.tight_layout()# Plot the scale transform of an onset strength autocorrelation
y, sr = librosa.load(librosa.util.example_audio_file(),offset=10.0, duration=30.0)
odf = librosa.onset.onset_strength(y=y, sr=sr)
# Auto-correlate with up to 10 seconds lag
odf_ac = librosa.autocorrelate(odf, max_size=10 * sr // 512)
# 标准化
odf_ac = librosa.util.normalize(odf_ac, norm=np.inf)
# Compute the scale transform
odf_ac_scale = librosa.fmt(librosa.util.normalize(odf_ac), n_fmt=512)
# Plot the results
plt.figure()
plt.subplot(3, 1, 1)
plt.plot(odf, label='Onset strength')
plt.axis('tight')
plt.xlabel('Time (frames)')
plt.xticks([])
plt.legend(frameon=True)
plt.subplot(3, 1, 2)
plt.plot(odf_ac, label='Onset autocorrelation')
plt.axis('tight')
plt.xlabel('Lag (frames)')
plt.xticks([])
plt.legend(frameon=True)
plt.subplot(3, 1, 3)
plt.semilogy(np.abs(odf_ac_scale), label='Scale transform magnitude')
plt.axis('tight')
plt.xlabel('scale coefficients')
plt.legend(frameon=True)
plt.tight_layout()

（8）计算时频信号中谐波的能量 interp_harmonics()

（9）谐波显示功能 salience()

（10）相位声码 phase_vocoder()

给定一个STFT矩阵D，将速度提高一个因子

（11）相位幅值 magphase()

计算复数图谱的幅度值和相位值。

（12）使用IIR滤波器的时频表示 iirt()

六、幅度 Magnitude scaling

（1）amplitude_to_db()

librosa.amplitude_to_db(S, ref=1.0, amin=1e-5, top_db=80.0)

将幅度频谱转换为dB标度频谱，也就是对S取对数。
参数：

S：输入幅度
ref：参考值，振幅abs（S）相对于ref进行缩放，20∗log10(Sref)

ref : scalar or callable
If scalar, the amplitude abs(S) is scaled relative to ref:
20 * log10(S / ref).
Zeros in the output correspond to positions where S == ref.
If callable, the reference value is computed as ref(S).

amin: float > 0 [scalar]
minimum threshold for S and ref
top_db: float >= 0 [scalar]
threshold the output at top_db below the peak:
max(20 * log10(S)) - top_db

dB为单位的S

（2）db_to_amplitude()

将db谱图转为普通振幅谱图（ db_to_amplitude(S_db) ~= 10.0**(0.5 * (S_db + log10(ref)/10))）：

librosa.db_to_amplitude(S_db, ref=1.0)

参数：

S_db: np.ndarray。dB-scaled spectrogram
ref: number > 0。Optional reference power.

S : np.ndarray
Linear magnitude spectrogram

（3）power_to_db()

librosa.core.power_to_db(S, ref=1.0, amin=1e-10, top_db=80.0, ref_power=Deprecated())

与这个函数相反的是librosa.db_to_power(S)
参数：

S：输入功率
ref ：参考值，振幅abs(S)相对于ref进行缩放，10∗log10(Sref)
amin : float > 0 [scalar]。minimum threshold for abs(S) and ref
top_db: float >= 0 [scalar]。
threshold the output at top_db below the peak:max(10 * log10(S)) - top_db
ref_power: scalar or callable

warning：This parameter name was deprecated in librosa 0.5.0.
Use the ref parameter instead.
The ref_power parameter will be removed in librosa 0.6.0.

S_dB：将功率谱（幅度平方）转换为分贝（dB）单位

应用：

import librosa.display
import numpy as np
import matplotlib.pyplot as plt
y, sr=librosa.load('D:/My life/music/some music/sweeter.mp3')
S=np.abs(librosa.stft(y))
print(librosa.power_to_db(S ** 2))
plt.figure()
plt.subplot(2, 1, 1)
librosa.display.specshow(S** 2, sr=sr, y_axis='log') #从波形获取功率谱图
plt.colorbar()
plt.title('Power spectrogram')
plt.subplot(2, 1, 2)#相对于峰值功率计算dB, 那么其他的dB都是负的，注意看后边cmp值
librosa.display.specshow(librosa.power_to_db(S ** 2, ref=np.max),
sr=sr, y_axis='log', x_axis='time')
plt.colorbar(format='%+2.0f dB')
plt.title('Log-Power spectrogram')
plt.set_cmap("autumn")
plt.tight_layout()
plt.show()

（4）db_to_power(S_db, ref=1.0)

参数：

S_db: np.ndarray。dB-scaled spectrogram
ref: number > 0。Reference power: output will be scaled by this value

S: np.ndarray
Power spectrogram
ref * np.power(10.0, 0.1 * S_db）

（5）perceptual_weighting()

功率谱图的感知加权（ S_p[f] = A_weighting(f) + 10*log(S[f] / ref)）:

librosa.perceptual_weighting(S, frequencies, **kwargs)

参数：

S: np.ndarray [shape=(d, t)]。Power spectrogram
frequencies: np.ndarray [shape=(d,)]。每行S的中心频率
kwargs: additional keyword arguments

S_p : np.ndarray [shape=(d, t)]

应用：

#Re-weight a CQT power spectrum, using peak power as reference
y, sr = librosa.load('D:/My life/music/some music/sweeter.mp3')
CQT = librosa.cqt(y, sr=sr, fmin=librosa.note_to_hz('A1'))
freqs = librosa.cqt_frequencies(CQT.shape[0],fmin=librosa.note_to_hz('A1'))
perceptual_CQT = librosa.perceptual_weighting(CQT**2,freqs,ref=np.max)
perceptual_CQT

import matplotlib.pyplot as plt
plt.figure()
plt.subplot(2, 1, 1)
librosa.display.specshow(librosa.amplitude_to_db(CQT,ref=np.max),fmin=librosa.note_to_hz('A1'),y_axis='cqt_hz')
plt.title('Log CQT power')
plt.colorbar(format='%+2.0f dB')
plt.subplot(2, 1, 2)
librosa.display.specshow(perceptual_CQT, y_axis='cqt_hz',fmin=librosa.note_to_hz('A1'),x_axis='time')
plt.title('Perceptually weighted log CQT')
plt.colorbar(format='%+2.0f dB')
plt.tight_layout()

（6）A_weighting()

计算一组频率的a加权。

（7）pcen()

该函数通过自动增益控制对时频表示S进行归一化，然后进行非线性压缩。

七、时频转化 Time and frequency conversion

（1）frames_to_samples()

将帧索引转换为音频样本索引。

（2）frames_to_time()

将帧数转换为时间(秒)。

（3）samples_to_frames()

将样本索引转换为STFT帧。

（4）samples_to_time()

将STFT帧转换为样本索引。

（5）time_to_frames()

将时间戳转换为STFT帧

（6）time_to_samples()

将时间戳（以秒为单位）转换为样本索引。

（7）hz_to_note()

将一个或多个频率（以Hz为单位）转换为最近的音符名称。

（8）hz_to_midi()

获取给定频率的MIDI音符编号

（9）midi_to_hz()

获取MIDI音符的频率（Hz）

（10）midi_to_note()

将一个或多个MIDI数转换为音符串。

（11）note_to_hz()

将一个或多个音符名称转换为频率（Hz）

（12）note_to_midi()

将一个或多个拼写音符转换为MIDI数字。

（13）hz_to_mel()

将Hz转换为Mels

（14）hz_to_octs()

将频率（Hz）转换为（分数）倍频程数。

（15）mel_to_hz()

将mel频率转换为频率

（16）octs_to_hz()

将八度数转换为频率。

（17）fft_frequencies()

np.fft.fftfreq的替代实现

（18）cqt_frequencies()

计算Constant-Q箱的中心频率。

（19）mel_frequencies()

计算调整到梅尔音阶的声学频率阵列。

（20）tempo_frequencies()

计算对应于起始自相关或临时图矩阵的频率（以每分钟节拍数为单位）。

（21）samples_like()

返回一组样本索引以匹配特征矩阵中的时间轴。

（22）times_like()

返回一组时间值以匹配特征矩阵中的时间轴。

八、librosa.effects

时域音频处理,如音高移动和时间拉伸。这个子模块还为分解子模块提供时域包装器。

（1）librosa.effects.split

librosa.effects.split(y, top_db=60, ref=<function amax at 0x7fcba2eb3d90>, frame_length=2048, hop_length=512) 将音频信号分成非静音间隔。

参数：

y：np.ndarray，shape =（n，）或（2，n）音频信号
top_db：数字> 0 低于参考值的阈值（以分贝为单位）被视为静音
ref：参考功率。默认情况下，它使用np.max并与信号中的峰值功率进行比较。
frame_length：int> 0 每帧的样本数
hop_length：int> 0 帧之间的样本数

返回值：

间隔：np.ndarray，shape =（m，2）
interval [i] ==（start_i，end_i）是非静音间隔i的开始和结束时间（以样本为单位）。
intervals = librosa.effects.split(utter, top_db=20)

（2）librosa.effects.hpss(y)

实现节奏与人声分离

import librosa
audio_path = 'D:\My life\music\some music/sweeter.mp3'#音频地址
y, sr = librosa.load(audio_path, sr=44100)# 播放原音频
import IPython.display as ipd
ipd.Audio(audio_path)# 分开提取人声和节奏
y_harmonic, y_percussive = librosa.effects.hpss(y)
ipd.Audio(data = y_harmonic,rate=sr )  # 人声
ipd.Audio(data = y_percussive,rate=sr )  # 节奏

查看分离之后两部分的语谱图：

# 查看人声部分的语谱图
import librosa.display
import matplotlib.pyplot as plt
A = librosa.stft(y_harmonic)
Adb = librosa.amplitude_to_db(abs(A))librosa.display.specshow(Adb, sr=sr, x_axis='time', y_axis='log')
plt.figure(figsize=(14, 5))

# 查看节奏部分的语谱图
A = librosa.stft(y_percussive)
Adb = librosa.amplitude_to_db(abs(A))librosa.display.specshow(Adb, sr=sr, x_axis='time', y_axis='log')
plt.figure(figsize=(14, 5))

可以明显看到两部分的语谱图差别。节奏音频的语谱图是竖直条纹。

九、librosa.filters

过滤库生成(chroma、伪CQT、CQT等)。这些主要是librosa的其他部分使用的内部函数。

Mel滤波器组

librosa.filters.mel(sr, n_fft, n_mels=128, fmin=0.0, fmax=None, htk=False, norm=1)

创建一个滤波器组矩阵以将FFT合并成Mel频率
参数：

sr：输入信号的采样率
n_fft：FFT组件数
n_mels：产生的梅尔带数
fmin：最低频率（Hz）
fmax：最高频率（以Hz为单位）。如果为None，则使用fmax = sr / 2.0
norm：{None，1，np.inf} [标量]
如果为1，则将三角mel权重除以mel带的宽度（区域归一化）否则，保留所有三角形的峰值为1.0

Mel变换矩阵

应用：

melfb = librosa.filters.mel(22050, 2048)
import matplotlib.pyplot as plt
plt.figure()
librosa.display.specshow(melfb, x_axis='linear')
plt.ylabel('Mel filter')
plt.title('Mel filter bank')
plt.colorbar()
plt.tight_layout()
plt.show()

十、librosa.onset

起跳检测和起跳强度计算。

十一、librosa.segment

用于结构分割的函数，如递归矩阵构造、时滞表示和顺序约束聚类。

十二、librosa.sequence

用于顺序建模的函数。各种形式的维特比解码，以及用于构造转换矩阵的辅助函数。

十三、librosa.util

辅助实用程序(规范化、填充、居中等)

（1）librosa.util.frame()

将时间序列分割成重叠的帧。

（2）librosa.util.pad_center()

将数组居中

（3）librosa.util.fix_length()

将数组数据的长度固定为精确的大小。

（4）librosa.util.fix_frames()

固定一个帧的最大值和最小值。

（5）librosa.util.index_to_slice()

从索引数组生成切片数组。关于这个函数的作用，需要学习一下numpy中切片数组的相关知识。

（6）librosa.util.softmask()

鲁棒地计算软掩码操作。

（7）librosa.util.sync()

边界之间多维数组的同步聚合。

（8）librosa.util.axis_sort()

对数组的行或列进行排序。

（9）librosa.util.normalize()

沿着选定的轴对数组进行标准化。

（9）librosa.util.roll_sparse()

系数矩阵滚动。

（10）librosa.util.sparsify_rows()

返回一个近似于输入x的行稀疏矩阵。

（11）librosa.util.buf_to_float()

将整数缓冲区转换为浮点值。

（12）librosa.util.tiny()

计算与输入数据类型对应的极小值。就是比如输入数据是int8类型，则返回int8类型可以表示的最小的数

（9）动态范围压缩Dynamic range compression（DRC）

简单地压缩为一个音频信号处理操作，降低响亮的体积声音、放大安静的声音，从而减少或压缩的音频信号的动态范围。压缩通常用于声音的记录和再现、广播、现场声音增强和某些乐器放大器中。

def dynamic_range_compression(x, C=1, clip_val=1e-5):return torch.log(torch.clamp(x, min=clip_val) * C)

十四、Deprecated(moved)

（1）dtw() 动态时间扭曲

Dynamic Time Warping。衡量两个时间之间相似度的方法，主要用在语音识别领域，判别两段语音是否是同一个文本。
由于每个人语速、发音速度有所不同，两段音频可能存在时间位移差。DTW通常把时间序列进行延伸和缩短，来计算两个时间序列之间的相似性。

计算相似度：两个时间序列X、Y，长度分别为|X|，|Y|。矩阵（Cost Matrix）

import dtw
from dtw import dtw
x = ...
y = ...
dist, cost, acc, path = dtw(x, y, dist = lamda x, y: norm(x - y, ord = 1))

（2）fill_off_diagonal()

将一个矩阵的所有细胞设置为给定的值,如果它们位于约束区域之外。

十五、Rhythm features

（1）tempogram()

计算模板图:起始强度包络线的局部自相关。

十六、Feature manipulation

（1）delta()

计算增量特性:对输入数据沿选定轴的导数进行局部估计。计算了三角函数的萨维茨基-戈莱滤波。

（2）stack_memory()

短期历史嵌入:将数据向量或矩阵与自身的延迟副本垂直连接。

十七、Spectrogram decomposition

（1）librosa.decompose.decompose() 分解一个特征矩阵

（2）librosa.decompose.hpss()

Median-filtering harmonic percussive source separation

（3）librosa.decompose.nn_filter() 谱分解

Filtering by nearest-neighbors

decompose, hpss, nn_filter comps,acts = librosa.decompose.decompose(S, n_components=None, transformer=None, sort=False, fit=True, **kwargs)

分解一个特征矩阵：给定一个谱S，分解成分量components和激活矩阵activations。也即是S ~= components.dot(activations)
默认情况下，利用非负矩阵分解法（non-negative matrix factorization，NMF)，来自 sklearn.decomposition。
参数：

S：np.ndarray [shape=(n_features, n_samples), dtype=float]。输入的特征矩阵（如幅度谱）
n_components：int > 0 [scalar] or None。想要分解的分量数目，若设置为None，就默认n_feature的值
transformer：None or object。变换类型，若设置None，默认 sklearn.decomposition.NMF。否则，任何具有与NMF类似接口的对象都可以。transformer 必须遵循 scikit-learn传统,即输入数据必须是(n_samples, n_features)。
transformer.fit_transform()应该是S的转置S.T，返回值存储(转置)为activations。
分量components将会返回为：transformer.components_.T。
S ~= np.dot(activations, transformer.components_).T
或
S ~= np.dot(transformer.components_.T, activations.T)
sort：bool
如果为True，则分量按峰值频率升序排序。
如果与transformer一起使用，则将对分解参数的副本应用排序，而不是对内部参数进行排序。
fit：bool
如果为True，则从输入S估计组件。
如果为False，则假定组件是预先计算的，并存储在transformer中，不进行更改。
kwargs：Additional keyword arguments to the default transformer

components: np.ndarray [shape=(n_features, n_components)]matrix of components (basis elements).
activations: np.ndarray [shape=(n_components, n_samples)]
变换后矩阵/激活矩阵

例子：

y, sr = librosa.load(librosa.ex('choice'), duration=5)
S = np.abs(librosa.stft(y))
comps, acts = librosa.decompose.decompose(S, n_components=8)

Matching

match_intervals() 将一组时间间隔与另一组时间间隔匹配。

match_events() 将一组事件与另一组事件匹配。

Miscellaneous

localmax() 在数组x中找到局部最大值。

peak_pick() 使用灵活的启发式算法选择信号中的峰值。

Input Validation

valid_audio() 验证变量是否包含有效的单声道音频数据。

valid_int() 确保输入值是整型的。

valid_intervals() 确保数组是时间间隔的有效表示。

File operations

example_audio_file() 获取包含音频示例文件的路径。

find_files() 获取目录或目录子树中已排序的(音频)文件列表。

、magphase

librosa.magphase(D, power=1)

librosa提供了专门将复数矩阵D(F, T)分离为幅值S和相位P的函数，D=S∗P
参数：

D：经过stft得到的复数矩阵
power：幅度谱的指数，例如，1代表能量，2代表功率，等等。

D_mag：幅值D，
D_phase：相位P， phase = exp(1.j * phi) ， phi 是复数矩阵的相位角 np.angle(D)

应用：

import librosa
y, sr = librosa.load('D:/My life/music/some music/sweeter.mp3')
D = librosa.stft(y)
magnitude, phase = librosa.magphase(D)

librosa | 系统实战（五~十七）相关推荐

librosa | 系统实战（一 ~ 四）
librosa是python中用于音乐与语音分析的库,主要用于提取音频文件的特征.官方文档文章目录一.安装方法一方法二方法三二.音频预处理(librosa.* 与 librosa.c ...
librosa | 系统实战（十八~十九）写音频音乐
文章目录 [(一 ~ 四)librosa学习点此处](https://blog.csdn.net/qq_44250700/article/details/119685358) [(五 ~ 十七) li ...
Netty实战 IM即时通讯系统（五）客户端启动流程
## Netty实战 IM即时通讯系统(五)客户端启动流程零. 目录 IM系统简介 Netty 简介 Netty 环境配置服务端启动流程实战: 客户端和服务端双向通信数据传输载体ByteBuf ...
秒杀系统实战（五）| 如何优雅的实现订单异步处理
秒杀系统实战(五)| 如何优雅的实现订单异步处理前言我回来啦,前段时间忙得不可开交.这段时间终于能喘口气了,继续把之前挖的坑填起来.写完上一篇秒杀系统(四):数据库与缓存双写一致性深入分析后,感觉 ...
软考高级系统架构设计师系列论文五十七：论软件项目管理技术及其应用
软考高级系统架构设计师系列论文五十七:论软件项目管理技术及其应用一.软件项目管理技术相关知识点二.摘要三.正文四.总结一.软件项目管理技术相关知识点写论文前充分了解软件项目管理技术相关的知 ...
【秒杀系统】秒杀系统实战（五）：如何优雅的完成订单异步处理
前言我回来啦,前段时间忙得不可开交.这段时间终于能喘口气了,继续把之前挖的坑填起来.写完上一篇秒杀系统(四):数据库与缓存双写一致性深入分析后,感觉文章深度一下子被我抬高了一些,现在构思新文章的时候 ...
OpenCV学习笔记（五十六）——InputArray和OutputArray的那些事core OpenCV学习笔记（五十七）——在同一窗口显示多幅图片 OpenCV学习笔记（五十八）——读《Mast
OpenCV学习笔记(五十六)--InputArray和OutputArray的那些事core 看过OpenCV源代码的朋友,肯定都知道很多函数的接口都是InputArray或者OutputArray ...
视频教程-spring+springMVC+mybatis（ssm框架）在线考试系统实战开发教程-Java
spring+springMVC+mybatis(ssm框架)在线考试系统实战开发教程软件工程硕士毕业,目前就职于上海电信研究院,有三年Java开发经验,五年PHP开发经验. 李礼强 ¥368.00 ...
[Python从零到壹] 五十七.图像增强及运算篇之图像锐化Roberts、Prewitt算子实现边缘检测
欢迎大家来到"Python从零到壹",在这里我将分享约200篇Python系列文章,带大家一起去学习和玩耍,看看Python这个有趣的世界.所有文章都将结合案例.代码和作者的经验讲 ...