torch.stft

stft(self, n_fft, hop_length=None, win_length=None,window=None,center=True, pad_mode='reflect', normalized=False, onesided=True)

Parameters：
----------

input (Tensor) – the input tensorn_fft (int) – size of Fourier transformhop_length (int, optional) – the distance between neighboring sliding window frames. Default: None (treated as equal to floor(n_fft / 4))win_length (int, optional) – the size of window frame and STFT filter. Default: None (treated as equal to n_fft)window (Tensor, optional) – the optional window function. Default: None (treated as window of all 111 s)center (bool, optional) – whether to pad input on both sides so that the ttt -th frame is centered at time t×hop_lengtht \times \text{hop\_length}t×hop_length . Default: Truepad_mode (string, optional) – controls the padding method used when center is True. Default: "reflect"normalized (bool, optional) – controls whether to return the normalized STFT results Default: Falseonesided (bool, optional) – controls whether to return half of results to avoid redundancy Default: TrueReturns the real and the imaginary parts together as one tensor of size :math:`(* \times N \times T \times 2)`, where :math:`*` is the optional batch size of :attr:`input`, :math:`N` is the number of frequencies where STFT is applied, :math:`T` is the total number of frames used, and each pair in the last dimension represents a complex number as the real part and the imaginary part.----------

其输入为一维或者二维的时间序列
返回值为一个tensor,其中第一个维度为输入数据的batch size，第二个维度为STFT应用的频数，第三个维度为帧总数，最后一个维度包含了返回的复数值中的实部和虚部部分。
幅度和相位的获取如下：

spec = torch.stft(mono,n_fft=len_frame,hop_length=len_hop)
rea = spec[:, :, 0]#实部
imag = spec[:, :, 1]#虚部
mag = torch.abs(torch.sqrt(torch.pow(rea, 2) + torch.pow(imag, 2)))
pha = torch.atan2(imag.data, rea.data)

librosa.stft

stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, dtype=np.complex64, pad_mode='reflect')

Parameters
----------

y : np.ndarray [shape=(n,)], real-valuedinput signaln_fft : int > 0 [scalar]length of the windowed signal after padding with zeros.The number of rows in the STFT matrix `D` is (1 + n_fft/2).The default value, n_fft=2048 samples, corresponds to a physicalduration of 93 milliseconds at a sample rate of 22050 Hz, i.e. thedefault sample rate in librosa. This value is well adapted for musicsignals. However, in speech processing, the recommended value is 512,corresponding to 23 milliseconds at a sample rate of 22050 Hz.In any case, we recommend setting `n_fft` to a power of two foroptimizing the speed of the fast Fourier transform (FFT) algorithm.hop_length : int > 0 [scalar]number of audio samples between adjacent STFT columns.Smaller values increase the number of columns in `D` withoutaffecting the frequency resolution of the STFT.If unspecified, defaults to `win_length / 4` (see below).win_length : int <= n_fft [scalar]Each frame of audio is windowed by `window()` of length `win_length`and then padded with zeros to match `n_fft`.Smaller values improve the temporal resolution of the STFT (i.e. theability to discriminate impulses that are closely spaced in time)at the expense of frequency resolution (i.e. the ability to discriminatepure tones that are closely spaced in frequency). This effect is knownas the time-frequency localization tradeoff and needs to be adjustedaccording to the properties of the input signal `y`.If unspecified, defaults to ``win_length = n_fft``.window : string, tuple, number, function, or np.ndarray [shape=(n_fft,)]Either:- a window specification (string, tuple, or number);see `scipy.signal.get_window`- a window function, such as `scipy.signal.hanning`- a vector or array of length `n_fft`Defaults to a raised cosine window ("hann"), which is adequate formost applications in audio signal processing... see also:: `filters.get_window`center : booleanIf `True`, the signal `y` is padded so that frame`D[:, t]` is centered at `y[t * hop_length]`.If `False`, then `D[:, t]` begins at `y[t * hop_length]`.Defaults to `True`,  which simplifies the alignment of `D` onto atime grid by means of `librosa.core.frames_to_samples`.Note, however, that `center` must be set to `False` when analyzingsignals with `librosa.stream`... see also:: `stream`dtype : numeric typeComplex numeric type for `D`.  Default is single-precisionfloating-point complex (`np.complex64`).pad_mode : string or functionIf `center=True`, this argument is passed to `np.pad` for paddingthe edges of the signal `y`. By default (`pad_mode="reflect"`),`y` is padded on both sides with its own reflection, mirrored aroundits first and last sample respectively.If `center=False`,  this argument is ignored... see also:: `np.pad`

通过在短重叠窗口上计算离散傅里叶变换(DFT)来表示时频域信号。返回值为一个复数值矩阵D，其中np.abs(D)表示幅度，np.angle(D)表示相位。
幅度和相位的获取如下：

spec = librosa.stft(mono, n_fft=len_frame, hop_length=len_hop)
mag = np.abs(spec)
pha = np.angle(spec)

或者直接利用librosa.core中封装好的函数

spec = librosa.stft(mono, n_fft=len_frame, hop_length=len_hop)
mag,pha = librosa.core.magphase(spec)

torch.stft()与librosa.stft()的对比相关推荐

librosa.stft() 短时傅里叶变换
librosa 短时傅里叶变换 import numpy as np # pip install numpy import librosa # pip install librosa y, sr = ...
正态分布初始化 torch.nn.Embedding.weight()与torch.nn.init.normal()的验证对比
torch.nn.Embedding.weight(num_embeddings, embedding_dim) 随机初始化,生成标准正态分布N(0,1)N(0,1)N(0,1)的张量Tensor t ...
torch.rand (randn, random以及normal)对比
(1)torch.rand(sizes, out=None) 产生一个服从均匀分布的张量,张量内的数据包含从区间[0,1)的随机数.参数size是一个整数序列,用于定义张量大小. a = torch. ...
librosa | 系统实战（五~十七）
文章目录 [ (一 ~ 四)librosa学习点此处](https://blog.csdn.net/qq_44250700/article/details/119685358) 五.频谱特性 Spec ...
你真的懂语音特征吗？
摘要:本文指在详细介绍语音转化声学特征的过程,并详细介绍不同声学特征在不同模型中的应用. 本文分享自华为云社区<你真的懂语音特征背后的原理吗?>,作者: 白马过平川 . 语音数据常被用于人 ...
（超详细）语音信号处理之特征提取
语音信号处理之特征提取语音信号处理之特征提取要对语音信号进行分析,首先要分析并提取出可表示该语音本质的特征参数.有了特征参数才能利用这些特征参数进行有效的处理. 根据提取参数的方法不同,可将语音信号 ...
python numpy 分离与合并复数矩阵实部虚部的方法
在进行数字信号处理的过程中,我们往往有对短时傅里叶变换频谱(spectrogram)进行分析的需求.常见的分析手段对应欧拉公式分为两种,要么使用模与相位的形式,要么使用实部虚部.本文分享一个简单的将复 ...
mfcc中的fft操作_简化音频数据：FFT，STFT和MFCC
mfcc中的fft操作 What we should know about sound. Sound is produced when there's an object that vibrates ...
matlab 时频分析（短时傅里叶变换、STFT）
短时傅里叶变换,short-time fourier transformation,有时也叫加窗傅里叶变换,时间窗口使得信号只在某一小区间内有效,这就避免了传统的傅里叶变换在时频局部表达能力上的不足, ...
STFT（短时傅里叶变换）音频特征提取，用于语音识别 python
在各种音频相关的任务中,不管用什么模型或网络,得到所需的音频特征肯定是必要的一步.下面简单说一下STFT特征一.原始信号在说STFT之前,先说一下读入的原始信号,图像是在XY二维上描述的像素点的集 ...

torch.stft()与librosa.stft()的对比

对比torch.stft与librosa.stft在获取语音的幅度和相位的不同表达

torch.stft

librosa.stft

torch.stft()与librosa.stft()的对比相关推荐

最新文章

热门文章