
  • torch.stft
  • librosa.stft


stft(self, n_fft, hop_length=None, win_length=None,window=None,center=True, pad_mode='reflect', normalized=False, onesided=True)


input (Tensor) – the input tensorn_fft (int) – size of Fourier transformhop_length (int, optional) – the distance between neighboring sliding window frames. Default: None (treated as equal to floor(n_fft / 4))win_length (int, optional) – the size of window frame and STFT filter. Default: None (treated as equal to n_fft)window (Tensor, optional) – the optional window function. Default: None (treated as window of all 111 s)center (bool, optional) – whether to pad input on both sides so that the ttt -th frame is centered at time t×hop_lengtht \times \text{hop\_length}t×hop_length . Default: Truepad_mode (string, optional) – controls the padding method used when center is True. Default: "reflect"normalized (bool, optional) – controls whether to return the normalized STFT results Default: Falseonesided (bool, optional) – controls whether to return half of results to avoid redundancy Default: TrueReturns the real and the imaginary parts together as one tensor of size :math:`(* \times N \times T \times 2)`, where :math:`*` is the optional batch size of :attr:`input`, :math:`N` is the number of frequencies where STFT is applied, :math:`T` is the total number of frames used, and each pair in the last dimension represents a complex number as the real part and the imaginary part.----------

返回值为一个tensor,其中第一个维度为输入数据的batch size,第二个维度为STFT应用的频数,第三个维度为帧总数,最后一个维度包含了返回的复数值中的实部和虚部部分。

spec = torch.stft(mono,n_fft=len_frame,hop_length=len_hop)
rea = spec[:, :, 0]#实部
imag = spec[:, :, 1]#虚部
mag = torch.abs(torch.sqrt(torch.pow(rea, 2) + torch.pow(imag, 2)))
pha = torch.atan2(imag.data, rea.data)


stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, dtype=np.complex64, pad_mode='reflect')


y : np.ndarray [shape=(n,)], real-valuedinput signaln_fft : int > 0 [scalar]length of the windowed signal after padding with zeros.The number of rows in the STFT matrix `D` is (1 + n_fft/2).The default value, n_fft=2048 samples, corresponds to a physicalduration of 93 milliseconds at a sample rate of 22050 Hz, i.e. thedefault sample rate in librosa. This value is well adapted for musicsignals. However, in speech processing, the recommended value is 512,corresponding to 23 milliseconds at a sample rate of 22050 Hz.In any case, we recommend setting `n_fft` to a power of two foroptimizing the speed of the fast Fourier transform (FFT) algorithm.hop_length : int > 0 [scalar]number of audio samples between adjacent STFT columns.Smaller values increase the number of columns in `D` withoutaffecting the frequency resolution of the STFT.If unspecified, defaults to `win_length / 4` (see below).win_length : int <= n_fft [scalar]Each frame of audio is windowed by `window()` of length `win_length`and then padded with zeros to match `n_fft`.Smaller values improve the temporal resolution of the STFT (i.e. theability to discriminate impulses that are closely spaced in time)at the expense of frequency resolution (i.e. the ability to discriminatepure tones that are closely spaced in frequency). This effect is knownas the time-frequency localization tradeoff and needs to be adjustedaccording to the properties of the input signal `y`.If unspecified, defaults to ``win_length = n_fft``.window : string, tuple, number, function, or np.ndarray [shape=(n_fft,)]Either:- a window specification (string, tuple, or number);see `scipy.signal.get_window`- a window function, such as `scipy.signal.hanning`- a vector or array of length `n_fft`Defaults to a raised cosine window ("hann"), which is adequate formost applications in audio signal processing... see also:: `filters.get_window`center : booleanIf `True`, the signal `y` is padded so that frame`D[:, t]` is centered at `y[t * hop_length]`.If `False`, then `D[:, t]` begins at `y[t * hop_length]`.Defaults to `True`,  which simplifies the alignment of `D` onto atime grid by means of `librosa.core.frames_to_samples`.Note, however, that `center` must be set to `False` when analyzingsignals with `librosa.stream`... see also:: `stream`dtype : numeric typeComplex numeric type for `D`.  Default is single-precisionfloating-point complex (`np.complex64`).pad_mode : string or functionIf `center=True`, this argument is passed to `np.pad` for paddingthe edges of the signal `y`. By default (`pad_mode="reflect"`),`y` is padded on both sides with its own reflection, mirrored aroundits first and last sample respectively.If `center=False`,  this argument is ignored... see also:: `np.pad`


spec = librosa.stft(mono, n_fft=len_frame, hop_length=len_hop)
mag = np.abs(spec)
pha = np.angle(spec)


spec = librosa.stft(mono, n_fft=len_frame, hop_length=len_hop)
mag,pha = librosa.core.magphase(spec)


  1. librosa.stft() 短时傅里叶变换

