对比torch.stft与librosa.stft在获取语音的幅度和相位的不同表达

  • torch.stft
  • librosa.stft

torch.stft

stft(self, n_fft, hop_length=None, win_length=None,window=None,center=True, pad_mode='reflect', normalized=False, onesided=True)

Parameters:
----------

input (Tensor) – the input tensorn_fft (int) – size of Fourier transformhop_length (int, optional) – the distance between neighboring sliding window frames. Default: None (treated as equal to floor(n_fft / 4))win_length (int, optional) – the size of window frame and STFT filter. Default: None (treated as equal to n_fft)window (Tensor, optional) – the optional window function. Default: None (treated as window of all 111 s)center (bool, optional) – whether to pad input on both sides so that the ttt -th frame is centered at time t×hop_lengtht \times \text{hop\_length}t×hop_length . Default: Truepad_mode (string, optional) – controls the padding method used when center is True. Default: "reflect"normalized (bool, optional) – controls whether to return the normalized STFT results Default: Falseonesided (bool, optional) – controls whether to return half of results to avoid redundancy Default: TrueReturns the real and the imaginary parts together as one tensor of size :math:`(* \times N \times T \times 2)`, where :math:`*` is the optional batch size of :attr:`input`, :math:`N` is the number of frequencies where STFT is applied, :math:`T` is the total number of frames used, and each pair in the last dimension represents a complex number as the real part and the imaginary part.----------

其输入为一维或者二维的时间序列
返回值为一个tensor,其中第一个维度为输入数据的batch size,第二个维度为STFT应用的频数,第三个维度为帧总数,最后一个维度包含了返回的复数值中的实部和虚部部分。
幅度和相位的获取如下:

spec = torch.stft(mono,n_fft=len_frame,hop_length=len_hop)
rea = spec[:, :, 0]#实部
imag = spec[:, :, 1]#虚部
mag = torch.abs(torch.sqrt(torch.pow(rea, 2) + torch.pow(imag, 2)))
pha = torch.atan2(imag.data, rea.data)

librosa.stft

stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, dtype=np.complex64, pad_mode='reflect')

Parameters
----------

y : np.ndarray [shape=(n,)], real-valuedinput signaln_fft : int > 0 [scalar]length of the windowed signal after padding with zeros.The number of rows in the STFT matrix `D` is (1 + n_fft/2).The default value, n_fft=2048 samples, corresponds to a physicalduration of 93 milliseconds at a sample rate of 22050 Hz, i.e. thedefault sample rate in librosa. This value is well adapted for musicsignals. However, in speech processing, the recommended value is 512,corresponding to 23 milliseconds at a sample rate of 22050 Hz.In any case, we recommend setting `n_fft` to a power of two foroptimizing the speed of the fast Fourier transform (FFT) algorithm.hop_length : int > 0 [scalar]number of audio samples between adjacent STFT columns.Smaller values increase the number of columns in `D` withoutaffecting the frequency resolution of the STFT.If unspecified, defaults to `win_length / 4` (see below).win_length : int <= n_fft [scalar]Each frame of audio is windowed by `window()` of length `win_length`and then padded with zeros to match `n_fft`.Smaller values improve the temporal resolution of the STFT (i.e. theability to discriminate impulses that are closely spaced in time)at the expense of frequency resolution (i.e. the ability to discriminatepure tones that are closely spaced in frequency). This effect is knownas the time-frequency localization tradeoff and needs to be adjustedaccording to the properties of the input signal `y`.If unspecified, defaults to ``win_length = n_fft``.window : string, tuple, number, function, or np.ndarray [shape=(n_fft,)]Either:- a window specification (string, tuple, or number);see `scipy.signal.get_window`- a window function, such as `scipy.signal.hanning`- a vector or array of length `n_fft`Defaults to a raised cosine window ("hann"), which is adequate formost applications in audio signal processing... see also:: `filters.get_window`center : booleanIf `True`, the signal `y` is padded so that frame`D[:, t]` is centered at `y[t * hop_length]`.If `False`, then `D[:, t]` begins at `y[t * hop_length]`.Defaults to `True`,  which simplifies the alignment of `D` onto atime grid by means of `librosa.core.frames_to_samples`.Note, however, that `center` must be set to `False` when analyzingsignals with `librosa.stream`... see also:: `stream`dtype : numeric typeComplex numeric type for `D`.  Default is single-precisionfloating-point complex (`np.complex64`).pad_mode : string or functionIf `center=True`, this argument is passed to `np.pad` for paddingthe edges of the signal `y`. By default (`pad_mode="reflect"`),`y` is padded on both sides with its own reflection, mirrored aroundits first and last sample respectively.If `center=False`,  this argument is ignored... see also:: `np.pad`

通过在短重叠窗口上计算离散傅里叶变换(DFT)来表示时频域信号。返回值为一个复数值矩阵D,其中np.abs(D)表示幅度,np.angle(D)表示相位。
幅度和相位的获取如下:

spec = librosa.stft(mono, n_fft=len_frame, hop_length=len_hop)
mag = np.abs(spec)
pha = np.angle(spec)

或者直接利用librosa.core中封装好的函数

spec = librosa.stft(mono, n_fft=len_frame, hop_length=len_hop)
mag,pha = librosa.core.magphase(spec)

torch.stft()与librosa.stft()的对比相关推荐

  1. librosa.stft() 短时傅里叶变换

    librosa 短时傅里叶变换 import numpy as np # pip install numpy import librosa # pip install librosa y, sr = ...

  2. 正态分布初始化 torch.nn.Embedding.weight()与torch.nn.init.normal()的验证对比

    torch.nn.Embedding.weight(num_embeddings, embedding_dim) 随机初始化,生成标准正态分布N(0,1)N(0,1)N(0,1)的张量Tensor t ...

  3. torch.rand (randn, random以及normal)对比

    (1)torch.rand(sizes, out=None) 产生一个服从均匀分布的张量,张量内的数据包含从区间[0,1)的随机数.参数size是一个整数序列,用于定义张量大小. a = torch. ...

  4. librosa | 系统实战(五~十七)

    文章目录 [ (一 ~ 四)librosa学习点此处](https://blog.csdn.net/qq_44250700/article/details/119685358) 五.频谱特性 Spec ...

  5. 你真的懂语音特征吗?

    摘要:本文指在详细介绍语音转化声学特征的过程,并详细介绍不同声学特征在不同模型中的应用. 本文分享自华为云社区<你真的懂语音特征背后的原理吗?>,作者: 白马过平川 . 语音数据常被用于人 ...

  6. (超详细)语音信号处理之特征提取

    语音信号处理之特征提取 语音信号处理之特征提取要对语音信号进行分析,首先要分析并提取出可表示该语音本质的特征参数.有了特征参数才能利用这些特征参数进行有效的处理. 根据提取参数的方法不同,可将语音信号 ...

  7. python numpy 分离与合并复数矩阵实部虚部的方法

    在进行数字信号处理的过程中,我们往往有对短时傅里叶变换频谱(spectrogram)进行分析的需求.常见的分析手段对应欧拉公式分为两种,要么使用模与相位的形式,要么使用实部虚部.本文分享一个简单的将复 ...

  8. mfcc中的fft操作_简化音频数据:FFT,STFT和MFCC

    mfcc中的fft操作 What we should know about sound. Sound is produced when there's an object that vibrates ...

  9. matlab 时频分析(短时傅里叶变换、STFT)

    短时傅里叶变换,short-time fourier transformation,有时也叫加窗傅里叶变换,时间窗口使得信号只在某一小区间内有效,这就避免了传统的傅里叶变换在时频局部表达能力上的不足, ...

  10. STFT(短时傅里叶变换)音频特征提取,用于语音识别 python

    在各种音频相关的任务中,不管用什么模型或网络,得到所需的音频特征肯定是必要的一步.下面简单说一下STFT特征 一.原始信号 在说STFT之前,先说一下读入的原始信号,图像是在XY二维上描述的像素点的集 ...

最新文章

  1. ATOM中MARKDOWN的使用小结
  2. matlab-创建函数
  3. Oracle WebLogic 最新补丁的绕过漏洞分析 cve-2020-2883
  4. 关于MySQL字符集和校对集问题
  5. 大话数据结构12 串String
  6. 宝塔面板怎么下载php,宝塔面板下怎么安装Mosquitto-php扩展
  7. 【虚拟化实战】Cluster设计之一资源池
  8. MapGuide 浏览器可接受参数分析
  9. java扑克牌排序_扑克牌排列 运用List ArrayList Arrays
  10. freemarker 导出html格式word_如何导出 Kindle 上的读书笔记
  11. 移动视频监控摄像机分类与优势浅析
  12. C51简介及Keil的使用
  13. 一二线城市 Java 程序员一般考虑入职的互联网公司清单?
  14. PO: Purchase Order采购订单关键知识点
  15. web前端优化10点总结
  16. 3ds Max 2014安装SupperMap 插件
  17. setBounds()函数设置Java布局
  18. 《FLUENT 14.0超级学习手册》——1.4 常用的商业CFD软件
  19. R7F0C908B2DFP-C#AA0微控制器MCU 16位RL78 RENESAS
  20. 物联网成为全球各国发展战略

热门文章

  1. 【HDU5857】Median
  2. 最全的【英语词根词缀思维导图总结】
  3. 如何利用Excel将文字颠倒顺序显示
  4. Python中的角度转换功能
  5. 如何0代码快速搭建教育平台,实现线上直播教学【内附源码/Demo】
  6. python正则表达式提取网页的图片链接
  7. 电脑32位和64位有什么区别
  8. 表头冻结列冻结_如何在Excel中冻结和取消冻结行和列
  9. 斐讯路由做php,斐讯K2路由双WiFi网速叠加教程
  10. 树莓派笔记8:UDP传输视频帧