如何用dds实现线性调频

The sound of birdsong is varied, beautiful, and relaxing. In the pre-Covid times, I made a focus timer which would play some recorded bird sounds during breaks, and I always wondered whether such sounds could be generated. After some trial and error, I landed on a proof-of-concept architecture which can both successfully reproduce a single chirp and has parameters which can be adjusted to alter the generated sound.

鸟鸣声多变,优美而轻松。 在Covid之前的时期,我制作了一个对焦计时器 ,该计时器会在休息时播放一些录制的鸟类声音,而我一直想知道是否会产生这样的声音。 经过一番尝试和错误之后,我进入了概念验证架构,该架构既可以成功复制单个chi声,又可以调整参数以更改生成的声音。

Since generating bird sounds seems like a somewhat novel application, I think it is worth sharing this approach. Along the way, I also learned how to take TensorFlow models apart and graft parts of them together. The code blocks below show how this is done. The full code can be found here.

由于生成鸟的声音似乎是一种新颖的应用程序,因此我认为值得分享这种方法。 在此过程中,我还学习了如何将TensorFlow模型分开并将它们的一部分移植在一起。 下面的代码块显示了如何完成此操作。 完整的代码可以在这里找到。

理论上的方法 (The approach in theory)

The generator will be composed two parts. The first part will take the entire sounds and encode key pieces of information about its overall shape in a small number of parameters.

发电机将由两部分组成。 第一部分将提取全部声音,并以少量参数对有关其总体形状的关键信息进行编码。

The second part will take a small bit of sound, along with the information about the overall shape, and predict the next little bit of sound.

第二部分将吸收少量声音以及有关整体形状的信息,并预测下一个声音。

The second part can be called iteratively on itself with adjusted parameters to produce an entirely new chirp!

第二部分可以通过调整后的参数自行调用,以产生全新的an!

编码参数 (Encoding the parameters)

An autoencoder structure is used for deriving the key parameters of the sound. This structure takes the entire soundwave and reduces it, through a series of (encoding) layers, down to a small number of components (the waist), before reproducing the sound in full from a series of expanding (decoding) layers. Once trained, the autoencoder model is cut off at the waist so that all it does is reduce the full sound down to the key parameters.

自动编码器结构用于导出声音的关键参数。 这种结构吸收了整个声波,并通过一系列(编码)层将其减小到少数组件(腰部),然后再从一系列扩展(解码)层中完全再现声音。 接受训练后,自动编码器模型会在腰部被切断,从而将整个声音降低到关键参数。

For the proof of concept, a single chirp was used; this chirp:

为了证明概念,使用了一个线性调频脉冲。 此chi:

Soundwave representation of employed chirp.
所用chi的声波表示。

It comes from the Cornell Guide to Bird Sounds: Essential Set for North America. The same set used for the Birds Sounds Chrome Experiment.

它来自《 康奈尔鸟的声音指南:北美必读》 。 与Birds Sounds Chrome实验所用的相同。

One problem with using just a single sound is that the autoencoder might simply hide all the information about the sound in the biases of the decoding layers, leaving the waist with all zero weights. To mitigate this, the sounds was morphed during training by altering its amplitude and shifting it around a little.

仅使用单一声音的一个问题是,自动编码器可能会将所有关于声音的信息隐藏在解码层的偏置中,而使腰部的权重全部为零。 为了减轻这种情况,在训练过程中,声音会通过改变其振幅并略微移动一些而变形。

The encoder portion of the autoencoder consists of a series of convolutional layers which compress a 3000-ish long sounds wave down to around 20 numbers, hopefully retaining important information along the way. Since sounds are composed of many different sine waves, allowing many convolutional filters of different sizes to pass over the sound can in theory capture key information about the composite waves. A waist size of 20 was chosen mainly because this seems like a somewhat surmountable number of adjustable parameters.

自动编码器的编码器部分由一系列卷积层组成,这些卷积层将3000道长的声音压缩成大约20个数字,希望在此过程中保留重要信息。 由于声音由许多不同的正弦波组成,因此理论上允许许多大小不同的卷积滤波器通过声音可以捕获有关复合波的关键信息。 选择腰围尺寸为20的主要原因是,这似乎是一些无法解决的可调参数。

In this first approach, the layers are stacked sequentially. In a future version, it may be advantageous to use a structure akin to inception-net blocks to run convolutions of different sizes in parallel.

在该第一种方法中,各层顺序堆叠。 在将来的版本中,使用类似于初始网块的结构并行运行不同大小的卷积可能会比较有利。

The decoder portion of the model consists of two dense layers, one of length 400, and one of length 3000 — the same length as the input sound. The activation function of the final layer is tanh, as the sound wave representations have values between -1 and 1.

模型的解码器部分由两个密集层组成,其中一个长度为400,另一个长度为3000,与输入声音的长度相同。 最后一层的激活函数为tanh,因为声波表示的值介于-1和1之间。

Here is what this looks like visualized:

这看起来像是可视化的:

PlotNeuralNet.PlotNeuralNet制作。

And here is the code:

这是代码:

训练发电机 (Training the Generator)

The structure of the generator begins with the encoding portion of the autoencoder network. The output at the waist is combined with some fresh input representing the bit of the sound wave immediately preceding that which is to be predicted. In this case, the previous 200 values of the sound wave are used as input, and the next 10 are predicted.

生成器的结构从自动编码器网络的编码部分开始。 腰部的输出与一些新鲜的输入相结合,这些输入代表了声波的比特,紧接在要预测的比特之前。 在这种情况下,将声波的前200个值用作输入,并预测下一个10个值。

The combined inputs are fed into a series of dense layers. The sequential dense layers allow the network to learn the relationship between the previous values, information on the overall shape of the sound, and the following values. The final dense layer is of length 10 and activated with a tanh function.

组合的输入被馈送到一系列密集的层中。 顺序的密集层允许网络学习先前值,有关声音总体形状的信息和后续值之间的关系。 最终的致密层的长度为10,并激活了tanh功能。

Here is what this network looks like:

该网络如下所示:

PlotNeuralNet.PlotNeuralNet制作。

The layers coming from the autoencoder network are frozen so that additional training resources are not spent on them.

来自自动编码器网络的层被冻结,因此不会在它们上花费额外的培训资源。

产生一些声音 (Generating some sounds)

Training this network takes only a couple of minutes as the data is not very varied and therefore relatively easy to learn, particularly for the autoencoder network. One final flourish is to produce two new networks from the trained models.

训练该网络仅需花费几分钟,因为数据变化不大,因此相对容易学习,尤其是对于自动编码器网络而言。 最后的成功是从训练有素的模型中产生两个新的网络。

The first is simply the encoder portion of the autoencoder, but now separated. We need this part to produce some initial good parameters.

第一个只是自动编码器的编码器部分,但现在是分开的。 我们需要这部分来产生一些初始的良好参数。

The second model is same as the generator network, but with the parts from the autoencoder network replaced with a new input source. This is done so that the trained generator no longer requires the entire soundwave as input, but only the encoded parameters capturing the key information about the sound. With these separated out as a new input, we can freely manipulate them when generating chirps.

第二种模型与生成器网络相同,但是自动编码器网络中的部件已替换为新的输入源。 这样做是为了使训练有素的生成器不再需要整个声波作为输入,而只需编码参数即可捕获有关声音的关键信息。 通过将这些作为新输入分离出来,我们可以在生成chi时自由操作它们。

The following sounds were generated without modifying the parameters, they are very close to the original sound, but are not perfect reproductions. The generator network is only able to reach an accuracy of between 60% and 70%, so some variability is to be expected.

以下声音是在不修改参数的情况下生成的,它们与原始声音非常接近,但不是完美的复制品。 发电机网络只能达到60%到70%的精度,因此可能会有一些变化。

Sounds generated without modifying the encoded parameters.
无需修改编码参数即可生成声音。

修改参数 (Modifying the parameters)

The advantage of generating bird sounds is in part that new variations on a theme can be produced. This can be done by modifying the parameters produced by the encoder network. In the above case, the encoder produced these parameters:

产生鸟声的优点部分是可以在主题上产生新的变化。 这可以通过修改编码器网络产生的参数来完成。 在上述情况下,编码器产生了以下参数:

Not all of the 20 nodes produced non-zero parameters, but there are enough of them to experiment with. There is a lot of complexity to be explored with 12 adjustable parameters that all can be adjusted to arbitrary degrees in both directions. Since this is a proof of concept, it will suffice to present some choice sounds generated by adjusting just a single parameter in each case:

并非所有20个节点都产生非零参数,但是有足够的参数可以进行试验。 通过12个可调参数可以探索很多复杂性,所有这些参数都可以在两个方向上任意调整为任意角度。 由于这是一个概念证明,因此在每种情况下仅需调整一个参数就可以呈现一些选择声音:

Sounds generated after modifying one of the encoded parameters in each case.
在每种情况下修改编码参数之一后产生的声音。

Here are the soundwave representations of the three examples:

这是三个示例的声波表示:

Soundwave representation of generated chirps.
生成的chi的声波表示。

下一步 (Next Steps)

It seems that generating bird sounds using a neural networks is possible, although it remains to be seen how practicable it is. The above approach uses just a single sound, so a nearby next step would be to attempt to train the model on multiple different sounds. It is not clear from the outset that this would work. However, if the model as constructed fails on multiple sounds, it would still be possible to train different models on different sounds and simply stack them to produce different sounds.

似乎可以使用神经网络生成鸟的声音,尽管还有待观察它是多么实用。 上述方法仅使用单个声音,因此附近的下一步将是尝试在多种不同的声音上训练模型。 从一开始还不清楚这是否行得通。 但是,如果所构建的模型在多个声音上失败,则仍然有可能在不同的声音上训练不同的模型,然后简单地将它们堆叠以产生不同的声音。

A larger problem is that not all produced sounds are viable, particularly when modifying the parameters. A fair share of produced sounds are more akin to computer beeps than bird song. Some sound like an angry computer that really doesn’t want you to do what you just tried to do. One way to mitigate this would be to train a separate model to detect bird sounds (perhaps along these lines), and use that to reject or accept generated output.

更大的问题是并非所有产生的声音都是可行的,尤其是在修改参数时。 相当一部分产生的声音更像是计算机发出的哔哔声,而不是鸟鸣。 有些听起来像是一台生气的计算机,但实际上并不想让您去做刚刚尝试做的事。 减轻这种情况的一种方法是训练一个单独的模型来检测鸟的声音(也许沿着这些线 ),并使用它来拒绝或接受生成的输出。

Computational costs are also a constraint with the current approach; generating a chirp takes an order of magnitude longer than playing the sound, which is not ideal if the idea is to generate beautiful soundscapes on the fly. The main mitigation which comes to mind here is to increase the length of each prediction, possibly at the cost of accuracy. One could also, of course, simply spend the time to pre-generate acceptable soundscapes.

计算成本也是当前方法的制约因素。 产生a声比播放声音要耗费一个数量级,如果要在飞行中产生优美的音景,这是不理想的。 这里想到的主要缓解措施是增加每个预测的长度,可能会以准确性为代价。 当然,人们也可以简单地花费时间来预先生成可接受的音景。

结论 (Conclusion)

A combination of an autoencoder network, and a short-term prediction network can be grafted together to produce a bird sound generator with some adjustable parameters which can be manipulated to create new and interesting bird sounds.

自动编码器网络和短期预测网络的组合可以嫁接到一起,以产生具有一些可调整参数的鸟声发生器,可以对这些参数进行操作以创建新的有趣的鸟声。

As with many projects, part of the motivation is to learn in the process. In particular, I did not know how to pull apart trained models and graft parts of them together. The models used above can be used as an example to guide other learners who want to experiment with such approaches.

与许多项目一样,部分动机是在过程中学习。 特别是,我不知道如何将训练有素的模型分开并将它们的一部分移植在一起。 上面使用的模型可以用作示例,指导其他想尝试这种方法的学习者。

翻译自: https://towardsdatascience.com/generating-chirps-with-neural-networks-41628e72efb2

如何用dds实现线性调频


http://www.taodudu.cc/news/show-863441.html

相关文章:

  • azure_Azure ML算法备忘单
  • 矩阵乘法如何去逆矩阵_矩阵乘法和求逆
  • 机器学习数据倾斜的解决方法_机器学习并不总是解决数据问题的方法
  • gan简介_GAN简介
  • 使用TensorFlow训练神经网络进行价格预测
  • 您应该如何改变数据科学教育
  • r语言解释回归模型的假设_模型假设-解释
  • 参考文献_参考文献:
  • 深度学习用于视频检测_视频如何用于检测您的个性?
  • 角距离恒星_恒星问卷调查的10倍机器学习生产率
  • apache beam_Apache Beam ML模型部署
  • 转正老板让你谈谈你的看法_让我们谈谈逻辑回归
  • openai-gpt_GPT-3报告存在的问题
  • 机器学习 凝聚态物理_机器学习遇到了凝聚的问题
  • 量子计算 qiskit_将Tensorflow和Qiskit集成到量子机器学习中
  • throw 烦人_烦人的简单句子聚类
  • 使用NumPy优于Python列表的优势
  • 迷你5和迷你4区别_可变大小的视频迷你批处理
  • power bi可视化表_如何使用Power BI可视化数据?
  • 变形金刚2_变形金刚(
  • 机器学习 测试_测试优先机器学习
  • azure机器学习_Microsoft Azure机器学习x Udacity —第4课笔记
  • 机器学习嵌入式实现_机器学习中的嵌入
  • 无监督学习 k-means_无监督学习-第3部分
  • linkedin爬虫_机器学习的学生和从业者的常见问题在LinkedIn上提问
  • lime 深度学习_用LIME解释机器学习预测并建立信任
  • 神经网络 梯度下降_梯度下降优化器对神经网络训练的影响
  • 深度学习实践:计算机视觉_深度学习与传统计算机视觉技术:您应该选择哪个?
  • 卷积神经网络如何解释和预测图像
  • 深度学习 正则化 正则化率_何时以及如何在深度学习中使用正则化

如何用dds实现线性调频_用神经网络生成线性调频相关推荐

  1. 卷积网络和卷积神经网络_卷积神经网络的眼病识别

    卷积网络和卷积神经网络 关于这个项目 (About this project) This project is part of the Algorithms for Massive Data cour ...

  2. 抑郁症损伤神经细胞吗_使用神经网络探索COVID-19与抑郁症之间的联系

    抑郁症损伤神经细胞吗 The drastic changes in our lifestyles coupled with restrictions, quarantines, and social ...

  3. keras_猫狗分类案例(三)_卷机神经网络的可视化(可视化卷积神经网络的中间输出)

    keras_猫狗分类案例(三)_卷积神经网络的可视化 参考: https://blog.csdn.net/Einstellung/article/details/82832872 卷积神经网络的可视化 ...

  4. shields 徽标_神经网络生成超级英雄徽标

    shields 徽标 Machine Learning is a superpower in the computer science world. We can use it to predict ...

  5. 图像对抗生成网络 GAN学习01:从头搭建最简单的GAN网络,利用神经网络生成手写体数字数据(tensorflow)

    图像对抗生成网络 GAN学习01:从头搭建最简单的GAN网络,利用神经网络生成手写体数字数据(tensorflow) 文章目录 图像对抗生成网络 GAN学习01:从头搭建最简单的GAN网络,利用神经网 ...

  6. 计算机图形学曲线生成原理,计算机图形学_曲线及生成.ppt

    计算机图形学_曲线及生成 华中理工大学计算机学院 陆枫 99-7 1999年7月 7.2.1 曲线的表示要求 1)唯一性 2)几何不变性 3)易于定界 4)统一性 5)易于实现光滑连接 6)几何直观 ...

  7. Nat. Mach. Intell. | 利用条件循环神经网络生成特定性质分子

    作者 | 陆丰庆 今天给大家介绍瑞士知名药企阿斯利康和伯尔尼大学的 Esben Jannik Bjerrum团队在Nature Machine Intelligence上的一篇论文.该研究提出基于分子 ...

  8. sql行数少于10_如何用少于100行的代码创建生成艺术

    sql行数少于10 by Eric Davidson 埃里克·戴维森(Eric Davidson) 如何用少于100行的代码创建生成艺术 (How to Create Generative Art I ...

  9. python修改文件格式为unix_软件测试技术之如何用python在Windows系统下,生成UNIX格式文件...

    本文将带你了解软件测试技术之如何用python在Windows系统下,生成UNIX格式文件,希望对大家学测试技术有所帮助 如何用python在Windows系统下,生成UNIX格式文件 平时测试工作中 ...

最新文章

  1. VINS(八)初始化
  2. 进程间通信(5) 命名管道
  3. 5.3 计算机网络传输层之TCP协议(tcp协议特点、tcp报文段首部格式、tcp连接---三次握手、tcp连接释放---四次握手)
  4. sgi stl 之list
  5. 设置路由器端口转发功能如何操作
  6. python创建和控制的实体称为_Python语法基础
  7. OpenCV鼠标事件和滑动条事件
  8. Java中Thread类的方法简介
  9. java线程三种创建方式与线程池的应用
  10. 在react里写原生js_从零开始使用react+antd搭建项目
  11. Ubuntu9.04更新源
  12. CSS常用内容总结(二)
  13. 使用python读取mid/mif文件
  14. react antd select默认选中第一项
  15. C语言的应用之单片机学习
  16. Fildder主菜单----Edit介绍
  17. 致远OA—V5版本系统预置用户密码恢复方法
  18. java8 collectors_java8的Collectors.reducing()
  19. 二、C++反作弊对抗实战 (进阶篇 —— 2.作弊器中常见断链隐藏DLL方法)
  20. 进阶篇:2)DFMA方法的运用

热门文章

  1. SSM高级整合_非Maven控制版本下SSM高级整合
  2. Windows 10封装中出现“无法验证你的Windows安装”错误解决方法
  3. VC问题 IntelliSense:“没有可用的附加信息”,[请參见“C++项目 IntelliSense 疑难解答”,获得进一步的帮助]...
  4. ArchSummit深圳APM专场总结:性能监控与调优实践干货分享
  5. 黄页前台联动菜单修改时不能显示,要重新选择|没样式
  6. vc++ List Control控件获得所有选中行的序号
  7. php单文件压缩的功能函数的实现
  8. 关于域名解析到服务器的问题
  9. 怎样理解 MVVM ( Model-View-ViewModel ) ?
  10. sql 嵌套select与关联select