通过合成器对动物发声进行建模

论文标题

通过合成器对动物发声进行建模

Modeling Animal Vocalizations through Synthesizers

论文作者

Hagiwara, Masato, Cusimano, Maddie, Liu, Jen-Yu

论文摘要

建模现实世界的声音是机器学习和许多其他领域（包括人类的语音处理和生物源）的基本问题。已知基于变压器的生成模型和一些先前的工作（例如DDSP）会产生逼真的声音，尽管它们的控制力有限且难以解释。作为替代方案，我们旨在使用模块化合成器，即成分，参数电子乐器，以建模非音乐声音。但是，给定目标声音（即参数推理任务）推断合成器参数并不是一般声音的微不足道，并且过去的研究通常集中在音乐声音上。在这项工作中，我们优化了从Torchsynth的可区分合成器，以模拟，仿真和创造性地产生动物发声。我们比较了从基于梯度的搜索到遗传算法的一系列优化方法，用于推断其参数，然后演示如何控制和解释用于建模非音乐声音的参数。

Modeling real-world sound is a fundamental problem in the creative use of machine learning and many other fields, including human speech processing and bioacoustics. Transformer-based generative models and some prior work (e.g., DDSP) are known to produce realistic sound, although they have limited control and are hard to interpret. As an alternative, we aim to use modular synthesizers, i.e., compositional, parametric electronic musical instruments, for modeling non-music sounds. However, inferring synthesizer parameters given a target sound, i.e., the parameter inference task, is not trivial for general sounds, and past research has typically focused on musical sound. In this work, we optimize a differentiable synthesizer from TorchSynth in order to model, emulate, and creatively generate animal vocalizations. We compare an array of optimization methods, from gradient-based search to genetic algorithms, for inferring its parameters, and then demonstrate how one can control and interpret the parameters for modeling non-music sounds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题