MTCRNN：用于定向音频质地合成的多尺度RNN

论文标题

MTCRNN：用于定向音频质地合成的多尺度RNN

MTCRNN: A multi-scale RNN for directed audio texture synthesis

论文作者

Huzaifah, M., Wyse, L.

论文摘要

音频纹理是环境声音的一个子集，通常定义为在足够大的时间窗口内具有稳定的统计特征，但在本地可能是非结构化的。它们包括常见的日常声音，例如雨，风和发动机。鉴于这些复杂的声音包含多个时间尺度上的模式，因此它们是使用传统方法建模的挑战。我们介绍了一种新颖的纹理建模方法，将以不同水平的抽象训练的经常性神经网络与允许用户指导的合成的调理策略相结合。我们在各种数据集上演示了该模型的性能，检查其在各种指标上的性能，并讨论一些潜在的应用。

Audio textures are a subset of environmental sounds, often defined as having stable statistical characteristics within an adequately large window of time but may be unstructured locally. They include common everyday sounds such as from rain, wind, and engines. Given that these complex sounds contain patterns on multiple timescales, they are a challenge to model with traditional methods. We introduce a novel modelling approach for textures, combining recurrent neural networks trained at different levels of abstraction with a conditioning strategy that allows for user-directed synthesis. We demonstrate the model's performance on a variety of datasets, examine its performance on various metrics, and discuss some potential applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题