使用自回归多状态音符模型的复音钢琴转录

论文标题

使用自回归多状态音符模型的复音钢琴转录

Polyphonic Piano Transcription Using Autoregressive Multi-State Note Model

论文作者

Kwon, Taegyun, Jeong, Dasaem, Nam, Juhan

论文摘要

复音钢琴转录的最新进展主要是通过对神经网络体系结构进行故意设计而实现的，这些神经网络体系结构检测到不同的状态状态，例如发作或维持并建模国家的时间演变。但是，其中大多数对每个音符状态使用单独的神经网络，从而优化了多个损失功能，并且还通过在州的神经网络之间的抽象连接或使用后处理模块来处理音符状态的时间演变。在本文中，我们提出了一个统一的神经网络体系结构，其中将多个音符状态预测为具有单个损耗函数的软磁输出，并且时间顺序是通过单个神经网络中的自动回应连接来学习的。这种紧凑的模型允许在没有架构复杂性的情况下增加注意状态。使用Maestro数据集，我们检查了多个注释状态的各种组合，包括ON，发作，维持，重新发育，偏移和关闭。我们还表明，自回归模块有效地学习了说明的州间依赖性。最后，我们表明我们所提出的模型可以达到与参数较少的最先进的性能。

Recent advances in polyphonic piano transcription have been made primarily by a deliberate design of neural network architectures that detect different note states such as onset or sustain and model the temporal evolution of the states. The majority of them, however, use separate neural networks for each note state, thereby optimizing multiple loss functions, and also they handle the temporal evolution of note states by abstract connections between the state-wise neural networks or using a post-processing module. In this paper, we propose a unified neural network architecture where multiple note states are predicted as a softmax output with a single loss function and the temporal order is learned by an auto-regressive connection within the single neural network. This compact model allows to increase note states without architectural complexity. Using the MAESTRO dataset, we examine various combinations of multiple note states including on, onset, sustain, re-onset, offset, and off. We also show that the autoregressive module effectively learns inter-state dependency of notes. Finally, we show that our proposed model achieves performance comparable to state-of-the-arts with fewer parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题