域知识启发的音乐嵌入空间和符号音乐建模的新颖注意机制

论文标题

域知识启发的音乐嵌入空间和符号音乐建模的新颖注意机制

A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling

论文作者

Guo, Z., Kang, J., Herremans, D.

论文摘要

在自然语言领域的变压器体系结构成功之后，最近已广泛应用于象征音乐的领域。但是，符号音乐和文字是两种不同的方式。符号音乐包含多个属性，即绝对属性（例如，音调）和相对属性（例如，音高间隔）。这些相对属性塑造了人类对音乐主题的看法。然而，这些重要的相对属性在现有的符号音乐建模方法中大多被忽略，主要原因是缺乏有意义的嵌入空间，在这些空间中，符号音乐令牌的绝对和相对嵌入都可以有效地表示。在本文中，我们提出了基于偏见调整的正弦编码的符号音乐的基本音乐嵌入（FME），其中绝对和相对属性都可以嵌入，并且可以嵌入基本的音乐属性（例如，翻译不变性）。利用拟议的FME，我们进一步提出了一种基于相对索引，音高和发作嵌入（RIPO注意）的新型注意机制，以便可以将音乐领域知识充分利用用于符号音乐建模。实验结果表明，我们提出的模型：利用FME和RIPO注意力的Ripo Transformer在旋律完成任务中优于最先进的变压器（即音乐变压器，线性变压器）。此外，在下游音乐生成任务中使用RIPO变压器，我们注意到臭名昭著的退化现象不再存在，Ripo Transformer产生的音乐优于主观和客观评估中最先进的变压器模型产生的音乐。

Following the success of the transformer architecture in the natural language domain, transformer-like architectures have been widely applied to the domain of symbolic music recently. Symbolic music and text, however, are two different modalities. Symbolic music contains multiple attributes, both absolute attributes (e.g., pitch) and relative attributes (e.g., pitch interval). These relative attributes shape human perception of musical motifs. These important relative attributes, however, are mostly ignored in existing symbolic music modeling methods with the main reason being the lack of a musically-meaningful embedding space where both the absolute and relative embeddings of the symbolic music tokens can be efficiently represented. In this paper, we propose the Fundamental Music Embedding (FME) for symbolic music based on a bias-adjusted sinusoidal encoding within which both the absolute and the relative attributes can be embedded and the fundamental musical properties (e.g., translational invariance) are explicitly preserved. Taking advantage of the proposed FME, we further propose a novel attention mechanism based on the relative index, pitch and onset embeddings (RIPO attention) such that the musical domain knowledge can be fully utilized for symbolic music modeling. Experiment results show that our proposed model: RIPO transformer which utilizes FME and RIPO attention outperforms the state-of-the-art transformers (i.e., music transformer, linear transformer) in a melody completion task. Moreover, using the RIPO transformer in a downstream music generation task, we notice that the notorious degeneration phenomenon no longer exists and the music generated by the RIPO transformer outperforms the music generated by state-of-the-art transformer models in both subjective and objective evaluations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题