论文标题
使用深神经网络检测伊朗模态音乐(Dastgah)
Iranian Modal Music (Dastgah) detection using deep neural networks
论文作者
论文摘要
音乐分类和流派检测是音乐信息检索(MIR)中有关其现代世界公用事业的许多文章的主题。但是,这种贡献在非西方音乐(例如伊朗音乐音乐)中不足。在这项工作中,我们实施了几个深度神经网络,以识别七个高度相关类别中的伊朗模态音乐。最好的模型Bilgnet达到了92%的总体精度,它使用了受自动编码器启发的架构,包括双向LSTM和GRU层。我们使用NAVA数据集培训了这些模型,其中包括1786张记录,以及最多55个小时的音乐,由Kamanche,Tar,Setar,Reed和Santoor(Dulcimer)独奏。我们将MFCC,Chroma Cens和MEL Spectrogram等多个特征视为输入。结果表明,与其他声音表示相比,MFCC携带的信息可用于检测伊朗模态音乐(Dastgah)。此外,受自动编码器启发的体系结构在区分高度相关的数据(如Dastgahs)方面具有鲁棒性。它还表明,由于伊朗达斯塔音乐的精确顺序,双向反复网络比本研究中实施的任何其他网络都更有效。
Music classification and genre detection are topics in music information retrieval (MIR) that many articles have been published regarding their utilities in the modern world. However, this contribution is insufficient in non-western music, such as Iranian modal music. In this work, we have implemented several deep neural networks to recognize Iranian modal music in seven highly correlated categories. The best model, BiLGNet, which achieved 92 percent overall accuracy, uses an architecture inspired by autoencoders, including bidirectional LSTM and GRU layers. We trained the models using the Nava dataset, which includes 1786 records and up to 55 hours of music played solo by Kamanche, Tar, Setar, Reed, and Santoor (Dulcimer). We considered Multiple features such as MFCC, Chroma CENS, and Mel spectrogram as input. The results indicate that MFCC carries more valuable information for detecting Iranian modal music (Dastgah) than other sound representations. Moreover, the architecture inspired by autoencoders is robust in distinguishing highly correlated data like Dastgahs. It also shows that because of the precise order in Iranian Dastgah Music, Bidirectional Recurrent networks are more efficient than any other networks that have been implemented in this study.