论文标题
通过视觉调理混合音乐
Remixing Music with Visual Conditioning
论文作者
论文摘要
我们通过结合深层视觉和音频模型来提出一个视觉上的音乐混合系统。该方法基于最先进的视听源分离模型,该模型通过视频信息执行音乐仪器源分离。我们修改了该模型以与用户选择的图像一起使用,而不是视频作为视觉输入,以实现仅音频内容的分离。此外,我们提出了一个混合引擎,将源分隔为音乐混合的任务概括了。与通过最先进的视听源分离模型通过单独的和ADD方法进行的混音相比,所提出的方法能够提高音频质量。
We propose a visually conditioned music remixing system by incorporating deep visual and audio models. The method is based on a state of the art audio-visual source separation model which performs music instrument source separation with video information. We modified the model to work with user-selected images instead of videos as visual input during inference to enable separation of audio-only content. Furthermore, we propose a remixing engine that generalizes the task of source separation into music remixing. The proposed method is able to achieve improved audio quality compared to remixing performed by the separate-and-add method with a state-of-the-art audio-visual source separation model.