论文标题
音频在视频视频摘要中的作用
Role of Audio in Audio-Visual Video Summarization
论文作者
论文摘要
视频摘要引起了人们对有效的视频表示,检索和浏览的注意,以减轻数量和交通电涌问题。尽管视频摘要主要使用视觉通道进行压实,但是最近文献中出现了视听建模的好处。来自音频频道的信息可能是视频内容中视听相关性的结果。在这项研究中,我们提出了一个新的视频视频摘要框架,将音频信息融合的四种方式与基于GRU的基于GRU和基于注意力的网络相结合。此外,我们使用视听规范相关分析(CCA)研究了一种新的可解释性方法,以更好地理解和解释音频在视频摘要任务中的作用。 TVSUM数据集的实验评估可获得F1得分和视频视频摘要的Kendall-Tau得分提高。此外,基于视听CCA的电视和认知数据集上的视频内容为正相关的视频,可以在正式相关的视频和音频视频摘要方面取得强烈的性能提高。
Video summarization attracts attention for efficient video representation, retrieval, and browsing to ease volume and traffic surge problems. Although video summarization mostly uses the visual channel for compaction, the benefits of audio-visual modeling appeared in recent literature. The information coming from the audio channel can be a result of audio-visual correlation in the video content. In this study, we propose a new audio-visual video summarization framework integrating four ways of audio-visual information fusion with GRU-based and attention-based networks. Furthermore, we investigate a new explainability methodology using audio-visual canonical correlation analysis (CCA) to better understand and explain the role of audio in the video summarization task. Experimental evaluations on the TVSum dataset attain F1 score and Kendall-tau score improvements for the audio-visual video summarization. Furthermore, splitting video content on TVSum and COGNIMUSE datasets based on audio-visual CCA as positively and negatively correlated videos yields a strong performance improvement over the positively correlated videos for audio-only and audio-visual video summarization.