论文标题
podsumm-播客音频摘要
PodSumm -- Podcast Audio Summarization
论文作者
论文摘要
播客的多样性,规模和特殊性对内容发现系统提出了独特的挑战。听众通常依靠播客创建者提供的情节的文本描述来发现新内容。叙述者的演示方式和生产质量之类的某些因素是主观用户偏好的重要指标,但很难量化,并且不反映在播客创建者提供的文本描述中。我们提出了自动创建播客音频摘要,以帮助内容发现,并帮助听众快速预览播客内容,然后再花费时间来聆听整个情节。在本文中,我们提出了一种通过文本域的指导自动构建播客摘要的方法。我们的方法执行了两个关键步骤,即文本转录和文本摘要生成的音频。由于缺乏此任务的数据集的激励,我们策划了一个内部数据集,找到一个有效的数据扩展方案,并设计了一项协议,以收集注释者的摘要。我们使用增强数据集微调了Presumm [10]模型,并进行消融研究。我们的方法在我们的数据集上达到了0.63/0.53/0.63的Rouge-F(1/2/L)分数。我们希望这些结果可以激发未来的研究。
The diverse nature, scale, and specificity of podcasts present a unique challenge to content discovery systems. Listeners often rely on text descriptions of episodes provided by the podcast creators to discover new content. Some factors like the presentation style of the narrator and production quality are significant indicators of subjective user preference but are difficult to quantify and not reflected in the text descriptions provided by the podcast creators. We propose the automated creation of podcast audio summaries to aid in content discovery and help listeners to quickly preview podcast content before investing time in listening to an entire episode. In this paper, we present a method to automatically construct a podcast summary via guidance from the text-domain. Our method performs two key steps, namely, audio to text transcription and text summary generation. Motivated by a lack of datasets for this task, we curate an internal dataset, find an effective scheme for data augmentation, and design a protocol to gather summaries from annotators. We fine-tune a PreSumm[10] model with our augmented dataset and perform an ablation study. Our method achieves ROUGE-F(1/2/L) scores of 0.63/0.53/0.63 on our dataset. We hope these results may inspire future research in this direction.