论文标题

Semeval 2022任务12:符号链接 - 将数学符号链接到其描述

SemEval 2022 Task 12: Symlink- Linking Mathematical Symbols to their Descriptions

论文作者

Lai, Viet Dac, Veyseh, Amir Pouran Ben, Dernoncourt, Franck, Nguyen, Thien Huu

论文摘要

鉴于越来越多的直播视频,用于直播视频成绩单的自动语音识别和后处理对于有效的数据管理和知识挖掘至关重要。此过程中的关键步骤是标点符号修复,该恢复恢复了视频成绩单中的基本文本结构,例如短语和句子边界。这项工作提出了一种新的人类通知的语料库,称为Behancepr,用于播放视频成绩单中的标点符号修复。我们对Behancepr的实验证明了标点符号恢复该领域的挑战。此外,我们表明,流行的自然语言处理工具包无法在直播视频的非插入笔录上检测句子边界,呼吁进行更多的研究工作来为该领域开发强大的模型。

Given the increasing number of livestreaming videos, automatic speech recognition and post-processing for livestreaming video transcripts are crucial for efficient data management as well as knowledge mining. A key step in this process is punctuation restoration which restores fundamental text structures such as phrase and sentence boundaries from the video transcripts. This work presents a new human-annotated corpus, called BehancePR, for punctuation restoration in livestreaming video transcripts. Our experiments on BehancePR demonstrate the challenges of punctuation restoration for this domain. Furthermore, we show that popular natural language processing toolkits are incapable of detecting sentence boundary on non-punctuated transcripts of livestreaming videos, calling for more research effort to develop robust models for this area.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源