论文标题

BER:说话者诊断的均衡错误率

BER: Balanced Error Rate For Speaker Diarization

论文作者

Liu, Tao, Yu, Kai

论文摘要

DER是在面对困境时评估诊断性能的主要度量:短语或段中的错误往往会被较长的误差所淹没。简短的段,例如`'yes'或`否,仍然有语义信息。此外,DER忽略了较不言语的扬声器中的错误。尽管Jer Balances扬声器错误,但仍然存在同样的困境。考虑到所有这些方面,持续时间误差,段错误以及构成完整的诊断评估的说话者加权错误,我们提出了一个平衡的错误率(BER)来评估说话者诊断。首先,我们通过连接的子图和自适应iou threshold提出一个段级别的错误率(SER),以获得准确的段匹配。其次,为了以统一的方式评估诊断,我们在持续时间和细分市场之间采用了说话者特定的谐波平均值,然后采用说话者加权平均值。第三,我们通过模块化系统,EEND和多模式方法分析了我们的指标。 SER和BER可在https://github.com/x-lance/ber上公开获取。

DER is the primary metric to evaluate diarization performance while facing a dilemma: the errors in short utterances or segments tend to be overwhelmed by longer ones. Short segments, e.g., `yes' or `no,' still have semantic information. Besides, DER overlooks errors in less-talked speakers. Although JER balances speaker errors, it still suffers from the same dilemma. Considering all those aspects, duration error, segment error, and speaker-weighted error constituting a complete diarization evaluation, we propose a Balanced Error Rate (BER) to evaluate speaker diarization. First, we propose a segment-level error rate (SER) via connected sub-graphs and adaptive IoU threshold to get accurate segment matching. Second, to evaluate diarization in a unified way, we adopt a speaker-specific harmonic mean between duration and segment, followed by a speaker-weighted average. Third, we analyze our metric via the modularized system, EEND, and the multi-modal method on real datasets. SER and BER are publicly available at https://github.com/X-LANCE/BER.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源