BER：说话者诊断的均衡错误率

论文标题

BER：说话者诊断的均衡错误率

BER: Balanced Error Rate For Speaker Diarization

论文作者

Liu, Tao, Yu, Kai

论文摘要

DER是在面对困境时评估诊断性能的主要度量：短语或段中的错误往往会被较长的误差所淹没。简短的段，例如`'yes'或`否，仍然有语义信息。此外，DER忽略了较不言语的扬声器中的错误。尽管Jer Balances扬声器错误，但仍然存在同样的困境。考虑到所有这些方面，持续时间误差，段错误以及构成完整的诊断评估的说话者加权错误，我们提出了一个平衡的错误率（BER）来评估说话者诊断。首先，我们通过连接的子图和自适应iou threshold提出一个段级别的错误率（SER），以获得准确的段匹配。其次，为了以统一的方式评估诊断，我们在持续时间和细分市场之间采用了说话者特定的谐波平均值，然后采用说话者加权平均值。第三，我们通过模块化系统，EEND和多模式方法分析了我们的指标。 SER和BER可在https://github.com/x-lance/ber上公开获取。

DER is the primary metric to evaluate diarization performance while facing a dilemma: the errors in short utterances or segments tend to be overwhelmed by longer ones. Short segments, e.g., `yes' or `no,' still have semantic information. Besides, DER overlooks errors in less-talked speakers. Although JER balances speaker errors, it still suffers from the same dilemma. Considering all those aspects, duration error, segment error, and speaker-weighted error constituting a complete diarization evaluation, we propose a Balanced Error Rate (BER) to evaluate speaker diarization. First, we propose a segment-level error rate (SER) via connected sub-graphs and adaptive IoU threshold to get accurate segment matching. Second, to evaluate diarization in a unified way, we adopt a speaker-specific harmonic mean between duration and segment, followed by a speaker-weighted average. Third, we analyze our metric via the modularized system, EEND, and the multi-modal method on real datasets. SER and BER are publicly available at https://github.com/X-LANCE/BER.

下载PDF全文

下载文献需遵守相关版权规定

论文标题