通过小组预测模型的个性化填充一代

论文标题

通过小组预测模型的个性化填充一代

Personalized Filled-pause Generation with Group-wise Prediction Models

论文作者

Matsunaga, Yuta, Saeki, Takaaki, Takamichi, Shinnosuke, Saruwatari, Hiroshi

论文摘要

在本文中，我们提出了一种使用小组预测模型生成个性化填充的停顿（FPS）的方法。与流利的文本生成相比，尚未广泛探索文本生成。为了生成更类似人类的文本，我们解决了不足的文本生成。诸如FPS，复习和单词片段之类的反弹性的用法与说话者之间的不同之处在于，因此需要个性化FPS的产生。但是，由于位置的稀疏性以及更多和较少使用的FPS之间的频率差异很难预测它们。此外，由于每个说话者内部趋势的差异很大，有时很难将FP预测模型调整为每个说话者。为了解决这些问题，我们提出了一种方法，通过根据使用FPS的趋势对说话者进行分组来构建依赖组的预测模型。此方法不需要大量数据和时间来训练每个说话者模型。我们进一步介绍了适合FP预测的损失函数和单词嵌入模型。我们的实验结果表明，依赖组的模型可以预测得分高于非人性化的FP，并且引入的损失函数和单词嵌入模型可以改善预测性能。

In this paper, we propose a method to generate personalized filled pauses (FPs) with group-wise prediction models. Compared with fluent text generation, disfluent text generation has not been widely explored. To generate more human-like texts, we addressed disfluent text generation. The usage of disfluency, such as FPs, rephrases, and word fragments, differs from speaker to speaker, and thus, the generation of personalized FPs is required. However, it is difficult to predict them because of the sparsity of position and the frequency difference between more and less frequently used FPs. Moreover, it is sometimes difficult to adapt FP prediction models to each speaker because of the large variation of the tendency within each speaker. To address these issues, we propose a method to build group-dependent prediction models by grouping speakers on the basis of their tendency to use FPs. This method does not require a large amount of data and time to train each speaker model. We further introduce a loss function and a word embedding model suitable for FP prediction. Our experimental results demonstrate that group-dependent models can predict FPs with higher scores than a non-personalized one and the introduced loss function and word embedding model improve the prediction performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题