自然语言理解平台的聊天机器人的比较

论文标题

自然语言理解平台的聊天机器人的比较

A Comparison of Natural Language Understanding Platforms for Chatbots in Software Engineering

论文作者

Abdellatif, Ahmad, Badran, Khaled, Costa, Diego Elias, Shihab, Emad

论文摘要

设想聊天机器人可以极大地改变软件工程的未来，使从业者可以聊天和询问其软件项目，并使用自然语言与不同的服务进行交互。每个聊天机器人的核心是一种自然语言理解（NLU）组件，它使聊天机器人能够理解自然语言输入。最近，提供了许多NLU平台作为用于聊天机器人的现成的NLU组件，但是，为软件工程聊天机器人选择最佳的NLU仍然是一个挑战。因此，在本文中，我们评估了四个最常用的NLU，即IBM Watson，Google Dialogflow，Rasa和Microsoft Luis，以阐明在基于软件工程的聊天机器人中应使用NLU。具体而言，我们研究了NLU在分类，置信度得分稳定性和提取实体方面的性能。为了评估NLU，我们使用两个反映软件工程从业人员执行的两个常见任务的数据集，1）与聊天机器人聊天的任务以询问有关软件存储库的问题2）在问答论坛上询问开发问题的任务（例如，堆栈溢出）。根据我们的发现，IBM Watson在考虑三个方面（意图分类，置信度得分和实体提取）时是表现最好的NLU。但是，每个单个方面的结果表明，在意图分类中，IBM Watson表现出色，F1量度> 84％，但在置信度得分中，RASA以中位置信度得分高于0.91。我们的结果还表明，除DialogFlow以外，所有NLU通常都提供可信赖的信心分数。对于实体提取，Microsoft Luis和IBM Watson在两个SE任务中的表现都优于其他NLU。我们的结果为软件工程从业人员确定在聊天机器人中使用哪种NLU时为指导提供了指导。

Chatbots are envisioned to dramatically change the future of Software Engineering, allowing practitioners to chat and inquire about their software projects and interact with different services using natural language. At the heart of every chatbot is a Natural Language Understanding (NLU) component that enables the chatbot to understand natural language input. Recently, many NLU platforms were provided to serve as an off-the-shelf NLU component for chatbots, however, selecting the best NLU for Software Engineering chatbots remains an open challenge. Therefore, in this paper, we evaluate four of the most commonly used NLUs, namely IBM Watson, Google Dialogflow, Rasa, and Microsoft LUIS to shed light on which NLU should be used in Software Engineering based chatbots. Specifically, we examine the NLUs' performance in classifying intents, confidence scores stability, and extracting entities. To evaluate the NLUs, we use two datasets that reflect two common tasks performed by Software Engineering practitioners, 1) the task of chatting with the chatbot to ask questions about software repositories 2) the task of asking development questions on Q&A forums (e.g., Stack Overflow). According to our findings, IBM Watson is the best performing NLU when considering the three aspects (intents classification, confidence scores, and entity extraction). However, the results from each individual aspect show that, in intents classification, IBM Watson performs the best with an F1-measure > 84%, but in confidence scores, Rasa comes on top with a median confidence score higher than 0.91. Our results also show that all NLUs, except for Dialogflow, generally provide trustable confidence scores. For entity extraction, Microsoft LUIS and IBM Watson outperform other NLUs in the two SE tasks. Our results provide guidance to software engineering practitioners when deciding which NLU to use in their chatbots.

下载PDF全文

下载文献需遵守相关版权规定

论文标题