论文标题
UGIF:UI接地指令以下
UGIF: UI Grounded Instruction Following
论文作者
论文摘要
智能手机用户经常发现很难导航多种菜单执行常见任务,例如“如何阻止未知数的呼叫?”。当前,手动编写带有分步说明的帮助文档以帮助用户。通过将帮助文档中的说明接地到UI并叠加电话UI的教程,可以进一步增强用户体验。为了构建此类教程,需要几种自然语言处理组件,包括检索,解析和接地,但对于此类任务没有任何相关数据集。因此,我们引入了UGIF-DATASET,这是一种多语言,多模式的UI接地数据集,用于在包含8种语言的4,184个任务的智能手机上完成逐步任务完成。作为解决此问题的初始方法,我们建议根据用户的查询检索相关的说明步骤,并使用大语言模型(LLMS)解析步骤,以生成可以在设备上执行的宏。指令步骤通常仅在英语中可用,因此挑战包括以多种语言的用户查询的英语操作页面进行跨模式的跨模式检索,并用潜在的语言将英语指令步骤映射到UI。我们比较了包括Palm和GPT-3在内的不同LLM的性能,并发现英语UI的端到端任务完成率为48%,但其他语言的性能下降到32%。我们在此任务上分析了现有模型的常见故障模式,并指出了改进领域。
Smartphone users often find it difficult to navigate myriad menus to perform common tasks such as "How to block calls from unknown numbers?". Currently, help documents with step-by-step instructions are manually written to aid the user. The user experience can be further enhanced by grounding the instructions in the help document to the UI and overlaying a tutorial on the phone UI. To build such tutorials, several natural language processing components including retrieval, parsing, and grounding are necessary, but there isn't any relevant dataset for such a task. Thus, we introduce UGIF-DataSet, a multi-lingual, multi-modal UI grounded dataset for step-by-step task completion on the smartphone containing 4,184 tasks across 8 languages. As an initial approach to this problem, we propose retrieving the relevant instruction steps based on the user's query and parsing the steps using Large Language Models (LLMs) to generate macros that can be executed on-device. The instruction steps are often available only in English, so the challenge includes cross-modal, cross-lingual retrieval of English how-to pages from user queries in many languages and mapping English instruction steps to UI in a potentially different language. We compare the performance of different LLMs including PaLM and GPT-3 and find that the end-to-end task completion rate is 48% for English UI but the performance drops to 32% for other languages. We analyze the common failure modes of existing models on this task and point out areas for improvement.