SMS数据集的术语句子相似性

论文标题

SMS数据集的术语句子相似性

On-Device Sentence Similarity for SMS Dataset

论文作者

Prabhu, Arun D, Arora, Nikhil, Vatsal, Shubham, Ramena, Gopi, Moharana, Sukumar, Purre, Naresh

论文摘要

确定简短消息服务（SMS）文本/句子之间的句子相似性在移动设备行业中起着重要作用。因此，对于各种应用程序（例如增强的搜索和导航），在给定自定义标签或标签时将SMS数据之间的相似性衡量是必需的。在本文中，我们提出了一个独特的管道，用于评估SMS文本之间的文本相似性。我们通过利用嵌入在SMS文本中的部分结构来使用语音（POS）模型进行关键字提取，并使用统计方法进行相似性比较。拟议的管道涉及SMS数据之间的主要语义变化，并使其对其应用程序设备（手机）有效。为了展示我们作品的功能，我们的管道设计为倾向于在以下各节中讨论的SMS文本相似性的可能应用之一，但仍然可以保证其他应用程序的可扩展性。

Determining the sentence similarity between Short Message Service (SMS) texts/sentences plays a significant role in mobile device industry. Gauging the similarity between SMS data is thus necessary for various applications like enhanced searching and navigation, clubbing together SMS of similar type when given a custom label or tag is provided by user irrespective of their sender etc. The problem faced with SMS data is its incomplete structure and grammatical inconsistencies. In this paper, we propose a unique pipeline for evaluating the text similarity between SMS texts. We use Part of Speech (POS) model for keyword extraction by taking advantage of the partial structure embedded in SMS texts and similarity comparisons are carried out using statistical methods. The proposed pipeline deals with major semantic variations across SMS data as well as makes it effective for its application on-device (mobile phone). To showcase the capabilities of our work, our pipeline has been designed with an inclination towards one of the possible applications of SMS text similarity discussed in one of the following sections but nonetheless guarantees scalability for other applications as well.

下载PDF全文

下载文献需遵守相关版权规定

论文标题