关于大型语音的语义相似性分析的认知研究：一种基于变压器的方法

论文标题

关于大型语音的语义相似性分析的认知研究：一种基于变压器的方法

A Cognitive Study on Semantic Similarity Analysis of Large Corpora: A Transformer-based Approach

论文作者

Nemani, Praneeth, Vollala, Satyanarayana

论文摘要

在当今自然语言处理的许多开创性应用中，语义相似性分析和建模是从根本上享有盛誉的任务。由于顺序模式识别的感觉，许多神经网络（例如RNN和LSTMS）在语义相似性建模中都取得了令人满意的结果。但是，这些解决方案由于无法以非顺序方式处理信息而被认为效率低下，从而导致上下文提取不当。变形金刚由于其优势（例如非序列数据处理和自我注意力）而成为最先进的体系结构。在本文中，我们使用传统和基于变压器的技术对美国专利短语进行语义相似性分析和建模。我们在解码增强的BERT-DEBERTA的四种不同变体中进行实验，并通过执行K折交叉验证来增强其性能。实验结果表明，与传统技术相比，我们的方法学的性能提高，平均Pearson相关评分为0.79。

Semantic similarity analysis and modeling is a fundamentally acclaimed task in many pioneering applications of natural language processing today. Owing to the sensation of sequential pattern recognition, many neural networks like RNNs and LSTMs have achieved satisfactory results in semantic similarity modeling. However, these solutions are considered inefficient due to their inability to process information in a non-sequential manner, thus leading to the improper extraction of context. Transformers function as the state-of-the-art architecture due to their advantages like non-sequential data processing and self-attention. In this paper, we perform semantic similarity analysis and modeling on the U.S Patent Phrase to Phrase Matching Dataset using both traditional and transformer-based techniques. We experiment upon four different variants of the Decoding Enhanced BERT - DeBERTa and enhance its performance by performing K-Fold Cross-Validation. The experimental results demonstrate our methodology's enhanced performance compared to traditional techniques, with an average Pearson correlation score of 0.79.

下载PDF全文

下载文献需遵守相关版权规定

论文标题