联合神经主题模型

论文标题

联合神经主题模型

Federated Neural Topic Models

论文作者

Calvo-Bartolomé, Lorena, Arenas-García, Jerónimo

论文摘要

在过去的几年中，主题建模已成为一种强大的技术，用于组织和总结大量文档或搜索其中的特定模式。但是，当来自不同来源的数据交叉分析时，可能会出现隐私问题。联合主题建模通过允许多方共同培训主题模型而无需共享数据，从而解决了这个问题。尽管确实存在几个联合经典主题模型的联合近似值，但尚未对其神经主题模型的申请进行研究。为了填补这一空白，我们根据最先进的神经主题建模实施提出和分析联合实施，在节点文档中有各种各样的主题以及建立联合模型的必要性时，显示出其好处。实际上，我们的方法等同于集中的模型培训，但保留了节点的隐私。通过使用合成和真实数据方案的实验来说明这种联合情景的优势。

Over the last years, topic modeling has emerged as a powerful technique for organizing and summarizing big collections of documents or searching for particular patterns in them. However, privacy concerns may arise when cross-analyzing data from different sources. Federated topic modeling solves this issue by allowing multiple parties to jointly train a topic model without sharing their data. While several federated approximations of classical topic models do exist, no research has been conducted on their application for neural topic models. To fill this gap, we propose and analyze a federated implementation based on state-of-the-art neural topic modeling implementations, showing its benefits when there is a diversity of topics across the nodes' documents and the need to build a joint model. In practice, our approach is equivalent to a centralized model training, but preserves the privacy of the nodes. Advantages of this federated scenario are illustrated by means of experiments using both synthetic and real data scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题