MEG：多模式语义取证的多通用GNN

论文标题

MEG：多模式语义取证的多通用GNN

MEG: Multi-Evidence GNN for Multimodal Semantic Forensics

论文作者

Sabir, Ekraam, Jaiswal, Ayush, AbdAlmageed, Wael, Natarajan, Prem

论文摘要

假新闻通常涉及跨图像，文本，位置等范围内的语义操作，并且需要开发多模式语义取证以进行检测。最近的研究将图像围绕问题的问题引起了重点，称其为重新利用 - 在语义上通过其伴随的多模式元数据（例如字幕，位置等）将图像错误地陈述为语义上的图像。问题设置需要算法来执行多模式的语义取证，以使用潜在相关软件包作为证据的参考数据集对查询多媒体软件包进行身份验证。现有方法仅限于使用单个证据（检索到的软件包），该证据忽略了使用多种证据的潜在绩效提高。在这项工作中，我们引入了一种基于图形神经网络的新型模型，用于多模式语义取证，该模型有效地利用了多个检索的软件包作为证据，并且具有证据的数量。我们将模型与现有方法的可扩展性和性能进行比较。实验结果表明，所提出的模型的表现优于现有的最新算法，误差降低高达25％。

Fake news often involves semantic manipulations across modalities such as image, text, location etc and requires the development of multimodal semantic forensics for its detection. Recent research has centered the problem around images, calling it image repurposing -- where a digitally unmanipulated image is semantically misrepresented by means of its accompanying multimodal metadata such as captions, location, etc. The image and metadata together comprise a multimedia package. The problem setup requires algorithms to perform multimodal semantic forensics to authenticate a query multimedia package using a reference dataset of potentially related packages as evidences. Existing methods are limited to using a single evidence (retrieved package), which ignores potential performance improvement from the use of multiple evidences. In this work, we introduce a novel graph neural network based model for multimodal semantic forensics, which effectively utilizes multiple retrieved packages as evidences and is scalable with the number of evidences. We compare the scalability and performance of our model against existing methods. Experimental results show that the proposed model outperforms existing state-of-the-art algorithms with an error reduction of up to 25%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题