论文标题
印度教哲学中的主题建模的人工智能:奥义书和博伽梵歌之间的主题
Artificial intelligence for topic modelling in Hindu philosophy: mapping themes between the Upanishads and the Bhagavad Gita
论文作者
论文摘要
印度教宗教和哲学文本的一个独特特征是,它们来自文本库,而不是单一来源。奥义书被称为世界上最古老的哲学文本之一,构成了印度哲学的基础。 《博伽梵歌》是印度哲学的核心文本,被称为一本文本,总结了奥义书的关键哲学,主要关注业力哲学。这些文本已被翻译成多种语言,并存在有关主题和主题突出的研究。但是,使用由深度学习提供动力的语言模型对主题建模的研究不多。在本文中,我们使用高级语言产生的伯特(Bert)来提供有关奥义书和博伽梵歌的关键文本的主题建模。我们分析了文本之间的独特和重叠的主题,并可视化了奥义书的选定文本与Bhagavad Gita的链接。我们的结果表明,这两个文本的主题之间的相似性非常高,平均余弦相似性为73%。我们发现,在从《博伽梵歌》中提取的十四个主题中,其中有9个与奥义书的主题具有70%以上的余弦相似性。我们还发现,与常规模型相比,基于BERT的模型生成的主题显示出非常高的连贯性。我们最佳性能模型的Bhagavad Gita的连贯得分为73%,Upanishads的连贯得分为69%。这些文本的低维嵌入的可视化表明,它们的主题之间非常清晰的重叠,为我们的结果增加了另一个验证。
A distinct feature of Hindu religious and philosophical text is that they come from a library of texts rather than single source. The Upanishads is known as one of the oldest philosophical texts in the world that forms the foundation of Hindu philosophy. The Bhagavad Gita is core text of Hindu philosophy and is known as a text that summarises the key philosophies of the Upanishads with major focus on the philosophy of karma. These texts have been translated into many languages and there exists studies about themes and topics that are prominent; however, there is not much study of topic modelling using language models which are powered by deep learning. In this paper, we use advanced language produces such as BERT to provide topic modelling of the key texts of the Upanishads and the Bhagavad Gita. We analyse the distinct and overlapping topics amongst the texts and visualise the link of selected texts of the Upanishads with Bhagavad Gita. Our results show a very high similarity between the topics of these two texts with the mean cosine similarity of 73%. We find that out of the fourteen topics extracted from the Bhagavad Gita, nine of them have a cosine similarity of more than 70% with the topics of the Upanishads. We also found that topics generated by the BERT-based models show very high coherence as compared to that of conventional models. Our best performing model gives a coherence score of 73% on the Bhagavad Gita and 69% on The Upanishads. The visualization of the low dimensional embeddings of these texts shows very clear overlapping among their topics adding another level of validation to our results.