采矿用户隐私涉及应用程序评论的主题

论文标题

采矿用户隐私涉及应用程序评论的主题

Mining User Privacy Concern Topics from App Reviews

论文作者

Zhang, Jianzhang, Hua, Jinping, Chen, Yiyang, Niu, Nan, Liu, Chuang

论文摘要

上下文：随着移动应用程序（应用程序）广泛分布在我们的社会和生活中，应用程序不断要求各种个人信息，以换取更聪明和定制的功能。越来越多的用户通过应用商店的应用程序评论来表达其隐私问题。目的：从用户评论中有效挖掘隐私问题的主要挑战在于，表达隐私问题的评论被大量评论所表达出更多通用的主题和嘈杂的内容所覆盖。在这项工作中，我们提出了一种新型的自动化方法来克服这一挑战。方法：我们的方法首先采用信息检索和文档嵌入来毫无根据地提取候选隐私审查，这些评论被进一步标记以准备注释数据集。然后，对监督分类器进行培训以自动确定隐私审查。最后，我们设计了一种可解释的主题挖掘算法，以检测隐私评论中包含的隐私问题。结果：实验结果表明，在前100名被检索的候选隐私审查中，嵌入最佳的文档的平均精度为96.80％。所有训练有素的隐私审查分类器都可以达到91％以上的F1值，超过了最近的关键字匹配基线，最大F1保证金为7.5％。为了检测隐私评论的隐私问题，我们提议的算法比包括LDA在内的三个强大的主题建模基准相比，主题连贯性和多样性更好。结论：经验评估结果证明了我们方法在确定隐私审查和检测应用程序评论中表达的用户隐私问题方面的有效性。

Context: As mobile applications (Apps) widely spread over our society and life, various personal information is constantly demanded by Apps in exchange for more intelligent and customized functionality. An increasing number of users are voicing their privacy concerns through app reviews on App stores. Objective: The main challenge of effectively mining privacy concerns from user reviews lies in the fact that reviews expressing privacy concerns are overridden by a large number of reviews expressing more generic themes and noisy content. In this work, we propose a novel automated approach to overcome that challenge. Method: Our approach first employs information retrieval and document embeddings to unsupervisedly extract candidate privacy reviews that are further labeled to prepare the annotation dataset. Then, supervised classifiers are trained to automatically identify privacy reviews. Finally, we design an interpretable topic mining algorithm to detect privacy concern topics contained in the privacy reviews. Results: Experimental results show that the best performed document embedding achieves an average precision of 96.80% in the top 100 retrieved candidate privacy reviews. All of the trained privacy review classifiers can achieve an F1 value of more than 91%, outperforming the recent keywords matching baseline with the maximum F1 margin being 7.5%. For detecting privacy concern topics from privacy reviews, our proposed algorithm achieves both better topic coherence and diversity than three strong topic modeling baselines including LDA. Conclusion: Empirical evaluation results demonstrate the effectiveness of our approach in identifying privacy reviews and detecting user privacy concerns expressed in App reviews.

下载PDF全文

下载文献需遵守相关版权规定

论文标题