论文标题
混合信号:分析Android生态系统中的软件归因挑战
Mixed Signals: Analyzing Software Attribution Challenges in the Android Ecosystem
论文作者
论文摘要
识别负责给定软件对象的作者的能力对于许多研究以及提高软件透明度和问责制至关重要。但是,与iOS等其他应用市场相反,众所周知,Android生态系统中的归因很难。先前的研究已利用市场元数据和签署证书来识别软件作者,而无需质疑这些归因信号的有效性和准确性。但是,Android App作者可以故意或错误地掩盖其真实身份,因为:(1)市场缺乏政策执行,以确保开发人员在应用程序发布过程中开发人员在其市场配置文件中披露的信息的准确性和正确性; 在本文中,我们对Android应用市场中的作者归因的公开元数据的可用性,波动性和总体适当性进行了首次实证分析。为此,我们分析了一个超过250万个从Android市场提取的超过250万个市场条目的数据集,已有两年多的时间。我们的结果表明,广泛使用的归因信号通常会因市场概况而缺少,并且它们会随着时间的流逝而变化。我们还使对作者归因的签署证书的有效性的一般信念无效。例如,我们发现,由于应用程序构建框架和软件工厂的扩散,来自不同作者的应用程序共享签名证书。最后,我们介绍了归因图的概念,并将其应用于评估Google Play商店上现有归因信号的有效性。我们的结果证实,缺乏对公共可用信号的控制可能会使归因过程感到困惑。
The ability to identify the author responsible for a given software object is critical for many research studies and for enhancing software transparency and accountability. However, as opposed to other application markets like iOS, attribution in the Android ecosystem is known to be hard. Prior research has leveraged market metadata and signing certificates to identify software authors without questioning the validity and accuracy of these attribution signals. However, Android app authors can, either intentionally or by mistake, hide their true identity due to: (1) the lack of policy enforcement by markets to ensure the accuracy and correctness of the information disclosed by developers in their market profiles during the app release process, and (2) the use of self-signed certificates for signing apps instead of certificates issued by trusted CAs. In this paper, we perform the first empirical analysis of the availability, volatility and overall aptness of publicly available metadata for author attribution in Android app markets. To that end, we analyze a dataset of over 2.5 million market entries and apps extracted from five Android markets for over two years. Our results show that widely used attribution signals are often missing from market profiles and that they change over time. We also invalidate the general belief about the validity of signing certificates for author attribution. For instance, we find that apps from different authors share signing certificates due to the proliferation of app building frameworks and software factories. Finally, we introduce the concept of attribution graph and we apply it to evaluate the validity of existing attribution signals on the Google Play Store. Our results confirm that the lack of control over publicly available signals can confuse the attribution process.