论文标题
公民科学生态数据的贝叶斯项目响应模型
Bayesian item response models for citizen science ecological data
论文作者
论文摘要
从人群中引起的所谓“公民科学”数据在包括生态学在内的许多领域变得越来越流行。但是,科学界的许多人经常辩论这些信息的质量。因此,现代公民科学实施需要衡量用户熟练程度来解决任务困难的问题。我们介绍了一个新的项目响应方法和线性逻辑测试模型的方法学框架,并应用于生态学研究中使用的公民科学数据。这种方法适应项目困难中的空间自相关,并产生与物种和现场相关的困难,歧视性和猜测行为的相关生态测量。这些以及对主题能力的估计,可以更好地管理这些计划并提供更深入的见解。本文还强调了项目响应模型对大数据的拟合度。我们发现,根据RMSE,准确性和WAIC,基于对模拟和经验数据的剩余交叉验证,建议的方法优于传统项目响应模型。我们使用坦桑尼亚塞伦盖蒂物种鉴定的案例研究提出了全面的实施。提供了R和Stan代码,可完全可重复性。给出了多个统计图和可视化,使从业者可以推断出广泛的公民科学生态问题。
So-called 'citizen science' data elicited from crowds has become increasingly popular in many fields including ecology. However, the quality of this information is being frequently debated by many within the scientific community. Therefore, modern citizen science implementations require measures of the users' proficiency that account for the difficulty of the tasks. We introduce a new methodological framework of item response and linear logistic test models with application to citizen science data used in ecology research. This approach accommodates spatial autocorrelation within the item difficulties and produces relevant ecological measures of species and site-related difficulties, discriminatory power and guessing behavior. These, along with estimates of the subject abilities allow better management of these programs and provide deeper insights. This paper also highlights the fit of item response models to big data via divide-and-conquer. We found that the suggested methods outperform the traditional item response models in terms of RMSE, accuracy, and WAIC based on leave-one-out cross-validation on simulated and empirical data. We present a comprehensive implementation using a case study of species identification in the Serengeti, Tanzania. The R and Stan codes are provided for full reproducibility. Multiple statistical illustrations and visualizations are given which allow practitioners the extrapolation to a wide range of citizen science ecological problems.