论文标题
生物信息学中的主观数据模型:湿与LAB和计算生物学家是否对数据有所不同?
Subjective data models in bioinformatics: Do wet-lab and computational biologists comprehend data differently?
论文作者
论文摘要
生物科学以各种格式产生大量数据,这需要使用计算工具来处理,集成,分析和收集数据的见解。使用计算生物学工具的研究人员的范围从主要用于通信和数据查找的计算机到编写复杂软件程序以分析数据或使其他人更容易这样做的人。这项研究研究了人们在概念化相同数据的方式方面的差异,我们将“主观数据模型”一词归因于此。 我们采访了22人具有生物学经验和不同水平的计算经验,以引起他们对同一生物数据实体子集的看法。结果表明,许多人具有流体的主观数据模型,这些模型会根据他们使用的情况或工具而改变。令人惊讶的是,结果通常似乎并没有围绕参与者的计算经验/教育水平或缺乏。我们进一步发现,人们并没有一贯地将实体从抽象数据模型映射到现实世界中的同一标识符,并发现某些数据标识符格式比其他数据标识符格式更容易推断出含义。 这些发现的现实含义表明,1)软件工程师应为任务性能设计界面并模仿其他相关的流行用户界面,而不是针对人的专业背景; 2)当提供不足的上下文时,人们可能会猜测数据的含义,是否正确,是否正确地强调了在准备数据以重复使用其他数据时提供上下文元数据的重要性,以消除对潜在错误的猜测工作的需求。
Biological science produces large amounts of data in a variety of formats, which necessitates the use of computational tools to process, integrate, analyse, and glean insights from the data. Researchers who use computational biology tools range from those who use computers primarily for communication and data lookup, to those who write complex software programs in order to analyse data or make it easier for others to do so. This research examines how people differ in how they conceptualise the same data, for which we coin the term "subjective data models". We interviewed 22 people with biological experience and varied levels of computational experience to elicit their perceptions of the same subset of biological data entities. The results suggest that many people had fluid subjective data models that would change depending on the circumstance or tool they were using. Surprisingly, results generally did not seem to cluster around a participant's computational experience/education levels, or the lack thereof. We further found that people did not consistently map entities from an abstract data model to the same identifiers in real-world files, and found that certain data identifier formats were easier for participants to infer meaning from than others. Real-world implications of these findings suggests that 1) software engineers should design interfaces for task performance and emulate other related popular user interfaces, rather than targeting a person's professional background; 2) when insufficient context is provided, people may guess what data means, whether or not their guesses are correct, emphasising the importance of providing contextual metadata when preparing data for re-use by other, to remove the need for potentially erroneous guesswork.