论文标题
为使用整个故事平台提供可重复性的数据密集型研究
Toward Enabling Reproducibility for Data-Intensive Research using the Whole Tale Platform
论文作者
论文摘要
整个故事http://wholetale.org是一个基于网络的开源平台,用于可复制的研究,支持科学研究社区的创建,共享,执行和验证“故事”。故事是可执行的研究对象,可捕获代码,数据和环境以及从科学研究中重新创建计算结果所需的叙事和工作流程信息。创建可重现的研究对象,以实现需要大量计算资源或利用大量数据的计算实验的可重复性,透明度和重新执行,这是一个特别具有挑战性的开放问题。我们使用整个故事计算平台描述了促进数据和计算密集型研究的可重复性的机遇,挑战和解决方案。我们重点介绍了前端响应需求中的挑战和解决方案,当前中间件设计和实现,网络限制,容器化和数据访问的差距。最后,我们讨论了针对便携式数据密集型故事包装计算实验实现的挑战,并概述了未来的工作。
Whole Tale http://wholetale.org is a web-based, open-source platform for reproducible research supporting the creation, sharing, execution, and verification of "Tales" for the scientific research community. Tales are executable research objects that capture the code, data, and environment along with narrative and workflow information needed to re-create computational results from scientific studies. Creating reproducible research objects that enable reproducibility, transparency, and re-execution for computational experiments requiring significant compute resources or utilizing massive data is an especially challenging open problem. We describe opportunities, challenges, and solutions to facilitating reproducibility for data- and compute-intensive research, that we call "Tales at Scale," using the Whole Tale computing platform. We highlight challenges and solutions in frontend responsiveness needs, gaps in current middleware design and implementation, network restrictions, containerization, and data access. Finally, we discuss challenges in packaging computational experiment implementations for portable data-intensive Tales and outline future work.