论文标题

捕获和复制Jupyter笔记本的绝对状态的框架

A Framework to capture and reproduce the Absolute State of Jupyter Notebooks

论文作者

Wannipurage, Dimuthu, Marru, Suresh, Pierce, Marlon

论文摘要

Jupyter笔记本电脑是一个非常流行的工具,用于创建和叙述计算研究项目。它们也具有创造可重复的科学研究工具的巨大潜力。捕获笔记本的完整状态有其他好处;例如,笔记本电脑执行可以在本地和远程资源之间进行分配,在本地和远程资源中,后者可能具有更强大的处理功能或存储大型或访问限制的数据。在详细检查时,使笔记本完全可重现的挑战存在一些挑战。笔记本代码必须完全复制,并且基础的Python运行时环境必须相同。复制参考数据,外部库依赖性和运行时变量状态时会出现更多微妙的问题。本文使用Juptyer的标准扩展机制为这些问题提供了解决方案,以为运行笔记本创建可存档的系统状态。我们表明,这些附加机制的开销涉及与基础Linux内核进行交互,并未引入大量的执行时间开销,这表明了方法的可行性。

Jupyter Notebooks are an enormously popular tool for creating and narrating computational research projects. They also have enormous potential for creating reproducible scientific research artifacts. Capturing the complete state of a notebook has additional benefits; for instance, the notebook execution may be split between local and remote resources, where the latter may have more powerful processing capabilities or store large or access-limited data. There are several challenges for making notebooks fully reproducible when examined in detail. The notebook code must be replicated entirely, and the underlying Python runtime environments must be identical. More subtle problems arise in replicating referenced data, external library dependencies, and runtime variable states. This paper presents solutions to these problems using Juptyer's standard extension mechanisms to create an archivable system state for a running notebook. We show that the overhead for these additional mechanisms, which involve interacting with the underlying Linux kernel, does not introduce substantial execution time overheads, demonstrating the approach's feasibility.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源