论文标题
用于半自动检查研究输出的多语言工具包
A multi-language toolkit for the semi-automated checking of research outputs
论文作者
论文摘要
本文提出了一个免费的开源工具包,该工具包支持对安全数据环境中的隐私披露的研究输出(SACRO)的半自动化检查。 SACRO是一个框架,在研究人员进行分析时,采用了最佳实践原则的统计披露控制(SDC)技术。 Sacro旨在帮助人类跳棋,而不是试图像当前基于规则的方法一样替代它们。该工具包由一个轻巧的Python软件包组成,该软件包位于众所周知的分析工具上,该工具产生了诸如表,图和统计模型之类的输出。该软件包将功能添加到(i)自动在一系列常用的披露测试中自动识别潜在的披露输出; (ii)根据要求采用可选的披露缓解策略; (iii)报告应用SDC的原因; (iv)产生简单的摘要文档,可信赖的研究环境人员可以用来简化其工作流程并维护可审核的记录。这会在动力学上产生明显的变化,因此SDC是对研究人员而不是对他们进行的,并可以与Checkers进行更有效的沟通。图形用户界面通过以立即访问的格式显示所需的输出和检查结果来支持人类检查员,并突出显示已确定的问题,潜在的缓解选项和跟踪决策。研究人员(Python,R和Stata)使用的主要分析编程语言是通过提供与核心Python后端接口的前端软件包来支持的。源代码,软件包和文档可在https://github.com/ai-sdc/acro获得MIT许可证。
This article presents a free and open source toolkit that supports the semi-automated checking of research outputs (SACRO) for privacy disclosure within secure data environments. SACRO is a framework that applies best-practice principles-based statistical disclosure control (SDC) techniques on-the-fly as researchers conduct their analyses. SACRO is designed to assist human checkers rather than seeking to replace them as with current automated rules-based approaches. The toolkit is composed of a lightweight Python package that sits over well-known analysis tools that produce outputs such as tables, plots, and statistical models. This package adds functionality to (i) automatically identify potentially disclosive outputs against a range of commonly used disclosure tests; (ii) apply optional disclosure mitigation strategies as requested; (iii) report reasons for applying SDC; and (iv) produce simple summary documents trusted research environment staff can use to streamline their workflow and maintain auditable records. This creates an explicit change in the dynamics so that SDC is something done with researchers rather than to them, and enables more efficient communication with checkers. A graphical user interface supports human checkers by displaying the requested output and results of the checks in an immediately accessible format, highlighting identified issues, potential mitigation options, and tracking decisions made. The major analytical programming languages used by researchers (Python, R, and Stata) are supported by providing front-end packages that interface with the core Python back-end. Source code, packages, and documentation are available under MIT license at https://github.com/AI-SDC/ACRO