论文标题
因果关系引导的自适应介入调试
Causality-Guided Adaptive Interventional Debugging
论文作者
论文摘要
运行时非确定性是现代数据库应用程序中生活的事实。先前的研究表明,非确定性可能导致应用程序间歇性崩溃,变得无反应或经历数据腐败。我们提出适应性介入的调试(AID),以调试此类间歇性失败。 AID结合了现有的统计调试,因果分析,故障注入和小组测试技术,以新颖的方式(1)查明应用间歇性故障的根本原因,(2)对根部原因如何触发故障产生解释。通过首先识别与失败密切相关的一组运行时行为(称为谓词)来起作用。然后,它利用谓词的时间特性来(过度)鉴定其因果关系。最后,它使用故障注入来执行谓词对干预措施的序列,并发现其真正的因果关系。这使辅助能够确定真正的根本原因及其与失败的因果关系。我们理论上分析了如何快速辅助能够融合到识别。我们评估了六个现实世界中的援助,这些应用在特定输入下间歇性失败。在每种情况下,AID都能够识别根本原因并解释根本原因是如何触发故障的,比组测试快得多,并且比统计调试更精确。我们还通过许多合成生成的应用程序来评估援助,并确认收益也适用。
Runtime nondeterminism is a fact of life in modern database applications. Previous research has shown that nondeterminism can cause applications to intermittently crash, become unresponsive, or experience data corruption. We propose Adaptive Interventional Debugging (AID) for debugging such intermittent failures. AID combines existing statistical debugging, causal analysis, fault injection, and group testing techniques in a novel way to (1) pinpoint the root cause of an application's intermittent failure and (2) generate an explanation of how the root cause triggers the failure. AID works by first identifying a set of runtime behaviors (called predicates) that are strongly correlated to the failure. It then utilizes temporal properties of the predicates to (over)-approximate their causal relationships. Finally, it uses fault injection to execute a sequence of interventions on the predicates and discover their true causal relationships. This enables AID to identify the true root cause and its causal relationship to the failure. We theoretically analyze how fast AID can converge to the identification. We evaluate AID with six real-world applications that intermittently fail under specific inputs. In each case, AID was able to identify the root cause and explain how the root cause triggered the failure, much faster than group testing and more precisely than statistical debugging. We also evaluate AID with many synthetically generated applications with known root causes and confirm that the benefits also hold for them.