论文标题

运行时可靠性监视复杂的耐受耐受性策略

Runtime reliability monitoring for complex fault-tolerance policies

论文作者

Fantechi, Alessandro, Gori, Gloria, Papini, Marco

论文摘要

复杂的网络物理系统的可靠性对于保证所提供服务的可用性和安全性是必要的。采用了各种且复杂的容错策略来提高可靠性,其中包括各种冗余和动态重新配置以解决硬件可靠性以及多样性或软件恢复的特定软件可靠性技术。这些复杂的策略要求对系统执行的灵活运行时检查,这些检查超出了预先编程的健康状况的常规运行时监视,以最大程度地减少维护成本。在复杂系统中应用此方法中定义合适的监视模型仍然是一个挑战。在本文中,我们提出了一种新颖的方法,基于可靠性的监视(RBM),用于对复杂系统中可靠性的灵活运行时监视,该方法利用了定期应用于运行时诊断数据的层次可靠性模型:这允许动态计划维护活动,旨在防止失败。作为概念的证明,我们将展示如何将RBM应用于实施不同耐故障策略的2OO3软件系统。

Reliability of complex Cyber-Physical Systems is necessary to guarantee availability and/or safety of the provided services. Diverse and complex fault tolerance policies are adopted to enhance reliability, that include a varied mix of redundancy and dynamic reconfiguration to address hardware reliability, as well as specific software reliability techniques like diversity or software rejuvenation. These complex policies call for flexible runtime health checks of system executions that go beyond conventional runtime monitoring of pre-programmed health conditions, also in order to minimize maintenance costs. Defining a suitable monitoring model in the application of this method in complex systems is still a challenge. In this paper we propose a novel approach, Reliability Based Monitoring (RBM), for a flexible runtime monitoring of reliability in complex systems, that exploits a hierarchical reliability model periodically applied to runtime diagnostics data: this allows to dynamically plan maintenance activities aimed at prevent failures. As a proof of concept, we show how to apply RBM to a 2oo3 software system implementing different fault-tolerant policies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源