Title : ( An automated framework for selectively tolerating SDC errors based on rigorous instruction-level vulnerability assessment )
Authors: HUSSIN ALHAJ AHMAD , Yasser Sedaghat ,Abstract
The recent trend in most processor manufacturing technologies has significantly increased the vulnerability of embedded systems operating in harsh environments against soft errors. These errors can cause Silent Data Corruptions (SDCs) that produce erroneous execution results silently, disturbing the system’s execution and potentially leading to severe financial, human or environmental disasters. The use of fault tolerance techniques that take into account the performance and constraints of safety-critical systems is therefore essential to improve system reliability efficiently. Given the significant overhead imposed by conventional techniques, e.g., performance loss, increased memory usage, and additional hardware costs, researchers have developed cost-effective software-based techniques for fault tolerance. However, as detection rates grow, these techniques can increase code size and execution time significantly, which creates a challenge. This paper proposes an automated framework for selective fault tolerance of SDCs in software running on different architectures. The framework comprises a sequence of several consecutive techniques executed automatically. It offers a software-based technique that operates at the microarchitecture level and evaluates the vulnerability of program instructions against SDC errors. The framework conducts vulnerability assessment at the binary code level using a non-intrusive, runtime fault injection mechanism. It can inject faults at different granularity levels to maximize fault activation, including fine-grained injection at specific instruction fields or encoding bits, and coarse-grained injection into the entire software system. The framework makes minor modifications to the software being tested, enabling it to run at near-native speed. When SDC vulnerable instructions are identified, the framework selectively protects them automatically using a compiler extension, achieving a more appropriate trade-off between SDC detection and overhead by avoiding overprotection. Our framework was evaluated by conducting a large number of fault injection-based experiments on real-world benchmark programs using the cycle-accurate Gem5 simulator. Leveraging the accurate vulnerability assessment results provided by our framework, the proposed selective technique reduces SDC errors by up to 99% by selectively protecting only 45% of the program’s static instructions, with a performance overhead ranging from 8% to 35%.
Keywords
, Fault tolerance, Transient hardware fault, Silent data corruptions. Vulnerability assessment, Fault injection.@article{paperid:1099224,
author = {ALHAJ AHMAD, HUSSIN and Sedaghat, Yasser},
title = {An automated framework for selectively tolerating SDC errors based on rigorous instruction-level vulnerability assessment},
journal = {Future Generation Computer Systems},
year = {2024},
volume = {157},
number = {1},
month = {August},
issn = {0167-739X},
pages = {392--407},
numpages = {15},
keywords = {Fault tolerance; Transient hardware fault; Silent data corruptions. Vulnerability assessment; Fault injection.},
}
%0 Journal Article
%T An automated framework for selectively tolerating SDC errors based on rigorous instruction-level vulnerability assessment
%A ALHAJ AHMAD, HUSSIN
%A Sedaghat, Yasser
%J Future Generation Computer Systems
%@ 0167-739X
%D 2024