Computers and Electrical Engineering, Volume (116), No (1), Year (2024-5) , Pages (109222-17)

Title : ( BiGResi: Robust bit-level fault injection framework for assessing intrinsic software resilience against soft errors )

Authors: HUSSIN ALHAJ AHMAD , Yasser Sedaghat ,

Citation: BibTeX | EndNote

Abstract

Radiation-induced soft errors, despite rare, pose a significant threat to the reliability of systems. Assessing the intrinsic resilience of software to soft errors is therefore essential for building fault tolerant systems cost-effectively. Analytical models, while fast, can be imprecise. In contrast, Fault Injection (FI) has been successfully applied as a mature method for reliability assessment. While high-level FI offers less accuracy, existing low-level techniques can enhance resilience assessment accuracy by sacrificing some desirable features like fault coverage and intrusiveness. Furthermore, these techniques are often driven by random FI campaigns, making establishing a clear correlation between application characteristics and resilience challenging. This paper presents BiGResi, a versatile software-based framework for assessing software resilience. BiGResi overcomes the limitations of random, instruction type-agnostic FI techniques by evaluating resilience at a low-level granularity, considering instruction type and bit location. Furthermore, it targets the instruction set architecture (ISA), enhancing assessment accuracy by revealing architecturally visible faults. BiGResi employs a timing-based FI mechanism with negligible modifications to the target software, minimizing intrusiveness and ensuring near-native speed. BiGResi’s accuracy is empirically evaluated through many FI campaigns targeting different benchmarks with diverse characteristics. We observed that instruction types, ISA encoding bits, and bit location are key factors to consider when assessing software resilience. Finally, BiGResi’s effectiveness is demonstrated by selectively applying instruction protection, resulting in an average reduction of silent data corruptions (SDCs) by 73.80%, with a performance overhead of 15.46%. Furthermore, allowing a slightly higher overhead of 22% can improve the SDC detection rate by up to 93.83%.

Keywords

, Fault tolerance, Transient hardware faults, Silent data corruptions, Software resilience assessment, Fault injection.
برای دانلود از شناسه و رمز عبور پرتال پویا استفاده کنید.

@article{paperid:1099222,
author = {ALHAJ AHMAD, HUSSIN and Sedaghat, Yasser},
title = {BiGResi: Robust bit-level fault injection framework for assessing intrinsic software resilience against soft errors},
journal = {Computers and Electrical Engineering},
year = {2024},
volume = {116},
number = {1},
month = {May},
issn = {0045-7906},
pages = {109222--17},
numpages = {-109205},
keywords = {Fault tolerance; Transient hardware faults; Silent data corruptions; Software resilience assessment; Fault injection.},
}

[Download]

%0 Journal Article
%T BiGResi: Robust bit-level fault injection framework for assessing intrinsic software resilience against soft errors
%A ALHAJ AHMAD, HUSSIN
%A Sedaghat, Yasser
%J Computers and Electrical Engineering
%@ 0045-7906
%D 2024

[Download]