Title : ( An FSM-based monitoring technique to differentiate between follow-up and original errors in safety-critical distributed embedded systems )
Authors: Yasser Sedaghat , Seyed Ghassem Miremadi ,Abstract
Nowadays, distributed embedded systems are employed in many safety-critical applications such as X-by-Wire. These systems are composed of several nodes interconnected by a network. Studies show that a transient fault in the communication controller of a network node can lead to errors in the fault site node (called original errors) and/or in the neighbor nodes (called follow-up errors). The communication controller of a network node can be halted due to an error, which may be a follow-up error. In this situation, a follow-up error leads to halt the correct operation of a fault-free controller while the fault site node, i.e. the faulty controller, still continues its operation. In this paper, an analysis shows that the occurrence probability of follow-up errors in communication protocols is noticeable. Consequently, it is important to provide a technique to recognize the error’s nature, i.e. original or follow-up, in each node. This paper proposes a novel low-cost monitoring technique to differentiate follow-up errors from original errors. The proposed technique is based on monitoring the operational states of a communication controller. In this paper, this technique has been applied to the FlexRay protocol. However, it is applicable for all communication protocols having an FSM-based description such as FlexRay, TTP/C, and TT-Ethernet. To evaluate the monitoring technique, a FlexRay-based network including 4 nodes was designed and implemented. The low-cost monitoring technique was as well implemented inside each node of the network. A total of 135,600 transient bit-flip faults were injected in the communication controller of one node. The results showed that about 6.0% of injected faults lead to original errors. This figure for follow-up errors was about 6.1%. The results as well showed that the accuracy of the proposed technique to differentiate between the follow-up and original errors is about 97% at merely 1.4% hardware overhead. This level of accuracy and cost makes the proposed technique a feasible solution to enhance the reliability of communication controllers.
Keywords
, Distributed embedded systems, Error propagation, Follow-up errors, FlexRay protocol, Transient faults, FSM-based monitoring.@article{paperid:1026816,
author = {Sedaghat, Yasser and Seyed Ghassem Miremadi},
title = {An FSM-based monitoring technique to differentiate between follow-up and original errors in safety-critical distributed embedded systems},
journal = {Microelectronics Journal},
year = {2011},
volume = {42},
number = {6},
month = {April},
issn = {1879-2391},
pages = {863--873},
numpages = {10},
keywords = {Distributed embedded systems; Error propagation; Follow-up errors; FlexRay protocol; Transient faults; FSM-based monitoring.},
}
%0 Journal Article
%T An FSM-based monitoring technique to differentiate between follow-up and original errors in safety-critical distributed embedded systems
%A Sedaghat, Yasser
%A Seyed Ghassem Miremadi
%J Microelectronics Journal
%@ 1879-2391
%D 2011