Practical issues in the use of ABFT and a new failure model. Silva, J., Prata, P., Rela, M., & Madeira, H. In Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing, 1998. Digest of Papers, pages 26--35, June, 1998.
doi  abstract   bibtex   
We study the behavior of algorithm based fault tolerance (ABFT) techniques under faults injected according to a quite general fault model. Besides the problem of roundoff error in floating point arithmetic we identify two further weakpoints, namely lack of protection of data during input and output, and incorrect execution of the correctness checks. We propose the robust ABFT technique to handle those weakpoints. We then generalize it to programs that use assertions, where similar problems arise, leading to the technique of robust assertions, whose effectiveness is shown by fault injection experiments on a realistic control application. With this technique a system follows a new failure model, that we call fail-bounded, where with high probability all results produced are either correct or, if wrong, they are within a certain bound of the correct value, whose exact value depends on the output assertions used. We claim that this failure model is very useful to describe the behavior of many low redundancy systems.
@inproceedings{ silva_practical_1998,
  title = {Practical issues in the use of {ABFT} and a new failure model},
  doi = {10.1109/FTCS.1998.689452},
  abstract = {We study the behavior of algorithm based fault tolerance (ABFT) techniques under faults injected according to a quite general fault model. Besides the problem of roundoff error in floating point arithmetic we identify two further weakpoints, namely lack of protection of data during input and output, and incorrect execution of the correctness checks. We propose the robust ABFT technique to handle those weakpoints. We then generalize it to programs that use assertions, where similar problems arise, leading to the technique of robust assertions, whose effectiveness is shown by fault injection experiments on a realistic control application. With this technique a system follows a new failure model, that we call fail-bounded, where with high probability all results produced are either correct or, if wrong, they are within a certain bound of the correct value, whose exact value depends on the output assertions used. We claim that this failure model is very useful to describe the behavior of many low redundancy systems.},
  booktitle = {Twenty-{Eighth} {Annual} {International} {Symposium} on {Fault}-{Tolerant} {Computing}, 1998. {Digest} of {Papers}},
  author = {Silva, J.G. and Prata, P. and Rela, M. and Madeira, H.},
  month = {June},
  year = {1998},
  keywords = {ABFT, Computer crashes, Control systems, Electrical capacitance tomography, Fault detection, Fault model, Floating-point arithmetic, Identity-based encryption, Protection, Robustness, Testing, _done, _model_of_failures, _model_of_faults, algorithm based fault tolerance, computerised control, control application, correctness check, fail bounded model, failure model, fault injection, floating point arithmetic, low redundancy systems, probability, redundancy, robust assertions, roundoff error, roundoff errors, software fault tolerance},
  pages = {26--35}
}

Downloads: 0