FT-Grid: A Fault-Tolerance System for e-Science. Townend, P., Groth, P., Looker, N., & Xu, J. In Proceedings of the UK OST e-Science Fourth All Hands Meeting (AHM05), September, 2005.
abstract   bibtex   
The size and complexity of many e-Science applications suggests that they may be very prone to errors and failures; the cost of recovering from failures may also be high. The FT-Grid system, developed as part of the e-Demand project at the University of Leeds [1], introduces a replication-based fault tolerance scheme that allows faults occurring in service-based systems to be tolerated, thus increasing the dependability of such systems. This paper details the progress that has been made in the development of FT-Grid, including both a GUI client and also an FT-Grid web service interface. We show empirical evidence of the dependability benefits offered by FT-Grid, by performing a dependability analysis on the results of fault injection testing performed with the WS-FIT tool at the University of Durham. We then illustrate a potential problem with voting based fault tolerance approaches in the service-oriented paradigm ? namely, that individual channels within fault-tolerant systems may invoke common services as part of their workflow, thus increasing the potential for commonmode failure. We propose a solution to this issue by using the technique of provenance to provide FT-Grid with topological awareness. We implement a large test system, and - with the use of the PreServ provenance system developed as part of the PASOA e-Science project at the University of Southampton - perform a large number of experiments which show that a provenance-aware FTGrid results in a much more dependable system than any of the other configurations tested, whilst imposing a negligible timing overhead.
@inproceedings{ Townend2005a,
  author    = {Paul Townend and Paul Groth and Nik Looker and Jie Xu},
  title     = {FT-Grid: A Fault-Tolerance System for e-Science}, 
  abstract   = {The size and complexity of many e-Science applications suggests that they may be very prone to errors and failures; the cost of recovering from failures may also be high. The FT-Grid system, developed as part of the e-Demand project at the University of Leeds [1], introduces a replication-based fault tolerance scheme that allows faults occurring in service-based systems to be tolerated, thus increasing the dependability of such systems. This paper details the progress that has been made in the development of FT-Grid, including both a GUI client and also an FT-Grid web service interface. We show empirical evidence of the dependability benefits offered by FT-Grid, by performing a dependability analysis on the results of fault injection testing performed with the WS-FIT tool at the University of Durham. We then illustrate a potential problem with voting based fault tolerance approaches in the service-oriented paradigm ? namely, that individual channels within fault-tolerant systems may invoke common services as part of their workflow, thus increasing the potential for commonmode failure. We propose a solution to this issue by using the technique of provenance to provide FT-Grid with topological awareness. We implement a large test system, and - with the use of the PreServ provenance system developed as part of the PASOA e-Science project at the University of Southampton - perform a large number of experiments which show that a provenance-aware FTGrid results in a much more dependable system than any of the other configurations tested, whilst imposing a negligible timing overhead.},
  booktitle   = {Proceedings of the UK OST e-Science Fourth All Hands Meeting (AHM05)},
  month   = {September} ,
  year   = {2005}
}

Downloads: 0