Rx: treating bugs as allergies—a safe method to survive software failures. Qin, F., Tucek, J., Sundaresan, J., & Zhou, Y. In Proceedings of the twentieth ACM symposium on Operating systems principles, pages 235–248.
bibtex   
@InProceedings{qin05rx,
  author    = {Qin, Feng and Tucek, Joseph and Sundaresan, Jagadeesan and Zhou, Yuanyuan},
  booktitle = {Proceedings of the twentieth ACM symposium on Operating systems principles},
  date      = {2005},
  title     = {Rx: treating bugs as allergies---a safe method to survive software failures},
  pages     = {235--248},
  comment   = {* context: generic recovery from software bugs
* approach: rollback and re-execution with modified environment upon
  error
* observation/statement that many bugs are triggered by environment

  * diverse references/citations regarding types of software failures
  * categorize approaches to survive software failures

    * micro-/reboot / software-rejuvenation
    * checkpoint, rollback, re-execute
    * application-specific approaches

      * e.g., exception handling, multiple processes

    * non-conventional approaches

* not much details on checkpoint-and-rollback component

  * handles memory state, files, file pointers

* implement wrappers for functions which access the environment

  * malloc, IO incl. network system calls, scheduling, IPC, process
    signaling, etc.
  * modify environment during re-execution via such wrappers

* a lot of text/work regarding checkpoint management
* a proxy to handle failures during communication

  * e.g., so client requests can be replayed in a series of HTTP
    requests

* detailed discussion on implementation limitations

  * checkpoints for multi-threaded applications
  * in distributed systems, all nodes should use their tool, which
    should then also be coordinated (e.g., checkpoint and rollback
    distributed state)

* case study on MySQL, Squid, Apache and CVS servers},
  file      = {:qin05rx - Rx_ treating bugs as allergies---a safe method to survive software failures.pdf:PDF},
  groups    = {dependability by default / dependability wrap},
  timestamp = {2021-08-30},
}

Downloads: 0