Co-evolving Tracing and Fault Injection with Box of Pain. Bittman, D., Miller, E. L, Cui, M., Alvaro, P., He, M., Edupuganti, S., Nayak, N., Sukhomlinov, V., Raphael, R., Shlomo, R., & others In 11th $\{$USENIX$\}$ Workshop on Hot Topics in Cloud Computing (HotCloud 19), 2019.
bibtex   
@InProceedings{bittman19co,
  author    = {Bittman, Daniel and Miller, Ethan L and Cui, Michael and Alvaro, Peter and He, Michael and Edupuganti, Saikrishna and Nayak, Naren and Sukhomlinov, Vadim and Raphael, Roger and Shlomo, Roee and others},
  title     = {Co-evolving Tracing and Fault Injection with Box of Pain},
  booktitle = {11th $\{$USENIX$\}$ Workshop on Hot Topics in Cloud Computing (HotCloud 19)},
  year      = {2019},
  comment   = {* fault injection in distributed systems to enhance fault tolerance

  * i. e., not only find but support improving fault tolerance

* try to find an approach where systems need injection and tracing
  infrastructure
* so they built a tracing framework

  * by observing, e.g., communication, crashes, and exit conditions
  * assume that any failures and failure conditions can be observed
    in the distributed systems' communication

    * hence, they trace and analyze communication only

      * to form a communication graph, a partial order, etc.
      * on system-call level

    * argue, that this is a good trade-off

      * between generality
      * ease of use
      * understanding application-level semantics

* focus on simulating partial failures

  * instead of failures themselves

* describe other tracing approaches in related work
* fault model: timing and crash
* early experiments with Redis

  * injecting in GET/SET requests
  * plot distribution of unique "runs"

    * a "run" is an execution of the experiment and identified
      by the trace of system calls

      * \# I guess

    * i.e., "how often did a certain sequence of syscalls occur"

      * sequence of syscalls maps to communication pattern

        * in distributed system / between nodes

      * within 2000 iterations

  * it shows that most of the time, the system behaves consistently

    * i.e., same injection tends to lead to same sequence of syscalls},
  file      = {:bittman19co - Co-evolving Tracing and Fault Injection with Box of Pain.pdf:PDF},
  groups    = {fault injection, fault injection tools},
  timestamp = {2019-06-20},
}

Downloads: 0