Co-evolving Tracing and Fault Injection with Box of Pain. Bittman, D., Miller, E. L, Cui, M., Alvaro, P., He, M., Edupuganti, S., Nayak, N., Sukhomlinov, V., Raphael, R., Shlomo, R., & others In 11th $\{$USENIX$\}$ Workshop on Hot Topics in Cloud Computing (HotCloud 19), 2019. bibtex @InProceedings{bittman19co,
author = {Bittman, Daniel and Miller, Ethan L and Cui, Michael and Alvaro, Peter and He, Michael and Edupuganti, Saikrishna and Nayak, Naren and Sukhomlinov, Vadim and Raphael, Roger and Shlomo, Roee and others},
title = {Co-evolving Tracing and Fault Injection with Box of Pain},
booktitle = {11th $\{$USENIX$\}$ Workshop on Hot Topics in Cloud Computing (HotCloud 19)},
year = {2019},
comment = {* fault injection in distributed systems to enhance fault tolerance
* i. e., not only find but support improving fault tolerance
* try to find an approach where systems need injection and tracing
infrastructure
* so they built a tracing framework
* by observing, e.g., communication, crashes, and exit conditions
* assume that any failures and failure conditions can be observed
in the distributed systems' communication
* hence, they trace and analyze communication only
* to form a communication graph, a partial order, etc.
* on system-call level
* argue, that this is a good trade-off
* between generality
* ease of use
* understanding application-level semantics
* focus on simulating partial failures
* instead of failures themselves
* describe other tracing approaches in related work
* fault model: timing and crash
* early experiments with Redis
* injecting in GET/SET requests
* plot distribution of unique "runs"
* a "run" is an execution of the experiment and identified
by the trace of system calls
* \# I guess
* i.e., "how often did a certain sequence of syscalls occur"
* sequence of syscalls maps to communication pattern
* in distributed system / between nodes
* within 2000 iterations
* it shows that most of the time, the system behaves consistently
* i.e., same injection tends to lead to same sequence of syscalls},
file = {:bittman19co - Co-evolving Tracing and Fault Injection with Box of Pain.pdf:PDF},
groups = {fault injection, fault injection tools},
timestamp = {2019-06-20},
}
Downloads: 0
{"_id":"AxCKNKfeRe6kDfw5G","bibbaseid":"bittman-miller-cui-alvaro-he-edupuganti-nayak-sukhomlinov-etal-coevolvingtracingandfaultinjectionwithboxofpain-2019","author_short":["Bittman, D.","Miller, E. L","Cui, M.","Alvaro, P.","He, M.","Edupuganti, S.","Nayak, N.","Sukhomlinov, V.","Raphael, R.","Shlomo, R.","others"],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"propositions":[],"lastnames":["Bittman"],"firstnames":["Daniel"],"suffixes":[]},{"propositions":[],"lastnames":["Miller"],"firstnames":["Ethan","L"],"suffixes":[]},{"propositions":[],"lastnames":["Cui"],"firstnames":["Michael"],"suffixes":[]},{"propositions":[],"lastnames":["Alvaro"],"firstnames":["Peter"],"suffixes":[]},{"propositions":[],"lastnames":["He"],"firstnames":["Michael"],"suffixes":[]},{"propositions":[],"lastnames":["Edupuganti"],"firstnames":["Saikrishna"],"suffixes":[]},{"propositions":[],"lastnames":["Nayak"],"firstnames":["Naren"],"suffixes":[]},{"propositions":[],"lastnames":["Sukhomlinov"],"firstnames":["Vadim"],"suffixes":[]},{"propositions":[],"lastnames":["Raphael"],"firstnames":["Roger"],"suffixes":[]},{"propositions":[],"lastnames":["Shlomo"],"firstnames":["Roee"],"suffixes":[]},{"firstnames":[],"propositions":[],"lastnames":["others"],"suffixes":[]}],"title":"Co-evolving Tracing and Fault Injection with Box of Pain","booktitle":"11th $\\{$USENIX$\\}$ Workshop on Hot Topics in Cloud Computing (HotCloud 19)","year":"2019","comment":"* fault injection in distributed systems to enhance fault tolerance * i. e., not only find but support improving fault tolerance * try to find an approach where systems need injection and tracing infrastructure * so they built a tracing framework * by observing, e.g., communication, crashes, and exit conditions * assume that any failures and failure conditions can be observed in the distributed systems' communication * hence, they trace and analyze communication only * to form a communication graph, a partial order, etc. * on system-call level * argue, that this is a good trade-off * between generality * ease of use * understanding application-level semantics * focus on simulating partial failures * instead of failures themselves * describe other tracing approaches in related work * fault model: timing and crash * early experiments with Redis * injecting in GET/SET requests * plot distribution of unique \"runs\" * a \"run\" is an execution of the experiment and identified by the trace of system calls * # I guess * i.e., \"how often did a certain sequence of syscalls occur\" * sequence of syscalls maps to communication pattern * in distributed system / between nodes * within 2000 iterations * it shows that most of the time, the system behaves consistently * i.e., same injection tends to lead to same sequence of syscalls","file":":bittman19co - Co-evolving Tracing and Fault Injection with Box of Pain.pdf:PDF","groups":"fault injection, fault injection tools","timestamp":"2019-06-20","bibtex":"@InProceedings{bittman19co,\n author = {Bittman, Daniel and Miller, Ethan L and Cui, Michael and Alvaro, Peter and He, Michael and Edupuganti, Saikrishna and Nayak, Naren and Sukhomlinov, Vadim and Raphael, Roger and Shlomo, Roee and others},\n title = {Co-evolving Tracing and Fault Injection with Box of Pain},\n booktitle = {11th $\\{$USENIX$\\}$ Workshop on Hot Topics in Cloud Computing (HotCloud 19)},\n year = {2019},\n comment = {* fault injection in distributed systems to enhance fault tolerance\n\n * i. e., not only find but support improving fault tolerance\n\n* try to find an approach where systems need injection and tracing\n infrastructure\n* so they built a tracing framework\n\n * by observing, e.g., communication, crashes, and exit conditions\n * assume that any failures and failure conditions can be observed\n in the distributed systems' communication\n\n * hence, they trace and analyze communication only\n\n * to form a communication graph, a partial order, etc.\n * on system-call level\n\n * argue, that this is a good trade-off\n\n * between generality\n * ease of use\n * understanding application-level semantics\n\n* focus on simulating partial failures\n\n * instead of failures themselves\n\n* describe other tracing approaches in related work\n* fault model: timing and crash\n* early experiments with Redis\n\n * injecting in GET/SET requests\n * plot distribution of unique \"runs\"\n\n * a \"run\" is an execution of the experiment and identified\n by the trace of system calls\n\n * \\# I guess\n\n * i.e., \"how often did a certain sequence of syscalls occur\"\n\n * sequence of syscalls maps to communication pattern\n\n * in distributed system / between nodes\n\n * within 2000 iterations\n\n * it shows that most of the time, the system behaves consistently\n\n * i.e., same injection tends to lead to same sequence of syscalls},\n file = {:bittman19co - Co-evolving Tracing and Fault Injection with Box of Pain.pdf:PDF},\n groups = {fault injection, fault injection tools},\n timestamp = {2019-06-20},\n}\n\n","author_short":["Bittman, D.","Miller, E. L","Cui, M.","Alvaro, P.","He, M.","Edupuganti, S.","Nayak, N.","Sukhomlinov, V.","Raphael, R.","Shlomo, R.","others"],"key":"bittman19co","id":"bittman19co","bibbaseid":"bittman-miller-cui-alvaro-he-edupuganti-nayak-sukhomlinov-etal-coevolvingtracingandfaultinjectionwithboxofpain-2019","role":"author","urls":{},"metadata":{"authorlinks":{}},"downloads":0,"html":""},"bibtype":"inproceedings","biburl":"https://bibbase.org/network/files/AsPiHTmHHGjgy6xSQ","dataSources":["wjZw5s4JL49uLwn3p"],"keywords":[],"search_terms":["evolving","tracing","fault","injection","box","pain","bittman","miller","cui","alvaro","he","edupuganti","nayak","sukhomlinov","raphael","shlomo","others"],"title":"Co-evolving Tracing and Fault Injection with Box of Pain","year":2019}