Rethread: A Low-cost Transient Fault Recovery Scheme for Multithreaded Processors. Fu, J., Yang, Q., Poss, R., Jesshope, C., & Zhang, C. In Proc. 9th International Conference on Availability, Reliability and Security (ARES'14), pages 88–93, University of Fribourg, Switzerland, September, 2014. IEEE.
Rethread: A Low-cost Transient Fault Recovery Scheme for Multithreaded Processors [link]Doi  doi  abstract   bibtex   
Transient fault recovery is important in processor availability. However, significant silicon or performance overheads are characteristics of existing techniques. We uncover an opportunity to reduce the overheads dramatically in modern processors that appears as a side-effect of introducing hard- ware multithreading to improve performance. We observe that threads are usually short code sequences with no branches and few memory side-effects, which means that the number of checkpoints is small and constant. In addition, the state structures of a thread already presented in hardware can be reused to provide checkpointing. In this paper, we demonstrate this principle of using a hardware/software co-design called Rethread, which features compiler-generated code annotations and automatic recovery in hardware by restarting threads. This approach provides the ability to recover from transient faults without dedicated hardware. Moreover, results show performance degradation under both fault-free condition (\<5%) and as a function of fault rate.
@inproceedings{fu14ares,
	Abstract = {Transient fault recovery is important in processor availability. However, significant silicon or performance overheads are characteristics of existing techniques. We uncover an opportunity to reduce the overheads dramatically in modern processors that appears as a side-effect of introducing hard- ware multithreading to improve performance. We observe that threads are usually short code sequences with no branches and few memory side-effects, which means that the number of checkpoints is small and constant. In addition, the state structures of a thread already presented in hardware can be reused to provide checkpointing. In this paper, we demonstrate this principle of using a hardware/software co-design called Rethread, which features compiler-generated code annotations and automatic recovery in hardware by restarting threads. This approach provides the ability to recover from transient faults without dedicated hardware. Moreover, results show performance degradation under both fault-free condition (\<5\%) and as a function of fault rate.},
	Address = {University of Fribourg, Switzerland},
	Author = {Jian Fu and Qiang Yang and Raphael Poss and Chris Jesshope and Chunyuan Zhang},
	Booktitle = {Proc. 9th International Conference on Availability, Reliability and Security (ARES'14)},


	Doi = {10.1109/ARES.2014.18}, Urldoi = {http://dx.doi.org/10.1109/ARES.2014.18},
	Month = {September},
	Pages = {88--93},
	Publisher = {IEEE},
	Title = {Rethread: A Low-cost Transient Fault Recovery Scheme for Multithreaded Processors},
	Year = {2014},
	}

Downloads: 0