\n \n \n
\n
\n\n \n \n \n \n \n \n Layered approach for runtime fault recovery in NOC-Based MPSOCS.\n \n \n \n \n\n\n \n Wachter, E. W.\n\n\n \n\n\n\n Ph.D. Thesis, Faculdade de Informática, Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, Brasil, 2015.\n
\n\n
\n\n
\n\n
\n\n \n \n
link\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@phdthesis{wachter2015,\n title = {{Layered approach for runtime fault recovery in NOC-Based MPSOCS}},\n school = {Faculdade de Informática, {Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)}},\n author = {Wachter, Eduardo Weber},\n year = {2015},\n address = {Porto Alegre, Brasil},\n keywords = {GAPH,MPSoC, fault tolerance, coadvised},\n url_link = {http://hdl.handle.net/10923/7538},\n abstract = {Mechanisms for fault-tolerance in MPSoCs are mandatory to cope with defects during fabrication or faults during product lifetime. For instance, permanent faults on the interconnect network can stall or crash applications, even though the MPSoCs’ network has alternative faultfree paths to a given destination. Runtime Fault Tolerance provide self-organization mechanisms to continue delivering their processing services despite defective cores due to the presence of permanent and/or transient faults throughout their lifetime. This Thesis presents a runtime layered approach to a fault-tolerant MPSoC, where each layer is responsible for solving one part of the problem. The approach is built on top of a novel small specialized network used to search fault-free paths. The first layer, named physical layer, is responsible for the fault detection and fault isolation of defective routers. The second layer, named the network layer, is responsible for replacing the original faulty path by an alternative fault-free path. A fault-tolerant routing method executes a path search mechanism and reconfigures the network to use the faulty-free path. The third layer, named transport layer, implements a fault-tolerant communication protocol that triggers the path search in the network layer when a packet does not reach its destination.The last layer, application layer, is responsible for moving tasks from the defective processing element (PE) to a healthy PE, saving the task’s internal state, and restoring it in case of fault while executing a task. Results at the network layer, show a fast path finding method. The entire process of finding alternative paths takes typically less than 2000 clock cycles or 20 microseconds. In the transport layer, different approaches were evaluated being capable of detecting a lost message and start the retransmission. The results show that the overhead to retransmit the message is 2.46X compared to the time to transmit a message without fault, being all other messages transmitted with no overhead. For the DTW, MPEG, and synthetic applications the average-case application execution overhead was 0.17%, 0.09%, and 0.42%, respectively. This represents less than 5% of the application execution overhead worst case. At the application layer, the entire fault recovery protocol executes fast, with a low execution time overhead with no faults (5.67%) and with faults (17.33% - 28.34%).}\n}\n\n\n
\n
\n\n\n
\n Mechanisms for fault-tolerance in MPSoCs are mandatory to cope with defects during fabrication or faults during product lifetime. For instance, permanent faults on the interconnect network can stall or crash applications, even though the MPSoCs’ network has alternative faultfree paths to a given destination. Runtime Fault Tolerance provide self-organization mechanisms to continue delivering their processing services despite defective cores due to the presence of permanent and/or transient faults throughout their lifetime. This Thesis presents a runtime layered approach to a fault-tolerant MPSoC, where each layer is responsible for solving one part of the problem. The approach is built on top of a novel small specialized network used to search fault-free paths. The first layer, named physical layer, is responsible for the fault detection and fault isolation of defective routers. The second layer, named the network layer, is responsible for replacing the original faulty path by an alternative fault-free path. A fault-tolerant routing method executes a path search mechanism and reconfigures the network to use the faulty-free path. The third layer, named transport layer, implements a fault-tolerant communication protocol that triggers the path search in the network layer when a packet does not reach its destination.The last layer, application layer, is responsible for moving tasks from the defective processing element (PE) to a healthy PE, saving the task’s internal state, and restoring it in case of fault while executing a task. Results at the network layer, show a fast path finding method. The entire process of finding alternative paths takes typically less than 2000 clock cycles or 20 microseconds. In the transport layer, different approaches were evaluated being capable of detecting a lost message and start the retransmission. The results show that the overhead to retransmit the message is 2.46X compared to the time to transmit a message without fault, being all other messages transmitted with no overhead. For the DTW, MPEG, and synthetic applications the average-case application execution overhead was 0.17%, 0.09%, and 0.42%, respectively. This represents less than 5% of the application execution overhead worst case. At the application layer, the entire fault recovery protocol executes fast, with a low execution time overhead with no faults (5.67%) and with faults (17.33% - 28.34%).\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Integration of a multi-agent system into a robotic framework: a case study of a cooperative fault diagnosis application.\n \n \n \n \n\n\n \n Morais, M. G.\n\n\n \n\n\n\n Master's thesis, Faculdade de Informática, Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, Brasil, 2015.\n
\n\n
\n\n
\n\n
\n\n \n \n
link\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@mastersthesis{morais2015,\n author = {Morais, Márcio Godoy},\n title = {{Integration of a multi-agent system into a robotic framework: a case study of a cooperative fault diagnosis application}},\n school = {Faculdade de Informática, {Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)}},\n year = {2015},\n address = {Porto Alegre, Brasil},\n url_link = {http://hdl.handle.net/10923/7687},\n keywords = {LSA,robotics, amory},\n abstract = {Programming multi-robot autonomous systems can be extremely complex without appropriate software development techniques to abstract hardware faults, as well as can be hard to deal with the complexity of software required the coordinated autonomous behavior. Real environments are dynamic and unexpected events may occur, leading a robot to unforeseen situations or even fault situations. This work presents a method of integration of Jason multi-agent system into ROS robotic framework. Through this integration, can be easier to describe complex missions by using Jason agent language and its resources, as well as abstracting hardware details from the decision-taken process. Moreover, software modules related to the hardware control and modules which have a high CPU cost are separated from the planning and decision-taken process in software layers, allowing plan and software modules reuse in different missions and robots.Through this integration, Jason resources such as plans reconsideration and contingency plans can be used in a way where they can enable the robot to reconsider its actions and strategies in order to reach its goals or to take actions to deal with unforeseen situations due the environment unpredictability or even some robot hardware fault. The presented integration method also allows the cooperation between multiple robots through a standardized language of communication between agents. The proposed method is validated by a case study applied in real robots where a robot can detect a fault in its hardware and diagnose it through the help of another robot, in a highly abstract method of cooperative diagnosis.}\n}\n\n
\n
\n\n\n
\n Programming multi-robot autonomous systems can be extremely complex without appropriate software development techniques to abstract hardware faults, as well as can be hard to deal with the complexity of software required the coordinated autonomous behavior. Real environments are dynamic and unexpected events may occur, leading a robot to unforeseen situations or even fault situations. This work presents a method of integration of Jason multi-agent system into ROS robotic framework. Through this integration, can be easier to describe complex missions by using Jason agent language and its resources, as well as abstracting hardware details from the decision-taken process. Moreover, software modules related to the hardware control and modules which have a high CPU cost are separated from the planning and decision-taken process in software layers, allowing plan and software modules reuse in different missions and robots.Through this integration, Jason resources such as plans reconsideration and contingency plans can be used in a way where they can enable the robot to reconsider its actions and strategies in order to reach its goals or to take actions to deal with unforeseen situations due the environment unpredictability or even some robot hardware fault. The presented integration method also allows the cooperation between multiple robots through a standardized language of communication between agents. The proposed method is validated by a case study applied in real robots where a robot can detect a fault in its hardware and diagnose it through the help of another robot, in a highly abstract method of cooperative diagnosis.\n
\n\n\n
\n\n\n
\n\n\n\n\n\n