A Reinforcement Learning Based Algorithm for Robot Action Planning

A Reinforcement Learning Based Algorithm for Robot Action Planning. Svaco, M., Jerbic, B., Polančec, M., & Šuligoj, F. June, 2018.
abstract bibtex

The learning process that arises in response to the visual perception of the environment is the starting point for numerous research in the field of applied and cognitive robotics. In this research, we propose a reinforcement learning based action planning algorithm for the assembly of spatial structures with an autonomous robot in an unstructured environment. We have developed an algorithm based on temporal difference learning using linear base functions for the approximation of the state-value-function because of a large number of discrete states that the autonomous robot can encounter. The aim is to find the optimal sequence of actions that the agent (robot) needs to take in order to move objects in a 2D environment until they reach the predefined target state. The algorithm is divided into two parts. In the first part, the goal is to learn the parameters in order to properly approximate the Q function. In the second part of the algorithm, the obtained parameters are used to define the sequence of actions for a UR3 robot arm. We present a preliminary validation of the algorithm in an experimental laboratory scenario. 1 Introduction According to [1], [2], development of robotic systems today is mostly characterized by the development of autonomous robotic systems that are capable of interpreting the environment and learning from experience. An autonomous robot is a device with motor skills and sensors for collecting feedback from a dynamic environment. The learning process that arises in response to visual perception of the environment is the starting point for numerous research in the field of robotics and artificial intelligence. The process of action planning of an autonomous robot over an unstructured set of objects was described in [1] and [3]. The main difference in these two papers is the robotic learning methodology. In [3], the process of learning takes place through the imitation of a teacher, where[1] precipitates the spatial arrangement of objects and uses a novel genetic algorithm for robot action planning. The genetic algorithm uses a predetermined sequence of actions to in order to generate the optimal order of actions. The methodology used in this paper utilizes reinforcement learning, based on which an

@book{svaco_reinforcement_2018,
	title = {A {Reinforcement} {Learning} {Based} {Algorithm} for {Robot} {Action} {Planning}},
	abstract = {The learning process that arises in response to the visual perception of the environment is the starting point for numerous research in the field of applied and cognitive robotics. In this research, we propose a reinforcement learning based action planning algorithm for the assembly of spatial structures with an autonomous robot in an unstructured environment. We have developed an algorithm based on temporal difference learning using linear base functions for the approximation of the state-value-function because of a large number of discrete states that the autonomous robot can encounter. The aim is to find the optimal sequence of actions that the agent (robot) needs to take in order to move objects in a 2D environment until they reach the predefined target state. The algorithm is divided into two parts. In the first part, the goal is to learn the parameters in order to properly approximate the Q function. In the second part of the algorithm, the obtained parameters are used to define the sequence of actions for a UR3 robot arm. We present a preliminary validation of the algorithm in an experimental laboratory scenario. 1 Introduction According to [1], [2], development of robotic systems today is mostly characterized by the development of autonomous robotic systems that are capable of interpreting the environment and learning from experience. An autonomous robot is a device with motor skills and sensors for collecting feedback from a dynamic environment. The learning process that arises in response to visual perception of the environment is the starting point for numerous research in the field of robotics and artificial intelligence. The process of action planning of an autonomous robot over an unstructured set of objects was described in [1] and [3]. The main difference in these two papers is the robotic learning methodology. In [3], the process of learning takes place through the imitation of a teacher, where[1] precipitates the spatial arrangement of objects and uses a novel genetic algorithm for robot action planning. The genetic algorithm uses a predetermined sequence of actions to in order to generate the optimal order of actions. The methodology used in this paper utilizes reinforcement learning, based on which an},
	author = {Svaco, Marko and Jerbic, Bojan and Polančec, Mateo and Šuligoj, Filip},
	month = jun,
	year = {2018},
}

Downloads: 0

{"_id":"a7LpoSLkywWeRD3t8","bibbaseid":"svaco-jerbic-polanec-uligoj-areinforcementlearningbasedalgorithmforrobotactionplanning-2018","author_short":["Svaco, M.","Jerbic, B.","Polančec, M.","Šuligoj, F."],"bibdata":{"bibtype":"book","type":"book","title":"A Reinforcement Learning Based Algorithm for Robot Action Planning","abstract":"The learning process that arises in response to the visual perception of the environment is the starting point for numerous research in the field of applied and cognitive robotics. In this research, we propose a reinforcement learning based action planning algorithm for the assembly of spatial structures with an autonomous robot in an unstructured environment. We have developed an algorithm based on temporal difference learning using linear base functions for the approximation of the state-value-function because of a large number of discrete states that the autonomous robot can encounter. The aim is to find the optimal sequence of actions that the agent (robot) needs to take in order to move objects in a 2D environment until they reach the predefined target state. The algorithm is divided into two parts. In the first part, the goal is to learn the parameters in order to properly approximate the Q function. In the second part of the algorithm, the obtained parameters are used to define the sequence of actions for a UR3 robot arm. We present a preliminary validation of the algorithm in an experimental laboratory scenario. 1 Introduction According to [1], [2], development of robotic systems today is mostly characterized by the development of autonomous robotic systems that are capable of interpreting the environment and learning from experience. An autonomous robot is a device with motor skills and sensors for collecting feedback from a dynamic environment. The learning process that arises in response to visual perception of the environment is the starting point for numerous research in the field of robotics and artificial intelligence. The process of action planning of an autonomous robot over an unstructured set of objects was described in [1] and [3]. The main difference in these two papers is the robotic learning methodology. In [3], the process of learning takes place through the imitation of a teacher, where[1] precipitates the spatial arrangement of objects and uses a novel genetic algorithm for robot action planning. The genetic algorithm uses a predetermined sequence of actions to in order to generate the optimal order of actions. The methodology used in this paper utilizes reinforcement learning, based on which an","author":[{"propositions":[],"lastnames":["Svaco"],"firstnames":["Marko"],"suffixes":[]},{"propositions":[],"lastnames":["Jerbic"],"firstnames":["Bojan"],"suffixes":[]},{"propositions":[],"lastnames":["Polančec"],"firstnames":["Mateo"],"suffixes":[]},{"propositions":[],"lastnames":["Šuligoj"],"firstnames":["Filip"],"suffixes":[]}],"month":"June","year":"2018","bibtex":"@book{svaco_reinforcement_2018,\n\ttitle = {A {Reinforcement} {Learning} {Based} {Algorithm} for {Robot} {Action} {Planning}},\n\tabstract = {The learning process that arises in response to the visual perception of the environment is the starting point for numerous research in the field of applied and cognitive robotics. In this research, we propose a reinforcement learning based action planning algorithm for the assembly of spatial structures with an autonomous robot in an unstructured environment. We have developed an algorithm based on temporal difference learning using linear base functions for the approximation of the state-value-function because of a large number of discrete states that the autonomous robot can encounter. The aim is to find the optimal sequence of actions that the agent (robot) needs to take in order to move objects in a 2D environment until they reach the predefined target state. The algorithm is divided into two parts. In the first part, the goal is to learn the parameters in order to properly approximate the Q function. In the second part of the algorithm, the obtained parameters are used to define the sequence of actions for a UR3 robot arm. We present a preliminary validation of the algorithm in an experimental laboratory scenario. 1 Introduction According to [1], [2], development of robotic systems today is mostly characterized by the development of autonomous robotic systems that are capable of interpreting the environment and learning from experience. An autonomous robot is a device with motor skills and sensors for collecting feedback from a dynamic environment. The learning process that arises in response to visual perception of the environment is the starting point for numerous research in the field of robotics and artificial intelligence. The process of action planning of an autonomous robot over an unstructured set of objects was described in [1] and [3]. The main difference in these two papers is the robotic learning methodology. In [3], the process of learning takes place through the imitation of a teacher, where[1] precipitates the spatial arrangement of objects and uses a novel genetic algorithm for robot action planning. The genetic algorithm uses a predetermined sequence of actions to in order to generate the optimal order of actions. The methodology used in this paper utilizes reinforcement learning, based on which an},\n\tauthor = {Svaco, Marko and Jerbic, Bojan and Polančec, Mateo and Šuligoj, Filip},\n\tmonth = jun,\n\tyear = {2018},\n}\n\n","author_short":["Svaco, M.","Jerbic, B.","Polančec, M.","Šuligoj, F."],"key":"svaco_reinforcement_2018","id":"svaco_reinforcement_2018","bibbaseid":"svaco-jerbic-polanec-uligoj-areinforcementlearningbasedalgorithmforrobotactionplanning-2018","role":"author","urls":{},"metadata":{"authorlinks":{}}},"bibtype":"book","biburl":"https://api.zotero.org/groups/5666190/collections/C5HS7N6U/items?key=v7LvdlDzMc6wuHelauUyxPKG&format=bibtex&limit=100","dataSources":["JYPZ6Z5hKfiFKzh3P"],"keywords":[],"search_terms":["reinforcement","learning","based","algorithm","robot","action","planning","svaco","jerbic","polančec","šuligoj"],"title":"A Reinforcement Learning Based Algorithm for Robot Action Planning","year":2018}