\n \n \n
\n
\n\n \n \n \n \n \n \n Environment Curriculum Generation via Large Language Models.\n \n \n \n \n\n\n \n Liang, W., Wang, S., Wang, H., Bastani, O., Jayaraman*, D., & Ma*, Y. J.\n\n\n \n\n\n\n
CORL. 2024.\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{liang2024eurekaverse,\ntitle = {Environment Curriculum Generation via Large Language Models},\nauthor = {William Liang and Sam Wang and Hungju Wang and\nOsbert Bastani and Dinesh Jayaraman* and Yecheng Jason Ma*},\nabstract={Recent work has demonstrated that a promising strategy for teaching robots a wide range of complex skills is by training them on a curriculum of progressively more challenging environments. However, developing an effective curriculum of environment distributions currently requires significant expertise, which must be repeated for every new domain. Our key insight is that environments are often naturally represented as code. Thus, we probe whether effective environment curriculum design can be achieved and automated via code generation by large language models (LLM). In this paper, we introduce Eurekaverse, an unsupervised environment design algorithm that uses LLMs to sample progressively more challenging, diverse, and learnable environments for skill training. We validate Eurekaverse's effectiveness in the domain of quadrupedal parkour learning, in which a quadruped robot must traverse through a variety of obstacle courses. The automatic curriculum designed by Eurekaverse enables gradual learning of complex parkour skills in simulation and can successfully transfer to the real-world, outperforming manual training courses designed by humans.},\njournal={CORL},\nyear={2024},\nurl={https://eureka-research.github.io/eurekaverse/}\n}\n
\n
\n\n\n
\n Recent work has demonstrated that a promising strategy for teaching robots a wide range of complex skills is by training them on a curriculum of progressively more challenging environments. However, developing an effective curriculum of environment distributions currently requires significant expertise, which must be repeated for every new domain. Our key insight is that environments are often naturally represented as code. Thus, we probe whether effective environment curriculum design can be achieved and automated via code generation by large language models (LLM). In this paper, we introduce Eurekaverse, an unsupervised environment design algorithm that uses LLMs to sample progressively more challenging, diverse, and learnable environments for skill training. We validate Eurekaverse's effectiveness in the domain of quadrupedal parkour learning, in which a quadruped robot must traverse through a variety of obstacle courses. The automatic curriculum designed by Eurekaverse enables gradual learning of complex parkour skills in simulation and can successfully transfer to the real-world, outperforming manual training courses designed by humans.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Task-Oriented Hierarchical Object Decomposition for Visuomotor Control .\n \n \n \n \n\n\n \n Qian, J., Bucher, B., & Jayaraman, D.\n\n\n \n\n\n\n
CORL. 2024.\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{qian2024hodor,\n title={Task-Oriented Hierarchical Object Decomposition for Visuomotor Control },\n author = {Jianing Qian and Bernadette Bucher and Dinesh Jayaraman},\n abstract={Good pre-trained visual representations could enable robots to learn visuomotor policy efficiently. Still, existing representations take a one-size-fits-all-tasks approach that comes with two important drawbacks: (1) Being completely task-agnostic, these representations cannot effectively ignore any task-irrelevant information in the scene, and (2) They often lack the representational capacity to handle unconstrained/complex real-world scenes. Instead, we propose to train a large combinatorial family of representations organized by scene entities: objects and object parts. This \\underline{h}ierarchical \\underline{o}bject \\underline{d}ecomposition for task-\\underline{o}riented \\underline{r}epresentations (\\methodname) permits selectively assembling different representations specific to each task while scaling in representational capacity with the complexity of the scene and the task. In our experiments, we find that \\methodname outperforms prior pre-trained representations, both scene vector representations and object-centric representations, for sample-efficient imitation learning across 5 simulated and 5 real-world manipulation tasks. We further find that the invariances captured in \\methodname are inherited into downstream policies, which can robustly generalize to out-of-distribution test conditions, permitting zero-shot skill chaining. Appendix and videos: https://sites.google.com/view/hodor-corl24.},\n journal={CORL},\n year={2024},\n url={https://sites.google.com/view/hodor-corl24}\n}\n
\n
\n\n\n
\n Good pre-trained visual representations could enable robots to learn visuomotor policy efficiently. Still, existing representations take a one-size-fits-all-tasks approach that comes with two important drawbacks: (1) Being completely task-agnostic, these representations cannot effectively ignore any task-irrelevant information in the scene, and (2) They often lack the representational capacity to handle unconstrained/complex real-world scenes. Instead, we propose to train a large combinatorial family of representations organized by scene entities: objects and object parts. This ˘nderlinehierarchical ˘nderlineobject ˘nderlinedecomposition for task-˘nderlineoriented ˘nderlinerepresentations (\\methodname) permits selectively assembling different representations specific to each task while scaling in representational capacity with the complexity of the scene and the task. In our experiments, we find that \\methodname outperforms prior pre-trained representations, both scene vector representations and object-centric representations, for sample-efficient imitation learning across 5 simulated and 5 real-world manipulation tasks. We further find that the invariances captured in \\methodname are inherited into downstream policies, which can robustly generalize to out-of-distribution test conditions, permitting zero-shot skill chaining. Appendix and videos: https://sites.google.com/view/hodor-corl24.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Open X-Embodiment: Robotic Learning Datasets and RT-X Models.\n \n \n \n \n\n\n \n collaboration , L.\n\n\n \n\n\n\n
ICRA. 2024.\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{open_x_embodiment_rt_x_2024,\ntitle={Open {X-E}mbodiment: Robotic Learning Datasets and {RT-X} Models},\nauthor = {Large collaboration},\njournal = {ICRA},\nyear = {2024},\nurl = {https://robotics-transformer-x.github.io/}\n}\n
\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n DrEureka: Language Model Guided Sim-To-Real Transfer.\n \n \n \n\n\n \n Ma, Y. J., Liang, W., Wang, H., Wang, S., Zhu, Y., Fan, L., Bastani, O., & Jayaraman, D.\n\n\n \n\n\n\n
RSS. 2024.\n
\n\n
\n\n
\n\n
\n\n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{ma2024dreureka,\n title = {DrEureka: Language Model Guided Sim-To-Real Transfer},\n author = {Yecheng Jason Ma and William Liang and Hungju Wang and Sam Wang and Yuke Zhu and Linxi Fan and Osbert Bastani and Dinesh Jayaraman},\n year = {2024},\n journal = {RSS}\n}\n
\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset.\n \n \n \n\n\n \n Khazatsky, A., Pertsch, K., Nair, S., Balakrishna, A., Dasari, S., Karamcheti, S., Nasiriany, S., Srirama, M. K., Chen, L. Y., Ellis, K., Fagan, P. D., Hejna, J., Itkina, M., Lepert, M., Ma, Y. J., Miller, P. T., Wu, J., Belkhale, S., Dass, S., Ha, H., Jain, A., Lee, A., Lee, Y., Memmel, M., Park, S., Radosavovic, I., Wang, K., Zhan, A., Black, K., Chi, C., Hatch, K. B., Lin, S., Lu, J., Mercat, J., Rehman, A., Sanketi, P. R, Sharma, A., Simpson, C., Vuong, Q., Walke, H. R., Wulfe, B., Xiao, T., Yang, J. H., Yavary, A., Zhao, T. Z., Agia, C., Baijal, R., Castro, M. G., Chen, D., Chen, Q., Chung, T., Drake, J., Foster, E. P., Gao, J., Herrera, D. A., Heo, M., Hsu, K., Hu, J., Jackson, D., Le, C., Li, Y., Lin, K., Lin, R., Ma, Z., Maddukuri, A., Mirchandani, S., Morton, D., Nguyen, T., O'Neill, A., Scalise, R., Seale, D., Son, V., Tian, S., Tran, E., Wang, A. E., Wu, Y., Xie, A., Yang, J., Yin, P., Zhang, Y., Bastani, O., Berseth, G., Bohg, J., Goldberg, K., Gupta, A., Gupta, A., Jayaraman, D., Lim, J. J, Malik, J., Martín-Martín, R., Ramamoorthy, S., Sadigh, D., Song, S., Wu, J., Yip, M. C., Zhu, Y., Kollar, T., Levine, S., & Finn, C.\n\n\n \n\n\n\n
RSS. 2024.\n
\n\n
\n\n
\n\n
\n\n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{khazatsky2024droid,\n title={DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset}, \n author={Alexander Khazatsky and Karl Pertsch and Suraj Nair and Ashwin Balakrishna and Sudeep Dasari and Siddharth Karamcheti and Soroush Nasiriany and Mohan Kumar Srirama and Lawrence Yunliang Chen and Kirsty Ellis and Peter David Fagan and Joey Hejna and Masha Itkina and Marion Lepert and Yecheng Jason Ma and Patrick Tree Miller and Jimmy Wu and Suneel Belkhale and Shivin Dass and Huy Ha and Arhan Jain and Abraham Lee and Youngwoon Lee and Marius Memmel and Sungjae Park and Ilija Radosavovic and Kaiyuan Wang and Albert Zhan and Kevin Black and Cheng Chi and Kyle Beltran Hatch and Shan Lin and Jingpei Lu and Jean Mercat and Abdul Rehman and Pannag R Sanketi and Archit Sharma and Cody Simpson and Quan Vuong and Homer Rich Walke and Blake Wulfe and Ted Xiao and Jonathan Heewon Yang and Arefeh Yavary and Tony Z. Zhao and Christopher Agia and Rohan Baijal and Mateo Guaman Castro and Daphne Chen and Qiuyu Chen and Trinity Chung and Jaimyn Drake and Ethan Paul Foster and Jensen Gao and David Antonio Herrera and Minho Heo and Kyle Hsu and Jiaheng Hu and Donovon Jackson and Charlotte Le and Yunshuang Li and Kevin Lin and Roy Lin and Zehan Ma and Abhiram Maddukuri and Suvir Mirchandani and Daniel Morton and Tony Nguyen and Abigail O'Neill and Rosario Scalise and Derick Seale and Victor Son and Stephen Tian and Emi Tran and Andrew E. Wang and Yilin Wu and Annie Xie and Jingyun Yang and Patrick Yin and Yunchu Zhang and Osbert Bastani and Glen Berseth and Jeannette Bohg and Ken Goldberg and Abhinav Gupta and Abhishek Gupta and Dinesh Jayaraman and Joseph J Lim and Jitendra Malik and Roberto Martín-Martín and Subramanian Ramamoorthy and Dorsa Sadigh and Shuran Song and Jiajun Wu and Michael C. Yip and Yuke Zhu and Thomas Kollar and Sergey Levine and Chelsea Finn},\n year={2024},\n journal={RSS}\n}\n
\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n Training self-learning circuits for power-efficient solutions.\n \n \n \n\n\n \n Stern, M., Dillavou, S., Jayaraman, D., Durian, D. J, & Liu, A. J\n\n\n \n\n\n\n
Applied Physics Letters (APL) Machine Learning. 2024.\n
\n\n
\n\n
\n\n
\n\n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{stern2024physical,\n title={Training self-learning circuits for power-efficient solutions},\n author={Stern, Menachem and Dillavou, Sam and Jayaraman, Dinesh and Durian, Douglas J and Liu, Andrea J},\n journal={Applied Physics Letters (APL) Machine Learning},\n year={2024}\n}\n
\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n Universal Visual Decomposer: Long-Horizon Manipulation Made Easy.\n \n \n \n\n\n \n Zhang, Z., Li, Y., Bastani, O., Gupta, A., Jayaraman, D., Ma, Y. J., & Weihs, L.\n\n\n \n\n\n\n
ICRA. 2024.\n
\n\n
\n\n
\n\n
\n\n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{zhang2024universal,\n title={Universal Visual Decomposer: Long-Horizon Manipulation Made Easy}, \n author={Zichen Zhang and Yunshuang Li and Osbert Bastani and Abhishek Gupta and Dinesh Jayaraman and Yecheng Jason Ma and Luca Weihs},\n year={2024},\n journal={ICRA},\n}\n
\n
\n\n\n\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Composing Pre-Trained Object-Centric Representations for Robotics From “What” and “Where” Foundation Models.\n \n \n \n \n\n\n \n Shi*, J., Qian*, J., Ma, Y. J., & Jayaraman, D.\n\n\n \n\n\n\n
ICRA. 2024.\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{shi2024plug,\n title={Composing Pre-Trained Object-Centric Representations for Robotics From “What” and “Where” Foundation Models},\n author={Shi*, Junyao and Qian*, Jianing and Ma, Yecheng Jason and Jayaraman, Dinesh},\n abstract={There have recently been large advances both in pre-training visual representations for robotic control and segmenting unknown category objects in general images. To leverage these for improved robot learning, we propose POCR, a new framework for building pre-trained object-centric representations for robotic control. Building on theories of “what-where” representations in psychology and computer vision, we use segmentations from a pre-trained model to stably locate across timesteps, various entities in the scene, capturing “where” information. To each such segmented entity, we apply other pre-trained models that build vector descriptions suitable for robotic control tasks, thus capturing “what” the entity is. Thus, our pre-trained object-centric representations for control are constructed by appropriately combining the outputs of off-the-shelf pre-trained models, with no new training. On various simulated and real robotic tasks, we show that imitation policies for robotic manipulators trained on POCR achieve better performance and systematic generalization than state of the art pre-trained representations for robotics, as well as prior object-centric representations that are typically trained from scratch.},\nurl={https://sites.google.com/view/pocr},\n journal={ICRA},\n year={2024}\n}\n
\n
\n\n\n
\n There have recently been large advances both in pre-training visual representations for robotic control and segmenting unknown category objects in general images. To leverage these for improved robot learning, we propose POCR, a new framework for building pre-trained object-centric representations for robotic control. Building on theories of “what-where” representations in psychology and computer vision, we use segmentations from a pre-trained model to stably locate across timesteps, various entities in the scene, capturing “where” information. To each such segmented entity, we apply other pre-trained models that build vector descriptions suitable for robotic control tasks, thus capturing “what” the entity is. Thus, our pre-trained object-centric representations for control are constructed by appropriately combining the outputs of off-the-shelf pre-trained models, with no new training. On various simulated and real robotic tasks, we show that imitation policies for robotic manipulators trained on POCR achieve better performance and systematic generalization than state of the art pre-trained representations for robotics, as well as prior object-centric representations that are typically trained from scratch.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n Long-HOT: A Modular Hierarchical Approach for Long-Horizon Object Transport.\n \n \n \n\n\n \n Narayanan, S., Jayaraman, D., & Chandraker, M.\n\n\n \n\n\n\n
ICRA. 2024.\n
\n\n
\n\n
\n\n
\n\n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{narayanan2024long,\n title={Long-HOT: A Modular Hierarchical Approach for Long-Horizon Object Transport},\n author={Narayanan, Sriram and Jayaraman, Dinesh and Chandraker, Manmohan},\n journal={ICRA},\n abstract={We aim to address key challenges in long-horizon\nembodied exploration and navigation by proposing a\nlong-horizon object transport task called Long-HOT and a\nnovel modular framework for temporally extended navigation.\nAgents in Long-HOT need to efficiently find and pick up\ntarget objects that are scattered in the environment, carry\nthem to a goal location with load constraints, and\noptionally have access to a container. We propose a modular\ntopological graph-based transport policy (HTP) that\nexplores efficiently with the help of weighted frontiers.\nOur approach uses a combination of motion planning to reach\npoint goals within explored locations and object navigation\npolicies for moving towards semantic targets at unknown\nlocations. Experiments on both our proposed Habitat\ntransport task and on MultiOn benchmarks show that our\nmethod outperforms baselines and prior works. Further, we\nanalyze the agent's behavior for the usage of the container\nand demonstrate meaningful generalization to much harder\ntransport scenes with training only on simpler versions of\nthe task. We will release all the code and data.},\n year={2024}\n}\n
\n
\n\n\n
\n We aim to address key challenges in long-horizon embodied exploration and navigation by proposing a long-horizon object transport task called Long-HOT and a novel modular framework for temporally extended navigation. Agents in Long-HOT need to efficiently find and pick up target objects that are scattered in the environment, carry them to a goal location with load constraints, and optionally have access to a container. We propose a modular topological graph-based transport policy (HTP) that explores efficiently with the help of weighted frontiers. Our approach uses a combination of motion planning to reach point goals within explored locations and object navigation policies for moving towards semantic targets at unknown locations. Experiments on both our proposed Habitat transport task and on MultiOn benchmarks show that our method outperforms baselines and prior works. Further, we analyze the agent's behavior for the usage of the container and demonstrate meaningful generalization to much harder transport scenes with training only on simpler versions of the task. We will release all the code and data.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n Privileged Sensing Scaffolds Reinforcement Learning.\n \n \n \n\n\n \n Hu, E., Springer, J., Rybkin, O., & Jayaraman, D.\n\n\n \n\n\n\n
ICLR. 2024.\n
\n\n
\n\n
\n\n
\n\n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{hu2024scaffolder,\n title={Privileged Sensing Scaffolds Reinforcement Learning},\n author={Edward Hu and James Springer and Oleh Rybkin and Dinesh Jayaraman},\n journal={ICLR},\n year={2024},\n abstract={We need to look at our shoelaces as we first learn to tie them but having mastered this skill, can do it from touch alone. We call this phenomenon “sensory scaffolding”: observation streams that are not needed by a master might yet aid a novice learner. We consider such sensory scaffolding setups for training artificial agents. For example, a robot arm may need to be deployed with just a low-cost, robust, general-purpose camera; yet its performance may improve by having privileged training-time-only access to informative albeit expensive and unwieldy motion capture rigs or fragile tactile sensors. For these settings, we propose Scaffolder, a reinforcement learning approach which effectively exploits privileged sensing in critics, world models, reward estimators, and other such auxiliary components that are only used at training time, to improve the target policy. For evaluating sensory scaffolding agents, we design a new “S3” suite of ten diverse simulated robotic tasks that explore a wide range of practical sensor setups. Agents must use privileged camera sensing to train blind hurdlers, privileged active visual perception to help robot arms overcome visual occlusions, privileged touch sensors to train robot hands, and more. Scaffolder easily outperforms relevant prior baselines and frequently performs comparably even to policies that have test-time access to the privileged sensors.}\n}\n
\n
\n\n\n
\n We need to look at our shoelaces as we first learn to tie them but having mastered this skill, can do it from touch alone. We call this phenomenon “sensory scaffolding”: observation streams that are not needed by a master might yet aid a novice learner. We consider such sensory scaffolding setups for training artificial agents. For example, a robot arm may need to be deployed with just a low-cost, robust, general-purpose camera; yet its performance may improve by having privileged training-time-only access to informative albeit expensive and unwieldy motion capture rigs or fragile tactile sensors. For these settings, we propose Scaffolder, a reinforcement learning approach which effectively exploits privileged sensing in critics, world models, reward estimators, and other such auxiliary components that are only used at training time, to improve the target policy. For evaluating sensory scaffolding agents, we design a new “S3” suite of ten diverse simulated robotic tasks that explore a wide range of practical sensor setups. Agents must use privileged camera sensing to train blind hurdlers, privileged active visual perception to help robot arms overcome visual occlusions, privileged touch sensors to train robot hands, and more. Scaffolder easily outperforms relevant prior baselines and frequently performs comparably even to policies that have test-time access to the privileged sensors.\n
\n\n\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n Eureka: Human-Level Reward Design via Coding Large Language Models.\n \n \n \n\n\n \n Ma, Y. J., Liang, W., Wang, G., Huang, D., Bastani, O., Jayaraman, D., Zhu, Y., Fan, L., & Anandkumar, A.\n\n\n \n\n\n\n
ICLR. 2024.\n
\n\n
\n\n
\n\n
\n\n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{ma2024eureka,\n title={Eureka: Human-Level Reward Design via Coding Large Language Models}, \n author={Yecheng Jason Ma and William Liang and Guanzhi Wang and De-An Huang and Osbert Bastani and Dinesh Jayaraman and Yuke Zhu and Linxi Fan and Anima Anandkumar},\n year={2024},\n journal={ICLR}\n}\n
\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n Memory-Consistent Neural Networks for Imitation Learning.\n \n \n \n\n\n \n Sridhar, K., Dutta, S., Jayaraman, D., Weimer, J., & Lee, I.\n\n\n \n\n\n\n
ICLR. 2024.\n
\n\n
\n\n
\n\n
\n\n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{sridhar2024memoryconsistent,\n title={Memory-Consistent Neural Networks for Imitation Learning}, \n author={Kaustubh Sridhar and Souradeep Dutta and Dinesh Jayaraman and James Weimer and Insup Lee},\n year={2024},\n journal={ICLR}\n}\n
\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n ZeroFlow: Fast Zero Label Scene Flow via Distillation.\n \n \n \n\n\n \n Vedder, K., Peri, N., Chodosh, N., Khatri, I., Eaton, E., Jayaraman, D., Liu, Y., Ramanan, D., & Hays, J.\n\n\n \n\n\n\n
ICLR. 2024.\n
\n\n
\n\n
\n\n
\n\n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{vedder2024zeroflow,\n title={ZeroFlow: Fast Zero Label Scene Flow via Distillation},\n author={Vedder, Kyle and Peri, Neehar and Chodosh, Nathaniel and Khatri, Ishan and Eaton, Eric and Jayaraman, Dinesh and Liu, Yang and Ramanan, Deva and Hays, James},\n journal={ICLR},\n year={2024}\n}\n
\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n TLControl: Trajectory and Language Control for Human Motion Synthesis.\n \n \n \n\n\n \n Wan, W., Dou, Z., Komura, T., Wang, W., Jayaraman, D., & Liu, L.\n\n\n \n\n\n\n
ECCV. 2024.\n
\n\n
\n\n
\n\n
\n\n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{wan2024tlcontrol,\n title={TLControl: Trajectory and Language Control for Human Motion Synthesis}, \n author={Weilin Wan and Zhiyang Dou and Taku Komura and Wenping Wang and Dinesh Jayaraman and Lingjie Liu},\n year={2024},\n abstract={Controllable human motion synthesis is essential for applications in AR/VR, gaming and embodied AI. Existing methods often focus solely on either language or full trajectory control, lacking precision in synthesizing motions aligned with user-specified trajectories, especially for multi-joint control. To address these issues, we present TLControl, a novel method for realistic human motion synthesis, incorporating both low-level Trajectory and high-level Language semantics controls, through the integration of neural-based and optimization-based techniques. Specifically, we begin with training a VQ-VAE for a compact and well-structured latent motion space organized by body parts. We then propose a Masked Trajectories Transformer (MTT) for predicting a motion distribution conditioned on language and trajectory. Once trained, we use MTT to sample initial motion predictions given user-specified partial trajectories and text descriptions as conditioning. Finally, we introduce a test-time optimization to refine these coarse predictions for precise trajectory control, which offers flexibility by allowing users to specify various optimization goals and ensures high runtime efficiency. Comprehensive experiments show that TLControl significantly outperforms the state-of-the-art in trajectory accuracy and time efficiency, making it practical for interactive and high-quality animation generation.},\n journal={ECCV},\n}\n\n
\n
\n\n\n
\n Controllable human motion synthesis is essential for applications in AR/VR, gaming and embodied AI. Existing methods often focus solely on either language or full trajectory control, lacking precision in synthesizing motions aligned with user-specified trajectories, especially for multi-joint control. To address these issues, we present TLControl, a novel method for realistic human motion synthesis, incorporating both low-level Trajectory and high-level Language semantics controls, through the integration of neural-based and optimization-based techniques. Specifically, we begin with training a VQ-VAE for a compact and well-structured latent motion space organized by body parts. We then propose a Masked Trajectories Transformer (MTT) for predicting a motion distribution conditioned on language and trajectory. Once trained, we use MTT to sample initial motion predictions given user-specified partial trajectories and text descriptions as conditioning. Finally, we introduce a test-time optimization to refine these coarse predictions for precise trajectory control, which offers flexibility by allowing users to specify various optimization goals and ensures high runtime efficiency. Comprehensive experiments show that TLControl significantly outperforms the state-of-the-art in trajectory accuracy and time efficiency, making it practical for interactive and high-quality animation generation.\n
\n\n\n
\n\n\n
\n\n\n\n\n\n