Jia, Zhiwei

Learning Embodied AI Agents with Task Decomposition

2023

Jia, Zhiwei
Advisor(s): Su, Hao

Abstract

One of the ultimate goals of artificial intelligence (AI) is to build autonomous agents that can perceive, reason, and interact with the surroundings (i.e., Embodied AI). Despite the recent advances in deep learning and simulators, acquiring embodied agents via robot learning remains extremely demanding. In this dissertation, we present three methods, each from a different aspect, that adopt task decomposition to improve robot learning.

Training agents that follow human instructions to complete long-horizon household tasks, especially solely from offline data, poses several technical challenges, such as scene understanding and compositionally generalizable task executions (usually high-level actions). We propose to decompose the tasks into exploration and execution phases. In the first phase, we utilize multimodal signals to explore the scene in a task-driven manner to obtain an affordance-aware semantic map. Next, we adopt a hierarchical task execution system to complete the sub-task sequences (another level of task decomposition) according to the map and the instructions.

Short-horizon tasks involving low-level motor controls are usually harder for AI (Moravec's paradox). To solve challenging contact-rich object manipulation that entails high precision and/or object variations, we develop a novel hierarchical imitation learning method that utilizes scalable, albeit sub-optimal, demonstrations by further decomposing short-horizon tasks into subskills. We first propose an observation space-agnostic method that unsupervisedly discovers the multi-step subskill decomposition (sequences of key observations) from demos. Next, we propose a Transformer-based design that effectively learns to dynamically predict such subskill decomposition as the high-level future guidance for low-level actions.

Besides the hierarchical principles, task decomposition also refers to decomposing the task space itself. While large-scale RL over diverse environment variations poses great optimization challenges, we find that launching a population of agents (the specialists), each trained on a subset of the task variations, drastically eases policy optimization. We, therefore, propose a meta-framework that generally improves online RL methods to tackle complex tasks by combining distributed and joint training in a principled manner. Our approach achieves a great balance of efficiency and effectiveness in large-scale policy learning, which is verified with extensive ablation studies and a diverse set of benchmark tasks.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC San Diego

Learning Embodied AI Agents with Task Decomposition