Whether it’s a dog chasing after a ball or a horse jumping over obstacles, animals can effortlessly perform an incredibly rich repertoire of agile skills. Developing robots that are able to replicate these agile behaviors can open opportunities to deploy robots for sophisticated tasks in the real world. But designing controllers that enable legged robots to perform these agile behaviors can be a very challenging task. While reinforcement learning (RL) is an approach often used for automating development of robotic skills, a number of technical hurdles remain and, in practice, there is still substantial manual overhead. Designing reward functions that lead to effective skills can itself require a great deal of expert insight, and often involves a lengthy reward tuning process for each desired skill. Furthermore, applying RL to legged robots requires not only efficient algorithms, but also mechanisms to enable the robots to remain safe and recover after falling, without frequent human assistance.
In this post, we will discuss two of our recent projects aimed at addressing these challenges. First, we describe how robots can learn agile behaviors by imitating motions from real animals, producing fast and fluent movements like trotting and hopping. Then, we discuss a system for automating the training of locomotion skills in the real world, which allows robots to learn to walk on their own, with minimal human assistance.
Learning Agile Robotic Locomotion Skills by Imitating Animals
In “Learning Agile Robotic Locomotion Skills by Imitating Animals”, we present a framework that takes a reference motion clip recorded from an animal (a dog, in this case) and uses RL to train a control policy that enables a robot to imitate the motion in the real world. By providing the system with different reference motions, we are able to train a quadruped robot to perform a diverse set of agile behaviors, ranging from fast walking gaits to dynamic hops and turns. The policies are trained primarily in simulation, and then transferred to the real world using a latent space adaptation technique that can efficiently adapt a policy using only a few minutes of data from the real robot.
We start by collecting motion capture clips of a real dog performing various locomotion skills. Then, we use RL to train a control policy to imitate the dog’s motions. The policies are trained in a physics simulation to track the pose of the reference motion at each timestep. Then, by using different reference motions in the reward function, we can train a simulated robot to imitate a variety of different skills.
|Reinforcement learning is used to train a simulated robot to imitate the reference motions from a dog. All simulations are performed using PyBullet.|
First, to encourage the policy to learn behaviors that are robust to variations in the dynamics, we randomize the dynamics of the simulation by varying physical quantities, such as the robot’s mass and friction. Since we have access to the values of these parameters during training in simulation, we can also map them to a low-dimensional representation using a learned encoder. This encoding is then passed as an additional input to the policy during training. Since the physical parameters of the real robot are not known a priori, when deploying the policy to a real robot, we remove the encoder and directly search for a set of parameters in the latent space that enables the robot to successfully execute the desired skills in the real world. This technique is often able to adapt a policy to the real world using less than 8 minutes of real-world data.
|Comparison of policies before and after adaptation on the real robot. Before adaptation, the robot is prone to falling. But after adaptation, the policies are able to more consistently execute the desired skills.|
Using this approach, the robot learns to imitate various locomotion skills from a dog, including different walking gaits, such as pacing and trotting, as well as an agile spinning motion.
|Robot imitating various skills from a dog.|
|Skills learned by imitating artist-animated keyframe motions: side-steps, turn, and hop-turn.|
Learning to Walk in the Real World with Minimal Human Effort
The above approach is able to train policies in simulation and then adapt them to the real world. However, when the task involves complex and diverse physical phenomena, it is also necessary to directly learn from real-world experience. Although learning on real robots has achieved state-of-the-art performance for manipulation tasks (e.g., QT-Opt), applying the same methods to legged robots is difficult since the robot may fall and damage itself, or leave the training area, which can then require human intervention.
|An automated learning system for legged robots must resolve safety and automation challenges.|
For each roll-out, the scheduler selects a task in which the desired walking direction is pointing towards the center. For instance, assuming we have two tasks, forward and backward walking, the scheduler will select the forward task if the robot is at the back of the workspace, and vice-versa for the backward task. In the middle of the episode, the learner takes dual gradient descent steps to iteratively optimize both the task objective and safety constraints, rather than treating them as a single goal. If the robot has fallen, we invoke an automated get-up controller and proceed to the next episode.
|We solve automation and safety challenges with multi-task learning, a safety-constrained SAC algorithm, and an automatic reset controller.|
This framework successfully trains policies from scratch to walk in different directions without any human intervention.
|Snapshots of the training process on the flat surface with zero human resets.|
|We train locomotion policies to walk in four directions, which allow us to interactively control the robot with a game controller.|
|Learned locomotion gaits on challenging terrains.|
In these two papers, we present methods to reproduce a diverse corpus of behaviors with quadruped robots. Extending this line of work to learn skills from videos would also be an exciting direction, which can substantially increase the volume of data from which robots can learn. We are also interested in applying the automated training system to more complex real-world environments and tasks.
We would like to thank our coauthors, Erwin Coumans, Tingnan Zhang, Tsang-Wei Lee, Jie Tan, Sergey Levine, Peng Xu and Zhenyu Tan. We would also like to thank Julian Ibarz, Byron David, Thinh Nguyen, Gus Kouretas, Krista Reymann, and Bonny Ho for their support and contributions to this work.