Fall Recovery for Quadrupeds

Introduction

Legged robots operating in real-world environments are inevitably exposed to disturbances, uneven terrain, and interaction forces that can lead to loss of balance and falls. Robust fall recovery is therefore a critical capability for enabling truly autonomous deployment of quadruped robots in unstructured settings. While traditional approaches rely on hand-crafted reflexes or carefully engineered controllers, these methods often struggle to generalize beyond a narrow set of scenarios.

In this thesis, we propose to investigate Reinforcement Learning (RL) methods for fall recovery in quadruped robots. The core idea is to learn feedback policies that, given a potentially high-impact post-fall state (e.g., lying on the side, upside down, or entangled with the environment), generate whole-body motions that safely bring the robot back to a nominal standing configuration and ready for subsequent locomotion. The policies will be trained primarily in simulation, leveraging randomized environments and disturbance models to achieve robustness, and subsequently transferred to hardware using appropriate sim-to-real strategies.

Objectives

The main objectives of the thesis are:

  • Formulate fall recovery as an RL problem, including state representation, action space (e.g., joint torques or low-level motion primitives), and task-specific reward functions that encourage safety, energy efficiency, and reliability.
  •  Design a training pipeline in simulation that exposes the quadruped to a diverse set of fall conditions and external disturbances, possibly using curriculum learning or domain randomization to improve convergence and robustness.
  • Integrate the learned fall-recovery policy with an existing locomotion stack, enabling seamless switching between nominal gait control and recovery behaviors based on fall detection or instability metrics.
  • Validate the learned policies both in simulation and, where possible, on a real quadruped platform (Unitree GO2), evaluating success rate, recovery time, impact forces, and generalization to unseen disturbances and terrains.

Contact

Georges Jetti: georges.jetti@polimi.it
Michael Khayyat: michael.khayyat@polimi.it
Stefano Arrigoni: stefano.arrigoni@polimi.it