Introduction

Humanoid robots are designed to operate in human-centered environments, where unexpected contacts, uneven terrain, and external disturbances frequently lead to loss of balance and falls. Reliable fall recovery is therefore a key capability for long-term autonomous operation and human–robot coexistence. Classical approaches rely on carefully engineered whole-body controllers and pre-defined get-up sequences, which are often tailored to a specific robot morphology and a limited set of fall configurations, making generalization difficult.

In this thesis, we propose to investigate Reinforcement Learning (RL) methods for fall recovery in humanoid robots. The core idea is to learn whole-body control policies that, starting from a wide range of post-fall configurations (e.g., lying supine, prone, or sideways, with limbs entangled or in contact with obstacles), generate contact-rich motions that safely bring the robot back to a nominal standing posture. Policies will be trained primarily in physics simulation, using extensive environment and model randomization to promote robustness, and then transferred to hardware via sim-to-real techniques and safety-aware execution strategies.

Objectives

The main objectives of the thesis are:

  • Formulate humanoid fall recovery as an RL problem, including suitable state and observation spaces (e.g., joint states, contact information, inertial measurements), action representations (joint torques, whole-body task commands, or motion primitives), and reward functions that encode safety, stability, and efficiency.
  • Develop a simulation-based training pipeline that exposes the humanoid to a diverse set of fall scenarios and disturbances, potentially using curriculum learning and domain randomization to improve convergence, robustness, and sim-to-real transfer.
  • Integrate the learned fall-recovery policy with an existing locomotion and balance control stack, enabling automatic detection of falls or unrecoverable disturbances and seamless switching to recovery behaviors.
  • Validate the approach in simulation on a humanoid model of choice, evaluating success rate, recovery time, peak impact forces, and generalization to unseen initial conditions and environments.

Contact

Georges Jetti: georges.jetti@polimi.it
Michael Khayyat: michael.khayyat@polimi.it
Stefano Arrigoni: stefano.arrigoni@polimi.it