Introduction

Reinforcement Learning (RL) has recently enabled agile and dynamic behaviors in legged robots. However, RL alone is not always the most suitable tool, especially for sparse, long-horizon tasks that involve complex object manipulation. Even with careful problem formulation, reward design, and curriculum learning, such tasks can remain difficult to solve using standard RL methods.

Imitation-based approaches such as Behavioral Cloning, Imitation Learning, and Adversarial Motion Priors (AMP) have emerged as powerful alternatives for these scenarios, as they can leverage existing datasets that demonstrate the desired behavior. These datasets may come from human demonstrations (e.g., videos and teleoperation), optimal controllers, diffusion models, or combinations thereof. Integrating RL with such imitation-based frameworks can guide policy learning toward task success while improving robustness and generalization.

Objectives

The goal of this project is to develop a modular teleoperation framework that enables a quadruped robot to manipulate objects using its feet. A human operator will be able to interact with the robot during task execution by sending retargeted commands (e.g., specifying desired foot poses or motions). To ensure stability and safety, the teleoperation controller will be based on an optimal control formulation that tracks the human commands while maintaining balance and feasibility of the robot’s motion. This controller should enable the robot to perform manipulation tasks without compromising its locomotion stability.

The student will be responsible for designing and implementing this teleoperation system and collecting demonstration data for a set of challenging manipulation tasks, such as:

  • Grasping an object with the hind-limb end-effectors and executing an overhead throwing motion.
  • Repositioning and reorienting a grasped object with the feet while maintaining whole-body stability.
  • Interacting with articulated objects (i.e opening a pull-type door).
  • Lifting and transporting a container-like object (e.g., basket or backpack) using the feet.
  • Achieving bipedal locomotion on the hind limbs while maintaining a stable grasp of an object with the fore-limb end-effectors.

We do not expect all of these tasks to be completed, as each may require a different Model Predictive Control (MPC) formulation and varying levels of system complexity. The specific subset of tasks to be addressed will be selected and refined in discussion with the student.

Contact

Georges Jetti: georges.jetti@polimi.it
Michael Khayyat: michael.khayyat@polimi.it
Stefano Arrigoni: stefano.arrigoni@polimi.it