Vision-based policies for human-robot collaboration

This thesis investigates vision-based policies (e.g., VLAs, diffusion policy, flow matching) for human-robot collaboration. Despite standard collaborative controllers which make only use of the robot and (if available) human states, vision-based policies are trained additionally on images captured during the task execution. This provides the robot with the possibility to embed the perception of the environment directly into its controller. However, such policies are strongly dependent on the training dataset (e.g., background) and are commonly only available for autonomous static tasks. In this thesis, we would like to extend vision-based policies to dynamic environments and, in particular, to human-robot collaboration. Indeed, the student will dive into vision-based policy state of the art, the definition of a framework for vision-based policies in human-robot collaboration, data gathering, training, and test of the proposed framework on a real collaborative manipulator.


Contact: loris.roveda@polimi.it