Post

Autonomous Humanoid Mobility | Course Project

Autonomous Humanoid Mobility | Course Project

Project Overview

This course project was completed under the guidance of Prof. Subrahmanya Swamy Peruru, in collaboration with my teammates Yash Verma and Paritosh Pankaj. Our primary focus was to develop a deep understanding of Trust Region Policy Optimization (TRPO) and to implement a practical algorithm based on this method.

In the later stages of the project, we employed the MuJoCo physics simulation environment, which provided us with a continuous state space of 376 dimensions, where each state could range from minus infinity to infinity. This environment was used to model a bipedal robot designed to simulate human walking.

Our experiments involved training the robot’s walking motion using TRPO, and we demonstrated that TRPO outperforms other models, such as Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC), particularly in training a humanoid walk. The findings underscored the superior efficiency of TRPO for complex tasks, especially in scenarios involving large state spaces and intricate dynamics.

Resources

This post is licensed under CC BY 4.0 by the author.