Autonomous Humanoid Mobility | Course Project

Posted Apr 29, 2024 Updated Jan 25, 2025

By Praneat Data

1 min read

Project Overview

This course project was completed under the guidance of Prof. Subrahmanya Swamy Peruru, in collaboration with my teammates Yash Verma and Paritosh Pankaj. Our primary focus was to develop a deep understanding of Trust Region Policy Optimization (TRPO) and to implement a practical algorithm based on this method.

In the later stages of the project, we employed the MuJoCo physics simulation environment, which provided us with a continuous state space of 376 dimensions, where each state could range from minus infinity to infinity. This environment was used to model a bipedal robot designed to simulate human walking.

Our experiments involved training the robot’s walking motion using TRPO, and we demonstrated that TRPO outperforms other models, such as Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC), particularly in training a humanoid walk. The findings underscored the superior efficiency of TRPO for complex tasks, especially in scenarios involving large state spaces and intricate dynamics.

Resources

You can find the assignments covered in the course at my Github.

Project

reinforcement learning course project

This post is licensed under CC BY 4.0 by the author.

Project Overview

Resources

Trending Tags