Reward Function Generation using Large Language Models

Reinforcement learning system where reward functions are generated using large language models.

Objective

To explore the use of large language models for automatically generating reward functions in reinforcement learning tasks.

My Role

  • Integrated LLMs with RL training pipelines
  • Trained Panda robot in Panda-Gym environment

Reward Engineering with LLaMA2

I provide the environment code as plain text and prompt LLaMA2 to generate a reward function description.
The generated description is then implemented explicitly inside the environment.

Prompt:
reward_prompt_llama2.txt

Reward function signature

def compute_reward(self, achieved_goal, desired_goal, info):
    ...
    return reward

Images

Panda Robot is pushing an object based on the given reward by the LLaMA2