Reward Function Generation using Large Language Models

Objective

To explore the use of large language models for automatically generating reward functions in reinforcement learning tasks.

My Role

Integrated LLMs with RL training pipelines
Trained Panda robot in Panda-Gym environment

Reward Engineering with LLaMA2

I provide the environment code as plain text and prompt LLaMA2 to generate a reward function description.
The generated description is then implemented explicitly inside the environment.

Prompt:
reward_prompt_llama2.txt

Reward function signature

def compute_reward(self, achieved_goal, desired_goal, info):
    ...
    return reward

Reward Function Generation using Large Language Models

Objective

My Role

Reward Engineering with LLaMA2

Reward function signature

Images

Links