Reward Function Generation using Large Language Models
Reinforcement learning system where reward functions are generated using large language models.
Objective
To explore the use of large language models for automatically generating reward functions in reinforcement learning tasks.
My Role
- Integrated LLMs with RL training pipelines
- Trained Panda robot in Panda-Gym environment
Reward Engineering with LLaMA2
I provide the environment code as plain text and prompt LLaMA2 to generate a reward function description.
The generated description is then implemented explicitly inside the environment.
Prompt:
reward_prompt_llama2.txt
Reward function signature
def compute_reward(self, achieved_goal, desired_goal, info):
...
return reward
Images
Links
- GitHub: Code