Skip to content
This repository was archived by the owner on Apr 25, 2023. It is now read-only.

Question: Is this some form of reward engineering? #34

@WorksWellWithOthers

Description

@WorksWellWithOthers

This would break in environments that return the state as more/less than 4 values for unpacking.

  1. If not essential can we just remove this?
  2. If it's essential, would someone explain why and/or reference the paper for this?
    This seems specific to CartPole. I wasn't sure if the implementation's goal was to only solve CartPole.
r1 = (env.x_threshold - abs(x)) / env.x_threshold - 0.8  
r2 = (env.theta_threshold_radians - abs(theta)) / env.theta_threshold_radians - 0.5  
reward = r1 + r2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions