Question: Is this some form of reward engineering?

This would break in environments that return the state as more/less than 4 values for unpacking.
1. If not essential can we just remove this?
2. If it's essential, would someone explain why and/or reference the paper for this?
This seems specific to CartPole. I wasn't sure if the implementation's goal was to only solve CartPole. 
 
```x,x_dot,theta,theta_dot = next_state  
r1 = (env.x_threshold - abs(x)) / env.x_threshold - 0.8  
r2 = (env.theta_threshold_radians - abs(theta)) / env.theta_threshold_radians - 0.5  
reward = r1 + r2
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Is this some form of reward engineering? #34

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question: Is this some form of reward engineering? #34

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions