These agents are learning using standard
Q-learning:
In this playground you can:
- Change all the learning/environment parameters in real-time
- Click anywhere in the top GridWorld to position the agent there
- The cell colours will change to reflect the agent's current value estimate
- Transfer values between the top/bottom agents