Conversation
eisDNV
commented
Apr 30, 2026
- Improvements to q_agent
- Addition of parameter report function in pendulum environment.
- Trained start-pendulum model added.
| with path.open(encoding="utf-8") as _f: | ||
| from_dump = json.load(_f) | ||
| self.previous_steps = int(from_dump["q_agent"]["steps"]) | ||
| self.epsilon = float(from_dump["q_agent"].get("epsilon", 1.0)) |
There was a problem hiding this comment.
@eisDNV read_dumped() restores epsilon (so exploration picks up where it left off) and previous_steps — but silently ignores epsilon-decay and final-epsilon. Those come from whatever was passed to the constructor instead.
So if someone trained with epsilon_decay=0.0005, saved, then loaded to continue — the continuation would run with the constructor default (1e-3) unless they remembered to pass the same value again. The saved value is right there in the file but unused.
There was a problem hiding this comment.
That is true. We can change that to also using the saved epsilon_decay when continuing a training.
There was a problem hiding this comment.
Feel free to implement this change before you approve.
There was a problem hiding this comment.
@eisDNV not sure that I can commit to your PR.
but it is addition of 2 lines after existing line 313
`# line 313 (existing)
self.epsilon = float(from_dump["q_agent"].get("epsilon", 1.0))
add after:
self.epsilon_decay = float(from_dump["q_agent"].get("epsilon-decay", self.epsilon_decay))
self.final_epsilon = float(from_dump["q_agent"].get("final-epsilon", self.final_epsilon))
`
| The environment to be trained. Must provide `.reset()` and `.step()` methods. | ||
| learning_rate : float, optional | ||
| How quickly to update Q-values, in the range (0, 1] (default 0.1). | ||
| initial_epsilon : float, optional |
There was a problem hiding this comment.
@eisDNV Should this be replaced with epsilon_decay?
There was a problem hiding this comment.
Maybe I do not quite understand. Fresh training starts at initial_epsilon and reduces down to final_epsilon through the episodes (using epsilon_decay). The learning rates relates to how the q_values are updated (independent of epsilon_decay)
There was a problem hiding this comment.
The constructor no longer has initial_epsilon as a parameter — it was replaced by epsilon_decay in this PR. The docstring still references it, so it needs updating to epsilon_decay.
…e value from the (new) agent. epsilon_decay default changed to 1e-4. New results from q-learning included.