Skip to content

Eis#12

Open
eisDNV wants to merge 5 commits intomainfrom
eis
Open

Eis#12
eisDNV wants to merge 5 commits intomainfrom
eis

Conversation

@eisDNV
Copy link
Copy Markdown
Collaborator

@eisDNV eisDNV commented Apr 30, 2026

  • Improvements to q_agent
  • Addition of parameter report function in pendulum environment.
  • Trained start-pendulum model added.

Comment thread src/crane_controller/q_agent.py
with path.open(encoding="utf-8") as _f:
from_dump = json.load(_f)
self.previous_steps = int(from_dump["q_agent"]["steps"])
self.epsilon = float(from_dump["q_agent"].get("epsilon", 1.0))
Copy link
Copy Markdown
Collaborator

@aleksandarbabicdnv aleksandarbabicdnv Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eisDNV read_dumped() restores epsilon (so exploration picks up where it left off) and previous_steps — but silently ignores epsilon-decay and final-epsilon. Those come from whatever was passed to the constructor instead.

So if someone trained with epsilon_decay=0.0005, saved, then loaded to continue — the continuation would run with the constructor default (1e-3) unless they remembered to pass the same value again. The saved value is right there in the file but unused.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true. We can change that to also using the saved epsilon_decay when continuing a training.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to implement this change before you approve.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eisDNV not sure that I can commit to your PR.
but it is addition of 2 lines after existing line 313

`# line 313 (existing)
self.epsilon = float(from_dump["q_agent"].get("epsilon", 1.0))

add after:

self.epsilon_decay = float(from_dump["q_agent"].get("epsilon-decay", self.epsilon_decay))
self.final_epsilon = float(from_dump["q_agent"].get("final-epsilon", self.final_epsilon))
`

The environment to be trained. Must provide `.reset()` and `.step()` methods.
learning_rate : float, optional
How quickly to update Q-values, in the range (0, 1] (default 0.1).
initial_epsilon : float, optional
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eisDNV Should this be replaced with epsilon_decay?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I do not quite understand. Fresh training starts at initial_epsilon and reduces down to final_epsilon through the episodes (using epsilon_decay). The learning rates relates to how the q_values are updated (independent of epsilon_decay)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor no longer has initial_epsilon as a parameter — it was replaced by epsilon_decay in this PR. The docstring still references it, so it needs updating to epsilon_decay.

…e value from the (new) agent. epsilon_decay default changed to 1e-4. New results from q-learning included.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants