Update ddqn_agent.py to prevent RuntimeError with newer pytorch version by atlevesque · Pull Request #3 · philtabor/Deep-Q-Learning-Paper-To-Code

atlevesque · 2020-04-25T22:12:17Z

When running the ddqn agent on pytorch v 1.5.0 I get the following RuntimeError:

RuntimeError: range.second - range.first == t.size() INTERNAL ASSERT FAILED at ..\torch\csrc\autograd\generated\Functions.cpp:57, please report a bug to PyTorch. inconsistent range for TensorList output (copy_range at ..\torch\csrc\autograd\generated\Functions.cpp:57)
(no backtrace available)'

My guess is that there is a diamond shape dependency when running the backward method as the self.q_eval network parameters affect the loss via q_pred and q_eval.

I fixed the issue by explicitly detaching the max_actions tensor from the computational tree as it is a discrete value and small changes in the self.q_eval network parameters should not change the max_actions taken. The derivative of the loss with respect to the self.q_eval network parameters thus only comes from the q_pred calculation.

I tested this change on my computer and got good performance and (more improtantly) didn't get the RuntimeError.

When running the ddqn agent on pytorch v 1.5.0 I get the following RuntimeError: RuntimeError: range.second - range.first == t.size() INTERNAL ASSERT FAILED at ..\torch\csrc\autograd\generated\Functions.cpp:57, please report a bug to PyTorch. inconsistent range for TensorList output (copy_range at ..\torch\csrc\autograd\generated\Functions.cpp:57) (no backtrace available)' My guess is that there is a diamond shape dependency when running the backward method as the `self.q_eval` network parameters affect the loss via `q_pred` and `q_eval`. I fixed the issue by explicitly detaching the `max_actions` tensor from the computational tree as it is a discrete value and small changes in the `self.q_eval` network parameters should not change the max_actions taken. The derivative of the loss with respect to the `self.q_eval` network parameters thus only comes from the q_pred calculation. I tested this change on my computer and got good performance and (more improtantly) didn't get the RuntimeError.

atlevesque · 2020-04-25T22:17:06Z

Here are the results I got when running the same Pong test case as you did in the course. it is marginally better than my run with the DQN algorythm and slightly worse than the score you had in your demo as I had to dramattically decrease the ReplayMemory size to fit in my old 6Gb RAM PC😞

srikanthkb · 2020-05-06T15:44:05Z

Here are the results I got when running the same Pong test case as you did in the course. it is marginally better than my run with the DQN algorythm and slightly worse than the score you had in your demo as I had to dramattically decrease the ReplayMemory size to fit in my old 6Gb RAM PC😞

Hi,
Did you make any other changes before running the main_ddqn.py ?
When i tried to run it, the agent is not learning and the average scores are around -17.0, can you let me know how were you able to obtain appropriate results ?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ddqn_agent.py to prevent RuntimeError with newer pytorch version#3

Update ddqn_agent.py to prevent RuntimeError with newer pytorch version#3
atlevesque wants to merge 1 commit intophiltabor:masterfrom
atlevesque:patch-1

atlevesque commented Apr 25, 2020

Uh oh!

atlevesque commented Apr 25, 2020

Uh oh!

srikanthkb commented May 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

atlevesque commented Apr 25, 2020

Uh oh!

atlevesque commented Apr 25, 2020

Uh oh!

srikanthkb commented May 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants