diff --git a/docs/rl/dqn/index.html b/docs/rl/dqn/index.html index 5042dadf..3ff95797 100644 --- a/docs/rl/dqn/index.html +++ b/docs/rl/dqn/index.html @@ -100,7 +100,7 @@
We want to find optimal action-value function.
In order to improve stability we use experience replay that randomly sample from previous experience . We also use a Q network with a separate set of paramters to calculate the target. is updated periodically. This is according to paper Human Level Control Through Deep Reinforcement Learning.
+In order to improve stability we use experience replay that randomly sample from previous experience . We also use a Q network with a separate set of parameters to calculate the target. is updated periodically. This is according to paper Human Level Control Through Deep Reinforcement Learning.
So the loss function is,
The max operator in the above calculation uses same network for both selecting the best action and for evaluating the value. That is, We use double Q-learning, where the is taken from and the value is taken from .
@@ -302,4 +302,4 @@ handleImages()