From 1ee5cd68719780ef1d7e28de63ea70ad033d4999 Mon Sep 17 00:00:00 2001 From: Ikko Ashimine Date: Mon, 26 Dec 2022 17:47:44 +0900 Subject: [PATCH] fix typo in rl/dqn/index.html paramters -> parameters --- docs/rl/dqn/index.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/rl/dqn/index.html b/docs/rl/dqn/index.html index 5042dadf..3ff95797 100644 --- a/docs/rl/dqn/index.html +++ b/docs/rl/dqn/index.html @@ -100,7 +100,7 @@

Train the model

We want to find optimal action-value function.

Target network 🎯

-

In order to improve stability we use experience replay that randomly sample from previous experience . We also use a Q network with a separate set of paramters to calculate the target. is updated periodically. This is according to paper Human Level Control Through Deep Reinforcement Learning.

+

In order to improve stability we use experience replay that randomly sample from previous experience . We also use a Q network with a separate set of parameters to calculate the target. is updated periodically. This is according to paper Human Level Control Through Deep Reinforcement Learning.

So the loss function is,

Double -Learning

The max operator in the above calculation uses same network for both selecting the best action and for evaluating the value. That is, We use double Q-learning, where the is taken from and the value is taken from .

@@ -302,4 +302,4 @@ handleImages() - \ No newline at end of file +