capsnet readme

2025-08-14 01:13:00 +08:00 · 2021-03-05 15:17:54 +05:30
parent bfe089fd80
commit 38f994a2c0
7 changed files with 156 additions and 9 deletions
--- a/docs/rl/ppo/gae.html
+++ b/docs/rl/ppo/gae.html
@ -123,7 +123,7 @@
 \hat{A_t^{(\infty)}} &= r_t + \gamma r_{t+1} +\gamma^2 r_{t+1} + ... - V(s)
 \end{align}</script>
 </p>
-<p>$\hat{A_t^{(1)}}$ is high bias, low variance whilst
+<p>$\hat{A_t^{(1)}}$ is high bias, low variance, whilst
 $\hat{A_t^{(\infty)}}$ is unbiased, high variance.</p>
 <p>We take a weighted average of $\hat{A_t^{(k)}}$ to balance bias and variance.
 This is called Generalized Advantage Estimation.
--- a/docs/rl/ppo/index.html
+++ b/docs/rl/ppo/index.html
@ -76,9 +76,9 @@
 <p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of
 <a href="https://arxiv.org/abs/1707.06347">Proximal Policy Optimization - PPO</a>.</p>
 <p>PPO is a policy gradient method for reinforcement learning.
-Simple policy gradient methods one do a single gradient update per sample (or a set of samples).
-Doing multiple gradient steps for a singe sample causes problems
-because the policy deviates too much producing a bad policy.
+Simple policy gradient methods do a single gradient update per sample (or a set of samples).
+Doing multiple gradient steps for a single sample causes problems
+because the policy deviates too much, producing a bad policy.
 PPO lets us do multiple gradient updates per sample by trying to keep the
 policy close to the policy that was used to sample data.
 It does so by clipping gradient flow if the updated policy
@ -172,7 +172,7 @@ J(\pi_\theta) - J(\pi_{\theta_{OLD}})
 </p>
 <p>Then we assume $d^\pi_\theta(s)$ and  $d^\pi_{\theta_{OLD}}(s)$ are similar.
 The error we introduce to $J(\pi_\theta) - J(\pi_{\theta_{OLD}})$
- by this assumtion is bound by the KL divergence between
+ by this assumption is bound by the KL divergence between
 $\pi_\theta$ and $\pi_{\theta_{OLD}}$.
 <a href="https://arxiv.org/abs/1705.10528">Constrained Policy Optimization</a>
 shows the proof of this. I haven&rsquo;t read it.</p>