From 38f994a2c034fabac70bd8b197c55b6cf8b05068 Mon Sep 17 00:00:00 2001
From: Varuna Jayasiri Capsule network is a neural network architecture that embeds features
as capsules and routes them with a voting mechanism to next layer of capsules. Unlike in other implementations of models, we’ve included a sample, because
-it is difficult to understand some of the concepts with just the modules.
+it is difficult to understand some concepts with just the modules.
This is the annotated code for a model that uses capsules to classify MNIST dataset This file holds the implementations of the core modules of Capsule Networks. I used jindongwang/Pytorch-CapsuleNet to clarify some
diff --git a/docs/capsule_networks/readme.html b/docs/capsule_networks/readme.html
new file mode 100644
index 00000000..7cb46a86
--- /dev/null
+++ b/docs/capsule_networks/readme.html
@@ -0,0 +1,126 @@
+
+
+
+ home
+ capsule_networks
+ This is a PyTorch implementation/tutorial of
+Dynamic Routing Between Capsules. Capsule network is a neural network architecture that embeds features
+as capsules and routes them with a voting mechanism to next layer of capsules. Unlike in other implementations of models, we’ve included a sample, because
+it is difficult to understand some concepts with just the modules.
+This is the annotated code for a model that uses capsules to classify MNIST dataset This file holds the implementations of the core modules of Capsule Networks. I used jindongwang/Pytorch-CapsuleNet to clarify some
+confusions I had with the paper. Here’s a notebook for training a Capsule Network on MNIST dataset.Capsule Networks
+
$\hat{A_t^{(1)}}$ is high bias, low variance whilst +
$\hat{A_t^{(1)}}$ is high bias, low variance, whilst $\hat{A_t^{(\infty)}}$ is unbiased, high variance.
We take a weighted average of $\hat{A_t^{(k)}}$ to balance bias and variance. This is called Generalized Advantage Estimation. diff --git a/docs/rl/ppo/index.html b/docs/rl/ppo/index.html index 85f78aeb..19aaf7c2 100644 --- a/docs/rl/ppo/index.html +++ b/docs/rl/ppo/index.html @@ -76,9 +76,9 @@
This is a PyTorch implementation of Proximal Policy Optimization - PPO.
PPO is a policy gradient method for reinforcement learning. -Simple policy gradient methods one do a single gradient update per sample (or a set of samples). -Doing multiple gradient steps for a singe sample causes problems -because the policy deviates too much producing a bad policy. +Simple policy gradient methods do a single gradient update per sample (or a set of samples). +Doing multiple gradient steps for a single sample causes problems +because the policy deviates too much, producing a bad policy. PPO lets us do multiple gradient updates per sample by trying to keep the policy close to the policy that was used to sample data. It does so by clipping gradient flow if the updated policy @@ -172,7 +172,7 @@ J(\pi_\theta) - J(\pi_{\theta_{OLD}})
Then we assume $d^\pi_\theta(s)$ and $d^\pi_{\theta_{OLD}}(s)$ are similar. The error we introduce to $J(\pi_\theta) - J(\pi_{\theta_{OLD}})$ - by this assumtion is bound by the KL divergence between + by this assumption is bound by the KL divergence between $\pi_\theta$ and $\pi_{\theta_{OLD}}$. Constrained Policy Optimization shows the proof of this. I haven’t read it.
diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 0448a6f9..76bfc8d3 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -659,14 +659,14 @@