paper url fix

2025-08-14 17:41:37 +08:00 · 2024-06-21 19:01:16 +05:30
parent 09d09379c2
commit f00ba4a70f
318 changed files with 378 additions and 378 deletions
--- a/translate_cache/activations/fta/init.ja.json
+++ b/translate_cache/activations/fta/init.ja.json
--- a/translate_cache/activations/fta/init.si.json
+++ b/translate_cache/activations/fta/init.si.json
--- a/translate_cache/activations/fta/init.zh.json
+++ b/translate_cache/activations/fta/init.zh.json
--- a/translate_cache/adaptive_computation/parity.ja.json
+++ b/translate_cache/adaptive_computation/parity.ja.json
@ -1,5 +1,5 @@
 {
- "<h1>Parity Task</h1>\n<p>This creates data for Parity Task from the paper <a href=\"https://papers.labml.ai/paper/1603.08983\">Adaptive Computation Time for Recurrent Neural Networks</a>.</p>\n<p>The input of the parity task is a vector with <span translate=no>_^_0_^_</span>&#x27;s <span translate=no>_^_1_^_</span>&#x27;s and <span translate=no>_^_2_^_</span>&#x27;s. The output is the parity of <span translate=no>_^_3_^_</span>&#x27;s - one if there is an odd number of <span translate=no>_^_4_^_</span>&#x27;s and zero otherwise. The input is generated by making a random number of elements in the vector either <span translate=no>_^_5_^_</span> or <span translate=no>_^_6_^_</span>&#x27;s.</p>\n": "<h1>\u30d1\u30ea\u30c6\u30a3\u30bf\u30b9\u30af</h1>\n<p>\u3053\u308c\u306b\u3088\u308a\u3001\u8ad6\u6587\u300c<a href=\"https://papers.labml.ai/paper/1603.08983\">\u30ea\u30ab\u30ec\u30f3\u30c8\u30cb\u30e5\u30fc\u30e9\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u9069\u5fdc\u7684\u8a08\u7b97\u6642\u9593</a>\u300d\u304b\u3089\u30d1\u30ea\u30c6\u30a3\u30bf\u30b9\u30af\u306e\u30c7\u30fc\u30bf\u304c\u4f5c\u6210\u3055\u308c\u307e\u3059\u3002</p>\n<p>\u30d1\u30ea\u30c6\u30a3\u30bf\u30b9\u30af\u306e\u5165\u529b\u306f\u3001\u3068 <span translate=no>_^_2_^_</span> s <span translate=no>_^_0_^_</span> <span translate=no>_^_1_^_</span> \u306e\u4ed8\u3044\u305f\u30d9\u30af\u30c8\u30eb\u3067\u3001\u51fa\u529b\u306f s <span translate=no>_^_3_^_</span> \u306e\u30d1\u30ea\u30c6\u30a3\u3067\u3059\u3002s <span translate=no>_^_4_^_</span> \u306e\u6570\u304c\u5947\u6570\u306e\u5834\u5408\u306f 1\u3001\u305d\u308c\u4ee5\u5916\u306e\u5834\u5408\u306f 0 \u3067\u3059\u3002\u5165\u529b\u306f\u3001<span translate=no>_^_5_^_</span><span translate=no>_^_6_^_</span>\u30d9\u30af\u30c8\u30eb\u5185\u306e\u30e9\u30f3\u30c0\u30e0\u306a\u6570\u306e\u8981\u7d20\u3092\u307e\u305f\u306f\u306e\u3044\u305a\u308c\u304b\u306b\u3059\u308b\u3053\u3068\u306b\u3088\u3063\u3066\u751f\u6210\u3055\u308c\u307e\u3059\u3002</p>\n",
+ "<h1>Parity Task</h1>\n<p>This creates data for Parity Task from the paper <a href=\"https://arxiv.org/abs/1603.08983\">Adaptive Computation Time for Recurrent Neural Networks</a>.</p>\n<p>The input of the parity task is a vector with <span translate=no>_^_0_^_</span>&#x27;s <span translate=no>_^_1_^_</span>&#x27;s and <span translate=no>_^_2_^_</span>&#x27;s. The output is the parity of <span translate=no>_^_3_^_</span>&#x27;s - one if there is an odd number of <span translate=no>_^_4_^_</span>&#x27;s and zero otherwise. The input is generated by making a random number of elements in the vector either <span translate=no>_^_5_^_</span> or <span translate=no>_^_6_^_</span>&#x27;s.</p>\n": "<h1>\u30d1\u30ea\u30c6\u30a3\u30bf\u30b9\u30af</h1>\n<p>\u3053\u308c\u306b\u3088\u308a\u3001\u8ad6\u6587\u300c<a href=\"https://arxiv.org/abs/1603.08983\">\u30ea\u30ab\u30ec\u30f3\u30c8\u30cb\u30e5\u30fc\u30e9\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u9069\u5fdc\u7684\u8a08\u7b97\u6642\u9593</a>\u300d\u304b\u3089\u30d1\u30ea\u30c6\u30a3\u30bf\u30b9\u30af\u306e\u30c7\u30fc\u30bf\u304c\u4f5c\u6210\u3055\u308c\u307e\u3059\u3002</p>\n<p>\u30d1\u30ea\u30c6\u30a3\u30bf\u30b9\u30af\u306e\u5165\u529b\u306f\u3001\u3068 <span translate=no>_^_2_^_</span> s <span translate=no>_^_0_^_</span> <span translate=no>_^_1_^_</span> \u306e\u4ed8\u3044\u305f\u30d9\u30af\u30c8\u30eb\u3067\u3001\u51fa\u529b\u306f s <span translate=no>_^_3_^_</span> \u306e\u30d1\u30ea\u30c6\u30a3\u3067\u3059\u3002s <span translate=no>_^_4_^_</span> \u306e\u6570\u304c\u5947\u6570\u306e\u5834\u5408\u306f 1\u3001\u305d\u308c\u4ee5\u5916\u306e\u5834\u5408\u306f 0 \u3067\u3059\u3002\u5165\u529b\u306f\u3001<span translate=no>_^_5_^_</span><span translate=no>_^_6_^_</span>\u30d9\u30af\u30c8\u30eb\u5185\u306e\u30e9\u30f3\u30c0\u30e0\u306a\u6570\u306e\u8981\u7d20\u3092\u307e\u305f\u306f\u306e\u3044\u305a\u308c\u304b\u306b\u3059\u308b\u3053\u3068\u306b\u3088\u3063\u3066\u751f\u6210\u3055\u308c\u307e\u3059\u3002</p>\n",
 "<h3>Parity dataset</h3>\n": "<h3>\u30d1\u30ea\u30c6\u30a3\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8</h3>\n",
 "<p> </p>\n": "<p></p>\n",
 "<p> Generate a sample</p>\n": "<p>\u30b5\u30f3\u30d7\u30eb\u3092\u751f\u6210</p>\n",
--- a/translate_cache/adaptive_computation/parity.si.json
+++ b/translate_cache/adaptive_computation/parity.si.json
@ -1,5 +1,5 @@
 {
- "<h1>Parity Task</h1>\n<p>This creates data for Parity Task from the paper <a href=\"https://papers.labml.ai/paper/1603.08983\">Adaptive Computation Time for Recurrent Neural Networks</a>.</p>\n<p>The input of the parity task is a vector with <span translate=no>_^_0_^_</span>&#x27;s <span translate=no>_^_1_^_</span>&#x27;s and <span translate=no>_^_2_^_</span>&#x27;s. The output is the parity of <span translate=no>_^_3_^_</span>&#x27;s - one if there is an odd number of <span translate=no>_^_4_^_</span>&#x27;s and zero otherwise. The input is generated by making a random number of elements in the vector either <span translate=no>_^_5_^_</span> or <span translate=no>_^_6_^_</span>&#x27;s.</p>\n": "<h1>\u0dc3\u0db8\u0dcf\u0db1\u0dcf\u0dad\u0dca\u0db8\u0dad\u0dcf\u0d9a\u0dcf\u0dbb\u0dca\u0dba\u0dba</h1>\n<p>\u0db8\u0dd9\u0dba\u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dbd\u0dd2\u0db1\u0dca Parity Task \u0dc3\u0db3\u0dc4\u0dcf \u0daf\u0dad\u0dca\u0dad \u0db1\u0dd2\u0dbb\u0dca\u0db8\u0dcf\u0dab\u0dba \u0d9a\u0dbb\u0dba\u0dd2 <a href=\"https://papers.labml.ai/paper/1603.08983\">\u0d85\u0db1\u0dd4\u0dc0\u0dbb\u0dca\u0dad\u0dd3 \u0d9c\u0dab\u0db1\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dda \u0d9a\u0dcf\u0dbd\u0dba \u0db4\u0dd4\u0db1\u0dbb\u0dcf\u0dc0\u0dbb\u0dca\u0dad\u0db1 \u0dc3\u0dca\u0db1\u0dcf\u0dba\u0dd4\u0d9a \u0da2\u0dcf\u0dbd \u0dc3\u0db3\u0dc4\u0dcf</a>. </p>\n<p>\u0dc3\u0db8\u0dcf\u0db1\u0dcf\u0dad\u0dca\u0db8\u0dad\u0dcf\u0d9a\u0dbb\u0dca\u0dad\u0dc0\u0dca\u0dba\u0dba\u0dda \u0d86\u0daf\u0dcf\u0db1\u0dba \u0dba\u0db1\u0dd4 <span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span><span translate=no>_^_2_^_</span>\u0d9c\u0dda \u0dc4\u0dcf \u0dc3\u0db8\u0d9f \u0daf\u0ddb\u0dc1\u0dd2\u0d9a\u0dba\u0d9a\u0dd2. \u0db4\u0dca\u0dbb\u0dad\u0dd2\u0daf\u0dcf\u0db1\u0dba \u0dba\u0db1\u0dd4 <span translate=no>_^_3_^_</span>\u0dc3\u0db8\u0dcf\u0db1\u0dcf\u0dad\u0dca\u0db8\u0dad\u0dcf\u0dc0\u0dba\u0dba\u0dd2 - \u0d91\u0dc4\u0dd2 \u0db1\u0db8\u0dca \u0d91\u0d9a\u0dca \u0dba\u0db1\u0dd4 \u0d94\u0dad\u0dca\u0dad\u0dda \u0dc3\u0d82\u0d9b\u0dca\u0dba\u0dcf\u0dc0\u0d9a\u0dca \u0dc0\u0db1 <span translate=no>_^_4_^_</span>\u0d85\u0dad\u0dbb \u0dc0\u0dd9\u0db1\u0dad\u0dca \u0d86\u0d9a\u0dcf\u0dbb\u0dba\u0d9a\u0dd2\u0db1\u0dca \u0dc1\u0dd4\u0db1\u0dca\u0dba \u0dc0\u0dda. \u0d86\u0daf\u0dcf\u0db1\u0dba \u0da2\u0db1\u0db1\u0dba \u0d9a\u0dbb\u0db1\u0dd4 \u0dbd\u0db6\u0db1\u0dca\u0db1\u0dda \u0daf\u0ddb\u0dc1\u0dd2\u0d9a\u0dba\u0dda \u0d85\u0dc4\u0db9\u0dd4 \u0db8\u0dd6\u0dbd\u0daf\u0dca\u0dbb\u0dc0\u0dca\u0dba \u0dc3\u0d82\u0d9b\u0dca\u0dba\u0dcf\u0dc0\u0d9a\u0dca <span translate=no>_^_5_^_</span> \u0dc4\u0ddd <span translate=no>_^_6_^_</span>\u0dba.</p>\n",
+ "<h1>Parity Task</h1>\n<p>This creates data for Parity Task from the paper <a href=\"https://arxiv.org/abs/1603.08983\">Adaptive Computation Time for Recurrent Neural Networks</a>.</p>\n<p>The input of the parity task is a vector with <span translate=no>_^_0_^_</span>&#x27;s <span translate=no>_^_1_^_</span>&#x27;s and <span translate=no>_^_2_^_</span>&#x27;s. The output is the parity of <span translate=no>_^_3_^_</span>&#x27;s - one if there is an odd number of <span translate=no>_^_4_^_</span>&#x27;s and zero otherwise. The input is generated by making a random number of elements in the vector either <span translate=no>_^_5_^_</span> or <span translate=no>_^_6_^_</span>&#x27;s.</p>\n": "<h1>\u0dc3\u0db8\u0dcf\u0db1\u0dcf\u0dad\u0dca\u0db8\u0dad\u0dcf\u0d9a\u0dcf\u0dbb\u0dca\u0dba\u0dba</h1>\n<p>\u0db8\u0dd9\u0dba\u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dbd\u0dd2\u0db1\u0dca Parity Task \u0dc3\u0db3\u0dc4\u0dcf \u0daf\u0dad\u0dca\u0dad \u0db1\u0dd2\u0dbb\u0dca\u0db8\u0dcf\u0dab\u0dba \u0d9a\u0dbb\u0dba\u0dd2 <a href=\"https://arxiv.org/abs/1603.08983\">\u0d85\u0db1\u0dd4\u0dc0\u0dbb\u0dca\u0dad\u0dd3 \u0d9c\u0dab\u0db1\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dda \u0d9a\u0dcf\u0dbd\u0dba \u0db4\u0dd4\u0db1\u0dbb\u0dcf\u0dc0\u0dbb\u0dca\u0dad\u0db1 \u0dc3\u0dca\u0db1\u0dcf\u0dba\u0dd4\u0d9a \u0da2\u0dcf\u0dbd \u0dc3\u0db3\u0dc4\u0dcf</a>. </p>\n<p>\u0dc3\u0db8\u0dcf\u0db1\u0dcf\u0dad\u0dca\u0db8\u0dad\u0dcf\u0d9a\u0dbb\u0dca\u0dad\u0dc0\u0dca\u0dba\u0dba\u0dda \u0d86\u0daf\u0dcf\u0db1\u0dba \u0dba\u0db1\u0dd4 <span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span><span translate=no>_^_2_^_</span>\u0d9c\u0dda \u0dc4\u0dcf \u0dc3\u0db8\u0d9f \u0daf\u0ddb\u0dc1\u0dd2\u0d9a\u0dba\u0d9a\u0dd2. \u0db4\u0dca\u0dbb\u0dad\u0dd2\u0daf\u0dcf\u0db1\u0dba \u0dba\u0db1\u0dd4 <span translate=no>_^_3_^_</span>\u0dc3\u0db8\u0dcf\u0db1\u0dcf\u0dad\u0dca\u0db8\u0dad\u0dcf\u0dc0\u0dba\u0dba\u0dd2 - \u0d91\u0dc4\u0dd2 \u0db1\u0db8\u0dca \u0d91\u0d9a\u0dca \u0dba\u0db1\u0dd4 \u0d94\u0dad\u0dca\u0dad\u0dda \u0dc3\u0d82\u0d9b\u0dca\u0dba\u0dcf\u0dc0\u0d9a\u0dca \u0dc0\u0db1 <span translate=no>_^_4_^_</span>\u0d85\u0dad\u0dbb \u0dc0\u0dd9\u0db1\u0dad\u0dca \u0d86\u0d9a\u0dcf\u0dbb\u0dba\u0d9a\u0dd2\u0db1\u0dca \u0dc1\u0dd4\u0db1\u0dca\u0dba \u0dc0\u0dda. \u0d86\u0daf\u0dcf\u0db1\u0dba \u0da2\u0db1\u0db1\u0dba \u0d9a\u0dbb\u0db1\u0dd4 \u0dbd\u0db6\u0db1\u0dca\u0db1\u0dda \u0daf\u0ddb\u0dc1\u0dd2\u0d9a\u0dba\u0dda \u0d85\u0dc4\u0db9\u0dd4 \u0db8\u0dd6\u0dbd\u0daf\u0dca\u0dbb\u0dc0\u0dca\u0dba \u0dc3\u0d82\u0d9b\u0dca\u0dba\u0dcf\u0dc0\u0d9a\u0dca <span translate=no>_^_5_^_</span> \u0dc4\u0ddd <span translate=no>_^_6_^_</span>\u0dba.</p>\n",
 "<h3>Parity dataset</h3>\n": "<h3>\u0dc3\u0db8\u0dcf\u0db1\u0dcf\u0dad\u0dca\u0db8\u0dad\u0dcf\u0daf\u0dad\u0dca\u0dad \u0d9a\u0da7\u0dca\u0da7\u0dbd\u0dba</h3>\n",
 "<p> </p>\n": "<p> </p>\n",
 "<p> Generate a sample</p>\n": "<p> \u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd2\u0dba\u0d9a\u0dca\u0da2\u0db1\u0db1\u0dba \u0d9a\u0dbb\u0db1\u0dca\u0db1</p>\n",
--- a/translate_cache/adaptive_computation/parity.zh.json
+++ b/translate_cache/adaptive_computation/parity.zh.json
@ -1,5 +1,5 @@
 {
- "<h1>Parity Task</h1>\n<p>This creates data for Parity Task from the paper <a href=\"https://papers.labml.ai/paper/1603.08983\">Adaptive Computation Time for Recurrent Neural Networks</a>.</p>\n<p>The input of the parity task is a vector with <span translate=no>_^_0_^_</span>&#x27;s <span translate=no>_^_1_^_</span>&#x27;s and <span translate=no>_^_2_^_</span>&#x27;s. The output is the parity of <span translate=no>_^_3_^_</span>&#x27;s - one if there is an odd number of <span translate=no>_^_4_^_</span>&#x27;s and zero otherwise. The input is generated by making a random number of elements in the vector either <span translate=no>_^_5_^_</span> or <span translate=no>_^_6_^_</span>&#x27;s.</p>\n": "<h1>\u5947\u5076\u6821\u9a8c\u4efb\u52a1</h1>\n<p>\u8fd9\u5c06\u4ece\u8bba\u6587\u300a<a href=\"https://papers.labml.ai/paper/1603.08983\">\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u81ea\u9002\u5e94\u8ba1\u7b97\u65f6\u95f4\u300b\u4e2d\u4e3a</a>\u5947\u5076\u6821\u9a8c\u4efb\u52a1\u521b\u5efa\u6570\u636e\u3002</p>\n<p>\u5947\u5076\u6821\u9a8c\u4efb\u52a1\u7684\u8f93\u5165\u662f\u4e00\u4e2a\u5e26\u6709<span translate=no>_^_0_^_</span>'s \u548c<span translate=no>_^_1_^_</span>'s \u7684\u5411\u91cf\u3002\u8f93\u51fa\u662f<span translate=no>_^_2_^_</span>'s \u7684<span translate=no>_^_3_^_</span>\u5947\u5076\u6821\u9a8c\u2014\u2014\u5982\u679c\u6709\uff0c\u5219\u4e3a 1\u662f\u7684\u5947\u6570<span translate=no>_^_4_^_</span>\uff0c\u5426\u5219\u4e3a\u96f6\u3002\u8f93\u5165\u662f\u901a\u8fc7\u4f7f\u77e2\u91cf\u4e2d\u7684\u968f\u673a\u6570\u91cf\u7684\u5143\u7d20\u4e3a<span translate=no>_^_5_^_</span>\u6216\u800c\u751f\u6210<span translate=no>_^_6_^_</span>\u7684\u3002</p>\n",
+ "<h1>Parity Task</h1>\n<p>This creates data for Parity Task from the paper <a href=\"https://arxiv.org/abs/1603.08983\">Adaptive Computation Time for Recurrent Neural Networks</a>.</p>\n<p>The input of the parity task is a vector with <span translate=no>_^_0_^_</span>&#x27;s <span translate=no>_^_1_^_</span>&#x27;s and <span translate=no>_^_2_^_</span>&#x27;s. The output is the parity of <span translate=no>_^_3_^_</span>&#x27;s - one if there is an odd number of <span translate=no>_^_4_^_</span>&#x27;s and zero otherwise. The input is generated by making a random number of elements in the vector either <span translate=no>_^_5_^_</span> or <span translate=no>_^_6_^_</span>&#x27;s.</p>\n": "<h1>\u5947\u5076\u6821\u9a8c\u4efb\u52a1</h1>\n<p>\u8fd9\u5c06\u4ece\u8bba\u6587\u300a<a href=\"https://arxiv.org/abs/1603.08983\">\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\u7684\u81ea\u9002\u5e94\u8ba1\u7b97\u65f6\u95f4\u300b\u4e2d\u4e3a</a>\u5947\u5076\u6821\u9a8c\u4efb\u52a1\u521b\u5efa\u6570\u636e\u3002</p>\n<p>\u5947\u5076\u6821\u9a8c\u4efb\u52a1\u7684\u8f93\u5165\u662f\u4e00\u4e2a\u5e26\u6709<span translate=no>_^_0_^_</span>'s \u548c<span translate=no>_^_1_^_</span>'s \u7684\u5411\u91cf\u3002\u8f93\u51fa\u662f<span translate=no>_^_2_^_</span>'s \u7684<span translate=no>_^_3_^_</span>\u5947\u5076\u6821\u9a8c\u2014\u2014\u5982\u679c\u6709\uff0c\u5219\u4e3a 1\u662f\u7684\u5947\u6570<span translate=no>_^_4_^_</span>\uff0c\u5426\u5219\u4e3a\u96f6\u3002\u8f93\u5165\u662f\u901a\u8fc7\u4f7f\u77e2\u91cf\u4e2d\u7684\u968f\u673a\u6570\u91cf\u7684\u5143\u7d20\u4e3a<span translate=no>_^_5_^_</span>\u6216\u800c\u751f\u6210<span translate=no>_^_6_^_</span>\u7684\u3002</p>\n",
 "<h3>Parity dataset</h3>\n": "<h3>\u5947\u5076\u6821\u9a8c\u6570\u636e</h3>\n",
 "<p> </p>\n": "<p></p>\n",
 "<p> Generate a sample</p>\n": "<p>\u751f\u6210\u6837\u672c</p>\n",
--- a/translate_cache/adaptive_computation/ponder_net/init.ja.json
+++ b/translate_cache/adaptive_computation/ponder_net/init.ja.json
--- a/translate_cache/adaptive_computation/ponder_net/init.si.json
+++ b/translate_cache/adaptive_computation/ponder_net/init.si.json
--- a/translate_cache/adaptive_computation/ponder_net/init.zh.json
+++ b/translate_cache/adaptive_computation/ponder_net/init.zh.json
--- a/translate_cache/adaptive_computation/ponder_net/readme.ja.json
+++ b/translate_cache/adaptive_computation/ponder_net/readme.ja.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/adaptive_computation/ponder_net/index.html\">PonderNet: Learning to Ponder</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://papers.labml.ai/paper/2107.05407\">PonderNet: Learning to Ponder</a>.</p>\n<p>PonderNet adapts the computation based on the input. It changes the number of steps to take on a recurrent network based on the input. PonderNet learns this with end-to-end gradient descent. </p>\n": "<h1><a href=\"https://nn.labml.ai/adaptive_computation/ponder_net/index.html\">PonderNet: \u719f\u8003\u3059\u308b\u3053\u3068\u3092\u5b66\u3076</a></h1>\n<p>\u3053\u308c\u306f\u3001\u8ad6\u6587\u300c<a href=\"https://papers.labml.ai/paper/2107.05407\">PonderNet: \u719f\u8003\u3092\u5b66\u307c\u3046</a>\u300d<a href=\"https://pytorch.org\">\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a>\u3002</p>\n<p>PonderNet \u306f\u5165\u529b\u306b\u57fa\u3065\u3044\u3066\u8a08\u7b97\u3092\u8abf\u6574\u3057\u307e\u3059\u3002\u5165\u529b\u306b\u57fa\u3065\u3044\u3066\u30ea\u30ab\u30ec\u30f3\u30c8\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3067\u5b9f\u884c\u3059\u308b\u30b9\u30c6\u30c3\u30d7\u306e\u6570\u3092\u5909\u66f4\u3057\u307e\u3059\u3002PonderNet\u306f\u3053\u308c\u3092\u7aef\u304b\u3089\u7aef\u307e\u3067\u306e\u52fe\u914d\u964d\u4e0b\u6cd5\u3067\u5b66\u7fd2\u3057\u307e\u3059</p>\u3002\n",
+ "<h1><a href=\"https://nn.labml.ai/adaptive_computation/ponder_net/index.html\">PonderNet: Learning to Ponder</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://arxiv.org/abs/2107.05407\">PonderNet: Learning to Ponder</a>.</p>\n<p>PonderNet adapts the computation based on the input. It changes the number of steps to take on a recurrent network based on the input. PonderNet learns this with end-to-end gradient descent. </p>\n": "<h1><a href=\"https://nn.labml.ai/adaptive_computation/ponder_net/index.html\">PonderNet: \u719f\u8003\u3059\u308b\u3053\u3068\u3092\u5b66\u3076</a></h1>\n<p>\u3053\u308c\u306f\u3001\u8ad6\u6587\u300c<a href=\"https://arxiv.org/abs/2107.05407\">PonderNet: \u719f\u8003\u3092\u5b66\u307c\u3046</a>\u300d<a href=\"https://pytorch.org\">\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a>\u3002</p>\n<p>PonderNet \u306f\u5165\u529b\u306b\u57fa\u3065\u3044\u3066\u8a08\u7b97\u3092\u8abf\u6574\u3057\u307e\u3059\u3002\u5165\u529b\u306b\u57fa\u3065\u3044\u3066\u30ea\u30ab\u30ec\u30f3\u30c8\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3067\u5b9f\u884c\u3059\u308b\u30b9\u30c6\u30c3\u30d7\u306e\u6570\u3092\u5909\u66f4\u3057\u307e\u3059\u3002PonderNet\u306f\u3053\u308c\u3092\u7aef\u304b\u3089\u7aef\u307e\u3067\u306e\u52fe\u914d\u964d\u4e0b\u6cd5\u3067\u5b66\u7fd2\u3057\u307e\u3059</p>\u3002\n",
 "PonderNet: Learning to Ponder": "PonderNet: \u719f\u8003\u3059\u308b\u3053\u3068\u3092\u5b66\u3076"
 }
--- a/translate_cache/adaptive_computation/ponder_net/readme.si.json
+++ b/translate_cache/adaptive_computation/ponder_net/readme.si.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/adaptive_computation/ponder_net/index.html\">PonderNet: Learning to Ponder</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://papers.labml.ai/paper/2107.05407\">PonderNet: Learning to Ponder</a>.</p>\n<p>PonderNet adapts the computation based on the input. It changes the number of steps to take on a recurrent network based on the input. PonderNet learns this with end-to-end gradient descent. </p>\n": "<h1><a href=\"https://nn.labml.ai/adaptive_computation/ponder_net/index.html\">\u0db4\u0ddc\u0db1\u0dca\u0da9\u0dbb\u0dca\u0db1\u0dd9\u0da7\u0dca: \u0db8\u0dd9\u0db1\u0dd9\u0dc4\u0dd2 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0da7 \u0d89\u0d9c\u0dd9\u0db1\u0dd3\u0db8</a></h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://papers.labml.ai/paper/2107.05407\">PonderNet \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 <a href=\"https://pytorch.org\">PyTorch</a> \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8: \u0db4\u0ddc\u0db1\u0dca\u0da9\u0dbb\u0dca \u0dc0\u0dd9\u0dad \u0d89\u0d9c\u0dd9\u0db1\u0dd3\u0db8</a> . </p>\n<p>PonderNet\u0d86\u0daf\u0dcf\u0db1\u0dba \u0db8\u0dad \u0db4\u0daf\u0db1\u0db8\u0dca\u0dc0 \u0d9c\u0dab\u0db1\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0d85\u0db1\u0dd4\u0dc0\u0dbb\u0dca\u0dad\u0db1\u0dba \u0d9a\u0dbb\u0dba\u0dd2. \u0d86\u0daf\u0dcf\u0db1\u0dba \u0db8\u0dad \u0db4\u0daf\u0db1\u0db8\u0dca\u0dc0 \u0db4\u0dd4\u0db1\u0dbb\u0dcf\u0dc0\u0dbb\u0dca\u0dad\u0db1 \u0da2\u0dcf\u0dbd\u0dba\u0d9a\u0dca \u0dc3\u0db3\u0dc4\u0dcf \u0d9c\u0dad \u0dba\u0dd4\u0dad\u0dd4 \u0db4\u0dd2\u0dba\u0dc0\u0dbb \u0d9c\u0dab\u0db1 \u0d91\u0dba \u0dc0\u0dd9\u0db1\u0dc3\u0dca \u0d9a\u0dbb\u0dba\u0dd2. \u0db4\u0ddc\u0db1\u0dca\u0da9\u0dbb\u0dca\u0db1\u0dd9\u0da7\u0dca \u0db8\u0dd9\u0dba \u0d89\u0d9c\u0dd9\u0db1 \u0d9c\u0db1\u0dca\u0db1\u0dda \u0d85\u0dc0\u0dc3\u0dcf\u0db1\u0dba \u0dc3\u0dd2\u0da7 \u0d85\u0dc0\u0dc3\u0dcf\u0db1\u0dba \u0daf\u0d9a\u0dca\u0dc0\u0dcf \u0dc0\u0dd6 \u0dc1\u0dca\u0dbb\u0dda\u0dab\u0dd2\u0dba\u0dda \u0dc3\u0db8\u0dca\u0db7\u0dc0\u0dba\u0d9a\u0dca \u0dc3\u0db8\u0d9f\u0dba. </p>\n",
+ "<h1><a href=\"https://nn.labml.ai/adaptive_computation/ponder_net/index.html\">PonderNet: Learning to Ponder</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://arxiv.org/abs/2107.05407\">PonderNet: Learning to Ponder</a>.</p>\n<p>PonderNet adapts the computation based on the input. It changes the number of steps to take on a recurrent network based on the input. PonderNet learns this with end-to-end gradient descent. </p>\n": "<h1><a href=\"https://nn.labml.ai/adaptive_computation/ponder_net/index.html\">\u0db4\u0ddc\u0db1\u0dca\u0da9\u0dbb\u0dca\u0db1\u0dd9\u0da7\u0dca: \u0db8\u0dd9\u0db1\u0dd9\u0dc4\u0dd2 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0da7 \u0d89\u0d9c\u0dd9\u0db1\u0dd3\u0db8</a></h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://arxiv.org/abs/2107.05407\">PonderNet \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 <a href=\"https://pytorch.org\">PyTorch</a> \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8: \u0db4\u0ddc\u0db1\u0dca\u0da9\u0dbb\u0dca \u0dc0\u0dd9\u0dad \u0d89\u0d9c\u0dd9\u0db1\u0dd3\u0db8</a> . </p>\n<p>PonderNet\u0d86\u0daf\u0dcf\u0db1\u0dba \u0db8\u0dad \u0db4\u0daf\u0db1\u0db8\u0dca\u0dc0 \u0d9c\u0dab\u0db1\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0d85\u0db1\u0dd4\u0dc0\u0dbb\u0dca\u0dad\u0db1\u0dba \u0d9a\u0dbb\u0dba\u0dd2. \u0d86\u0daf\u0dcf\u0db1\u0dba \u0db8\u0dad \u0db4\u0daf\u0db1\u0db8\u0dca\u0dc0 \u0db4\u0dd4\u0db1\u0dbb\u0dcf\u0dc0\u0dbb\u0dca\u0dad\u0db1 \u0da2\u0dcf\u0dbd\u0dba\u0d9a\u0dca \u0dc3\u0db3\u0dc4\u0dcf \u0d9c\u0dad \u0dba\u0dd4\u0dad\u0dd4 \u0db4\u0dd2\u0dba\u0dc0\u0dbb \u0d9c\u0dab\u0db1 \u0d91\u0dba \u0dc0\u0dd9\u0db1\u0dc3\u0dca \u0d9a\u0dbb\u0dba\u0dd2. \u0db4\u0ddc\u0db1\u0dca\u0da9\u0dbb\u0dca\u0db1\u0dd9\u0da7\u0dca \u0db8\u0dd9\u0dba \u0d89\u0d9c\u0dd9\u0db1 \u0d9c\u0db1\u0dca\u0db1\u0dda \u0d85\u0dc0\u0dc3\u0dcf\u0db1\u0dba \u0dc3\u0dd2\u0da7 \u0d85\u0dc0\u0dc3\u0dcf\u0db1\u0dba \u0daf\u0d9a\u0dca\u0dc0\u0dcf \u0dc0\u0dd6 \u0dc1\u0dca\u0dbb\u0dda\u0dab\u0dd2\u0dba\u0dda \u0dc3\u0db8\u0dca\u0db7\u0dc0\u0dba\u0d9a\u0dca \u0dc3\u0db8\u0d9f\u0dba. </p>\n",
 "PonderNet: Learning to Ponder": "\u0db4\u0ddc\u0db1\u0dca\u0da9\u0dbb\u0dca\u0db1\u0dd9\u0da7\u0dca: \u0db8\u0dd9\u0db1\u0dd9\u0dc4\u0dd2 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0da7 \u0d89\u0d9c\u0dd9\u0db1\u0dd3\u0db8"
 }
--- a/translate_cache/adaptive_computation/ponder_net/readme.zh.json
+++ b/translate_cache/adaptive_computation/ponder_net/readme.zh.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/adaptive_computation/ponder_net/index.html\">PonderNet: Learning to Ponder</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://papers.labml.ai/paper/2107.05407\">PonderNet: Learning to Ponder</a>.</p>\n<p>PonderNet adapts the computation based on the input. It changes the number of steps to take on a recurrent network based on the input. PonderNet learns this with end-to-end gradient descent. </p>\n": "<h1><a href=\"https://nn.labml.ai/adaptive_computation/ponder_net/index.html\">PonderNet\uff1a\u5b66\u4f1a\u601d\u8003</a></h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">P <a href=\"https://papers.labml.ai/paper/2107.05407\">onderNet\uff1a\u5b66\u4f1a\u601d\u8003</a>\u8bba\u6587\u7684 PyTorch</a> \u5b9e\u73b0\u3002</p>\n<p>PonderNet \u6839\u636e\u8f93\u5165\u8c03\u6574\u8ba1\u7b97\u3002\u5b83\u4f1a\u6839\u636e\u8f93\u5165\u66f4\u6539\u5faa\u73af\u7f51\u7edc\u4e0a\u8981\u6267\u884c\u7684\u6b65\u9aa4\u6570\u3002PonderNet \u901a\u8fc7\u7aef\u5230\u7aef\u68af\u5ea6\u4e0b\u964d\u6765\u5b66\u4e60\u8fd9\u4e00\u70b9\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/adaptive_computation/ponder_net/index.html\">PonderNet: Learning to Ponder</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://arxiv.org/abs/2107.05407\">PonderNet: Learning to Ponder</a>.</p>\n<p>PonderNet adapts the computation based on the input. It changes the number of steps to take on a recurrent network based on the input. PonderNet learns this with end-to-end gradient descent. </p>\n": "<h1><a href=\"https://nn.labml.ai/adaptive_computation/ponder_net/index.html\">PonderNet\uff1a\u5b66\u4f1a\u601d\u8003</a></h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">P <a href=\"https://arxiv.org/abs/2107.05407\">onderNet\uff1a\u5b66\u4f1a\u601d\u8003</a>\u8bba\u6587\u7684 PyTorch</a> \u5b9e\u73b0\u3002</p>\n<p>PonderNet \u6839\u636e\u8f93\u5165\u8c03\u6574\u8ba1\u7b97\u3002\u5b83\u4f1a\u6839\u636e\u8f93\u5165\u66f4\u6539\u5faa\u73af\u7f51\u7edc\u4e0a\u8981\u6267\u884c\u7684\u6b65\u9aa4\u6570\u3002PonderNet \u901a\u8fc7\u7aef\u5230\u7aef\u68af\u5ea6\u4e0b\u964d\u6765\u5b66\u4e60\u8fd9\u4e00\u70b9\u3002</p>\n",
 "PonderNet: Learning to Ponder": "PonderNet\uff1a\u5b66\u4f1a\u601d\u8003"
 }
--- a/translate_cache/capsule_networks/init.ja.json
+++ b/translate_cache/capsule_networks/init.ja.json
@ -1,5 +1,5 @@
 {
- "<h1>Capsule Networks</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of <a href=\"https://papers.labml.ai/paper/1710.09829\">Dynamic Routing Between Capsules</a>.</p>\n<p>Capsule network is a neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules.</p>\n<p>Unlike in other implementations of models, we&#x27;ve included a sample, because it is difficult to understand some concepts with just the modules. <a href=\"mnist.html\">This is the annotated code for a model that uses capsules to classify MNIST dataset</a></p>\n<p>This file holds the implementations of the core modules of Capsule Networks.</p>\n<p>I used <a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">jindongwang/Pytorch-CapsuleNet</a> to clarify some confusions I had with the paper.</p>\n<p>Here&#x27;s a notebook for training a Capsule Network on MNIST dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af</h1>\n<p><a href=\"https://papers.labml.ai/paper/1710.09829\">\u3053\u308c\u306f\u3001<a href=\"https://pytorch.org\">\u30ab\u30d7\u30bb\u30eb\u9593\u306e\u52d5\u7684\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u306ePyTorch\u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3067\u3059</a>\u3002</a></p>\n<p>\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306f\u3001\u30d5\u30a3\u30fc\u30c1\u30e3\u3092\u30ab\u30d7\u30bb\u30eb\u3068\u3057\u3066\u57cb\u3081\u8fbc\u307f\u3001\u6295\u7968\u30e1\u30ab\u30cb\u30ba\u30e0\u3092\u4f7f\u7528\u3057\u3066\u6b21\u306e\u30ab\u30d7\u30bb\u30eb\u5c64\u306b\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u3059\u308b\u30cb\u30e5\u30fc\u30e9\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u3067\u3059\u3002</p>\n<p>\u4ed6\u306e\u30e2\u30c7\u30eb\u306e\u5b9f\u88c5\u3068\u306f\u7570\u306a\u308a\u3001\u30e2\u30b8\u30e5\u30fc\u30eb\u3060\u3051\u3067\u306f\u4e00\u90e8\u306e\u6982\u5ff5\u3092\u7406\u89e3\u3059\u308b\u306e\u304c\u96e3\u3057\u3044\u305f\u3081\u3001\u30b5\u30f3\u30d7\u30eb\u3092\u7528\u610f\u3057\u3066\u3044\u307e\u3059\u3002</p><a href=\"mnist.html\">\u3053\u308c\u306f\u3001\u30ab\u30d7\u30bb\u30eb\u3092\u4f7f\u7528\u3057\u3066 MNIST \u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3092\u5206\u985e\u3059\u308b\u30e2\u30c7\u30eb\u306e\u6ce8\u91c8\u4ed8\u304d\u30b3\u30fc\u30c9\u3067\u3059\u3002</a>\n<p>\u3053\u306e\u30d5\u30a1\u30a4\u30eb\u306b\u306f\u3001Capsule Networks \u306e\u30b3\u30a2\u30e2\u30b8\u30e5\u30fc\u30eb\u306e\u5b9f\u88c5\u304c\u683c\u7d0d\u3055\u308c\u3066\u3044\u307e\u3059\u3002</p>\n<p><a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">Jindongwang/Pytorch-Capsulenet\u3092\u4f7f\u3063\u3066</a>\u3001\u8ad6\u6587\u306b\u95a2\u3059\u308b\u6df7\u4e71\u3092\u89e3\u6d88\u3057\u307e\u3057\u305f\u3002</p>\n<p>\u3053\u308c\u306f\u3001MNIST\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3067\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u305f\u3081\u306e\u30ce\u30fc\u30c8\u30d6\u30c3\u30af\u3067\u3059\u3002</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
+ "<h1>Capsule Networks</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of <a href=\"https://arxiv.org/abs/1710.09829\">Dynamic Routing Between Capsules</a>.</p>\n<p>Capsule network is a neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules.</p>\n<p>Unlike in other implementations of models, we&#x27;ve included a sample, because it is difficult to understand some concepts with just the modules. <a href=\"mnist.html\">This is the annotated code for a model that uses capsules to classify MNIST dataset</a></p>\n<p>This file holds the implementations of the core modules of Capsule Networks.</p>\n<p>I used <a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">jindongwang/Pytorch-CapsuleNet</a> to clarify some confusions I had with the paper.</p>\n<p>Here&#x27;s a notebook for training a Capsule Network on MNIST dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af</h1>\n<p><a href=\"https://arxiv.org/abs/1710.09829\">\u3053\u308c\u306f\u3001<a href=\"https://pytorch.org\">\u30ab\u30d7\u30bb\u30eb\u9593\u306e\u52d5\u7684\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u306ePyTorch\u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3067\u3059</a>\u3002</a></p>\n<p>\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306f\u3001\u30d5\u30a3\u30fc\u30c1\u30e3\u3092\u30ab\u30d7\u30bb\u30eb\u3068\u3057\u3066\u57cb\u3081\u8fbc\u307f\u3001\u6295\u7968\u30e1\u30ab\u30cb\u30ba\u30e0\u3092\u4f7f\u7528\u3057\u3066\u6b21\u306e\u30ab\u30d7\u30bb\u30eb\u5c64\u306b\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u3059\u308b\u30cb\u30e5\u30fc\u30e9\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u3067\u3059\u3002</p>\n<p>\u4ed6\u306e\u30e2\u30c7\u30eb\u306e\u5b9f\u88c5\u3068\u306f\u7570\u306a\u308a\u3001\u30e2\u30b8\u30e5\u30fc\u30eb\u3060\u3051\u3067\u306f\u4e00\u90e8\u306e\u6982\u5ff5\u3092\u7406\u89e3\u3059\u308b\u306e\u304c\u96e3\u3057\u3044\u305f\u3081\u3001\u30b5\u30f3\u30d7\u30eb\u3092\u7528\u610f\u3057\u3066\u3044\u307e\u3059\u3002</p><a href=\"mnist.html\">\u3053\u308c\u306f\u3001\u30ab\u30d7\u30bb\u30eb\u3092\u4f7f\u7528\u3057\u3066 MNIST \u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3092\u5206\u985e\u3059\u308b\u30e2\u30c7\u30eb\u306e\u6ce8\u91c8\u4ed8\u304d\u30b3\u30fc\u30c9\u3067\u3059\u3002</a>\n<p>\u3053\u306e\u30d5\u30a1\u30a4\u30eb\u306b\u306f\u3001Capsule Networks \u306e\u30b3\u30a2\u30e2\u30b8\u30e5\u30fc\u30eb\u306e\u5b9f\u88c5\u304c\u683c\u7d0d\u3055\u308c\u3066\u3044\u307e\u3059\u3002</p>\n<p><a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">Jindongwang/Pytorch-Capsulenet\u3092\u4f7f\u3063\u3066</a>\u3001\u8ad6\u6587\u306b\u95a2\u3059\u308b\u6df7\u4e71\u3092\u89e3\u6d88\u3057\u307e\u3057\u305f\u3002</p>\n<p>\u3053\u308c\u306f\u3001MNIST\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3067\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u305f\u3081\u306e\u30ce\u30fc\u30c8\u30d6\u30c3\u30af\u3067\u3059\u3002</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
 "<h2>Margin loss for class existence</h2>\n<p>A separate margin loss is used for each output capsule and the total loss is the sum of them. The length of each output capsule is the probability that class is present in the input.</p>\n<p>Loss for each output capsule or class <span translate=no>_^_0_^_</span> is, <span translate=no>_^_1_^_</span></p>\n<p><span translate=no>_^_2_^_</span> is <span translate=no>_^_3_^_</span> if the class <span translate=no>_^_4_^_</span> is present and <span translate=no>_^_5_^_</span> otherwise. The first component of the loss is <span translate=no>_^_6_^_</span> when the class is not present, and the second component is <span translate=no>_^_7_^_</span> if the class is present. The <span translate=no>_^_8_^_</span> is used to avoid predictions going to extremes. <span translate=no>_^_9_^_</span> is set to be <span translate=no>_^_10_^_</span> and <span translate=no>_^_11_^_</span> to be <span translate=no>_^_12_^_</span> in the paper.</p>\n<p>The <span translate=no>_^_13_^_</span> down-weighting is used to stop the length of all capsules from falling during the initial phase of training.</p>\n": "<h2>\u30af\u30e9\u30b9\u5b58\u5728\u306b\u3088\u308b\u30de\u30fc\u30b8\u30f3\u30ed\u30b9</h2>\n<p>\u51fa\u529b\u30ab\u30d7\u30bb\u30eb\u3054\u3068\u306b\u500b\u5225\u306e\u30de\u30fc\u30b8\u30f3\u30ed\u30b9\u304c\u4f7f\u7528\u3055\u308c\u3001\u5408\u8a08\u640d\u5931\u306f\u305d\u308c\u3089\u306e\u5408\u8a08\u306b\u306a\u308a\u307e\u3059\u3002\u5404\u51fa\u529b\u30ab\u30d7\u30bb\u30eb\u306e\u9577\u3055\u306f\u3001\u5165\u529b\u306b\u30af\u30e9\u30b9\u304c\u5b58\u5728\u3059\u308b\u78ba\u7387\u3067\u3059\u3002</p>\n<p><span translate=no>_^_0_^_</span>\u5404\u51fa\u529b\u30ab\u30d7\u30bb\u30eb\u307e\u305f\u306f\u30af\u30e9\u30b9\u306e\u640d\u5931\u306f\u3001<span translate=no>_^_1_^_</span></p>\n<p><span translate=no>_^_2_^_</span><span translate=no>_^_3_^_</span>\u30af\u30e9\u30b9\u304c\u5b58\u5728\u3059\u308b\u304b\u3069\u3046\u304b\u3001<span translate=no>_^_4_^_</span><span translate=no>_^_5_^_</span>\u305d\u3046\u3067\u306a\u3044\u5834\u5408\u3067\u3059\u3002<span translate=no>_^_6_^_</span>\u640d\u5931\u306e\u6700\u521d\u306e\u8981\u7d20\u306f\u30af\u30e9\u30b9\u304c\u5b58\u5728\u3057\u306a\u3044\u5834\u5408\u3067\u3001<span translate=no>_^_7_^_</span> 2\u756a\u76ee\u306e\u8981\u7d20\u306f\u30af\u30e9\u30b9\u304c\u5b58\u5728\u3059\u308b\u5834\u5408\u3067\u3059\u3002<span translate=no>_^_8_^_</span>\u4e88\u6e2c\u304c\u6975\u7aef\u306b\u306a\u308b\u306e\u3092\u9632\u3050\u305f\u3081\u306b\u4f7f\u7528\u3055\u308c\u307e\u3059\u3002<span translate=no>_^_9_^_</span><span translate=no>_^_10_^_</span><span translate=no>_^_11_^_</span><span translate=no>_^_12_^_</span>\u65b0\u805e\u306b\u63b2\u8f09\u3055\u308c\u308b\u4e88\u5b9a\u3067\u3001\u63b2\u8f09\u3055\u308c\u308b\u4e88\u5b9a\u3067\u3059\u3002</p>\n<p><span translate=no>_^_13_^_</span>\u30c0\u30a6\u30f3\u30a6\u30a8\u30a4\u30c8\u306f\u3001\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306e\u521d\u671f\u6bb5\u968e\u3067\u3059\u3079\u3066\u306e\u30ab\u30d7\u30bb\u30eb\u306e\u9577\u3055\u304c\u843d\u3061\u308b\u306e\u3092\u9632\u3050\u305f\u3081\u306b\u4f7f\u7528\u3055\u308c\u307e\u3059\u3002</p>\n",
 "<h2>Routing Algorithm</h2>\n<p>This is the routing mechanism described in the paper. You can use multiple routing layers in your models.</p>\n<p>This combines calculating <span translate=no>_^_0_^_</span> for this layer and the routing algorithm described in <em>Procedure 1</em>.</p>\n": "<h2>\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0</h2>\n<p>\u3053\u308c\u306f\u3001\u3053\u306e\u30db\u30ef\u30a4\u30c8\u30da\u30fc\u30d1\u30fc\u3067\u8aac\u660e\u3055\u308c\u3066\u3044\u308b\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u30e1\u30ab\u30cb\u30ba\u30e0\u3067\u3059\u3002\u30e2\u30c7\u30eb\u3067\u306f\u8907\u6570\u306e\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u30ec\u30a4\u30e4\u30fc\u3092\u4f7f\u7528\u3067\u304d\u307e\u3059\u3002</p>\n<p>\u3053\u308c\u306f\u3001<span translate=no>_^_0_^_</span><em>\u3053\u306e\u30ec\u30a4\u30e4\u30fc\u306e\u8a08\u7b97\u3068\u624b\u98061\u3067\u8aac\u660e\u3057\u305f\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u3092\u7d44\u307f\u5408\u308f\u305b\u305f\u3082\u306e\u3067\u3059</em>\u3002</p>\n",
 "<h2>Squash</h2>\n<p>This is <strong>squashing</strong> function from paper, given by equation <span translate=no>_^_0_^_</span>.</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p><span translate=no>_^_2_^_</span> normalizes the length of all the capsules, whilst <span translate=no>_^_3_^_</span> shrinks the capsules that have a length smaller than one .</p>\n": "<h2>\u30b9\u30ab\u30c3\u30b7\u30e5</h2>\n<p>\u3053\u308c\u306f\u3001<strong>\u65b9\u7a0b\u5f0f\u3067\u4e0e\u3048\u3089\u308c\u308b\u7d19\u304b\u3089\u306e\u62bc\u3057\u3064\u3076\u3057\u95a2\u6570\u3067\u3059</strong>\u3002<span translate=no>_^_0_^_</span></p>\n<p><span translate=no>_^_1_^_</span></p>\n<p><span translate=no>_^_2_^_</span>\u3059\u3079\u3066\u306e\u30ab\u30d7\u30bb\u30eb\u306e\u9577\u3055\u3092\u6b63\u898f\u5316\u3057\u3001\u9577\u3055\u304c 1 <span translate=no>_^_3_^_</span> \u3088\u308a\u77ed\u3044\u30ab\u30d7\u30bb\u30eb\u3092\u7e2e\u5c0f\u3057\u307e\u3059\u3002</p>\n",
--- a/translate_cache/capsule_networks/init.si.json
+++ b/translate_cache/capsule_networks/init.si.json
--- a/translate_cache/capsule_networks/init.zh.json
+++ b/translate_cache/capsule_networks/init.zh.json
@ -1,5 +1,5 @@
 {
- "<h1>Capsule Networks</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of <a href=\"https://papers.labml.ai/paper/1710.09829\">Dynamic Routing Between Capsules</a>.</p>\n<p>Capsule network is a neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules.</p>\n<p>Unlike in other implementations of models, we&#x27;ve included a sample, because it is difficult to understand some concepts with just the modules. <a href=\"mnist.html\">This is the annotated code for a model that uses capsules to classify MNIST dataset</a></p>\n<p>This file holds the implementations of the core modules of Capsule Networks.</p>\n<p>I used <a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">jindongwang/Pytorch-CapsuleNet</a> to clarify some confusions I had with the paper.</p>\n<p>Here&#x27;s a notebook for training a Capsule Network on MNIST dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>\u80f6\u56ca\u7f51\u7edc</h1>\n<p>\u8fd9\u662f<a href=\"https://papers.labml.ai/paper/1710.09829\">\u80f6\u56ca\u95f4\u52a8\u6001\u8def\u7531</a>\u7684 <a href=\"https://pytorch.org\">PyTorch</a> \u5b9e\u73b0/\u6559\u7a0b\u3002</p>\n<p>Capsule \u7f51\u7edc\u662f\u4e00\u79cd\u795e\u7ecf\u7f51\u7edc\u67b6\u6784\uff0c\u5b83\u4ee5\u80f6\u56ca\u7684\u5f62\u5f0f\u5d4c\u5165\u7279\u5f81\uff0c\u5e76\u901a\u8fc7\u6295\u7968\u673a\u5236\u5c06\u5b83\u4eec\u8def\u7531\u5230\u4e0b\u4e00\u5c42\u80f6\u56ca\u3002</p>\n<p>\u4e0e\u5176\u4ed6\u6a21\u578b\u5b9e\u73b0\u4e0d\u540c\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u793a\u4f8b\uff0c\u56e0\u4e3a\u4ec5\u4f7f\u7528\u6a21\u5757\u5f88\u96be\u7406\u89e3\u67d0\u4e9b\u6982\u5ff5\u3002<a href=\"mnist.html\">\u8fd9\u662f\u4f7f\u7528\u80f6\u56ca\u5bf9 MNIST \u6570\u636e\u96c6\u8fdb\u884c\u5206\u7c7b\u7684\u6a21\u578b\u7684\u5e26\u6ce8\u91ca\u7684\u4ee3\u7801</a></p>\n<p>\u8be5\u6587\u4ef6\u5305\u542b\u4e86 Capsule Networks \u6838\u5fc3\u6a21\u5757\u7684\u5b9e\u73b0\u3002</p>\n<p>\u6211\u7528 <a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">jindongwang/pytorch-CapsuleNet</a> \u6765\u6f84\u6e05\u6211\u5bf9\u8fd9\u7bc7\u8bba\u6587\u7684\u4e00\u4e9b\u56f0\u60d1\u3002</p>\n<p>\u8fd9\u662f\u4e00\u672c\u5728 MNIST \u6570\u636e\u96c6\u4e0a\u8bad\u7ec3 Capsule \u7f51\u7edc\u7684\u7b14\u8bb0\u672c\u3002</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
+ "<h1>Capsule Networks</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of <a href=\"https://arxiv.org/abs/1710.09829\">Dynamic Routing Between Capsules</a>.</p>\n<p>Capsule network is a neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules.</p>\n<p>Unlike in other implementations of models, we&#x27;ve included a sample, because it is difficult to understand some concepts with just the modules. <a href=\"mnist.html\">This is the annotated code for a model that uses capsules to classify MNIST dataset</a></p>\n<p>This file holds the implementations of the core modules of Capsule Networks.</p>\n<p>I used <a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">jindongwang/Pytorch-CapsuleNet</a> to clarify some confusions I had with the paper.</p>\n<p>Here&#x27;s a notebook for training a Capsule Network on MNIST dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>\u80f6\u56ca\u7f51\u7edc</h1>\n<p>\u8fd9\u662f<a href=\"https://arxiv.org/abs/1710.09829\">\u80f6\u56ca\u95f4\u52a8\u6001\u8def\u7531</a>\u7684 <a href=\"https://pytorch.org\">PyTorch</a> \u5b9e\u73b0/\u6559\u7a0b\u3002</p>\n<p>Capsule \u7f51\u7edc\u662f\u4e00\u79cd\u795e\u7ecf\u7f51\u7edc\u67b6\u6784\uff0c\u5b83\u4ee5\u80f6\u56ca\u7684\u5f62\u5f0f\u5d4c\u5165\u7279\u5f81\uff0c\u5e76\u901a\u8fc7\u6295\u7968\u673a\u5236\u5c06\u5b83\u4eec\u8def\u7531\u5230\u4e0b\u4e00\u5c42\u80f6\u56ca\u3002</p>\n<p>\u4e0e\u5176\u4ed6\u6a21\u578b\u5b9e\u73b0\u4e0d\u540c\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u793a\u4f8b\uff0c\u56e0\u4e3a\u4ec5\u4f7f\u7528\u6a21\u5757\u5f88\u96be\u7406\u89e3\u67d0\u4e9b\u6982\u5ff5\u3002<a href=\"mnist.html\">\u8fd9\u662f\u4f7f\u7528\u80f6\u56ca\u5bf9 MNIST \u6570\u636e\u96c6\u8fdb\u884c\u5206\u7c7b\u7684\u6a21\u578b\u7684\u5e26\u6ce8\u91ca\u7684\u4ee3\u7801</a></p>\n<p>\u8be5\u6587\u4ef6\u5305\u542b\u4e86 Capsule Networks \u6838\u5fc3\u6a21\u5757\u7684\u5b9e\u73b0\u3002</p>\n<p>\u6211\u7528 <a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">jindongwang/pytorch-CapsuleNet</a> \u6765\u6f84\u6e05\u6211\u5bf9\u8fd9\u7bc7\u8bba\u6587\u7684\u4e00\u4e9b\u56f0\u60d1\u3002</p>\n<p>\u8fd9\u662f\u4e00\u672c\u5728 MNIST \u6570\u636e\u96c6\u4e0a\u8bad\u7ec3 Capsule \u7f51\u7edc\u7684\u7b14\u8bb0\u672c\u3002</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
 "<h2>Margin loss for class existence</h2>\n<p>A separate margin loss is used for each output capsule and the total loss is the sum of them. The length of each output capsule is the probability that class is present in the input.</p>\n<p>Loss for each output capsule or class <span translate=no>_^_0_^_</span> is, <span translate=no>_^_1_^_</span></p>\n<p><span translate=no>_^_2_^_</span> is <span translate=no>_^_3_^_</span> if the class <span translate=no>_^_4_^_</span> is present and <span translate=no>_^_5_^_</span> otherwise. The first component of the loss is <span translate=no>_^_6_^_</span> when the class is not present, and the second component is <span translate=no>_^_7_^_</span> if the class is present. The <span translate=no>_^_8_^_</span> is used to avoid predictions going to extremes. <span translate=no>_^_9_^_</span> is set to be <span translate=no>_^_10_^_</span> and <span translate=no>_^_11_^_</span> to be <span translate=no>_^_12_^_</span> in the paper.</p>\n<p>The <span translate=no>_^_13_^_</span> down-weighting is used to stop the length of all capsules from falling during the initial phase of training.</p>\n": "<h2>\u9636\u7ea7\u5b58\u5728\u7684\u4fdd\u8bc1\u91d1\u635f\u5931</h2>\n<p>\u6bcf\u4e2a\u8f93\u51fa\u80f6\u56ca\u4f7f\u7528\u5355\u72ec\u7684\u4fdd\u8bc1\u91d1\u635f\u5931\uff0c\u603b\u4e8f\u635f\u662f\u5b83\u4eec\u7684\u603b\u548c\u3002\u6bcf\u4e2a\u8f93\u51fa\u80f6\u56ca\u7684\u957f\u5ea6\u662f\u8f93\u5165\u4e2d\u5b58\u5728\u7c7b\u7684\u6982\u7387\u3002</p>\n<p>\u6bcf\u4e2a\u8f93\u51fa\u80f6\u56ca\u6216\u7c7b\u7684\u635f\u5931<span translate=no>_^_0_^_</span>\u4e3a\uff0c<span translate=no>_^_1_^_</span></p>\n<p><span translate=no>_^_2_^_</span><span translate=no>_^_4_^_</span>\u662f\u7c7b<span translate=no>_^_3_^_</span>\u662f\u5426\u5b58\u5728\uff0c<span translate=no>_^_5_^_</span>\u5426\u5219\u3002\u635f\u5931\u7684\u7b2c\u4e00\u4e2a\u7ec4\u6210\u90e8\u5206\u662f<span translate=no>_^_6_^_</span>\u5f53\u7c7b\u4e0d\u5b58\u5728\u65f6\uff0c\u7b2c\u4e8c\u4e2a\u7ec4\u6210\u90e8\u5206\u662f\u7c7b<span translate=no>_^_7_^_</span>\u662f\u5426\u5b58\u5728\u3002<span translate=no>_^_8_^_</span>\u7528\u4e8e\u907f\u514d\u9884\u6d4b\u8d70\u5411\u6781\u7aef\u3002<span translate=no>_^_9_^_</span>\u88ab\u8bbe\u7f6e<span translate=no>_^_11_^_</span>\u4e3a<span translate=no>_^_10_^_</span>\u548c\u5c06\u5728<span translate=no>_^_12_^_</span>\u62a5\u7eb8\u4e0a\u3002</p>\n<p>\u5728\u8bad\u7ec3<span translate=no>_^_13_^_</span>\u7684\u521d\u59cb\u9636\u6bb5\uff0c\u51cf\u91cd\u7528\u4e8e\u9632\u6b62\u6240\u6709\u80f6\u56ca\u7684\u957f\u5ea6\u6389\u843d\u3002</p>\n",
 "<h2>Routing Algorithm</h2>\n<p>This is the routing mechanism described in the paper. You can use multiple routing layers in your models.</p>\n<p>This combines calculating <span translate=no>_^_0_^_</span> for this layer and the routing algorithm described in <em>Procedure 1</em>.</p>\n": "<h2>\u8def\u7531\u7b97\u6cd5</h2>\n<p>\u8fd9\u662f\u767d\u76ae\u4e66\u4e2d\u63cf\u8ff0\u7684\u8def\u7531\u673a\u5236\u3002\u53ef\u4ee5\u5728\u6a21\u578b\u4e2d\u4f7f\u7528\u591a\u4e2a\u5e03\u7ebf\u5c42\u3002</p>\n<p>\u8fd9\u7ed3\u5408\u4e86\u6b64\u5c42<span translate=no>_^_0_^_</span>\u7684\u8ba1\u7b97\u548c<em>\u8fc7\u7a0b 1</em> \u4e2d\u63cf\u8ff0\u7684\u8def\u7531\u7b97\u6cd5\u3002</p>\n",
 "<h2>Squash</h2>\n<p>This is <strong>squashing</strong> function from paper, given by equation <span translate=no>_^_0_^_</span>.</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p><span translate=no>_^_2_^_</span> normalizes the length of all the capsules, whilst <span translate=no>_^_3_^_</span> shrinks the capsules that have a length smaller than one .</p>\n": "<h2>\u58c1\u7403</h2>\n<p>\u8fd9\u662f\u6765\u81ea\u7eb8\u5f20\u7684<strong>\u6324\u538b</strong>\u51fd\u6570\uff0c\u7531\u65b9\u7a0b\u7ed9\u51fa<span translate=no>_^_0_^_</span>\u3002</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p><span translate=no>_^_2_^_</span>\u6807\u51c6\u5316\u6240\u6709\u80f6\u56ca\u7684\u957f\u5ea6\uff0c\u540c\u65f6<span translate=no>_^_3_^_</span>\u7f29\u5c0f\u957f\u5ea6\u5c0f\u4e8e\u4e00\u4e2a\u7684\u80f6\u56ca\u3002</p>\n",
--- a/translate_cache/capsule_networks/mnist.ja.json
+++ b/translate_cache/capsule_networks/mnist.ja.json
@ -1,5 +1,5 @@
 {
- "<h1>Classify MNIST digits with Capsule Networks</h1>\n<p>This is an annotated PyTorch code to classify MNIST digits with PyTorch.</p>\n<p>This paper implements the experiment described in paper <a href=\"https://papers.labml.ai/paper/1710.09829\">Dynamic Routing Between Capsules</a>.</p>\n": "<h1>\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306b\u3088\u308b MNIST \u30c7\u30a3\u30b8\u30c3\u30c8\u306e\u5206\u985e</h1>\n<p>\u3053\u308c\u306f\u3001MNIST\u306e\u6570\u5b57\u3092PyTorch\u3067\u5206\u985e\u3059\u308b\u305f\u3081\u306e\u30a2\u30ce\u30c6\u30fc\u30b7\u30e7\u30f3\u4ed8\u304d\u306ePyTorch\u30b3\u30fc\u30c9\u3067\u3059\u3002</p>\n<p>\u3053\u306e\u8ad6\u6587\u3067\u306f\u3001\u8ad6\u6587\u300c<a href=\"https://papers.labml.ai/paper/1710.09829\">\u30ab\u30d7\u30bb\u30eb\u9593\u306e\u52d5\u7684\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0</a>\u300d\u3067\u8aac\u660e\u3055\u308c\u3066\u3044\u308b\u5b9f\u9a13\u3092\u5b9f\u88c5\u3057\u3066\u3044\u307e\u3059\u3002</p>\n",
+ "<h1>Classify MNIST digits with Capsule Networks</h1>\n<p>This is an annotated PyTorch code to classify MNIST digits with PyTorch.</p>\n<p>This paper implements the experiment described in paper <a href=\"https://arxiv.org/abs/1710.09829\">Dynamic Routing Between Capsules</a>.</p>\n": "<h1>\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306b\u3088\u308b MNIST \u30c7\u30a3\u30b8\u30c3\u30c8\u306e\u5206\u985e</h1>\n<p>\u3053\u308c\u306f\u3001MNIST\u306e\u6570\u5b57\u3092PyTorch\u3067\u5206\u985e\u3059\u308b\u305f\u3081\u306e\u30a2\u30ce\u30c6\u30fc\u30b7\u30e7\u30f3\u4ed8\u304d\u306ePyTorch\u30b3\u30fc\u30c9\u3067\u3059\u3002</p>\n<p>\u3053\u306e\u8ad6\u6587\u3067\u306f\u3001\u8ad6\u6587\u300c<a href=\"https://arxiv.org/abs/1710.09829\">\u30ab\u30d7\u30bb\u30eb\u9593\u306e\u52d5\u7684\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0</a>\u300d\u3067\u8aac\u660e\u3055\u308c\u3066\u3044\u308b\u5b9f\u9a13\u3092\u5b9f\u88c5\u3057\u3066\u3044\u307e\u3059\u3002</p>\n",
 "<h2>Model for classifying MNIST digits</h2>\n": "<h2>MNIST \u30c7\u30a3\u30b8\u30c3\u30c8\u3092\u5206\u985e\u3059\u308b\u305f\u3081\u306e\u30e2\u30c7\u30eb</h2>\n",
 "<p> <span translate=no>_^_0_^_</span> are the MNIST images, with shape <span translate=no>_^_1_^_</span></p>\n": "<p><span translate=no>_^_0_^_</span>MNIST \u306e\u753b\u50cf\u306f\u5f62\u72b6\u4ed8\u304d\u3067\u3059 <span translate=no>_^_1_^_</span></p>\n",
 "<p> Configurations with MNIST data and Train &amp; Validation setup</p>\n": "<p>MNIST\u30c7\u30fc\u30bf\u3068\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3068\u691c\u8a3c\u306e\u30bb\u30c3\u30c8\u30a2\u30c3\u30d7\u3092\u542b\u3080\u69cb\u6210</p>\n",
--- a/translate_cache/capsule_networks/mnist.si.json
+++ b/translate_cache/capsule_networks/mnist.si.json
@ -1,5 +1,5 @@
 {
- "<h1>Classify MNIST digits with Capsule Networks</h1>\n<p>This is an annotated PyTorch code to classify MNIST digits with PyTorch.</p>\n<p>This paper implements the experiment described in paper <a href=\"https://papers.labml.ai/paper/1710.09829\">Dynamic Routing Between Capsules</a>.</p>\n": "<h1>\u0d9a\u0dd0\u0db4\u0dca\u0dc3\u0dd2\u0dba\u0dd4\u0dbd\u0da2\u0dcf\u0dbd \u0dc3\u0db8\u0d9f MNIST \u0d89\u0dbd\u0d9a\u0dca\u0d9a\u0db8\u0dca \u0dc0\u0dbb\u0dca\u0d9c\u0dd3\u0d9a\u0dbb\u0dab\u0dba \u0d9a\u0dbb\u0db1\u0dca\u0db1</h1>\n<p>\u0db8\u0dd9\u0dbaPyTorch \u0dc3\u0db8\u0d9f MNIST \u0d89\u0dbd\u0d9a\u0dca\u0d9a\u0db8\u0dca \u0dc0\u0dbb\u0dca\u0d9c\u0dd3\u0d9a\u0dbb\u0dab\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0dc0\u0dd2\u0db1\u0dd3\u0dad \u0db4\u0dba\u0dd2\u0da7\u0ddd\u0da0\u0dca \u0d9a\u0dda\u0dad\u0dba\u0d9a\u0dd2. </p>\n<p>\u0db8\u0dd9\u0db8\u0dbd\u0dd2\u0db4\u0dd2\u0dba \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dd2\u0dc3\u0dca\u0dad\u0dbb \u0d9a\u0dbb \u0d87\u0dad\u0dd2 \u0d85\u0dad\u0dca\u0dc4\u0daf\u0dcf \u0db6\u0dd0\u0dbd\u0dd3\u0db8 \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dbb\u0dba\u0dd2 <a href=\"https://papers.labml.ai/paper/1710.09829\">\u0da9\u0dba\u0dd2\u0db1\u0db8\u0dd2\u0d9a\u0dca \u0dbb\u0dc0\u0dd4\u0da7\u0dd2\u0db1\u0dca \u0d9a\u0dd0\u0db4\u0dca\u0dc3\u0dd2\u0dba\u0dd4\u0dbd \u0d85\u0dad\u0dbb</a>. </p>\n",
+ "<h1>Classify MNIST digits with Capsule Networks</h1>\n<p>This is an annotated PyTorch code to classify MNIST digits with PyTorch.</p>\n<p>This paper implements the experiment described in paper <a href=\"https://arxiv.org/abs/1710.09829\">Dynamic Routing Between Capsules</a>.</p>\n": "<h1>\u0d9a\u0dd0\u0db4\u0dca\u0dc3\u0dd2\u0dba\u0dd4\u0dbd\u0da2\u0dcf\u0dbd \u0dc3\u0db8\u0d9f MNIST \u0d89\u0dbd\u0d9a\u0dca\u0d9a\u0db8\u0dca \u0dc0\u0dbb\u0dca\u0d9c\u0dd3\u0d9a\u0dbb\u0dab\u0dba \u0d9a\u0dbb\u0db1\u0dca\u0db1</h1>\n<p>\u0db8\u0dd9\u0dbaPyTorch \u0dc3\u0db8\u0d9f MNIST \u0d89\u0dbd\u0d9a\u0dca\u0d9a\u0db8\u0dca \u0dc0\u0dbb\u0dca\u0d9c\u0dd3\u0d9a\u0dbb\u0dab\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0dc0\u0dd2\u0db1\u0dd3\u0dad \u0db4\u0dba\u0dd2\u0da7\u0ddd\u0da0\u0dca \u0d9a\u0dda\u0dad\u0dba\u0d9a\u0dd2. </p>\n<p>\u0db8\u0dd9\u0db8\u0dbd\u0dd2\u0db4\u0dd2\u0dba \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dd2\u0dc3\u0dca\u0dad\u0dbb \u0d9a\u0dbb \u0d87\u0dad\u0dd2 \u0d85\u0dad\u0dca\u0dc4\u0daf\u0dcf \u0db6\u0dd0\u0dbd\u0dd3\u0db8 \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dbb\u0dba\u0dd2 <a href=\"https://arxiv.org/abs/1710.09829\">\u0da9\u0dba\u0dd2\u0db1\u0db8\u0dd2\u0d9a\u0dca \u0dbb\u0dc0\u0dd4\u0da7\u0dd2\u0db1\u0dca \u0d9a\u0dd0\u0db4\u0dca\u0dc3\u0dd2\u0dba\u0dd4\u0dbd \u0d85\u0dad\u0dbb</a>. </p>\n",
 "<h2>Model for classifying MNIST digits</h2>\n": "<h2>MNIST\u0d89\u0dbd\u0d9a\u0dca\u0d9a\u0db8\u0dca \u0dc0\u0dbb\u0dca\u0d9c\u0dd3\u0d9a\u0dbb\u0dab\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dda \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba</h2>\n",
 "<p> <span translate=no>_^_0_^_</span> are the MNIST images, with shape <span translate=no>_^_1_^_</span></p>\n": "<p> <span translate=no>_^_0_^_</span> \u0dc4\u0dd0\u0da9\u0dba \u0dc3\u0dc4\u0dd2\u0dad MNIST \u0dbb\u0dd6\u0db4 <span translate=no>_^_1_^_</span></p>\n",
 "<p> Configurations with MNIST data and Train &amp; Validation setup</p>\n": "<p> MNIST\u0daf\u0dad\u0dca\u0dad \u0dc3\u0dc4 \u0daf\u0dd4\u0db8\u0dca\u0dbb\u0dd2\u0dba \u0dc3\u0dc4 \u0dc0\u0dbd\u0d82\u0d9c\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dda \u0dc3\u0dd0\u0d9a\u0dc3\u0dd4\u0db8 \u0dc3\u0db8\u0d9f \u0dc0\u0dd2\u0db1\u0dca\u0dba\u0dcf\u0dc3 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dca</p>\n",
--- a/translate_cache/capsule_networks/mnist.zh.json
+++ b/translate_cache/capsule_networks/mnist.zh.json
@ -1,5 +1,5 @@
 {
- "<h1>Classify MNIST digits with Capsule Networks</h1>\n<p>This is an annotated PyTorch code to classify MNIST digits with PyTorch.</p>\n<p>This paper implements the experiment described in paper <a href=\"https://papers.labml.ai/paper/1710.09829\">Dynamic Routing Between Capsules</a>.</p>\n": "<h1>\u4f7f\u7528\u80f6\u56ca\u7f51\u7edc\u5bf9 MNIST \u6570\u5b57\u8fdb\u884c\u5206\u7c7b</h1>\n<p>\u8fd9\u662f\u4e00\u4e2a\u5e26\u6ce8\u91ca\u7684 PyTorch \u4ee3\u7801\uff0c\u7528\u4e8e\u4f7f\u7528 PyTorch \u5bf9 MNIST \u6570\u5b57\u8fdb\u884c\u5206\u7c7b\u3002</p>\n<p>\u672c\u6587\u5b9e\u65bd\u4e86\u8bba\u6587\u300a<a href=\"https://papers.labml.ai/paper/1710.09829\">\u80f6\u56ca\u95f4\u52a8\u6001\u8def\u7531</a>\u300b\u4e2d\u63cf\u8ff0\u7684\u5b9e\u9a8c\u3002</p>\n",
+ "<h1>Classify MNIST digits with Capsule Networks</h1>\n<p>This is an annotated PyTorch code to classify MNIST digits with PyTorch.</p>\n<p>This paper implements the experiment described in paper <a href=\"https://arxiv.org/abs/1710.09829\">Dynamic Routing Between Capsules</a>.</p>\n": "<h1>\u4f7f\u7528\u80f6\u56ca\u7f51\u7edc\u5bf9 MNIST \u6570\u5b57\u8fdb\u884c\u5206\u7c7b</h1>\n<p>\u8fd9\u662f\u4e00\u4e2a\u5e26\u6ce8\u91ca\u7684 PyTorch \u4ee3\u7801\uff0c\u7528\u4e8e\u4f7f\u7528 PyTorch \u5bf9 MNIST \u6570\u5b57\u8fdb\u884c\u5206\u7c7b\u3002</p>\n<p>\u672c\u6587\u5b9e\u65bd\u4e86\u8bba\u6587\u300a<a href=\"https://arxiv.org/abs/1710.09829\">\u80f6\u56ca\u95f4\u52a8\u6001\u8def\u7531</a>\u300b\u4e2d\u63cf\u8ff0\u7684\u5b9e\u9a8c\u3002</p>\n",
 "<h2>Model for classifying MNIST digits</h2>\n": "<h2>\u7528\u4e8e\u5bf9 MNIST \u6570\u5b57\u8fdb\u884c\u5206\u7c7b\u7684\u6a21\u578b</h2>\n",
 "<p> <span translate=no>_^_0_^_</span> are the MNIST images, with shape <span translate=no>_^_1_^_</span></p>\n": "<p><span translate=no>_^_0_^_</span>\u662f MNIST \u56fe\u50cf\uff0c\u6709\u5f62\u72b6<span translate=no>_^_1_^_</span></p>\n",
 "<p> Configurations with MNIST data and Train &amp; Validation setup</p>\n": "<p>\u4f7f\u7528 MNIST \u6570\u636e\u548c\u8bad\u7ec3\u4e0e\u9a8c\u8bc1\u8bbe\u7f6e\u7684\u914d\u7f6e</p>\n",
--- a/translate_cache/capsule_networks/readme.ja.json
+++ b/translate_cache/capsule_networks/readme.ja.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/capsule_networks/index.html\">Capsule Networks</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of <a href=\"https://papers.labml.ai/paper/1710.09829\">Dynamic Routing Between Capsules</a>.</p>\n<p>Capsule network is a neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules.</p>\n<p>Unlike in other implementations of models, we&#x27;ve included a sample, because it is difficult to understand some concepts with just the modules. <a href=\"mnist.html\">This is the annotated code for a model that uses capsules to classify MNIST dataset</a></p>\n<p>This file holds the implementations of the core modules of Capsule Networks.</p>\n<p>I used <a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">jindongwang/Pytorch-CapsuleNet</a> to clarify some confusions I had with the paper.</p>\n<p>Here&#x27;s a notebook for training a Capsule Network on MNIST dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a> </p>\n": "<h1><a href=\"https://nn.labml.ai/capsule_networks/index.html\">\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af</a></h1>\n<p><a href=\"https://papers.labml.ai/paper/1710.09829\">\u3053\u308c\u306f\u3001<a href=\"https://pytorch.org\">\u30ab\u30d7\u30bb\u30eb\u9593\u306e\u52d5\u7684\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u306ePyTorch\u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3067\u3059</a>\u3002</a></p>\n<p>\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306f\u3001\u30d5\u30a3\u30fc\u30c1\u30e3\u3092\u30ab\u30d7\u30bb\u30eb\u3068\u3057\u3066\u57cb\u3081\u8fbc\u307f\u3001\u6295\u7968\u30e1\u30ab\u30cb\u30ba\u30e0\u3092\u4f7f\u7528\u3057\u3066\u6b21\u306e\u30ab\u30d7\u30bb\u30eb\u5c64\u306b\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u3059\u308b\u30cb\u30e5\u30fc\u30e9\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u3067\u3059\u3002</p>\n<p>\u4ed6\u306e\u30e2\u30c7\u30eb\u306e\u5b9f\u88c5\u3068\u306f\u7570\u306a\u308a\u3001\u30e2\u30b8\u30e5\u30fc\u30eb\u3060\u3051\u3067\u306f\u4e00\u90e8\u306e\u6982\u5ff5\u3092\u7406\u89e3\u3059\u308b\u306e\u304c\u96e3\u3057\u3044\u305f\u3081\u3001\u30b5\u30f3\u30d7\u30eb\u3092\u7528\u610f\u3057\u3066\u3044\u307e\u3059\u3002</p><a href=\"mnist.html\">\u3053\u308c\u306f\u3001\u30ab\u30d7\u30bb\u30eb\u3092\u4f7f\u7528\u3057\u3066 MNIST \u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3092\u5206\u985e\u3059\u308b\u30e2\u30c7\u30eb\u306e\u6ce8\u91c8\u4ed8\u304d\u30b3\u30fc\u30c9\u3067\u3059\u3002</a>\n<p>\u3053\u306e\u30d5\u30a1\u30a4\u30eb\u306b\u306f\u3001Capsule Networks \u306e\u30b3\u30a2\u30e2\u30b8\u30e5\u30fc\u30eb\u306e\u5b9f\u88c5\u304c\u683c\u7d0d\u3055\u308c\u3066\u3044\u307e\u3059\u3002</p>\n<p><a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">Jindongwang/Pytorch-Capsulenet\u3092\u4f7f\u3063\u3066</a>\u3001\u8ad6\u6587\u306b\u95a2\u3059\u308b\u6df7\u4e71\u3092\u89e3\u6d88\u3057\u307e\u3057\u305f\u3002</p>\n<p>\u3053\u308c\u306f\u3001MNIST\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3067\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u305f\u3081\u306e\u30ce\u30fc\u30c8\u30d6\u30c3\u30af\u3067\u3059\u3002</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
+ "<h1><a href=\"https://nn.labml.ai/capsule_networks/index.html\">Capsule Networks</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of <a href=\"https://arxiv.org/abs/1710.09829\">Dynamic Routing Between Capsules</a>.</p>\n<p>Capsule network is a neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules.</p>\n<p>Unlike in other implementations of models, we&#x27;ve included a sample, because it is difficult to understand some concepts with just the modules. <a href=\"mnist.html\">This is the annotated code for a model that uses capsules to classify MNIST dataset</a></p>\n<p>This file holds the implementations of the core modules of Capsule Networks.</p>\n<p>I used <a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">jindongwang/Pytorch-CapsuleNet</a> to clarify some confusions I had with the paper.</p>\n<p>Here&#x27;s a notebook for training a Capsule Network on MNIST dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a> </p>\n": "<h1><a href=\"https://nn.labml.ai/capsule_networks/index.html\">\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af</a></h1>\n<p><a href=\"https://arxiv.org/abs/1710.09829\">\u3053\u308c\u306f\u3001<a href=\"https://pytorch.org\">\u30ab\u30d7\u30bb\u30eb\u9593\u306e\u52d5\u7684\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u306ePyTorch\u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3067\u3059</a>\u3002</a></p>\n<p>\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306f\u3001\u30d5\u30a3\u30fc\u30c1\u30e3\u3092\u30ab\u30d7\u30bb\u30eb\u3068\u3057\u3066\u57cb\u3081\u8fbc\u307f\u3001\u6295\u7968\u30e1\u30ab\u30cb\u30ba\u30e0\u3092\u4f7f\u7528\u3057\u3066\u6b21\u306e\u30ab\u30d7\u30bb\u30eb\u5c64\u306b\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u3059\u308b\u30cb\u30e5\u30fc\u30e9\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u3067\u3059\u3002</p>\n<p>\u4ed6\u306e\u30e2\u30c7\u30eb\u306e\u5b9f\u88c5\u3068\u306f\u7570\u306a\u308a\u3001\u30e2\u30b8\u30e5\u30fc\u30eb\u3060\u3051\u3067\u306f\u4e00\u90e8\u306e\u6982\u5ff5\u3092\u7406\u89e3\u3059\u308b\u306e\u304c\u96e3\u3057\u3044\u305f\u3081\u3001\u30b5\u30f3\u30d7\u30eb\u3092\u7528\u610f\u3057\u3066\u3044\u307e\u3059\u3002</p><a href=\"mnist.html\">\u3053\u308c\u306f\u3001\u30ab\u30d7\u30bb\u30eb\u3092\u4f7f\u7528\u3057\u3066 MNIST \u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3092\u5206\u985e\u3059\u308b\u30e2\u30c7\u30eb\u306e\u6ce8\u91c8\u4ed8\u304d\u30b3\u30fc\u30c9\u3067\u3059\u3002</a>\n<p>\u3053\u306e\u30d5\u30a1\u30a4\u30eb\u306b\u306f\u3001Capsule Networks \u306e\u30b3\u30a2\u30e2\u30b8\u30e5\u30fc\u30eb\u306e\u5b9f\u88c5\u304c\u683c\u7d0d\u3055\u308c\u3066\u3044\u307e\u3059\u3002</p>\n<p><a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">Jindongwang/Pytorch-Capsulenet\u3092\u4f7f\u3063\u3066</a>\u3001\u8ad6\u6587\u306b\u95a2\u3059\u308b\u6df7\u4e71\u3092\u89e3\u6d88\u3057\u307e\u3057\u305f\u3002</p>\n<p>\u3053\u308c\u306f\u3001MNIST\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3067\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u305f\u3081\u306e\u30ce\u30fc\u30c8\u30d6\u30c3\u30af\u3067\u3059\u3002</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
 "Capsule Networks": "\u30ab\u30d7\u30bb\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af"
 }
--- a/translate_cache/capsule_networks/readme.si.json
+++ b/translate_cache/capsule_networks/readme.si.json
--- a/translate_cache/capsule_networks/readme.zh.json
+++ b/translate_cache/capsule_networks/readme.zh.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/capsule_networks/index.html\">Capsule Networks</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of <a href=\"https://papers.labml.ai/paper/1710.09829\">Dynamic Routing Between Capsules</a>.</p>\n<p>Capsule network is a neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules.</p>\n<p>Unlike in other implementations of models, we&#x27;ve included a sample, because it is difficult to understand some concepts with just the modules. <a href=\"mnist.html\">This is the annotated code for a model that uses capsules to classify MNIST dataset</a></p>\n<p>This file holds the implementations of the core modules of Capsule Networks.</p>\n<p>I used <a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">jindongwang/Pytorch-CapsuleNet</a> to clarify some confusions I had with the paper.</p>\n<p>Here&#x27;s a notebook for training a Capsule Network on MNIST dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a> </p>\n": "<h1><a href=\"https://nn.labml.ai/capsule_networks/index.html\">\u80f6\u56ca\u7f51\u7edc</a></h1>\n<p>\u8fd9\u662f<a href=\"https://papers.labml.ai/paper/1710.09829\">\u80f6\u56ca\u95f4\u52a8\u6001\u8def\u7531</a>\u7684 <a href=\"https://pytorch.org\">PyTorch</a> \u5b9e\u73b0/\u6559\u7a0b\u3002</p>\n<p>Capsule \u7f51\u7edc\u662f\u4e00\u79cd\u795e\u7ecf\u7f51\u7edc\u67b6\u6784\uff0c\u5b83\u4ee5\u80f6\u56ca\u7684\u5f62\u5f0f\u5d4c\u5165\u7279\u5f81\uff0c\u5e76\u901a\u8fc7\u6295\u7968\u673a\u5236\u5c06\u5b83\u4eec\u8def\u7531\u5230\u4e0b\u4e00\u5c42\u80f6\u56ca\u3002</p>\n<p>\u4e0e\u5176\u4ed6\u6a21\u578b\u5b9e\u73b0\u4e0d\u540c\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u793a\u4f8b\uff0c\u56e0\u4e3a\u4ec5\u4f7f\u7528\u6a21\u5757\u5f88\u96be\u7406\u89e3\u67d0\u4e9b\u6982\u5ff5\u3002<a href=\"mnist.html\">\u8fd9\u662f\u4f7f\u7528\u80f6\u56ca\u5bf9 MNIST \u6570\u636e\u96c6\u8fdb\u884c\u5206\u7c7b\u7684\u6a21\u578b\u7684\u5e26\u6ce8\u91ca\u7684\u4ee3\u7801</a></p>\n<p>\u8be5\u6587\u4ef6\u5305\u542b\u4e86 Capsule Networks \u6838\u5fc3\u6a21\u5757\u7684\u5b9e\u73b0\u3002</p>\n<p>\u6211\u7528 <a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">jindongwang/pytorch-CapsuleNet</a> \u6765\u6f84\u6e05\u6211\u5bf9\u8fd9\u7bc7\u8bba\u6587\u7684\u4e00\u4e9b\u56f0\u60d1\u3002</p>\n<p>\u8fd9\u662f\u4e00\u672c\u5728 MNIST \u6570\u636e\u96c6\u4e0a\u8bad\u7ec3 Capsule \u7f51\u7edc\u7684\u7b14\u8bb0\u672c\u3002</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
+ "<h1><a href=\"https://nn.labml.ai/capsule_networks/index.html\">Capsule Networks</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of <a href=\"https://arxiv.org/abs/1710.09829\">Dynamic Routing Between Capsules</a>.</p>\n<p>Capsule network is a neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules.</p>\n<p>Unlike in other implementations of models, we&#x27;ve included a sample, because it is difficult to understand some concepts with just the modules. <a href=\"mnist.html\">This is the annotated code for a model that uses capsules to classify MNIST dataset</a></p>\n<p>This file holds the implementations of the core modules of Capsule Networks.</p>\n<p>I used <a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">jindongwang/Pytorch-CapsuleNet</a> to clarify some confusions I had with the paper.</p>\n<p>Here&#x27;s a notebook for training a Capsule Network on MNIST dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a> </p>\n": "<h1><a href=\"https://nn.labml.ai/capsule_networks/index.html\">\u80f6\u56ca\u7f51\u7edc</a></h1>\n<p>\u8fd9\u662f<a href=\"https://arxiv.org/abs/1710.09829\">\u80f6\u56ca\u95f4\u52a8\u6001\u8def\u7531</a>\u7684 <a href=\"https://pytorch.org\">PyTorch</a> \u5b9e\u73b0/\u6559\u7a0b\u3002</p>\n<p>Capsule \u7f51\u7edc\u662f\u4e00\u79cd\u795e\u7ecf\u7f51\u7edc\u67b6\u6784\uff0c\u5b83\u4ee5\u80f6\u56ca\u7684\u5f62\u5f0f\u5d4c\u5165\u7279\u5f81\uff0c\u5e76\u901a\u8fc7\u6295\u7968\u673a\u5236\u5c06\u5b83\u4eec\u8def\u7531\u5230\u4e0b\u4e00\u5c42\u80f6\u56ca\u3002</p>\n<p>\u4e0e\u5176\u4ed6\u6a21\u578b\u5b9e\u73b0\u4e0d\u540c\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u793a\u4f8b\uff0c\u56e0\u4e3a\u4ec5\u4f7f\u7528\u6a21\u5757\u5f88\u96be\u7406\u89e3\u67d0\u4e9b\u6982\u5ff5\u3002<a href=\"mnist.html\">\u8fd9\u662f\u4f7f\u7528\u80f6\u56ca\u5bf9 MNIST \u6570\u636e\u96c6\u8fdb\u884c\u5206\u7c7b\u7684\u6a21\u578b\u7684\u5e26\u6ce8\u91ca\u7684\u4ee3\u7801</a></p>\n<p>\u8be5\u6587\u4ef6\u5305\u542b\u4e86 Capsule Networks \u6838\u5fc3\u6a21\u5757\u7684\u5b9e\u73b0\u3002</p>\n<p>\u6211\u7528 <a href=\"https://github.com/jindongwang/Pytorch-CapsuleNet\">jindongwang/pytorch-CapsuleNet</a> \u6765\u6f84\u6e05\u6211\u5bf9\u8fd9\u7bc7\u8bba\u6587\u7684\u4e00\u4e9b\u56f0\u60d1\u3002</p>\n<p>\u8fd9\u662f\u4e00\u672c\u5728 MNIST \u6570\u636e\u96c6\u4e0a\u8bad\u7ec3 Capsule \u7f51\u7edc\u7684\u7b14\u8bb0\u672c\u3002</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
 "Capsule Networks": "\u80f6\u56ca\u7f51\u7edc"
 }
--- a/translate_cache/conv_mixer/init.ja.json
+++ b/translate_cache/conv_mixer/init.ja.json
@ -1,5 +1,5 @@
 {
- "<h1>Patches Are All You Need? (ConvMixer)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://papers.labml.ai/paper/2201.09792\">Patches Are All You Need?</a>.</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p>ConvMixer is Similar to <a href=\"../transformers/mlp_mixer/index.html\">MLP-Mixer</a>. MLP-Mixer separates mixing of spatial and channel dimensions, by applying an MLP across spatial dimension and then an MLP across the channel dimension (spatial MLP replaces the <a href=\"../transformers/vit/index.html\">ViT</a> attention and channel MLP is the <a href=\"../transformers/feed_forward.html\">FFN</a> of ViT).</p>\n<p>ConvMixer uses a <span translate=no>_^_1_^_</span> convolution for channel mixing and a depth-wise convolution for spatial mixing. Since it&#x27;s a convolution instead of a full MLP across the space, it mixes only the nearby batches in contrast to ViT or MLP-Mixer. Also, the MLP-mixer uses MLPs of two layers for each mixing and ConvMixer uses a single layer for each mixing.</p>\n<p>The paper recommends removing the residual connection across the channel mixing (point-wise convolution) and having only a residual connection over the spatial mixing (depth-wise convolution). They also use <a href=\"../normalization/batch_norm/index.html\">Batch normalization</a> instead of <a href=\"../normalization/layer_norm/index.html\">Layer normalization</a>.</p>\n<p>Here&#x27;s <a href=\"experiment.html\">an experiment</a> that trains ConvMixer on CIFAR-10.</p>\n": "<h1>\u5fc5\u8981\u306a\u306e\u306f\u30d1\u30c3\u30c1\u3060\u3051\uff1f(\u30b3\u30f3\u30d0\u30fc\u30b8\u30e7\u30f3\u30df\u30ad\u30b5\u30fc</h1>)\n<p><a href=\"https://pytorch.org\">\u3053\u308c\u306f\u7d19\u306e\u30d1\u30c3\u30c1\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a><a href=\"https://papers.labml.ai/paper/2201.09792\">\u3002\u5fc5\u8981\u306a\u306e\u306f\u30d1\u30c3\u30c1\u3060\u3051\u3067\u3059\u304b</a>\uff1f</p>\u3002\n<p><span translate=no>_^_0_^_</span></p>\n<p><a href=\"../transformers/mlp_mixer/index.html\">ConvMixer\u306fMLP\u30df\u30ad\u30b5\u30fc\u306b\u4f3c\u3066\u3044\u307e\u3059\u3002</a></p><a href=\"../transformers/feed_forward.html\">MLP-Mixer\u306f\u3001\u7a7a\u9593\u6b21\u5143\u5168\u4f53\u306bMLP\u3092\u9069\u7528\u3057\u3001\u6b21\u306b\u30c1\u30e3\u30cd\u30eb\u6b21\u5143\u5168\u4f53\u306bMLP\u3092\u9069\u7528\u3059\u308b\u3053\u3068\u3067\u3001\u7a7a\u9593\u6b21\u5143\u3068\u30c1\u30e3\u30cd\u30eb\u6b21\u5143\u306e\u6df7\u5408\u3092\u5206\u96e2\u3057\u307e\u3059\uff08\u7a7a\u9593MLP\u306fvIT\u306e\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u306b\u4ee3\u308f\u308a\u3001<a href=\"../transformers/vit/index.html\">\u30c1\u30e3\u30cd\u30ebMLP\u306fVIT\u306eFFN\u3067\u3059</a>\uff09\u3002</a>\n<p>ConvMixer\u306f\u3001<span translate=no>_^_1_^_</span>\u30c1\u30e3\u30f3\u30cd\u30eb\u30df\u30ad\u30b7\u30f3\u30b0\u306b\u306f\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u3092\u4f7f\u7528\u3057\u3001\u7a7a\u9593\u30df\u30ad\u30b7\u30f3\u30b0\u306b\u306f\u5965\u884c\u304d\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u3092\u4f7f\u7528\u3057\u307e\u3059\u3002\u30b9\u30da\u30fc\u30b9\u5168\u4f53\u3067\u30d5\u30ebMLP\u3067\u306f\u306a\u304f\u7573\u307f\u8fbc\u307f\u306a\u306e\u3067\u3001VIT\u3084MLP\u30df\u30ad\u30b5\u30fc\u3068\u306f\u5bfe\u7167\u7684\u306b\u3001\u8fd1\u304f\u306e\u30d0\u30c3\u30c1\u306e\u307f\u3092\u30df\u30ad\u30b7\u30f3\u30b0\u3057\u307e\u3059\u3002\u307e\u305f\u3001MLP\u30df\u30ad\u30b5\u30fc\u306f\u30df\u30ad\u30b7\u30f3\u30b0\u3054\u3068\u306b2\u5c64\u306eMLP\u3092\u4f7f\u7528\u3057\u3001ConvMixer\u306f\u30df\u30ad\u30b7\u30f3\u30b0\u3054\u3068\u306b1\u5c64\u306eMLP\u3092\u4f7f\u7528\u3057\u307e\u3059</p>\u3002\n<p>\u3053\u306e\u8ad6\u6587\u3067\u306f\u3001\u30c1\u30e3\u30cd\u30eb\u30df\u30ad\u30b7\u30f3\u30b0\u5168\u4f53\u306e\u6b8b\u7559\u63a5\u7d9a\u3092\u524a\u9664\u3057\uff08\u70b9\u5358\u4f4d\u306e\u7573\u307f\u8fbc\u307f\uff09\u3001\u7a7a\u9593\u30df\u30ad\u30b7\u30f3\u30b0\u3067\u306f\u6b8b\u7559\u63a5\u7d9a\u306e\u307f\u306b\u3059\u308b\uff08\u6df1\u3055\u65b9\u5411\u306e\u7573\u307f\u8fbc\u307f\uff09\u3053\u3068\u3092\u63a8\u5968\u3057\u3066\u3044\u307e\u3059\u3002\u307e\u305f\u3001</p><a href=\"../normalization/batch_norm/index.html\"><a href=\"../normalization/layer_norm/index.html\">\u30ec\u30a4\u30e4\u30fc\u6b63\u898f\u5316\u306e\u4ee3\u308f\u308a\u306b\u30d0\u30c3\u30c1\u6b63\u898f\u5316\u3092\u4f7f\u7528\u3057\u307e\u3059</a></a>\u3002\n<p>\u3053\u308c\u306f<a href=\"experiment.html\">\u3001CIFAR-10 \u3067 ConvMixer \u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u5b9f\u9a13\u3067\u3059</a>\u3002</p>\n",
+ "<h1>Patches Are All You Need? (ConvMixer)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://arxiv.org/abs/2201.09792\">Patches Are All You Need?</a>.</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p>ConvMixer is Similar to <a href=\"../transformers/mlp_mixer/index.html\">MLP-Mixer</a>. MLP-Mixer separates mixing of spatial and channel dimensions, by applying an MLP across spatial dimension and then an MLP across the channel dimension (spatial MLP replaces the <a href=\"../transformers/vit/index.html\">ViT</a> attention and channel MLP is the <a href=\"../transformers/feed_forward.html\">FFN</a> of ViT).</p>\n<p>ConvMixer uses a <span translate=no>_^_1_^_</span> convolution for channel mixing and a depth-wise convolution for spatial mixing. Since it&#x27;s a convolution instead of a full MLP across the space, it mixes only the nearby batches in contrast to ViT or MLP-Mixer. Also, the MLP-mixer uses MLPs of two layers for each mixing and ConvMixer uses a single layer for each mixing.</p>\n<p>The paper recommends removing the residual connection across the channel mixing (point-wise convolution) and having only a residual connection over the spatial mixing (depth-wise convolution). They also use <a href=\"../normalization/batch_norm/index.html\">Batch normalization</a> instead of <a href=\"../normalization/layer_norm/index.html\">Layer normalization</a>.</p>\n<p>Here&#x27;s <a href=\"experiment.html\">an experiment</a> that trains ConvMixer on CIFAR-10.</p>\n": "<h1>\u5fc5\u8981\u306a\u306e\u306f\u30d1\u30c3\u30c1\u3060\u3051\uff1f(\u30b3\u30f3\u30d0\u30fc\u30b8\u30e7\u30f3\u30df\u30ad\u30b5\u30fc</h1>)\n<p><a href=\"https://pytorch.org\">\u3053\u308c\u306f\u7d19\u306e\u30d1\u30c3\u30c1\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a><a href=\"https://arxiv.org/abs/2201.09792\">\u3002\u5fc5\u8981\u306a\u306e\u306f\u30d1\u30c3\u30c1\u3060\u3051\u3067\u3059\u304b</a>\uff1f</p>\u3002\n<p><span translate=no>_^_0_^_</span></p>\n<p><a href=\"../transformers/mlp_mixer/index.html\">ConvMixer\u306fMLP\u30df\u30ad\u30b5\u30fc\u306b\u4f3c\u3066\u3044\u307e\u3059\u3002</a></p><a href=\"../transformers/feed_forward.html\">MLP-Mixer\u306f\u3001\u7a7a\u9593\u6b21\u5143\u5168\u4f53\u306bMLP\u3092\u9069\u7528\u3057\u3001\u6b21\u306b\u30c1\u30e3\u30cd\u30eb\u6b21\u5143\u5168\u4f53\u306bMLP\u3092\u9069\u7528\u3059\u308b\u3053\u3068\u3067\u3001\u7a7a\u9593\u6b21\u5143\u3068\u30c1\u30e3\u30cd\u30eb\u6b21\u5143\u306e\u6df7\u5408\u3092\u5206\u96e2\u3057\u307e\u3059\uff08\u7a7a\u9593MLP\u306fvIT\u306e\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u306b\u4ee3\u308f\u308a\u3001<a href=\"../transformers/vit/index.html\">\u30c1\u30e3\u30cd\u30ebMLP\u306fVIT\u306eFFN\u3067\u3059</a>\uff09\u3002</a>\n<p>ConvMixer\u306f\u3001<span translate=no>_^_1_^_</span>\u30c1\u30e3\u30f3\u30cd\u30eb\u30df\u30ad\u30b7\u30f3\u30b0\u306b\u306f\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u3092\u4f7f\u7528\u3057\u3001\u7a7a\u9593\u30df\u30ad\u30b7\u30f3\u30b0\u306b\u306f\u5965\u884c\u304d\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u3092\u4f7f\u7528\u3057\u307e\u3059\u3002\u30b9\u30da\u30fc\u30b9\u5168\u4f53\u3067\u30d5\u30ebMLP\u3067\u306f\u306a\u304f\u7573\u307f\u8fbc\u307f\u306a\u306e\u3067\u3001VIT\u3084MLP\u30df\u30ad\u30b5\u30fc\u3068\u306f\u5bfe\u7167\u7684\u306b\u3001\u8fd1\u304f\u306e\u30d0\u30c3\u30c1\u306e\u307f\u3092\u30df\u30ad\u30b7\u30f3\u30b0\u3057\u307e\u3059\u3002\u307e\u305f\u3001MLP\u30df\u30ad\u30b5\u30fc\u306f\u30df\u30ad\u30b7\u30f3\u30b0\u3054\u3068\u306b2\u5c64\u306eMLP\u3092\u4f7f\u7528\u3057\u3001ConvMixer\u306f\u30df\u30ad\u30b7\u30f3\u30b0\u3054\u3068\u306b1\u5c64\u306eMLP\u3092\u4f7f\u7528\u3057\u307e\u3059</p>\u3002\n<p>\u3053\u306e\u8ad6\u6587\u3067\u306f\u3001\u30c1\u30e3\u30cd\u30eb\u30df\u30ad\u30b7\u30f3\u30b0\u5168\u4f53\u306e\u6b8b\u7559\u63a5\u7d9a\u3092\u524a\u9664\u3057\uff08\u70b9\u5358\u4f4d\u306e\u7573\u307f\u8fbc\u307f\uff09\u3001\u7a7a\u9593\u30df\u30ad\u30b7\u30f3\u30b0\u3067\u306f\u6b8b\u7559\u63a5\u7d9a\u306e\u307f\u306b\u3059\u308b\uff08\u6df1\u3055\u65b9\u5411\u306e\u7573\u307f\u8fbc\u307f\uff09\u3053\u3068\u3092\u63a8\u5968\u3057\u3066\u3044\u307e\u3059\u3002\u307e\u305f\u3001</p><a href=\"../normalization/batch_norm/index.html\"><a href=\"../normalization/layer_norm/index.html\">\u30ec\u30a4\u30e4\u30fc\u6b63\u898f\u5316\u306e\u4ee3\u308f\u308a\u306b\u30d0\u30c3\u30c1\u6b63\u898f\u5316\u3092\u4f7f\u7528\u3057\u307e\u3059</a></a>\u3002\n<p>\u3053\u308c\u306f<a href=\"experiment.html\">\u3001CIFAR-10 \u3067 ConvMixer \u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u5b9f\u9a13\u3067\u3059</a>\u3002</p>\n",
 "<h2>ConvMixer</h2>\n<p>This combines the patch embeddings block, a number of ConvMixer layers and a classification head.</p>\n": "<h2>\u30b3\u30f3\u30d0\u30fc\u30b8\u30e7\u30f3\u30df\u30ad\u30b5\u30fc</h2>\n<p>\u3053\u308c\u306b\u3088\u308a\u3001\u30d1\u30c3\u30c1\u57cb\u3081\u8fbc\u307f\u30d6\u30ed\u30c3\u30af\u3001\u591a\u6570\u306e ConvMixer \u30ec\u30a4\u30e4\u30fc\u3001\u304a\u3088\u3073\u5206\u985e\u30d8\u30c3\u30c9\u304c\u7d44\u307f\u5408\u308f\u3055\u308c\u307e\u3059\u3002</p>\n",
 "<p> </p>\n": "<p></p>\n",
 "<p> <a id=\"ClassificationHead\"></a></p>\n<h2>Classification Head</h2>\n<p>They do average pooling (taking the mean of all patch embeddings) and a final linear transformation to predict the log-probabilities of the image classes.</p>\n": "<p><a id=\"ClassificationHead\"></a></p>\n<h2>\u5206\u985e\u8cac\u4efb\u8005</h2>\n<p>\u5e73\u5747\u30d7\u30fc\u30ea\u30f3\u30b0\uff08\u3059\u3079\u3066\u306e\u30d1\u30c3\u30c1\u57cb\u3081\u8fbc\u307f\u306e\u5e73\u5747\u3092\u53d6\u308b\uff09\u3068\u6700\u7d42\u7684\u306a\u7dda\u5f62\u5909\u63db\u3092\u884c\u3063\u3066\u3001\u753b\u50cf\u30af\u30e9\u30b9\u306e\u5bfe\u6570\u78ba\u7387\u3092\u4e88\u6e2c\u3057\u307e\u3059\u3002</p>\n",
--- a/translate_cache/conv_mixer/init.si.json
+++ b/translate_cache/conv_mixer/init.si.json
--- a/translate_cache/conv_mixer/init.zh.json
+++ b/translate_cache/conv_mixer/init.zh.json
@ -1,5 +1,5 @@
 {
- "<h1>Patches Are All You Need? (ConvMixer)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://papers.labml.ai/paper/2201.09792\">Patches Are All You Need?</a>.</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p>ConvMixer is Similar to <a href=\"../transformers/mlp_mixer/index.html\">MLP-Mixer</a>. MLP-Mixer separates mixing of spatial and channel dimensions, by applying an MLP across spatial dimension and then an MLP across the channel dimension (spatial MLP replaces the <a href=\"../transformers/vit/index.html\">ViT</a> attention and channel MLP is the <a href=\"../transformers/feed_forward.html\">FFN</a> of ViT).</p>\n<p>ConvMixer uses a <span translate=no>_^_1_^_</span> convolution for channel mixing and a depth-wise convolution for spatial mixing. Since it&#x27;s a convolution instead of a full MLP across the space, it mixes only the nearby batches in contrast to ViT or MLP-Mixer. Also, the MLP-mixer uses MLPs of two layers for each mixing and ConvMixer uses a single layer for each mixing.</p>\n<p>The paper recommends removing the residual connection across the channel mixing (point-wise convolution) and having only a residual connection over the spatial mixing (depth-wise convolution). They also use <a href=\"../normalization/batch_norm/index.html\">Batch normalization</a> instead of <a href=\"../normalization/layer_norm/index.html\">Layer normalization</a>.</p>\n<p>Here&#x27;s <a href=\"experiment.html\">an experiment</a> that trains ConvMixer on CIFAR-10.</p>\n": "<h1>\u4f60\u53ea\u9700\u8981\u8865\u4e01\u5417\uff1f\uff08convMixer\uff09</h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5bf9\u8bba\u6587\u300a<a href=\"https://papers.labml.ai/paper/2201.09792\">\u8865\u4e01\u5c31\u662f\u4f60\u6240\u9700\u8981\u7684\uff1f</a>\u300b\u7684\u5b9e\u73b0</p>\u3002\n<p><span translate=no>_^_0_^_</span></p>\n<p>convMixer \u7c7b\u4f3c\u4e8e <a href=\"../transformers/mlp_mixer/index.html\">MLP \u6df7\u97f3\u5668</a>\u3002MLP-Mixer \u901a\u8fc7\u5728\u7a7a\u95f4\u7ef4\u5ea6\u4e0a\u5e94\u7528 MLP\uff0c\u7136\u540e\u5728\u4fe1\u9053\u7ef4\u5ea6\u4e0a\u5e94\u7528 MLP \u6765\u5206\u79bb\u7a7a\u95f4\u7ef4\u5ea6\u548c\u4fe1\u9053\u7ef4\u5ea6\u7684\u6df7\u97f3\uff08\u7a7a\u95f4 MLP \u53d6\u4ee3 <a href=\"../transformers/vit/index.html\">vIT</a> \u6ce8\u610f\u529b\uff0c\u4fe1\u9053 MLP \u662f ViT \u7684 <a href=\"../transformers/feed_forward.html\">FFN</a>\uff09\u3002</p>\n<p>ConvMixer \u4f7f\u7528<span translate=no>_^_1_^_</span>\u5377\u79ef\u8fdb\u884c\u901a\u9053\u6df7\u5408\uff0c\u4f7f\u7528\u6df1\u5ea6\u5377\u79ef\u8fdb\u884c\u7a7a\u95f4\u6df7\u5408\u3002\u7531\u4e8e\u5b83\u662f\u5377\u79ef\u800c\u4e0d\u662f\u6574\u4e2a\u7a7a\u95f4\u7684\u5b8c\u6574\u7684 MLP\uff0c\u56e0\u6b64\u4e0e vIT \u6216 MLP-Mixer \u76f8\u6bd4\uff0c\u5b83\u53ea\u6df7\u5408\u9644\u8fd1\u7684\u6279\u6b21\u3002\u6b64\u5916\uff0cMLP-Mixer \u6bcf\u6b21\u6df7\u5408\u4f7f\u7528\u4e24\u5c42 MLP\uff0cConvMixer \u6bcf\u6b21\u6df7\u5408\u4f7f\u7528\u5355\u5c42\u3002</p>\n<p>\u8be5\u8bba\u6587\u5efa\u8bae\u5220\u9664\u4fe1\u9053\u6df7\u5408\uff08\u9010\u70b9\u5377\u79ef\uff09\u4e0a\u7684\u5269\u4f59\u8fde\u63a5\uff0c\u5728\u7a7a\u95f4\u6df7\u5408\uff08\u6df1\u5ea6\u5377\u79ef\uff09\u4e0a\u4ec5\u4f7f\u7528\u6b8b\u5dee\u8fde\u63a5\u3002\u4ed6\u4eec\u8fd8\u4f7f\u7528<a href=\"../normalization/batch_norm/index.html\">\u6279\u91cf\u6807\u51c6\u5316</a>\u800c\u4e0d\u662f<a href=\"../normalization/layer_norm/index.html\">\u56fe\u5c42\u6807\u51c6\u5316</a>\u3002</p>\n<p>\u8fd9\u662f<a href=\"experiment.html\">\u4e00\u9879\u5728 CIFAR-10 \u4e0a\u8bad\u7ec3 ConvMixer \u7684\u5b9e\u9a8c</a>\u3002</p>\n",
+ "<h1>Patches Are All You Need? (ConvMixer)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://arxiv.org/abs/2201.09792\">Patches Are All You Need?</a>.</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p>ConvMixer is Similar to <a href=\"../transformers/mlp_mixer/index.html\">MLP-Mixer</a>. MLP-Mixer separates mixing of spatial and channel dimensions, by applying an MLP across spatial dimension and then an MLP across the channel dimension (spatial MLP replaces the <a href=\"../transformers/vit/index.html\">ViT</a> attention and channel MLP is the <a href=\"../transformers/feed_forward.html\">FFN</a> of ViT).</p>\n<p>ConvMixer uses a <span translate=no>_^_1_^_</span> convolution for channel mixing and a depth-wise convolution for spatial mixing. Since it&#x27;s a convolution instead of a full MLP across the space, it mixes only the nearby batches in contrast to ViT or MLP-Mixer. Also, the MLP-mixer uses MLPs of two layers for each mixing and ConvMixer uses a single layer for each mixing.</p>\n<p>The paper recommends removing the residual connection across the channel mixing (point-wise convolution) and having only a residual connection over the spatial mixing (depth-wise convolution). They also use <a href=\"../normalization/batch_norm/index.html\">Batch normalization</a> instead of <a href=\"../normalization/layer_norm/index.html\">Layer normalization</a>.</p>\n<p>Here&#x27;s <a href=\"experiment.html\">an experiment</a> that trains ConvMixer on CIFAR-10.</p>\n": "<h1>\u4f60\u53ea\u9700\u8981\u8865\u4e01\u5417\uff1f\uff08convMixer\uff09</h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5bf9\u8bba\u6587\u300a<a href=\"https://arxiv.org/abs/2201.09792\">\u8865\u4e01\u5c31\u662f\u4f60\u6240\u9700\u8981\u7684\uff1f</a>\u300b\u7684\u5b9e\u73b0</p>\u3002\n<p><span translate=no>_^_0_^_</span></p>\n<p>convMixer \u7c7b\u4f3c\u4e8e <a href=\"../transformers/mlp_mixer/index.html\">MLP \u6df7\u97f3\u5668</a>\u3002MLP-Mixer \u901a\u8fc7\u5728\u7a7a\u95f4\u7ef4\u5ea6\u4e0a\u5e94\u7528 MLP\uff0c\u7136\u540e\u5728\u4fe1\u9053\u7ef4\u5ea6\u4e0a\u5e94\u7528 MLP \u6765\u5206\u79bb\u7a7a\u95f4\u7ef4\u5ea6\u548c\u4fe1\u9053\u7ef4\u5ea6\u7684\u6df7\u97f3\uff08\u7a7a\u95f4 MLP \u53d6\u4ee3 <a href=\"../transformers/vit/index.html\">vIT</a> \u6ce8\u610f\u529b\uff0c\u4fe1\u9053 MLP \u662f ViT \u7684 <a href=\"../transformers/feed_forward.html\">FFN</a>\uff09\u3002</p>\n<p>ConvMixer \u4f7f\u7528<span translate=no>_^_1_^_</span>\u5377\u79ef\u8fdb\u884c\u901a\u9053\u6df7\u5408\uff0c\u4f7f\u7528\u6df1\u5ea6\u5377\u79ef\u8fdb\u884c\u7a7a\u95f4\u6df7\u5408\u3002\u7531\u4e8e\u5b83\u662f\u5377\u79ef\u800c\u4e0d\u662f\u6574\u4e2a\u7a7a\u95f4\u7684\u5b8c\u6574\u7684 MLP\uff0c\u56e0\u6b64\u4e0e vIT \u6216 MLP-Mixer \u76f8\u6bd4\uff0c\u5b83\u53ea\u6df7\u5408\u9644\u8fd1\u7684\u6279\u6b21\u3002\u6b64\u5916\uff0cMLP-Mixer \u6bcf\u6b21\u6df7\u5408\u4f7f\u7528\u4e24\u5c42 MLP\uff0cConvMixer \u6bcf\u6b21\u6df7\u5408\u4f7f\u7528\u5355\u5c42\u3002</p>\n<p>\u8be5\u8bba\u6587\u5efa\u8bae\u5220\u9664\u4fe1\u9053\u6df7\u5408\uff08\u9010\u70b9\u5377\u79ef\uff09\u4e0a\u7684\u5269\u4f59\u8fde\u63a5\uff0c\u5728\u7a7a\u95f4\u6df7\u5408\uff08\u6df1\u5ea6\u5377\u79ef\uff09\u4e0a\u4ec5\u4f7f\u7528\u6b8b\u5dee\u8fde\u63a5\u3002\u4ed6\u4eec\u8fd8\u4f7f\u7528<a href=\"../normalization/batch_norm/index.html\">\u6279\u91cf\u6807\u51c6\u5316</a>\u800c\u4e0d\u662f<a href=\"../normalization/layer_norm/index.html\">\u56fe\u5c42\u6807\u51c6\u5316</a>\u3002</p>\n<p>\u8fd9\u662f<a href=\"experiment.html\">\u4e00\u9879\u5728 CIFAR-10 \u4e0a\u8bad\u7ec3 ConvMixer \u7684\u5b9e\u9a8c</a>\u3002</p>\n",
 "<h2>ConvMixer</h2>\n<p>This combines the patch embeddings block, a number of ConvMixer layers and a classification head.</p>\n": "<h2>\u6df7\u97f3\u5668</h2>\n<p>\u5b83\u7ed3\u5408\u4e86\u8865\u4e01\u5d4c\u5165\u5757\u3001\u8bb8\u591a ConvMixer \u5c42\u548c\u4e00\u4e2a\u5206\u7c7b\u5934\u3002</p>\n",
 "<p> </p>\n": "<p></p>\n",
 "<p> <a id=\"ClassificationHead\"></a></p>\n<h2>Classification Head</h2>\n<p>They do average pooling (taking the mean of all patch embeddings) and a final linear transformation to predict the log-probabilities of the image classes.</p>\n": "<p><a id=\"ClassificationHead\"></a></p>\n<h2>\u5206\u7c7b\u4e3b\u7ba1</h2>\n<p>\u5b83\u4eec\u8fdb\u884c\u5e73\u5747\u6c60\uff08\u53d6\u6240\u6709\u8865\u4e01\u5d4c\u5165\u7684\u5747\u503c\uff09\u548c\u6700\u7ec8\u7684\u7ebf\u6027\u53d8\u6362\u6765\u9884\u6d4b\u5f71\u50cf\u7c7b\u7684\u5bf9\u6570\u6982\u7387\u3002</p>\n",
--- a/translate_cache/conv_mixer/readme.ja.json
+++ b/translate_cache/conv_mixer/readme.ja.json
@ -1,4 +1,4 @@
 {
 " Patches Are All You Need?": " \u5fc5\u8981\u306a\u306e\u306f\u30d1\u30c3\u30c1\u3060\u3051\uff1f",
- "<h1><a href=\"https://nn.labml.ai/conv_mixer/index.html\">Patches Are All You Need?</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://papers.labml.ai/paper/2201.09792\">Patches Are All You Need?</a>.</p>\n<p>ConvMixer is Similar to <a href=\"https://nn.labml.ai/transformers/mlp_mixer/index.html\">MLP-Mixer</a>. MLP-Mixer separates mixing of spatial and channel dimensions, by applying an MLP across spatial dimension and then an MLP across the channel dimension (spatial MLP replaces the <a href=\"https://nn.labml.ai/transformers/vit/index.html\">ViT</a> attention and channel MLP is the <a href=\"https://nn.labml.ai/transformers/feed_forward.html\">FFN</a> of ViT).</p>\n<p>ConvMixer uses a 1x1 convolution for channel mixing and a depth-wise convolution for spatial mixing. Since it&#x27;s a convolution instead of a full MLP across the space, it mixes only the nearby batches in contrast to ViT or MLP-Mixer. Also, the MLP-mixer uses MLPs of two layers for each mixing and ConvMixer uses a single layer for each mixing.</p>\n<p>The paper recommends removing the residual connection across the channel mixing (point-wise convolution) and having only a residual connection over the spatial mixing (depth-wise convolution). They also use <a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">Batch normalization</a> instead of <a href=\"../normalization/layer_norm/index.html\">Layer normalization</a>.</p>\n<p>Here&#x27;s <a href=\"https://nn.labml.ai/conv_mixer/experiment.html\">an experiment</a> that trains ConvMixer on CIFAR-10. </p>\n": "<h1><a href=\"https://nn.labml.ai/conv_mixer/index.html\">\u5fc5\u8981\u306a\u306e\u306f\u30d1\u30c3\u30c1\u3060\u3051\uff1f</a></h1>\n<p><a href=\"https://pytorch.org\">\u3053\u308c\u306f\u7d19\u306e\u30d1\u30c3\u30c1\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a><a href=\"https://papers.labml.ai/paper/2201.09792\">\u3002\u5fc5\u8981\u306a\u306e\u306f\u30d1\u30c3\u30c1\u3060\u3051\u3067\u3059\u304b</a>\uff1f</p>\u3002\n<p><a href=\"https://nn.labml.ai/transformers/mlp_mixer/index.html\">ConvMixer\u306fMLP\u30df\u30ad\u30b5\u30fc\u306b\u4f3c\u3066\u3044\u307e\u3059\u3002</a></p><a href=\"https://nn.labml.ai/transformers/feed_forward.html\">MLP-Mixer\u306f\u3001\u7a7a\u9593\u6b21\u5143\u5168\u4f53\u306bMLP\u3092\u9069\u7528\u3057\u3001\u6b21\u306b\u30c1\u30e3\u30cd\u30eb\u6b21\u5143\u5168\u4f53\u306bMLP\u3092\u9069\u7528\u3059\u308b\u3053\u3068\u3067\u3001\u7a7a\u9593\u6b21\u5143\u3068\u30c1\u30e3\u30cd\u30eb\u6b21\u5143\u306e\u6df7\u5408\u3092\u5206\u96e2\u3057\u307e\u3059\uff08\u7a7a\u9593MLP\u306fvIT\u306e\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u306b\u4ee3\u308f\u308a\u3001<a href=\"https://nn.labml.ai/transformers/vit/index.html\">\u30c1\u30e3\u30cd\u30ebMLP\u306fVIT\u306eFFN\u3067\u3059</a>\uff09\u3002</a>\n<p>ConvMixer\u306f\u3001\u30c1\u30e3\u30f3\u30cd\u30eb\u30df\u30ad\u30b7\u30f3\u30b0\u306b1x1\u306e\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u3092\u4f7f\u7528\u3057\u3001\u7a7a\u9593\u30df\u30ad\u30b7\u30f3\u30b0\u306b\u5965\u884c\u304d\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u3092\u4f7f\u7528\u3057\u307e\u3059\u3002\u30b9\u30da\u30fc\u30b9\u5168\u4f53\u3067\u30d5\u30ebMLP\u3067\u306f\u306a\u304f\u7573\u307f\u8fbc\u307f\u306a\u306e\u3067\u3001VIT\u3084MLP\u30df\u30ad\u30b5\u30fc\u3068\u306f\u5bfe\u7167\u7684\u306b\u3001\u8fd1\u304f\u306e\u30d0\u30c3\u30c1\u306e\u307f\u3092\u30df\u30ad\u30b7\u30f3\u30b0\u3057\u307e\u3059\u3002\u307e\u305f\u3001MLP\u30df\u30ad\u30b5\u30fc\u306f\u30df\u30ad\u30b7\u30f3\u30b0\u3054\u3068\u306b2\u5c64\u306eMLP\u3092\u4f7f\u7528\u3057\u3001ConvMixer\u306f\u30df\u30ad\u30b7\u30f3\u30b0\u3054\u3068\u306b1\u5c64\u306eMLP\u3092\u4f7f\u7528\u3057\u307e\u3059</p>\u3002\n<p>\u3053\u306e\u8ad6\u6587\u3067\u306f\u3001\u30c1\u30e3\u30cd\u30eb\u30df\u30ad\u30b7\u30f3\u30b0\u5168\u4f53\u306e\u6b8b\u7559\u63a5\u7d9a\u3092\u524a\u9664\u3057\uff08\u70b9\u5358\u4f4d\u306e\u7573\u307f\u8fbc\u307f\uff09\u3001\u7a7a\u9593\u30df\u30ad\u30b7\u30f3\u30b0\u3067\u306f\u6b8b\u7559\u63a5\u7d9a\u306e\u307f\u306b\u3059\u308b\uff08\u6df1\u3055\u65b9\u5411\u306e\u7573\u307f\u8fbc\u307f\uff09\u3053\u3068\u3092\u63a8\u5968\u3057\u3066\u3044\u307e\u3059\u3002\u307e\u305f\u3001</p><a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\"><a href=\"../normalization/layer_norm/index.html\">\u30ec\u30a4\u30e4\u30fc\u6b63\u898f\u5316\u306e\u4ee3\u308f\u308a\u306b\u30d0\u30c3\u30c1\u6b63\u898f\u5316\u3092\u4f7f\u7528\u3057\u307e\u3059</a></a>\u3002\n<p>\u3053\u308c\u306f<a href=\"https://nn.labml.ai/conv_mixer/experiment.html\">\u3001CIFAR-10 \u3067 ConvMixer \u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u5b9f\u9a13\u3067\u3059</a>\u3002</p>\n"
+ "<h1><a href=\"https://nn.labml.ai/conv_mixer/index.html\">Patches Are All You Need?</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://arxiv.org/abs/2201.09792\">Patches Are All You Need?</a>.</p>\n<p>ConvMixer is Similar to <a href=\"https://nn.labml.ai/transformers/mlp_mixer/index.html\">MLP-Mixer</a>. MLP-Mixer separates mixing of spatial and channel dimensions, by applying an MLP across spatial dimension and then an MLP across the channel dimension (spatial MLP replaces the <a href=\"https://nn.labml.ai/transformers/vit/index.html\">ViT</a> attention and channel MLP is the <a href=\"https://nn.labml.ai/transformers/feed_forward.html\">FFN</a> of ViT).</p>\n<p>ConvMixer uses a 1x1 convolution for channel mixing and a depth-wise convolution for spatial mixing. Since it&#x27;s a convolution instead of a full MLP across the space, it mixes only the nearby batches in contrast to ViT or MLP-Mixer. Also, the MLP-mixer uses MLPs of two layers for each mixing and ConvMixer uses a single layer for each mixing.</p>\n<p>The paper recommends removing the residual connection across the channel mixing (point-wise convolution) and having only a residual connection over the spatial mixing (depth-wise convolution). They also use <a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">Batch normalization</a> instead of <a href=\"../normalization/layer_norm/index.html\">Layer normalization</a>.</p>\n<p>Here&#x27;s <a href=\"https://nn.labml.ai/conv_mixer/experiment.html\">an experiment</a> that trains ConvMixer on CIFAR-10. </p>\n": "<h1><a href=\"https://nn.labml.ai/conv_mixer/index.html\">\u5fc5\u8981\u306a\u306e\u306f\u30d1\u30c3\u30c1\u3060\u3051\uff1f</a></h1>\n<p><a href=\"https://pytorch.org\">\u3053\u308c\u306f\u7d19\u306e\u30d1\u30c3\u30c1\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a><a href=\"https://arxiv.org/abs/2201.09792\">\u3002\u5fc5\u8981\u306a\u306e\u306f\u30d1\u30c3\u30c1\u3060\u3051\u3067\u3059\u304b</a>\uff1f</p>\u3002\n<p><a href=\"https://nn.labml.ai/transformers/mlp_mixer/index.html\">ConvMixer\u306fMLP\u30df\u30ad\u30b5\u30fc\u306b\u4f3c\u3066\u3044\u307e\u3059\u3002</a></p><a href=\"https://nn.labml.ai/transformers/feed_forward.html\">MLP-Mixer\u306f\u3001\u7a7a\u9593\u6b21\u5143\u5168\u4f53\u306bMLP\u3092\u9069\u7528\u3057\u3001\u6b21\u306b\u30c1\u30e3\u30cd\u30eb\u6b21\u5143\u5168\u4f53\u306bMLP\u3092\u9069\u7528\u3059\u308b\u3053\u3068\u3067\u3001\u7a7a\u9593\u6b21\u5143\u3068\u30c1\u30e3\u30cd\u30eb\u6b21\u5143\u306e\u6df7\u5408\u3092\u5206\u96e2\u3057\u307e\u3059\uff08\u7a7a\u9593MLP\u306fvIT\u306e\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u306b\u4ee3\u308f\u308a\u3001<a href=\"https://nn.labml.ai/transformers/vit/index.html\">\u30c1\u30e3\u30cd\u30ebMLP\u306fVIT\u306eFFN\u3067\u3059</a>\uff09\u3002</a>\n<p>ConvMixer\u306f\u3001\u30c1\u30e3\u30f3\u30cd\u30eb\u30df\u30ad\u30b7\u30f3\u30b0\u306b1x1\u306e\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u3092\u4f7f\u7528\u3057\u3001\u7a7a\u9593\u30df\u30ad\u30b7\u30f3\u30b0\u306b\u5965\u884c\u304d\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u3092\u4f7f\u7528\u3057\u307e\u3059\u3002\u30b9\u30da\u30fc\u30b9\u5168\u4f53\u3067\u30d5\u30ebMLP\u3067\u306f\u306a\u304f\u7573\u307f\u8fbc\u307f\u306a\u306e\u3067\u3001VIT\u3084MLP\u30df\u30ad\u30b5\u30fc\u3068\u306f\u5bfe\u7167\u7684\u306b\u3001\u8fd1\u304f\u306e\u30d0\u30c3\u30c1\u306e\u307f\u3092\u30df\u30ad\u30b7\u30f3\u30b0\u3057\u307e\u3059\u3002\u307e\u305f\u3001MLP\u30df\u30ad\u30b5\u30fc\u306f\u30df\u30ad\u30b7\u30f3\u30b0\u3054\u3068\u306b2\u5c64\u306eMLP\u3092\u4f7f\u7528\u3057\u3001ConvMixer\u306f\u30df\u30ad\u30b7\u30f3\u30b0\u3054\u3068\u306b1\u5c64\u306eMLP\u3092\u4f7f\u7528\u3057\u307e\u3059</p>\u3002\n<p>\u3053\u306e\u8ad6\u6587\u3067\u306f\u3001\u30c1\u30e3\u30cd\u30eb\u30df\u30ad\u30b7\u30f3\u30b0\u5168\u4f53\u306e\u6b8b\u7559\u63a5\u7d9a\u3092\u524a\u9664\u3057\uff08\u70b9\u5358\u4f4d\u306e\u7573\u307f\u8fbc\u307f\uff09\u3001\u7a7a\u9593\u30df\u30ad\u30b7\u30f3\u30b0\u3067\u306f\u6b8b\u7559\u63a5\u7d9a\u306e\u307f\u306b\u3059\u308b\uff08\u6df1\u3055\u65b9\u5411\u306e\u7573\u307f\u8fbc\u307f\uff09\u3053\u3068\u3092\u63a8\u5968\u3057\u3066\u3044\u307e\u3059\u3002\u307e\u305f\u3001</p><a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\"><a href=\"../normalization/layer_norm/index.html\">\u30ec\u30a4\u30e4\u30fc\u6b63\u898f\u5316\u306e\u4ee3\u308f\u308a\u306b\u30d0\u30c3\u30c1\u6b63\u898f\u5316\u3092\u4f7f\u7528\u3057\u307e\u3059</a></a>\u3002\n<p>\u3053\u308c\u306f<a href=\"https://nn.labml.ai/conv_mixer/experiment.html\">\u3001CIFAR-10 \u3067 ConvMixer \u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u5b9f\u9a13\u3067\u3059</a>\u3002</p>\n"
 }
--- a/translate_cache/conv_mixer/readme.si.json
+++ b/translate_cache/conv_mixer/readme.si.json
--- a/translate_cache/conv_mixer/readme.zh.json
+++ b/translate_cache/conv_mixer/readme.zh.json
@ -1,4 +1,4 @@
 {
 " Patches Are All You Need?": " \u8865\u4e01\u662f\u4f60\u6240\u9700\u8981\u7684\u5417\uff1f",
- "<h1><a href=\"https://nn.labml.ai/conv_mixer/index.html\">Patches Are All You Need?</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://papers.labml.ai/paper/2201.09792\">Patches Are All You Need?</a>.</p>\n<p>ConvMixer is Similar to <a href=\"https://nn.labml.ai/transformers/mlp_mixer/index.html\">MLP-Mixer</a>. MLP-Mixer separates mixing of spatial and channel dimensions, by applying an MLP across spatial dimension and then an MLP across the channel dimension (spatial MLP replaces the <a href=\"https://nn.labml.ai/transformers/vit/index.html\">ViT</a> attention and channel MLP is the <a href=\"https://nn.labml.ai/transformers/feed_forward.html\">FFN</a> of ViT).</p>\n<p>ConvMixer uses a 1x1 convolution for channel mixing and a depth-wise convolution for spatial mixing. Since it&#x27;s a convolution instead of a full MLP across the space, it mixes only the nearby batches in contrast to ViT or MLP-Mixer. Also, the MLP-mixer uses MLPs of two layers for each mixing and ConvMixer uses a single layer for each mixing.</p>\n<p>The paper recommends removing the residual connection across the channel mixing (point-wise convolution) and having only a residual connection over the spatial mixing (depth-wise convolution). They also use <a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">Batch normalization</a> instead of <a href=\"../normalization/layer_norm/index.html\">Layer normalization</a>.</p>\n<p>Here&#x27;s <a href=\"https://nn.labml.ai/conv_mixer/experiment.html\">an experiment</a> that trains ConvMixer on CIFAR-10. </p>\n": "<h1><a href=\"https://nn.labml.ai/conv_mixer/index.html\">\u4f60\u53ea\u9700\u8981\u8865\u4e01\u5417\uff1f</a></h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5bf9\u8bba\u6587\u300a<a href=\"https://papers.labml.ai/paper/2201.09792\">\u8865\u4e01\u5c31\u662f\u4f60\u6240\u9700\u8981\u7684\uff1f</a>\u300b\u7684\u5b9e\u73b0</p>\u3002\n<p>convMixer \u7c7b\u4f3c\u4e8e <a href=\"https://nn.labml.ai/transformers/mlp_mixer/index.html\">MLP \u6df7\u97f3\u5668</a>\u3002MLP-Mixer \u901a\u8fc7\u5728\u7a7a\u95f4\u7ef4\u5ea6\u4e0a\u5e94\u7528 MLP\uff0c\u7136\u540e\u5728\u4fe1\u9053\u7ef4\u5ea6\u4e0a\u5e94\u7528 MLP \u6765\u5206\u79bb\u7a7a\u95f4\u7ef4\u5ea6\u548c\u4fe1\u9053\u7ef4\u5ea6\u7684\u6df7\u97f3\uff08\u7a7a\u95f4 MLP \u53d6\u4ee3 <a href=\"https://nn.labml.ai/transformers/vit/index.html\">vIT</a> \u6ce8\u610f\u529b\uff0c\u4fe1\u9053 MLP \u662f ViT \u7684 <a href=\"https://nn.labml.ai/transformers/feed_forward.html\">FFN</a>\uff09\u3002</p>\n<p>ConvMixer \u4f7f\u7528 1x1 \u5377\u79ef\u8fdb\u884c\u901a\u9053\u6df7\u5408\uff0c\u4f7f\u7528\u6df1\u5ea6\u5377\u79ef\u8fdb\u884c\u7a7a\u95f4\u6df7\u5408\u3002\u7531\u4e8e\u5b83\u662f\u5377\u79ef\u800c\u4e0d\u662f\u6574\u4e2a\u7a7a\u95f4\u7684\u5b8c\u6574\u7684 MLP\uff0c\u56e0\u6b64\u4e0e vIT \u6216 MLP-Mixer \u76f8\u6bd4\uff0c\u5b83\u53ea\u6df7\u5408\u9644\u8fd1\u7684\u6279\u6b21\u3002\u6b64\u5916\uff0cMLP-Mixer \u6bcf\u6b21\u6df7\u5408\u4f7f\u7528\u4e24\u5c42 MLP\uff0cConvMixer \u6bcf\u6b21\u6df7\u5408\u4f7f\u7528\u5355\u5c42\u3002</p>\n<p>\u8be5\u8bba\u6587\u5efa\u8bae\u5220\u9664\u4fe1\u9053\u6df7\u5408\uff08\u9010\u70b9\u5377\u79ef\uff09\u4e0a\u7684\u5269\u4f59\u8fde\u63a5\uff0c\u5728\u7a7a\u95f4\u6df7\u5408\uff08\u6df1\u5ea6\u5377\u79ef\uff09\u4e0a\u4ec5\u4f7f\u7528\u6b8b\u5dee\u8fde\u63a5\u3002\u4ed6\u4eec\u8fd8\u4f7f\u7528<a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">\u6279\u91cf\u6807\u51c6\u5316</a>\u800c\u4e0d\u662f<a href=\"../normalization/layer_norm/index.html\">\u56fe\u5c42\u6807\u51c6\u5316</a>\u3002</p>\n<p>\u8fd9\u662f<a href=\"https://nn.labml.ai/conv_mixer/experiment.html\">\u4e00\u9879\u5728 CIFAR-10 \u4e0a\u8bad\u7ec3 ConvMixer \u7684\u5b9e\u9a8c</a>\u3002</p>\n"
+ "<h1><a href=\"https://nn.labml.ai/conv_mixer/index.html\">Patches Are All You Need?</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://arxiv.org/abs/2201.09792\">Patches Are All You Need?</a>.</p>\n<p>ConvMixer is Similar to <a href=\"https://nn.labml.ai/transformers/mlp_mixer/index.html\">MLP-Mixer</a>. MLP-Mixer separates mixing of spatial and channel dimensions, by applying an MLP across spatial dimension and then an MLP across the channel dimension (spatial MLP replaces the <a href=\"https://nn.labml.ai/transformers/vit/index.html\">ViT</a> attention and channel MLP is the <a href=\"https://nn.labml.ai/transformers/feed_forward.html\">FFN</a> of ViT).</p>\n<p>ConvMixer uses a 1x1 convolution for channel mixing and a depth-wise convolution for spatial mixing. Since it&#x27;s a convolution instead of a full MLP across the space, it mixes only the nearby batches in contrast to ViT or MLP-Mixer. Also, the MLP-mixer uses MLPs of two layers for each mixing and ConvMixer uses a single layer for each mixing.</p>\n<p>The paper recommends removing the residual connection across the channel mixing (point-wise convolution) and having only a residual connection over the spatial mixing (depth-wise convolution). They also use <a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">Batch normalization</a> instead of <a href=\"../normalization/layer_norm/index.html\">Layer normalization</a>.</p>\n<p>Here&#x27;s <a href=\"https://nn.labml.ai/conv_mixer/experiment.html\">an experiment</a> that trains ConvMixer on CIFAR-10. </p>\n": "<h1><a href=\"https://nn.labml.ai/conv_mixer/index.html\">\u4f60\u53ea\u9700\u8981\u8865\u4e01\u5417\uff1f</a></h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5bf9\u8bba\u6587\u300a<a href=\"https://arxiv.org/abs/2201.09792\">\u8865\u4e01\u5c31\u662f\u4f60\u6240\u9700\u8981\u7684\uff1f</a>\u300b\u7684\u5b9e\u73b0</p>\u3002\n<p>convMixer \u7c7b\u4f3c\u4e8e <a href=\"https://nn.labml.ai/transformers/mlp_mixer/index.html\">MLP \u6df7\u97f3\u5668</a>\u3002MLP-Mixer \u901a\u8fc7\u5728\u7a7a\u95f4\u7ef4\u5ea6\u4e0a\u5e94\u7528 MLP\uff0c\u7136\u540e\u5728\u4fe1\u9053\u7ef4\u5ea6\u4e0a\u5e94\u7528 MLP \u6765\u5206\u79bb\u7a7a\u95f4\u7ef4\u5ea6\u548c\u4fe1\u9053\u7ef4\u5ea6\u7684\u6df7\u97f3\uff08\u7a7a\u95f4 MLP \u53d6\u4ee3 <a href=\"https://nn.labml.ai/transformers/vit/index.html\">vIT</a> \u6ce8\u610f\u529b\uff0c\u4fe1\u9053 MLP \u662f ViT \u7684 <a href=\"https://nn.labml.ai/transformers/feed_forward.html\">FFN</a>\uff09\u3002</p>\n<p>ConvMixer \u4f7f\u7528 1x1 \u5377\u79ef\u8fdb\u884c\u901a\u9053\u6df7\u5408\uff0c\u4f7f\u7528\u6df1\u5ea6\u5377\u79ef\u8fdb\u884c\u7a7a\u95f4\u6df7\u5408\u3002\u7531\u4e8e\u5b83\u662f\u5377\u79ef\u800c\u4e0d\u662f\u6574\u4e2a\u7a7a\u95f4\u7684\u5b8c\u6574\u7684 MLP\uff0c\u56e0\u6b64\u4e0e vIT \u6216 MLP-Mixer \u76f8\u6bd4\uff0c\u5b83\u53ea\u6df7\u5408\u9644\u8fd1\u7684\u6279\u6b21\u3002\u6b64\u5916\uff0cMLP-Mixer \u6bcf\u6b21\u6df7\u5408\u4f7f\u7528\u4e24\u5c42 MLP\uff0cConvMixer \u6bcf\u6b21\u6df7\u5408\u4f7f\u7528\u5355\u5c42\u3002</p>\n<p>\u8be5\u8bba\u6587\u5efa\u8bae\u5220\u9664\u4fe1\u9053\u6df7\u5408\uff08\u9010\u70b9\u5377\u79ef\uff09\u4e0a\u7684\u5269\u4f59\u8fde\u63a5\uff0c\u5728\u7a7a\u95f4\u6df7\u5408\uff08\u6df1\u5ea6\u5377\u79ef\uff09\u4e0a\u4ec5\u4f7f\u7528\u6b8b\u5dee\u8fde\u63a5\u3002\u4ed6\u4eec\u8fd8\u4f7f\u7528<a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">\u6279\u91cf\u6807\u51c6\u5316</a>\u800c\u4e0d\u662f<a href=\"../normalization/layer_norm/index.html\">\u56fe\u5c42\u6807\u51c6\u5316</a>\u3002</p>\n<p>\u8fd9\u662f<a href=\"https://nn.labml.ai/conv_mixer/experiment.html\">\u4e00\u9879\u5728 CIFAR-10 \u4e0a\u8bad\u7ec3 ConvMixer \u7684\u5b9e\u9a8c</a>\u3002</p>\n"
 }
--- a/translate_cache/diffusion/ddpm/init.ja.json
+++ b/translate_cache/diffusion/ddpm/init.ja.json
--- a/translate_cache/diffusion/ddpm/init.si.json
+++ b/translate_cache/diffusion/ddpm/init.si.json
--- a/translate_cache/diffusion/ddpm/init.zh.json
+++ b/translate_cache/diffusion/ddpm/init.zh.json
--- a/translate_cache/diffusion/ddpm/readme.ja.json
+++ b/translate_cache/diffusion/ddpm/readme.ja.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/diffusion/ddpm/index.html\">Denoising Diffusion Probabilistic Models (DDPM)</a></h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/diffusion/ddpm/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://papers.labml.ai/paper/2006.11239\">Denoising Diffusion Probabilistic Models</a>.</p>\n<p>In simple terms, we get an image from data and add noise step by step. Then We train a model to predict that noise at each step and use the model to generate images.</p>\n<p>Here is the <a href=\"https://nn.labml.ai/diffusion/ddpm/unet.html\">UNet model</a> that predicts the noise and <a href=\"https://nn.labml.ai/diffusion/ddpm/experiment.html\">training code</a>. <a href=\"https://nn.labml.ai/diffusion/ddpm/evaluate.html\">This file</a> can generate samples and interpolations from a trained model. </p>\n": "<h1><a href=\"https://nn.labml.ai/diffusion/ddpm/index.html\">\u30ce\u30a4\u30ba\u9664\u53bb\u62e1\u6563\u78ba\u7387\u30e2\u30c7\u30eb (DDPM)</a></h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/diffusion/ddpm/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n<p><a href=\"https://papers.labml.ai/paper/2006.11239\">\u3053\u308c\u306f\u3001\u8ad6\u6587\u300c\u30ce\u30a4\u30ba\u9664\u53bb\u62e1\u6563\u78ba\u7387\u30e2\u30c7\u30eb\u300d<a href=\"https://pytorch.org\">\u306ePyTorch\u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3067\u3059</a>\u3002</a></p>\n<p>\u7c21\u5358\u306b\u8a00\u3046\u3068\u3001\u30c7\u30fc\u30bf\u304b\u3089\u753b\u50cf\u3092\u53d6\u5f97\u3057\u3001\u6bb5\u968e\u7684\u306b\u30ce\u30a4\u30ba\u3092\u8ffd\u52a0\u3057\u307e\u3059\u3002\u6b21\u306b\u3001\u30e2\u30c7\u30eb\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3057\u3066\u5404\u30b9\u30c6\u30c3\u30d7\u3067\u305d\u306e\u30ce\u30a4\u30ba\u3092\u4e88\u6e2c\u3057\u3001\u305d\u306e\u30e2\u30c7\u30eb\u3092\u4f7f\u7528\u3057\u3066\u753b\u50cf\u3092\u751f\u6210\u3057\u307e\u3059\u3002</p>\n<p><a href=\"https://nn.labml.ai/diffusion/ddpm/unet.html\"><a href=\"https://nn.labml.ai/diffusion/ddpm/experiment.html\">\u30ce\u30a4\u30ba\u3068\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30b3\u30fc\u30c9\u3092\u4e88\u6e2c\u3059\u308b</a> uNet \u30e2\u30c7\u30eb\u3092\u6b21\u306b\u793a\u3057\u307e\u3059</a>\u3002<a href=\"https://nn.labml.ai/diffusion/ddpm/evaluate.html\">\u3053\u306e\u30d5\u30a1\u30a4\u30eb\u3067\u306f</a>\u3001\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u6e08\u307f\u306e\u30e2\u30c7\u30eb\u304b\u3089\u30b5\u30f3\u30d7\u30eb\u3068\u88dc\u9593\u3092\u751f\u6210\u3067\u304d\u307e\u3059</p>\u3002\n",
+ "<h1><a href=\"https://nn.labml.ai/diffusion/ddpm/index.html\">Denoising Diffusion Probabilistic Models (DDPM)</a></h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/diffusion/ddpm/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://arxiv.org/abs/2006.11239\">Denoising Diffusion Probabilistic Models</a>.</p>\n<p>In simple terms, we get an image from data and add noise step by step. Then We train a model to predict that noise at each step and use the model to generate images.</p>\n<p>Here is the <a href=\"https://nn.labml.ai/diffusion/ddpm/unet.html\">UNet model</a> that predicts the noise and <a href=\"https://nn.labml.ai/diffusion/ddpm/experiment.html\">training code</a>. <a href=\"https://nn.labml.ai/diffusion/ddpm/evaluate.html\">This file</a> can generate samples and interpolations from a trained model. </p>\n": "<h1><a href=\"https://nn.labml.ai/diffusion/ddpm/index.html\">\u30ce\u30a4\u30ba\u9664\u53bb\u62e1\u6563\u78ba\u7387\u30e2\u30c7\u30eb (DDPM)</a></h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/diffusion/ddpm/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n<p><a href=\"https://arxiv.org/abs/2006.11239\">\u3053\u308c\u306f\u3001\u8ad6\u6587\u300c\u30ce\u30a4\u30ba\u9664\u53bb\u62e1\u6563\u78ba\u7387\u30e2\u30c7\u30eb\u300d<a href=\"https://pytorch.org\">\u306ePyTorch\u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3067\u3059</a>\u3002</a></p>\n<p>\u7c21\u5358\u306b\u8a00\u3046\u3068\u3001\u30c7\u30fc\u30bf\u304b\u3089\u753b\u50cf\u3092\u53d6\u5f97\u3057\u3001\u6bb5\u968e\u7684\u306b\u30ce\u30a4\u30ba\u3092\u8ffd\u52a0\u3057\u307e\u3059\u3002\u6b21\u306b\u3001\u30e2\u30c7\u30eb\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3057\u3066\u5404\u30b9\u30c6\u30c3\u30d7\u3067\u305d\u306e\u30ce\u30a4\u30ba\u3092\u4e88\u6e2c\u3057\u3001\u305d\u306e\u30e2\u30c7\u30eb\u3092\u4f7f\u7528\u3057\u3066\u753b\u50cf\u3092\u751f\u6210\u3057\u307e\u3059\u3002</p>\n<p><a href=\"https://nn.labml.ai/diffusion/ddpm/unet.html\"><a href=\"https://nn.labml.ai/diffusion/ddpm/experiment.html\">\u30ce\u30a4\u30ba\u3068\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30b3\u30fc\u30c9\u3092\u4e88\u6e2c\u3059\u308b</a> uNet \u30e2\u30c7\u30eb\u3092\u6b21\u306b\u793a\u3057\u307e\u3059</a>\u3002<a href=\"https://nn.labml.ai/diffusion/ddpm/evaluate.html\">\u3053\u306e\u30d5\u30a1\u30a4\u30eb\u3067\u306f</a>\u3001\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u6e08\u307f\u306e\u30e2\u30c7\u30eb\u304b\u3089\u30b5\u30f3\u30d7\u30eb\u3068\u88dc\u9593\u3092\u751f\u6210\u3067\u304d\u307e\u3059</p>\u3002\n",
 "Denoising Diffusion Probabilistic Models (DDPM)": "\u30ce\u30a4\u30ba\u9664\u53bb\u62e1\u6563\u78ba\u7387\u30e2\u30c7\u30eb (DDPM)"
 }
--- a/translate_cache/diffusion/ddpm/readme.si.json
+++ b/translate_cache/diffusion/ddpm/readme.si.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/diffusion/ddpm/index.html\">Denoising Diffusion Probabilistic Models (DDPM)</a></h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/diffusion/ddpm/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://papers.labml.ai/paper/2006.11239\">Denoising Diffusion Probabilistic Models</a>.</p>\n<p>In simple terms, we get an image from data and add noise step by step. Then We train a model to predict that noise at each step and use the model to generate images.</p>\n<p>Here is the <a href=\"https://nn.labml.ai/diffusion/ddpm/unet.html\">UNet model</a> that predicts the noise and <a href=\"https://nn.labml.ai/diffusion/ddpm/experiment.html\">training code</a>. <a href=\"https://nn.labml.ai/diffusion/ddpm/evaluate.html\">This file</a> can generate samples and interpolations from a trained model. </p>\n": "<h1><a href=\"https://nn.labml.ai/diffusion/ddpm/index.html\">\u0dc0\u0dd2\u0dc3\u0dbb\u0dab \u0dc3\u0db8\u0dca\u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf\u0dc0 \u0d86\u0d9a\u0dd8\u0dad\u0dd2 \u0db1\u0dd2\u0dbb\u0dd6\u0db4\u0dab\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 (DDPM)</a></h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/diffusion/ddpm/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://pytorch.org\">PyTorch</a> \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8/\u0db1\u0dd2\u0db6\u0db1\u0dca\u0db0\u0db1\u0dba\u0d9a\u0dd2 \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 <a href=\"https://papers.labml.ai/paper/2006.11239\">Denoising Diffusion Probilistic \u0d86\u0d9a\u0dd8\u0dad\u0dd2</a>.</p>\n<p>\u0dc3\u0dbb\u0dc5\u0dc0 \u0d9a\u0dd2\u0dc0\u0dc4\u0ddc\u0dad\u0dca, \u0d85\u0db4\u0dd2 \u0daf\u0dad\u0dca\u0dad \u0dc0\u0dbd\u0dd2\u0db1\u0dca \u0dbb\u0dd6\u0db4\u0dba\u0d9a\u0dca \u0dbd\u0db6\u0dcf\u0d9c\u0dd9\u0db1 \u0db4\u0dd2\u0dba\u0dc0\u0dbb\u0dd9\u0db1\u0dca \u0db4\u0dd2\u0dba\u0dc0\u0dbb \u0dc1\u0db6\u0dca\u0daf\u0dba \u0d91\u0d9a\u0dca \u0d9a\u0dbb\u0db8\u0dd4. \u0d89\u0db1\u0dca\u0db4\u0dc3\u0dd4 \u0d85\u0db4\u0dd2 \u0dc3\u0dd1\u0db8 \u0db4\u0dd2\u0dba\u0dc0\u0dbb\u0d9a\u0daf\u0dd3\u0db8 \u0d91\u0db8 \u0dc1\u0db6\u0dca\u0daf\u0dba \u0db4\u0dd4\u0dbb\u0ddd\u0d9a\u0dae\u0db1\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0da7 \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba\u0d9a\u0dca \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dbb \u0dbb\u0dd6\u0db4 \u0da2\u0db1\u0db1\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0da7 \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0db8\u0dd4.</p>\n<p>\u0dc1\u0db6\u0dca\u0daf\u0dba \u0dc3\u0dc4 <a href=\"https://nn.labml.ai/diffusion/ddpm/experiment.html\">\u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dda\u0dad\u0dba</a> \u0db4\u0dd4\u0dbb\u0ddd\u0d9a\u0dae\u0db1\u0dba \u0d9a\u0dbb\u0db1 <a href=\"https://nn.labml.ai/diffusion/ddpm/unet.html\">UNET \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba</a> \u0db8\u0dd9\u0db1\u0dca\u0db1. <a href=\"https://nn.labml.ai/diffusion/ddpm/evaluate.html\">\u0db8\u0dd9\u0db8 \u0d9c\u0ddc\u0db1\u0dd4\u0dc0\u0da7</a> \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba\u0d9a\u0dd2\u0db1\u0dca \u0dc3\u0dcf\u0db8\u0dca\u0db4\u0dbd \u0dc3\u0dc4 \u0d85\u0db1\u0dca\u0dad\u0dbb\u0dca\u0db1\u0dd2\u0dc0\u0dda\u0dc1\u0db1\u0dba\u0db1\u0dca \u0da2\u0db1\u0db1\u0dba \u0d9a\u0dc5 \u0dc4\u0dd0\u0d9a\u0dd2\u0dba.</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/diffusion/ddpm/index.html\">Denoising Diffusion Probabilistic Models (DDPM)</a></h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/diffusion/ddpm/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://arxiv.org/abs/2006.11239\">Denoising Diffusion Probabilistic Models</a>.</p>\n<p>In simple terms, we get an image from data and add noise step by step. Then We train a model to predict that noise at each step and use the model to generate images.</p>\n<p>Here is the <a href=\"https://nn.labml.ai/diffusion/ddpm/unet.html\">UNet model</a> that predicts the noise and <a href=\"https://nn.labml.ai/diffusion/ddpm/experiment.html\">training code</a>. <a href=\"https://nn.labml.ai/diffusion/ddpm/evaluate.html\">This file</a> can generate samples and interpolations from a trained model. </p>\n": "<h1><a href=\"https://nn.labml.ai/diffusion/ddpm/index.html\">\u0dc0\u0dd2\u0dc3\u0dbb\u0dab \u0dc3\u0db8\u0dca\u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf\u0dc0 \u0d86\u0d9a\u0dd8\u0dad\u0dd2 \u0db1\u0dd2\u0dbb\u0dd6\u0db4\u0dab\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 (DDPM)</a></h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/diffusion/ddpm/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://pytorch.org\">PyTorch</a> \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8/\u0db1\u0dd2\u0db6\u0db1\u0dca\u0db0\u0db1\u0dba\u0d9a\u0dd2 \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 <a href=\"https://arxiv.org/abs/2006.11239\">Denoising Diffusion Probilistic \u0d86\u0d9a\u0dd8\u0dad\u0dd2</a>.</p>\n<p>\u0dc3\u0dbb\u0dc5\u0dc0 \u0d9a\u0dd2\u0dc0\u0dc4\u0ddc\u0dad\u0dca, \u0d85\u0db4\u0dd2 \u0daf\u0dad\u0dca\u0dad \u0dc0\u0dbd\u0dd2\u0db1\u0dca \u0dbb\u0dd6\u0db4\u0dba\u0d9a\u0dca \u0dbd\u0db6\u0dcf\u0d9c\u0dd9\u0db1 \u0db4\u0dd2\u0dba\u0dc0\u0dbb\u0dd9\u0db1\u0dca \u0db4\u0dd2\u0dba\u0dc0\u0dbb \u0dc1\u0db6\u0dca\u0daf\u0dba \u0d91\u0d9a\u0dca \u0d9a\u0dbb\u0db8\u0dd4. \u0d89\u0db1\u0dca\u0db4\u0dc3\u0dd4 \u0d85\u0db4\u0dd2 \u0dc3\u0dd1\u0db8 \u0db4\u0dd2\u0dba\u0dc0\u0dbb\u0d9a\u0daf\u0dd3\u0db8 \u0d91\u0db8 \u0dc1\u0db6\u0dca\u0daf\u0dba \u0db4\u0dd4\u0dbb\u0ddd\u0d9a\u0dae\u0db1\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0da7 \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba\u0d9a\u0dca \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dbb \u0dbb\u0dd6\u0db4 \u0da2\u0db1\u0db1\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0da7 \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0db8\u0dd4.</p>\n<p>\u0dc1\u0db6\u0dca\u0daf\u0dba \u0dc3\u0dc4 <a href=\"https://nn.labml.ai/diffusion/ddpm/experiment.html\">\u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dda\u0dad\u0dba</a> \u0db4\u0dd4\u0dbb\u0ddd\u0d9a\u0dae\u0db1\u0dba \u0d9a\u0dbb\u0db1 <a href=\"https://nn.labml.ai/diffusion/ddpm/unet.html\">UNET \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba</a> \u0db8\u0dd9\u0db1\u0dca\u0db1. <a href=\"https://nn.labml.ai/diffusion/ddpm/evaluate.html\">\u0db8\u0dd9\u0db8 \u0d9c\u0ddc\u0db1\u0dd4\u0dc0\u0da7</a> \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba\u0d9a\u0dd2\u0db1\u0dca \u0dc3\u0dcf\u0db8\u0dca\u0db4\u0dbd \u0dc3\u0dc4 \u0d85\u0db1\u0dca\u0dad\u0dbb\u0dca\u0db1\u0dd2\u0dc0\u0dda\u0dc1\u0db1\u0dba\u0db1\u0dca \u0da2\u0db1\u0db1\u0dba \u0d9a\u0dc5 \u0dc4\u0dd0\u0d9a\u0dd2\u0dba.</p>\n",
 "Denoising Diffusion Probabilistic Models (DDPM)": "\u0dc0\u0dd2\u0dc3\u0dbb\u0dab \u0dc3\u0db8\u0dca\u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf\u0dc0 \u0d86\u0d9a\u0dd8\u0dad\u0dd2 \u0db1\u0dd2\u0dbb\u0dd6\u0db4\u0dab\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 (DDPM)"
 }
--- a/translate_cache/diffusion/ddpm/readme.zh.json
+++ b/translate_cache/diffusion/ddpm/readme.zh.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/diffusion/ddpm/index.html\">Denoising Diffusion Probabilistic Models (DDPM)</a></h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/diffusion/ddpm/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://papers.labml.ai/paper/2006.11239\">Denoising Diffusion Probabilistic Models</a>.</p>\n<p>In simple terms, we get an image from data and add noise step by step. Then We train a model to predict that noise at each step and use the model to generate images.</p>\n<p>Here is the <a href=\"https://nn.labml.ai/diffusion/ddpm/unet.html\">UNet model</a> that predicts the noise and <a href=\"https://nn.labml.ai/diffusion/ddpm/experiment.html\">training code</a>. <a href=\"https://nn.labml.ai/diffusion/ddpm/evaluate.html\">This file</a> can generate samples and interpolations from a trained model. </p>\n": "<h1><a href=\"https://nn.labml.ai/diffusion/ddpm/index.html\">\u53bb\u566a\u6269\u6563\u6982\u7387\u6a21\u578b (DDPM)</a></h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/diffusion/ddpm/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n<p>\u8fd9\u662f\u300a<a href=\"https://papers.labml.ai/paper/2006.11239\">\u53bb\u566a\u6269\u6563\u6982\u7387\u6a21\u578b</a>\u300b\u8bba\u6587\u7684 <a href=\"https://pytorch.org\">PyTorch</a> \u5b9e\u73b0/\u6559\u7a0b\u3002</p>\n<p>\u7b80\u800c\u8a00\u4e4b\uff0c\u6211\u4eec\u4ece\u6570\u636e\u4e2d\u83b7\u53d6\u56fe\u50cf\u5e76\u9010\u6b65\u6dfb\u52a0\u566a\u70b9\u3002\u7136\u540e\uff0c\u6211\u4eec\u8bad\u7ec3\u4e00\u4e2a\u6a21\u578b\u6765\u9884\u6d4b\u6bcf\u4e2a\u6b65\u9aa4\u7684\u566a\u58f0\uff0c\u5e76\u4f7f\u7528\u8be5\u6a21\u578b\u751f\u6210\u56fe\u50cf\u3002</p>\n<p>\u8fd9\u662f\u9884\u6d4b\u566a\u58f0\u548c<a href=\"https://nn.labml.ai/diffusion/ddpm/experiment.html\">\u8bad\u7ec3\u4ee3\u7801</a>\u7684 <a href=\"https://nn.labml.ai/diffusion/ddpm/unet.html\">UNet \u6a21\u578b</a>\u3002<a href=\"https://nn.labml.ai/diffusion/ddpm/evaluate.html\">\u6b64\u6587\u4ef6</a>\u53ef\u4ee5\u4ece\u7ecf\u8fc7\u8bad\u7ec3\u7684\u6a21\u578b\u751f\u6210\u6837\u672c\u548c\u63d2\u503c\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/diffusion/ddpm/index.html\">Denoising Diffusion Probabilistic Models (DDPM)</a></h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/diffusion/ddpm/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://arxiv.org/abs/2006.11239\">Denoising Diffusion Probabilistic Models</a>.</p>\n<p>In simple terms, we get an image from data and add noise step by step. Then We train a model to predict that noise at each step and use the model to generate images.</p>\n<p>Here is the <a href=\"https://nn.labml.ai/diffusion/ddpm/unet.html\">UNet model</a> that predicts the noise and <a href=\"https://nn.labml.ai/diffusion/ddpm/experiment.html\">training code</a>. <a href=\"https://nn.labml.ai/diffusion/ddpm/evaluate.html\">This file</a> can generate samples and interpolations from a trained model. </p>\n": "<h1><a href=\"https://nn.labml.ai/diffusion/ddpm/index.html\">\u53bb\u566a\u6269\u6563\u6982\u7387\u6a21\u578b (DDPM)</a></h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/diffusion/ddpm/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n<p>\u8fd9\u662f\u300a<a href=\"https://arxiv.org/abs/2006.11239\">\u53bb\u566a\u6269\u6563\u6982\u7387\u6a21\u578b</a>\u300b\u8bba\u6587\u7684 <a href=\"https://pytorch.org\">PyTorch</a> \u5b9e\u73b0/\u6559\u7a0b\u3002</p>\n<p>\u7b80\u800c\u8a00\u4e4b\uff0c\u6211\u4eec\u4ece\u6570\u636e\u4e2d\u83b7\u53d6\u56fe\u50cf\u5e76\u9010\u6b65\u6dfb\u52a0\u566a\u70b9\u3002\u7136\u540e\uff0c\u6211\u4eec\u8bad\u7ec3\u4e00\u4e2a\u6a21\u578b\u6765\u9884\u6d4b\u6bcf\u4e2a\u6b65\u9aa4\u7684\u566a\u58f0\uff0c\u5e76\u4f7f\u7528\u8be5\u6a21\u578b\u751f\u6210\u56fe\u50cf\u3002</p>\n<p>\u8fd9\u662f\u9884\u6d4b\u566a\u58f0\u548c<a href=\"https://nn.labml.ai/diffusion/ddpm/experiment.html\">\u8bad\u7ec3\u4ee3\u7801</a>\u7684 <a href=\"https://nn.labml.ai/diffusion/ddpm/unet.html\">UNet \u6a21\u578b</a>\u3002<a href=\"https://nn.labml.ai/diffusion/ddpm/evaluate.html\">\u6b64\u6587\u4ef6</a>\u53ef\u4ee5\u4ece\u7ecf\u8fc7\u8bad\u7ec3\u7684\u6a21\u578b\u751f\u6210\u6837\u672c\u548c\u63d2\u503c\u3002</p>\n",
 "Denoising Diffusion Probabilistic Models (DDPM)": "\u53bb\u566a\u6269\u6563\u6982\u7387\u6a21\u578b (DDPM)"
 }
--- a/translate_cache/diffusion/stable_diffusion/latent_diffusion.ja.json
+++ b/translate_cache/diffusion/stable_diffusion/latent_diffusion.ja.json
@ -1,5 +1,5 @@
 {
- "<h1>Latent Diffusion Models</h1>\n<p>Latent diffusion models use an auto-encoder to map between image space and latent space. The diffusion model works on the latent space, which makes it a lot easier to train. It is based on paper <a href=\"https://papers.labml.ai/paper/2112.10752\">High-Resolution Image Synthesis with Latent Diffusion Models</a>.</p>\n<p>They use a pre-trained auto-encoder and train the diffusion U-Net on the latent space of the pre-trained auto-encoder.</p>\n<p>For a simpler diffusion implementation refer to our <a href=\"../ddpm/index.html\">DDPM implementation</a>. We use same notations for <span translate=no>_^_0_^_</span>, <span translate=no>_^_1_^_</span> schedules, etc.</p>\n": "<h1>\u6f5c\u5728\u62e1\u6563\u30e2\u30c7\u30eb</h1>\n<p>\u6f5c\u5728\u62e1\u6563\u30e2\u30c7\u30eb\u3067\u306f\u3001\u30aa\u30fc\u30c8\u30a8\u30f3\u30b3\u30fc\u30c0\u30fc\u3092\u4f7f\u7528\u3057\u3066\u753b\u50cf\u7a7a\u9593\u3068\u6f5c\u5728\u7a7a\u9593\u3092\u30de\u30c3\u30d4\u30f3\u30b0\u3057\u307e\u3059\u3002\u62e1\u6563\u30e2\u30c7\u30eb\u306f\u6f5c\u5728\u7a7a\u9593\u3067\u6a5f\u80fd\u3059\u308b\u305f\u3081\u3001\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u304c\u306f\u308b\u304b\u306b\u7c21\u5358\u306b\u306a\u308a\u307e\u3059\u3002\u3053\u308c\u306f\u3001<a href=\"https://papers.labml.ai/paper/2112.10752\">\u6f5c\u5728\u62e1\u6563\u30e2\u30c7\u30eb\u3092\u7528\u3044\u305f\u8ad6\u6587\u306e\u9ad8\u89e3\u50cf\u5ea6\u753b\u50cf\u5408\u6210\u306b\u57fa\u3065\u3044\u3066\u3044\u307e\u3059</a></p>\u3002\n<p>\u4e8b\u524d\u306b\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3055\u308c\u305f\u30aa\u30fc\u30c8\u30a8\u30f3\u30b3\u30fc\u30c0\u30fc\u3092\u4f7f\u7528\u3057\u3001\u4e8b\u524d\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u6e08\u307f\u306e\u30aa\u30fc\u30c8\u30a8\u30f3\u30b3\u30fc\u30c0\u30fc\u306e\u6f5c\u5728\u7a7a\u9593\u3067\u62e1\u6563 U-Net \u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3057\u307e\u3059\u3002</p>\n<p><a href=\"../ddpm/index.html\">\u3088\u308a\u5358\u7d14\u306a\u62e1\u6563\u5b9f\u88c5\u306b\u3064\u3044\u3066\u306f\u3001DDPM \u5b9f\u88c5\u3092\u53c2\u7167\u3057\u3066\u304f\u3060\u3055\u3044\u3002</a><span translate=no>_^_1_^_</span>\u30b9\u30b1\u30b8\u30e5\u30fc\u30eb\u306a\u3069\u306b\u3082\u540c\u3058\u8868\u8a18\u3092\u4f7f\u3044\u307e\u3059</p>\u3002<span translate=no>_^_0_^_</span>\n",
+ "<h1>Latent Diffusion Models</h1>\n<p>Latent diffusion models use an auto-encoder to map between image space and latent space. The diffusion model works on the latent space, which makes it a lot easier to train. It is based on paper <a href=\"https://arxiv.org/abs/2112.10752\">High-Resolution Image Synthesis with Latent Diffusion Models</a>.</p>\n<p>They use a pre-trained auto-encoder and train the diffusion U-Net on the latent space of the pre-trained auto-encoder.</p>\n<p>For a simpler diffusion implementation refer to our <a href=\"../ddpm/index.html\">DDPM implementation</a>. We use same notations for <span translate=no>_^_0_^_</span>, <span translate=no>_^_1_^_</span> schedules, etc.</p>\n": "<h1>\u6f5c\u5728\u62e1\u6563\u30e2\u30c7\u30eb</h1>\n<p>\u6f5c\u5728\u62e1\u6563\u30e2\u30c7\u30eb\u3067\u306f\u3001\u30aa\u30fc\u30c8\u30a8\u30f3\u30b3\u30fc\u30c0\u30fc\u3092\u4f7f\u7528\u3057\u3066\u753b\u50cf\u7a7a\u9593\u3068\u6f5c\u5728\u7a7a\u9593\u3092\u30de\u30c3\u30d4\u30f3\u30b0\u3057\u307e\u3059\u3002\u62e1\u6563\u30e2\u30c7\u30eb\u306f\u6f5c\u5728\u7a7a\u9593\u3067\u6a5f\u80fd\u3059\u308b\u305f\u3081\u3001\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u304c\u306f\u308b\u304b\u306b\u7c21\u5358\u306b\u306a\u308a\u307e\u3059\u3002\u3053\u308c\u306f\u3001<a href=\"https://arxiv.org/abs/2112.10752\">\u6f5c\u5728\u62e1\u6563\u30e2\u30c7\u30eb\u3092\u7528\u3044\u305f\u8ad6\u6587\u306e\u9ad8\u89e3\u50cf\u5ea6\u753b\u50cf\u5408\u6210\u306b\u57fa\u3065\u3044\u3066\u3044\u307e\u3059</a></p>\u3002\n<p>\u4e8b\u524d\u306b\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3055\u308c\u305f\u30aa\u30fc\u30c8\u30a8\u30f3\u30b3\u30fc\u30c0\u30fc\u3092\u4f7f\u7528\u3057\u3001\u4e8b\u524d\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u6e08\u307f\u306e\u30aa\u30fc\u30c8\u30a8\u30f3\u30b3\u30fc\u30c0\u30fc\u306e\u6f5c\u5728\u7a7a\u9593\u3067\u62e1\u6563 U-Net \u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3057\u307e\u3059\u3002</p>\n<p><a href=\"../ddpm/index.html\">\u3088\u308a\u5358\u7d14\u306a\u62e1\u6563\u5b9f\u88c5\u306b\u3064\u3044\u3066\u306f\u3001DDPM \u5b9f\u88c5\u3092\u53c2\u7167\u3057\u3066\u304f\u3060\u3055\u3044\u3002</a><span translate=no>_^_1_^_</span>\u30b9\u30b1\u30b8\u30e5\u30fc\u30eb\u306a\u3069\u306b\u3082\u540c\u3058\u8868\u8a18\u3092\u4f7f\u3044\u307e\u3059</p>\u3002<span translate=no>_^_0_^_</span>\n",
 "<h2>Latent diffusion model</h2>\n<p>This contains following components:</p>\n<ul><li><a href=\"model/autoencoder.html\">AutoEncoder</a> </li>\n<li><a href=\"model/unet.html\">U-Net</a> with <a href=\"model/unet_attention.html\">attention</a> </li>\n<li><a href=\"model/clip_embedder.html\">CLIP embeddings generator</a></li></ul>\n": "<h2>\u6f5c\u4f0f\u62e1\u6563\u30e2\u30c7\u30eb</h2>\n<p>\u3053\u308c\u306b\u306f\u4ee5\u4e0b\u306e\u30b3\u30f3\u30dd\u30fc\u30cd\u30f3\u30c8\u304c\u542b\u307e\u308c\u307e\u3059\u3002</p>\n<ul><li><a href=\"model/autoencoder.html\">\u30aa\u30fc\u30c8\u30a8\u30f3\u30b3\u30fc\u30c0</a></li>\n<li><a href=\"model/unet.html\"><a href=\"model/unet_attention.html\">\u6ce8\u610f\u3092\u5411\u3051\u305fU-Net</a></a></li>\n<li><a href=\"model/clip_embedder.html\">CLIP \u57cb\u3081\u8fbc\u307f\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc</a></li></ul>\n",
 "<h3>Get <a href=\"model/clip_embedder.html\">CLIP embeddings</a> for a list of text prompts</h3>\n": "<h3>\u30c6\u30ad\u30b9\u30c8\u30d7\u30ed\u30f3\u30d7\u30c8\u306e\u30ea\u30b9\u30c8\u306e <a href=\"model/clip_embedder.html\">CLIP \u57cb\u3081\u8fbc\u307f\u3092\u53d6\u5f97\u3059\u308b</a></h3>\n",
 "<h3>Get image from the latent representation</h3>\n<p>We scale down by the scaling factor and then decode.</p>\n": "<h3>\u6f5c\u5728\u8868\u73fe\u304b\u3089\u753b\u50cf\u3092\u53d6\u5f97</h3>\n<p>\u30b9\u30b1\u30fc\u30ea\u30f3\u30b0\u4fc2\u6570\u3067\u30b9\u30b1\u30fc\u30eb\u30c0\u30a6\u30f3\u3057\u3066\u304b\u3089\u30c7\u30b3\u30fc\u30c9\u3057\u307e\u3059\u3002</p>\n",
--- a/translate_cache/diffusion/stable_diffusion/latent_diffusion.si.json
+++ b/translate_cache/diffusion/stable_diffusion/latent_diffusion.si.json
@ -1,5 +1,5 @@
 {
- "<h1>Latent Diffusion Models</h1>\n<p>Latent diffusion models use an auto-encoder to map between image space and latent space. The diffusion model works on the latent space, which makes it a lot easier to train. It is based on paper <a href=\"https://papers.labml.ai/paper/2112.10752\">High-Resolution Image Synthesis with Latent Diffusion Models</a>.</p>\n<p>They use a pre-trained auto-encoder and train the diffusion U-Net on the latent space of the pre-trained auto-encoder.</p>\n<p>For a simpler diffusion implementation refer to our <a href=\"../ddpm/index.html\">DDPM implementation</a>. We use same notations for <span translate=no>_^_0_^_</span>, <span translate=no>_^_1_^_</span> schedules, etc.</p>\n": "<h1>\u0d9c\u0dd4\u0db4\u0dca\u0dad \u0dc0\u0dd2\u0dc3\u0dbb\u0dab \u0d86\u0d9a\u0dd8\u0dad\u0dd2</h1>\n<p>\u0d9c\u0dd4\u0db4\u0dca\u0dad \u0dc0\u0dd2\u0dc3\u0dbb\u0dab\u0dba \u0d86\u0d9a\u0dd8\u0dad\u0dd2 \u0dbb\u0dd6\u0db4 \u0d85\u0dc0\u0d9a\u0dcf\u0dc1\u0dba \u0dc3\u0dc4 \u0d9c\u0dd4\u0db4\u0dca\u0dad \u0d85\u0dc0\u0d9a\u0dcf\u0dc1\u0dba \u0d85\u0dad\u0dbb \u0dc3\u0dd2\u0dad\u0dd2\u0dba\u0db8\u0dca \u0d9c\u0dad \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0dc3\u0dca\u0dc0\u0dba\u0d82\u0d9a\u0dca\u0dbb\u0dd3\u0dba \u0d91\u0db1\u0dca\u0d9a\u0ddd\u0da9\u0dbb\u0dba\u0d9a\u0dca \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0dba\u0dd2. \u0dc0\u0dd2\u0dc3\u0dbb\u0dab \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba \u0dc3\u0dd0\u0dc4\u0dd0\u0dbd\u0dca\u0dbd\u0dd4 \u0d85\u0dc0\u0d9a\u0dcf\u0dc1\u0dba \u0db8\u0dad \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf \u0d9a\u0dbb\u0dba\u0dd2, \u0d91\u0dba \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0db4\u0dc4\u0dc3\u0dd4 \u0d9a\u0dbb\u0dba\u0dd2. \u0d91\u0dba \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd3 \u0d87\u0dad\u0dca\u0dad\u0dda <a href=\"https://papers.labml.ai/paper/2112.10752\">\u0d9c\u0dd4\u0db4\u0dca\u0dad \u0dc0\u0dd2\u0dc3\u0dbb\u0dab \u0d86\u0d9a\u0dd8\u0dad\u0dd2 \u0dc3\u0db8\u0d9f \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0d85\u0db0\u0dd2-\u0dc0\u0dd2\u0db7\u0dda\u0daf\u0db1 \u0dbb\u0dd6\u0db4 \u0dc3\u0d82\u0dc1\u0dca\u0dbd\u0dda\u0dc2\u0dab\u0dba</a> \u0db8\u0dad \u0dba.</p>\n<p>\u0d94\u0dc0\u0dd4\u0db1\u0dca \u0db4\u0dd9\u0dbb \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dbb\u0db1 \u0dbd\u0daf \u0dc3\u0dca\u0dc0\u0dba\u0d82\u0d9a\u0dca\u0dbb\u0dd3\u0dba \u0d91\u0db1\u0dca\u0d9a\u0ddd\u0da9\u0dbb\u0dba\u0d9a\u0dca \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0db1 \u0d85\u0dad\u0dbb \u0db4\u0dd9\u0dbb \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dbb\u0db1 \u0dbd\u0daf \u0dc3\u0dca\u0dc0\u0dba\u0d82\u0d9a\u0dca\u0dbb\u0dd3\u0dba \u0d91\u0db1\u0dca\u0d9a\u0ddd\u0da9\u0dbb\u0dba\u0dda \u0d9c\u0dd4\u0db4\u0dca\u0dad \u0d85\u0dc0\u0d9a\u0dcf\u0dc1\u0dba \u0db8\u0dad \u0dc0\u0dd2\u0dc3\u0dbb\u0dab\u0dba \u0dba\u0dd6-\u0db1\u0dd9\u0da7\u0dca \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dbb\u0dba\u0dd2.</p>\n<p>\u0dc3\u0dbb\u0dbd \u0dc0\u0dd2\u0dc3\u0dbb\u0dab\u0dba \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0d85\u0db4\u0d9c\u0dda <a href=\"../ddpm/index.html\">DDPM \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8</a> \u0dc0\u0dd9\u0dad \u0dba\u0ddc\u0db8\u0dd4 \u0dc0\u0db1\u0dca\u0db1. <span translate=no>_^_1_^_</span>\u0d9a\u0dcf\u0dbd\u0dc3\u0da7\u0dc4\u0db1\u0dca<span translate=no>_^_0_^_</span> \u0d86\u0daf\u0dd2\u0dba \u0dc3\u0db3\u0dc4\u0dcf \u0d85\u0db4\u0dd2 \u0d91\u0d9a\u0db8 \u0d85\u0d82\u0d9a\u0db1 \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0db8\u0dd4.</p>\n",
+ "<h1>Latent Diffusion Models</h1>\n<p>Latent diffusion models use an auto-encoder to map between image space and latent space. The diffusion model works on the latent space, which makes it a lot easier to train. It is based on paper <a href=\"https://arxiv.org/abs/2112.10752\">High-Resolution Image Synthesis with Latent Diffusion Models</a>.</p>\n<p>They use a pre-trained auto-encoder and train the diffusion U-Net on the latent space of the pre-trained auto-encoder.</p>\n<p>For a simpler diffusion implementation refer to our <a href=\"../ddpm/index.html\">DDPM implementation</a>. We use same notations for <span translate=no>_^_0_^_</span>, <span translate=no>_^_1_^_</span> schedules, etc.</p>\n": "<h1>\u0d9c\u0dd4\u0db4\u0dca\u0dad \u0dc0\u0dd2\u0dc3\u0dbb\u0dab \u0d86\u0d9a\u0dd8\u0dad\u0dd2</h1>\n<p>\u0d9c\u0dd4\u0db4\u0dca\u0dad \u0dc0\u0dd2\u0dc3\u0dbb\u0dab\u0dba \u0d86\u0d9a\u0dd8\u0dad\u0dd2 \u0dbb\u0dd6\u0db4 \u0d85\u0dc0\u0d9a\u0dcf\u0dc1\u0dba \u0dc3\u0dc4 \u0d9c\u0dd4\u0db4\u0dca\u0dad \u0d85\u0dc0\u0d9a\u0dcf\u0dc1\u0dba \u0d85\u0dad\u0dbb \u0dc3\u0dd2\u0dad\u0dd2\u0dba\u0db8\u0dca \u0d9c\u0dad \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0dc3\u0dca\u0dc0\u0dba\u0d82\u0d9a\u0dca\u0dbb\u0dd3\u0dba \u0d91\u0db1\u0dca\u0d9a\u0ddd\u0da9\u0dbb\u0dba\u0d9a\u0dca \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0dba\u0dd2. \u0dc0\u0dd2\u0dc3\u0dbb\u0dab \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba \u0dc3\u0dd0\u0dc4\u0dd0\u0dbd\u0dca\u0dbd\u0dd4 \u0d85\u0dc0\u0d9a\u0dcf\u0dc1\u0dba \u0db8\u0dad \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf \u0d9a\u0dbb\u0dba\u0dd2, \u0d91\u0dba \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0db4\u0dc4\u0dc3\u0dd4 \u0d9a\u0dbb\u0dba\u0dd2. \u0d91\u0dba \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd3 \u0d87\u0dad\u0dca\u0dad\u0dda <a href=\"https://arxiv.org/abs/2112.10752\">\u0d9c\u0dd4\u0db4\u0dca\u0dad \u0dc0\u0dd2\u0dc3\u0dbb\u0dab \u0d86\u0d9a\u0dd8\u0dad\u0dd2 \u0dc3\u0db8\u0d9f \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0d85\u0db0\u0dd2-\u0dc0\u0dd2\u0db7\u0dda\u0daf\u0db1 \u0dbb\u0dd6\u0db4 \u0dc3\u0d82\u0dc1\u0dca\u0dbd\u0dda\u0dc2\u0dab\u0dba</a> \u0db8\u0dad \u0dba.</p>\n<p>\u0d94\u0dc0\u0dd4\u0db1\u0dca \u0db4\u0dd9\u0dbb \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dbb\u0db1 \u0dbd\u0daf \u0dc3\u0dca\u0dc0\u0dba\u0d82\u0d9a\u0dca\u0dbb\u0dd3\u0dba \u0d91\u0db1\u0dca\u0d9a\u0ddd\u0da9\u0dbb\u0dba\u0d9a\u0dca \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0db1 \u0d85\u0dad\u0dbb \u0db4\u0dd9\u0dbb \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dbb\u0db1 \u0dbd\u0daf \u0dc3\u0dca\u0dc0\u0dba\u0d82\u0d9a\u0dca\u0dbb\u0dd3\u0dba \u0d91\u0db1\u0dca\u0d9a\u0ddd\u0da9\u0dbb\u0dba\u0dda \u0d9c\u0dd4\u0db4\u0dca\u0dad \u0d85\u0dc0\u0d9a\u0dcf\u0dc1\u0dba \u0db8\u0dad \u0dc0\u0dd2\u0dc3\u0dbb\u0dab\u0dba \u0dba\u0dd6-\u0db1\u0dd9\u0da7\u0dca \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dbb\u0dba\u0dd2.</p>\n<p>\u0dc3\u0dbb\u0dbd \u0dc0\u0dd2\u0dc3\u0dbb\u0dab\u0dba \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0d85\u0db4\u0d9c\u0dda <a href=\"../ddpm/index.html\">DDPM \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8</a> \u0dc0\u0dd9\u0dad \u0dba\u0ddc\u0db8\u0dd4 \u0dc0\u0db1\u0dca\u0db1. <span translate=no>_^_1_^_</span>\u0d9a\u0dcf\u0dbd\u0dc3\u0da7\u0dc4\u0db1\u0dca<span translate=no>_^_0_^_</span> \u0d86\u0daf\u0dd2\u0dba \u0dc3\u0db3\u0dc4\u0dcf \u0d85\u0db4\u0dd2 \u0d91\u0d9a\u0db8 \u0d85\u0d82\u0d9a\u0db1 \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0db8\u0dd4.</p>\n",
 "<h2>Latent diffusion model</h2>\n<p>This contains following components:</p>\n<ul><li><a href=\"model/autoencoder.html\">AutoEncoder</a> </li>\n<li><a href=\"model/unet.html\">U-Net</a> with <a href=\"model/unet_attention.html\">attention</a> </li>\n<li><a href=\"model/clip_embedder.html\">CLIP embeddings generator</a></li></ul>\n": "<h2>\u0d9c\u0dd4\u0db4\u0dca\u0dad \u0dc0\u0dd2\u0dc3\u0dbb\u0dab\u0dba \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba</h2>\n<p>\u0db4\u0dc4\u0dad \u0dc3\u0db3\u0dc4\u0db1\u0dca \u0dc3\u0d82\u0dbb\u0da0\u0d9a \u0d85\u0da9\u0d82\u0d9c\u0dd4 \u0dc0\u0dda:</p>\n<ul><li><a href=\"model/autoencoder.html\">\u0dc3\u0dca\u0dc0\u0dba\u0d82 \u0d86\u0d9a\u0dda\u0dad\u0d9a\u0dba</a></li>\n<li><a href=\"model/unet.html\">U-Net</a> <a href=\"model/unet_attention.html\">\u0d85\u0dc0\u0db0\u0dcf\u0db1\u0dba\u0dd9\u0db1\u0dca</a></li>\n<li><a href=\"model/clip_embedder.html\">CLIP \u0d9a\u0dcf\u0dc0\u0dd0\u0daf\u0dca\u0daf\u0dd3\u0db8\u0dca \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0dba\u0db1\u0dca\u0dad\u0dca\u0dbb\u0dba</a></li></ul>\n",
 "<h3>Get <a href=\"model/clip_embedder.html\">CLIP embeddings</a> for a list of text prompts</h3>\n": "<h3>\u0db4\u0dd9\u0dc5 \u0dc0\u0dd2\u0db8\u0dc3\u0dd4\u0db8\u0dca \u0dbd\u0dd0\u0dba\u0dd2\u0dc3\u0dca\u0dad\u0dd4\u0dc0\u0d9a\u0dca <a href=\"model/clip_embedder.html\">\u0dc3\u0db3\u0dc4\u0dcf CLIP \u0d9a\u0dcf\u0dc0\u0dd0\u0daf\u0dca\u0daf\u0dd3\u0db8\u0dca</a> \u0dbd\u0db6\u0dcf \u0d9c\u0db1\u0dca\u0db1</h3>\n",
 "<h3>Get image from the latent representation</h3>\n<p>We scale down by the scaling factor and then decode.</p>\n": "<h3>\u0d9c\u0dd4\u0db4\u0dca\u0dad \u0db1\u0dd2\u0dbb\u0dd6\u0db4\u0dab\u0dba\u0dd9\u0db1\u0dca \u0dbb\u0dd6\u0db4\u0dba \u0dbd\u0db6\u0dcf \u0d9c\u0db1\u0dca\u0db1</h3>\n<p>\u0d85\u0db4\u0dd2 \u0db4\u0dbb\u0dd2\u0db8\u0dcf\u0dab \u0dc3\u0dcf\u0db0\u0d9a\u0dba \u0d85\u0db1\u0dd4\u0dc0 \u0db4\u0dbb\u0dd2\u0db8\u0dcf\u0dab\u0dba \u0d9a\u0dbb \u0dc0\u0dd2\u0d9a\u0dda\u0dad\u0db1\u0dba \u0d9a\u0dbb\u0db8\u0dd4.</p>\n",
--- a/translate_cache/diffusion/stable_diffusion/latent_diffusion.zh.json
+++ b/translate_cache/diffusion/stable_diffusion/latent_diffusion.zh.json
@ -1,5 +1,5 @@
 {
- "<h1>Latent Diffusion Models</h1>\n<p>Latent diffusion models use an auto-encoder to map between image space and latent space. The diffusion model works on the latent space, which makes it a lot easier to train. It is based on paper <a href=\"https://papers.labml.ai/paper/2112.10752\">High-Resolution Image Synthesis with Latent Diffusion Models</a>.</p>\n<p>They use a pre-trained auto-encoder and train the diffusion U-Net on the latent space of the pre-trained auto-encoder.</p>\n<p>For a simpler diffusion implementation refer to our <a href=\"../ddpm/index.html\">DDPM implementation</a>. We use same notations for <span translate=no>_^_0_^_</span>, <span translate=no>_^_1_^_</span> schedules, etc.</p>\n": "<h1>\u6f5c\u5728\u6269\u6563\u6a21\u578b</h1>\n<p>\u6f5c\u5728\u6269\u6563\u6a21\u578b\u4f7f\u7528\u81ea\u52a8\u7f16\u7801\u5668\u5728\u56fe\u50cf\u7a7a\u95f4\u548c\u6f5c\u5728\u7a7a\u95f4\u4e4b\u95f4\u8fdb\u884c\u6620\u5c04\u3002\u6269\u6563\u6a21\u578b\u9002\u7528\u4e8e\u6f5c\u5728\u7a7a\u95f4\uff0c\u8fd9\u4f7f\u5f97\u8bad\u7ec3\u53d8\u5f97\u5bb9\u6613\u5f97\u591a\u3002\u5b83\u57fa\u4e8e<a href=\"https://papers.labml.ai/paper/2112.10752\">\u5e26\u6709\u6f5c\u5728\u6269\u6563\u6a21\u578b\u7684\u7eb8\u8d28\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u5408\u6210</a>\u3002</p>\n<p>\u5b83\u4eec\u4f7f\u7528\u9884\u8bad\u7ec3\u7684\u81ea\u52a8\u7f16\u7801\u5668\uff0c\u5728\u9884\u8bad\u7ec3\u7684\u81ea\u52a8\u7f16\u7801\u5668\u7684\u6f5c\u5728\u7a7a\u95f4\u4e0a\u8bad\u7ec3\u6269\u6563 U-Net\u3002</p>\n<p>\u6709\u5173\u66f4\u7b80\u5355\u7684\u6269\u6563\u5b9e\u73b0\uff0c\u8bf7\u53c2\u9605\u6211\u4eec\u7684 <a href=\"../ddpm/index.html\">DDPM \u5b9e\u73b0</a>\u3002\u6211\u4eec\u5bf9<span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span>\u65f6\u95f4\u8868\u7b49\u4f7f\u7528\u76f8\u540c\u7684\u7b26\u53f7\u3002</p>\n",
+ "<h1>Latent Diffusion Models</h1>\n<p>Latent diffusion models use an auto-encoder to map between image space and latent space. The diffusion model works on the latent space, which makes it a lot easier to train. It is based on paper <a href=\"https://arxiv.org/abs/2112.10752\">High-Resolution Image Synthesis with Latent Diffusion Models</a>.</p>\n<p>They use a pre-trained auto-encoder and train the diffusion U-Net on the latent space of the pre-trained auto-encoder.</p>\n<p>For a simpler diffusion implementation refer to our <a href=\"../ddpm/index.html\">DDPM implementation</a>. We use same notations for <span translate=no>_^_0_^_</span>, <span translate=no>_^_1_^_</span> schedules, etc.</p>\n": "<h1>\u6f5c\u5728\u6269\u6563\u6a21\u578b</h1>\n<p>\u6f5c\u5728\u6269\u6563\u6a21\u578b\u4f7f\u7528\u81ea\u52a8\u7f16\u7801\u5668\u5728\u56fe\u50cf\u7a7a\u95f4\u548c\u6f5c\u5728\u7a7a\u95f4\u4e4b\u95f4\u8fdb\u884c\u6620\u5c04\u3002\u6269\u6563\u6a21\u578b\u9002\u7528\u4e8e\u6f5c\u5728\u7a7a\u95f4\uff0c\u8fd9\u4f7f\u5f97\u8bad\u7ec3\u53d8\u5f97\u5bb9\u6613\u5f97\u591a\u3002\u5b83\u57fa\u4e8e<a href=\"https://arxiv.org/abs/2112.10752\">\u5e26\u6709\u6f5c\u5728\u6269\u6563\u6a21\u578b\u7684\u7eb8\u8d28\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u5408\u6210</a>\u3002</p>\n<p>\u5b83\u4eec\u4f7f\u7528\u9884\u8bad\u7ec3\u7684\u81ea\u52a8\u7f16\u7801\u5668\uff0c\u5728\u9884\u8bad\u7ec3\u7684\u81ea\u52a8\u7f16\u7801\u5668\u7684\u6f5c\u5728\u7a7a\u95f4\u4e0a\u8bad\u7ec3\u6269\u6563 U-Net\u3002</p>\n<p>\u6709\u5173\u66f4\u7b80\u5355\u7684\u6269\u6563\u5b9e\u73b0\uff0c\u8bf7\u53c2\u9605\u6211\u4eec\u7684 <a href=\"../ddpm/index.html\">DDPM \u5b9e\u73b0</a>\u3002\u6211\u4eec\u5bf9<span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span>\u65f6\u95f4\u8868\u7b49\u4f7f\u7528\u76f8\u540c\u7684\u7b26\u53f7\u3002</p>\n",
 "<h2>Latent diffusion model</h2>\n<p>This contains following components:</p>\n<ul><li><a href=\"model/autoencoder.html\">AutoEncoder</a> </li>\n<li><a href=\"model/unet.html\">U-Net</a> with <a href=\"model/unet_attention.html\">attention</a> </li>\n<li><a href=\"model/clip_embedder.html\">CLIP embeddings generator</a></li></ul>\n": "<h2>\u6f5c\u5728\u6269\u6563\u6a21\u578b</h2>\n<p>\u5b83\u5305\u542b\u4ee5\u4e0b\u7ec4\u4ef6\uff1a</p>\n<ul><li><a href=\"model/autoencoder.html\">\u81ea\u52a8\u7f16\u7801\u5668</a></li>\n<li><a href=\"model/unet_attention.html\">\u5907\u53d7\u5173\u6ce8</a>\u7684 <a href=\"model/unet.html\">U-Net</a></li>\n<li><a href=\"model/clip_embedder.html\">CLIP \u5d4c\u5165\u5f0f\u751f\u6210\u5668</a></li></ul>\n",
 "<h3>Get <a href=\"model/clip_embedder.html\">CLIP embeddings</a> for a list of text prompts</h3>\n": "<h3>\u83b7\u53d6 <a href=\"model/clip_embedder.html\">CLIP \u5d4c\u5165</a>\u4ee5\u83b7\u53d6\u6587\u672c\u63d0\u793a\u5217\u8868</h3>\n",
 "<h3>Get image from the latent representation</h3>\n<p>We scale down by the scaling factor and then decode.</p>\n": "<h3>\u4ece\u6f5c\u5728\u8868\u793a\u4e2d\u83b7\u53d6\u56fe\u50cf</h3>\n<p>\u6211\u4eec\u6309\u7f29\u653e\u7cfb\u6570\u5411\u4e0b\u7f29\u653e\uff0c\u7136\u540e\u89e3\u7801\u3002</p>\n",
--- a/translate_cache/diffusion/stable_diffusion/sampler/ddim.ja.json
+++ b/translate_cache/diffusion/stable_diffusion/sampler/ddim.ja.json
@ -1,5 +1,5 @@
 {
- "<h1>Denoising Diffusion Implicit Models (DDIM) Sampling</h1>\n<p>This implements DDIM sampling from the paper <a href=\"https://papers.labml.ai/paper/2010.02502\">Denoising Diffusion Implicit Models</a></p>\n": "<h1>\u30ce\u30a4\u30ba\u9664\u53bb\u62e1\u6563\u6697\u9ed9\u30e2\u30c7\u30eb (DDIM) \u30b5\u30f3\u30d7\u30ea\u30b7\u30c3\u30c8\u30b5\u30f3\u30d7\u30ea\u30b7\u30c3\u30c8</h1>\n<p>\u3053\u308c\u306f\u3001\u8ad6\u6587\u300c<a href=\"https://papers.labml.ai/paper/2010.02502\">\u30ce\u30a4\u30ba\u9664\u53bb\u62e1\u6563\u6697\u9ed9\u30e2\u30c7\u30eb</a>\u300d\u304b\u3089\u306eDDIM\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u3092\u5b9f\u88c5\u3057\u3066\u3044\u307e\u3059\u3002</p>\n",
+ "<h1>Denoising Diffusion Implicit Models (DDIM) Sampling</h1>\n<p>This implements DDIM sampling from the paper <a href=\"https://arxiv.org/abs/2010.02502\">Denoising Diffusion Implicit Models</a></p>\n": "<h1>\u30ce\u30a4\u30ba\u9664\u53bb\u62e1\u6563\u6697\u9ed9\u30e2\u30c7\u30eb (DDIM) \u30b5\u30f3\u30d7\u30ea\u30b7\u30c3\u30c8\u30b5\u30f3\u30d7\u30ea\u30b7\u30c3\u30c8</h1>\n<p>\u3053\u308c\u306f\u3001\u8ad6\u6587\u300c<a href=\"https://arxiv.org/abs/2010.02502\">\u30ce\u30a4\u30ba\u9664\u53bb\u62e1\u6563\u6697\u9ed9\u30e2\u30c7\u30eb</a>\u300d\u304b\u3089\u306eDDIM\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u3092\u5b9f\u88c5\u3057\u3066\u3044\u307e\u3059\u3002</p>\n",
 "<h2>DDIM Sampler</h2>\n<p>This extends the <a href=\"index.html\"><span translate=no>_^_0_^_</span> base class</a>.</p>\n<p>DDPM samples images by repeatedly removing noise by sampling step by step using,</p>\n<span translate=no>_^_1_^_</span><p>where <span translate=no>_^_2_^_</span> is random noise, <span translate=no>_^_3_^_</span> is a subsequence of <span translate=no>_^_4_^_</span> of length <span translate=no>_^_5_^_</span>, and <span translate=no>_^_6_^_</span></p>\n<p>Note that, <span translate=no>_^_7_^_</span> in DDIM paper refers to <span translate=no>_^_8_^_</span> from <a href=\"ddpm.html\">DDPM</a>.</p>\n": "<h2>DDIM \u30b5\u30f3\u30d7\u30e9\u30fc</h2>\n<p><a href=\"index.html\"><span translate=no>_^_0_^_</span>\u3053\u308c\u306f\u57fa\u672c\u30af\u30e9\u30b9\u3092\u62e1\u5f35\u3057\u307e\u3059</a>\u3002</p>\n<p>DDPM\u306f\u3001\u4ee5\u4e0b\u3092\u4f7f\u7528\u3057\u3066\u6bb5\u968e\u7684\u306b\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u3059\u308b\u3053\u3068\u306b\u3088\u308a\u3001\u30ce\u30a4\u30ba\u3092\u7e70\u308a\u8fd4\u3057\u9664\u53bb\u3059\u308b\u3053\u3068\u306b\u3088\u3063\u3066\u753b\u50cf\u3092\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u3057\u307e\u3059\u3002</p>\n<span translate=no>_^_1_^_</span><p>\u3053\u3053\u3067\u3001<span translate=no>_^_2_^_</span>\u306f\u30e9\u30f3\u30c0\u30e0\u30ce\u30a4\u30ba\u3001<span translate=no>_^_3_^_</span><span translate=no>_^_4_^_</span>\u306f\u9577\u3055\u306e\u30b5\u30d6\u30b7\u30fc\u30b1\u30f3\u30b9<span translate=no>_^_5_^_</span>\u3001<span translate=no>_^_6_^_</span></p>\n<p><a href=\"ddpm.html\">\u306a\u304a\u3001<span translate=no>_^_7_^_</span> <span translate=no>_^_8_^_</span> DDIM\u306e\u8ad6\u6587\u3067\u306fDDPM\u306e\u3082\u306e\u3092\u6307\u3057\u3066\u3044\u307e\u3059\u3002</a></p>\n",
 "<h3>Painting Loop</h3>\n<ul><li><span translate=no>_^_0_^_</span>  is <span translate=no>_^_1_^_</span> of shape <span translate=no>_^_2_^_</span> </li>\n<li><span translate=no>_^_3_^_</span>  is the conditional embeddings <span translate=no>_^_4_^_</span> </li>\n<li><span translate=no>_^_5_^_</span>  is the sampling step to start from, <span translate=no>_^_6_^_</span> </li>\n<li><span translate=no>_^_7_^_</span>  is the original image in latent page which we are in paining.  If this is not provided, it&#x27;ll be an image to image transformation. </li>\n<li><span translate=no>_^_8_^_</span>  is the mask to keep the original image. </li>\n<li><span translate=no>_^_9_^_</span>  is fixed noise to be added to the original image. </li>\n<li><span translate=no>_^_10_^_</span>  is the unconditional guidance scale <span translate=no>_^_11_^_</span>. This is used for  <span translate=no>_^_12_^_</span> </li>\n<li><span translate=no>_^_13_^_</span>  is the conditional embedding for empty prompt <span translate=no>_^_14_^_</span></li></ul>\n": "<h3>\u30da\u30a4\u30f3\u30c6\u30a3\u30f3\u30b0\u30eb\u30fc\u30d7</h3>\n<ul><li><span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span>\u5f62\u304c\u5408\u3063\u3066\u3044\u308b <span translate=no>_^_2_^_</span></li>\n<li><span translate=no>_^_3_^_</span>\u6761\u4ef6\u4ed8\u304d\u57cb\u3081\u8fbc\u307f\u3067\u3059 <span translate=no>_^_4_^_</span></li>\n<li><span translate=no>_^_5_^_</span>\u958b\u59cb\u3059\u308b\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u30b9\u30c6\u30c3\u30d7\u3067\u3059 <span translate=no>_^_6_^_</span></li>\n<li><span translate=no>_^_7_^_</span>\u73fe\u5728\u30da\u30a4\u30f3\u30c8\u4e2d\u306e\u6f5c\u5728\u30da\u30fc\u30b8\u306e\u30aa\u30ea\u30b8\u30ca\u30eb\u753b\u50cf\u3067\u3059\u3002\u3053\u308c\u304c\u6307\u5b9a\u3055\u308c\u3066\u3044\u306a\u3044\u5834\u5408\u306f\u3001\u753b\u50cf\u304b\u3089\u753b\u50cf\u3078\u306e\u5909\u63db\u306b\u306a\u308a\u307e\u3059\u3002</li>\n<li><span translate=no>_^_8_^_</span>\u5143\u306e\u753b\u50cf\u3092\u6b8b\u3059\u305f\u3081\u306e\u30de\u30b9\u30af\u3067\u3059\u3002</li>\n<li><span translate=no>_^_9_^_</span>\u5143\u306e\u753b\u50cf\u306b\u8ffd\u52a0\u3055\u308c\u308b\u56fa\u5b9a\u30ce\u30a4\u30ba\u3067\u3059\u3002</li>\n<li><span translate=no>_^_10_^_</span>\u7121\u6761\u4ef6\u30ac\u30a4\u30c0\u30f3\u30b9\u30b9\u30b1\u30fc\u30eb\u3067\u3059 <span translate=no>_^_11_^_</span>\u3053\u308c\u306f\u6b21\u306e\u7528\u9014\u306b\u4f7f\u7528\u3055\u308c\u307e\u3059 <span translate=no>_^_12_^_</span></li>\n<li><span translate=no>_^_13_^_</span>\u7a7a\u306e\u30d7\u30ed\u30f3\u30d7\u30c8\u306e\u6761\u4ef6\u4ed8\u304d\u57cb\u3081\u8fbc\u307f\u3067\u3059 <span translate=no>_^_14_^_</span></li></ul>\n",
 "<h3>Sample <span translate=no>_^_0_^_</span> given <span translate=no>_^_1_^_</span></h3>\n": "<h3><span translate=no>_^_0_^_</span>\u30b5\u30f3\u30d7\u30eb\u63d0\u4f9b <span translate=no>_^_1_^_</span></h3>\n",
--- a/translate_cache/diffusion/stable_diffusion/sampler/ddim.si.json
+++ b/translate_cache/diffusion/stable_diffusion/sampler/ddim.si.json
@ -1,5 +1,5 @@
 {
- "<h1>Denoising Diffusion Implicit Models (DDIM) Sampling</h1>\n<p>This implements DDIM sampling from the paper <a href=\"https://papers.labml.ai/paper/2010.02502\">Denoising Diffusion Implicit Models</a></p>\n": "<h1>Denoising \u0dc0\u0dd2\u0dc3\u0dbb\u0dab \u0dc0\u0dca\u0dba\u0d82\u0d9c \u0d86\u0d9a\u0dd8\u0dad\u0dd2 (DDIM) \u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd3\u0db8</h1>\n<p>\u0db8\u0dd9\u0dba \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dbd\u0dd2\u0db1\u0dca DDIM \u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd3\u0db8 \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dbb\u0dba\u0dd2 <a href=\"https://papers.labml.ai/paper/2010.02502\">Denoising Diffusion Implicit \u0d86\u0d9a\u0dd8\u0dad\u0dd2</a> Denoising</p>\n",
+ "<h1>Denoising Diffusion Implicit Models (DDIM) Sampling</h1>\n<p>This implements DDIM sampling from the paper <a href=\"https://arxiv.org/abs/2010.02502\">Denoising Diffusion Implicit Models</a></p>\n": "<h1>Denoising \u0dc0\u0dd2\u0dc3\u0dbb\u0dab \u0dc0\u0dca\u0dba\u0d82\u0d9c \u0d86\u0d9a\u0dd8\u0dad\u0dd2 (DDIM) \u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd3\u0db8</h1>\n<p>\u0db8\u0dd9\u0dba \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dbd\u0dd2\u0db1\u0dca DDIM \u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd3\u0db8 \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dbb\u0dba\u0dd2 <a href=\"https://arxiv.org/abs/2010.02502\">Denoising Diffusion Implicit \u0d86\u0d9a\u0dd8\u0dad\u0dd2</a> Denoising</p>\n",
 "<h2>DDIM Sampler</h2>\n<p>This extends the <a href=\"index.html\"><span translate=no>_^_0_^_</span> base class</a>.</p>\n<p>DDPM samples images by repeatedly removing noise by sampling step by step using,</p>\n<span translate=no>_^_1_^_</span><p>where <span translate=no>_^_2_^_</span> is random noise, <span translate=no>_^_3_^_</span> is a subsequence of <span translate=no>_^_4_^_</span> of length <span translate=no>_^_5_^_</span>, and <span translate=no>_^_6_^_</span></p>\n<p>Note that, <span translate=no>_^_7_^_</span> in DDIM paper refers to <span translate=no>_^_8_^_</span> from <a href=\"ddpm.html\">DDPM</a>.</p>\n": "<h2>\u0da9\u0dd3\u0da9\u0dd3\u0d85\u0dba\u0dd2\u0d91\u0db8\u0dca \u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd2\u0d9a\u0dbb\u0dd4</h2>\n<p>\u0db8\u0dd9\u0dba <a href=\"index.html\"><span translate=no>_^_0_^_</span>\u0db8\u0dd6\u0dbd\u0dd2\u0d9a \u0db4\u0db1\u0dca\u0dad\u0dd2\u0dba</a> \u0db4\u0dd4\u0dc5\u0dd4\u0dbd\u0dca \u0d9a\u0dbb\u0dba\u0dd2.</p>\n<p>\u0da9\u0dd3\u0da9\u0dd3\u0db4\u0dd3\u0d91\u0db8\u0dca \u0dc3\u0dcf\u0db8\u0dca\u0db4\u0dbd \u0dbb\u0dd6\u0db4 \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0db8\u0dd2\u0db1\u0dca \u0db4\u0dd2\u0dba\u0dc0\u0dbb\u0dd9\u0db1\u0dca \u0db4\u0dd2\u0dba\u0dc0\u0dbb \u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd3\u0db8\u0dd9\u0db1\u0dca \u0dc1\u0db6\u0dca\u0daf\u0dba \u0db1\u0dd0\u0dc0\u0dad \u0db1\u0dd0\u0dc0\u0dad\u0dad\u0dca \u0d89\u0dc0\u0dad\u0dca \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dd9\u0db1\u0dca,</p>\n<span translate=no>_^_1_^_</span><p>\u0d85\u0dc4\u0db9\u0dd4 \u0dc1\u0db6\u0dca\u0daf\u0dba<span translate=no>_^_2_^_</span> \u0dba\u0db1\u0dd4 \u0d9a\u0ddc\u0dad\u0dd0\u0db1\u0daf, \u0daf\u0dd2\u0d9c<span translate=no>_^_3_^_</span> \u0d85\u0db1\u0dd4\u0d9a\u0dca\u0dbb\u0db8\u0dba\u0d9a\u0dd2<span translate=no>_^_5_^_</span>, \u0dc3\u0dc4<span translate=no>_^_4_^_</span><span translate=no>_^_6_^_</span></p>\n<p>\u0da9\u0dd3\u0da9\u0dd3\u0d85\u0dba\u0dd2\u0d91\u0db8\u0dca \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2<span translate=no>_^_7_^_</span> \u0dc0\u0dbd <a href=\"ddpm.html\">\u0da9\u0dd3\u0da9\u0dd3\u0db4\u0dd3\u0d91\u0db8\u0dca<span translate=no>_^_8_^_</span></a> \u0dc0\u0dd9\u0dad\u0dd2\u0db1\u0dca \u0dc3\u0db3\u0dc4\u0db1\u0dca \u0dc0\u0db1 \u0db6\u0dc0 \u0dc3\u0dbd\u0d9a\u0db1\u0dca\u0db1.</p>\n",
 "<h3>Painting Loop</h3>\n<ul><li><span translate=no>_^_0_^_</span>  is <span translate=no>_^_1_^_</span> of shape <span translate=no>_^_2_^_</span> </li>\n<li><span translate=no>_^_3_^_</span>  is the conditional embeddings <span translate=no>_^_4_^_</span> </li>\n<li><span translate=no>_^_5_^_</span>  is the sampling step to start from, <span translate=no>_^_6_^_</span> </li>\n<li><span translate=no>_^_7_^_</span>  is the original image in latent page which we are in paining.  If this is not provided, it&#x27;ll be an image to image transformation. </li>\n<li><span translate=no>_^_8_^_</span>  is the mask to keep the original image. </li>\n<li><span translate=no>_^_9_^_</span>  is fixed noise to be added to the original image. </li>\n<li><span translate=no>_^_10_^_</span>  is the unconditional guidance scale <span translate=no>_^_11_^_</span>. This is used for  <span translate=no>_^_12_^_</span> </li>\n<li><span translate=no>_^_13_^_</span>  is the conditional embedding for empty prompt <span translate=no>_^_14_^_</span></li></ul>\n": "<h3>\u0db4\u0dd2\u0db1\u0dca\u0dad\u0dcf\u0dbb\u0dd4 \u0dbd\u0dd6\u0db4</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u0dc4\u0dd0\u0da9\u0dba\u0dd9\u0db1\u0dca<span translate=no>_^_1_^_</span> \u0dba\u0dd4\u0d9a\u0dca\u0dad \u0dc0\u0dda<span translate=no>_^_2_^_</span></li>\n<li><span translate=no>_^_3_^_</span>\u0d9a\u0ddc\u0db1\u0dca\u0daf\u0dda\u0dc3\u0dd2 \u0dc3\u0dc4\u0dd2\u0dad \u0d9a\u0dcf\u0dc0\u0dd0\u0daf\u0dca\u0daf\u0dd3\u0db8\u0dca \u0dc0\u0dda<span translate=no>_^_4_^_</span></li>\n<li><span translate=no>_^_5_^_</span>\u0dc3\u0dd2\u0da7 \u0d86\u0dbb\u0db8\u0dca\u0db7 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0da7 \u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd2 \u0db4\u0dd2\u0dba\u0dc0\u0dbb \u0dc0\u0dda,<span translate=no>_^_6_^_</span></li>\n<li><span translate=no>_^_7_^_</span>\u0dba\u0db1\u0dd4 \u0db8\u0dd4\u0dbd\u0dca \u0dbb\u0dd6\u0db4\u0dba\u0dba\u0dd2 \u0d9c\u0dd4\u0db4\u0dca\u0dad \u0db4\u0dd2\u0da7\u0dd4\u0dc0 \u0d85\u0db4\u0dd2 \u0db4\u0dd0\u0dbd\u0dca\u0dbd\u0db8\u0dca \u0d9a\u0dbb\u0db1. \u0db8\u0dd9\u0dba \u0dc3\u0db4\u0dba\u0dcf \u0db1\u0ddc\u0db8\u0dd0\u0dad\u0dd2 \u0db1\u0db8\u0dca, \u0d91\u0dba \u0dbb\u0dd6\u0db4 \u0db4\u0dbb\u0dd2\u0dc0\u0dbb\u0dca\u0dad\u0db1\u0dba\u0da7 \u0dbb\u0dd6\u0db4\u0dba\u0d9a\u0dca \u0dc0\u0db1\u0dd4 \u0d87\u0dad.</li>\n<li><span translate=no>_^_8_^_</span>\u0db8\u0dd4\u0dbd\u0dca \u0dbb\u0dd6\u0db4\u0dba \u0dad\u0db6\u0dcf \u0d9c\u0dd0\u0db1\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0dc0\u0dd9\u0dc3\u0dca\u0db8\u0dd4\u0dc4\u0dd4\u0dab \u0dc0\u0dda.</li>\n<li><span translate=no>_^_9_^_</span>\u0db8\u0dd4\u0dbd\u0dca \u0dbb\u0dd6\u0db4\u0dba\u0da7 \u0d91\u0d9a\u0dad\u0dd4 \u0d9a\u0dc5 \u0dba\u0dd4\u0dad\u0dd4 \u0dc3\u0dca\u0dae\u0dcf\u0dc0\u0dbb \u0dc1\u0db6\u0dca\u0daf\u0dba.</li>\n<li><span translate=no>_^_10_^_</span>\u0dba\u0db1\u0dd4 \u0d9a\u0ddc\u0db1\u0dca\u0daf\u0dda\u0dc3\u0dd2 \u0dc0\u0dd2\u0dbb\u0dc4\u0dd2\u0dad \u0db8\u0dcf\u0dbb\u0dca\u0d9c\u0ddd\u0db4\u0daf\u0dda\u0dc1<span translate=no>_^_11_^_</span> \u0db4\u0dbb\u0dd2\u0db8\u0dcf\u0dab\u0dba\u0dba\u0dd2. \u0db8\u0dd9\u0dba \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0dc0\u0dda<span translate=no>_^_12_^_</span></li>\n<li><span translate=no>_^_13_^_</span>\u0dc4\u0dd2\u0dc3\u0dca \u0dc0\u0dd2\u0db8\u0dc3\u0dd4\u0db8\u0d9a\u0dca \u0dc3\u0db3\u0dc4\u0dcf \u0d9a\u0ddc\u0db1\u0dca\u0daf\u0dda\u0dc3\u0dd2 \u0dc3\u0dc4\u0dd2\u0dad \u0d9a\u0dcf\u0dc0\u0dd0\u0daf\u0dca\u0daf\u0dd3\u0db8 \u0dc0\u0dda<span translate=no>_^_14_^_</span></li></ul>\n",
 "<h3>Sample <span translate=no>_^_0_^_</span> given <span translate=no>_^_1_^_</span></h3>\n": "<h3><span translate=no>_^_0_^_</span>\u0dbd\u0db6\u0dcf \u0daf\u0dd3 \u0d87\u0dad\u0dd2 \u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd2\u0dba<span translate=no>_^_1_^_</span></h3>\n",
--- a/translate_cache/diffusion/stable_diffusion/sampler/ddim.zh.json
+++ b/translate_cache/diffusion/stable_diffusion/sampler/ddim.zh.json
@ -1,5 +1,5 @@
 {
- "<h1>Denoising Diffusion Implicit Models (DDIM) Sampling</h1>\n<p>This implements DDIM sampling from the paper <a href=\"https://papers.labml.ai/paper/2010.02502\">Denoising Diffusion Implicit Models</a></p>\n": "<h1>\u964d\u566a\u6269\u6563\u9690\u542b\u6a21\u578b (DDIM) \u91c7\u6837</h1>\n<p>\u8fd9\u5b9e\u73b0\u4e86\u6765\u81ea\u8bba\u6587 \u201c<a href=\"https://papers.labml.ai/paper/2010.02502\">\u964d\u566a\u6269\u6563\u9690\u5f0f\u6a21\u578b</a>\u201d \u7684 DDIM \u91c7\u6837</p>\n",
+ "<h1>Denoising Diffusion Implicit Models (DDIM) Sampling</h1>\n<p>This implements DDIM sampling from the paper <a href=\"https://arxiv.org/abs/2010.02502\">Denoising Diffusion Implicit Models</a></p>\n": "<h1>\u964d\u566a\u6269\u6563\u9690\u542b\u6a21\u578b (DDIM) \u91c7\u6837</h1>\n<p>\u8fd9\u5b9e\u73b0\u4e86\u6765\u81ea\u8bba\u6587 \u201c<a href=\"https://arxiv.org/abs/2010.02502\">\u964d\u566a\u6269\u6563\u9690\u5f0f\u6a21\u578b</a>\u201d \u7684 DDIM \u91c7\u6837</p>\n",
 "<h2>DDIM Sampler</h2>\n<p>This extends the <a href=\"index.html\"><span translate=no>_^_0_^_</span> base class</a>.</p>\n<p>DDPM samples images by repeatedly removing noise by sampling step by step using,</p>\n<span translate=no>_^_1_^_</span><p>where <span translate=no>_^_2_^_</span> is random noise, <span translate=no>_^_3_^_</span> is a subsequence of <span translate=no>_^_4_^_</span> of length <span translate=no>_^_5_^_</span>, and <span translate=no>_^_6_^_</span></p>\n<p>Note that, <span translate=no>_^_7_^_</span> in DDIM paper refers to <span translate=no>_^_8_^_</span> from <a href=\"ddpm.html\">DDPM</a>.</p>\n": "<h2>DDIM \u91c7\u6837\u5668</h2>\n<p>\u8fd9\u6269\u5c55\u4e86<a href=\"index.html\"><span translate=no>_^_0_^_</span>\u57fa\u7c7b</a>\u3002</p>\n<p>DDPM \u901a\u8fc7\u9010\u6b65\u91c7\u6837\u6765\u53cd\u590d\u6d88\u9664\u566a\u70b9\u6765\u5bf9\u56fe\u50cf\u8fdb\u884c\u91c7\u6837\uff0c</p>\n<span translate=no>_^_1_^_</span><p>\u5176\u4e2d<span translate=no>_^_2_^_</span>\uff0c\u662f\u968f\u673a\u566a\u58f0\uff0c<span translate=no>_^_3_^_</span>\u662f\u957f\u5ea6\u4e3a<span translate=no>_^_4_^_</span>\u7684\u5b50\u5e8f\u5217<span translate=no>_^_5_^_</span>\uff0c<span translate=no>_^_6_^_</span></p>\n<p>\u8bf7\u6ce8\u610f\uff0c<span translate=no>_^_7_^_</span>\u5728 DDIM \u8bba\u6587\u4e2d\uff0c\u6307\u7684\u662f\u6765<span translate=no>_^_8_^_</span>\u81ea <a href=\"ddpm.html\">DDPM</a> \u7684\u8bba\u6587\u3002</p>\n",
 "<h3>Painting Loop</h3>\n<ul><li><span translate=no>_^_0_^_</span>  is <span translate=no>_^_1_^_</span> of shape <span translate=no>_^_2_^_</span> </li>\n<li><span translate=no>_^_3_^_</span>  is the conditional embeddings <span translate=no>_^_4_^_</span> </li>\n<li><span translate=no>_^_5_^_</span>  is the sampling step to start from, <span translate=no>_^_6_^_</span> </li>\n<li><span translate=no>_^_7_^_</span>  is the original image in latent page which we are in paining.  If this is not provided, it&#x27;ll be an image to image transformation. </li>\n<li><span translate=no>_^_8_^_</span>  is the mask to keep the original image. </li>\n<li><span translate=no>_^_9_^_</span>  is fixed noise to be added to the original image. </li>\n<li><span translate=no>_^_10_^_</span>  is the unconditional guidance scale <span translate=no>_^_11_^_</span>. This is used for  <span translate=no>_^_12_^_</span> </li>\n<li><span translate=no>_^_13_^_</span>  is the conditional embedding for empty prompt <span translate=no>_^_14_^_</span></li></ul>\n": "<h3>\u7ed8\u753b\u5faa\u73af</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u662f\u5f62<span translate=no>_^_1_^_</span>\u72b6\u7684<span translate=no>_^_2_^_</span></li>\n<li><span translate=no>_^_3_^_</span>\u662f\u6761\u4ef6\u5d4c\u5165<span translate=no>_^_4_^_</span></li>\n<li><span translate=no>_^_5_^_</span>\u662f\u5f00\u59cb\u65f6\u7684\u91c7\u6837\u6b65\u9aa4\uff0c<span translate=no>_^_6_^_</span></li>\n<li><span translate=no>_^_7_^_</span>\u662f\u6211\u4eec\u6b63\u5728\u7ed8\u5236\u7684\u6f5c\u5728\u9875\u9762\u4e2d\u7684\u539f\u59cb\u56fe\u50cf\u3002\u5982\u679c\u672a\u63d0\u4f9b\uff0c\u5219\u5c06\u662f\u56fe\u50cf\u5230\u56fe\u50cf\u7684\u8f6c\u6362\u3002</li>\n<li><span translate=no>_^_8_^_</span>\u662f\u4fdd\u7559\u539f\u59cb\u56fe\u50cf\u7684\u63a9\u7801\u3002</li>\n<li><span translate=no>_^_9_^_</span>\u662f\u8981\u6dfb\u52a0\u5230\u539f\u59cb\u56fe\u50cf\u7684\u56fa\u5b9a\u566a\u70b9\u3002</li>\n<li><span translate=no>_^_10_^_</span>\u662f\u65e0\u6761\u4ef6\u6307\u5bfc\u91cf\u8868<span translate=no>_^_11_^_</span>\u3002\u8fd9\u7528\u4e8e<span translate=no>_^_12_^_</span></li>\n<li><span translate=no>_^_13_^_</span>\u662f\u7a7a\u63d0\u793a\u7684\u6761\u4ef6\u5d4c\u5165<span translate=no>_^_14_^_</span></li></ul>\n",
 "<h3>Sample <span translate=no>_^_0_^_</span> given <span translate=no>_^_1_^_</span></h3>\n": "<h3><span translate=no>_^_0_^_</span>\u7ed9\u51fa\u7684\u6837\u672c<span translate=no>_^_1_^_</span></h3>\n",
--- a/translate_cache/distillation/init.ja.json
+++ b/translate_cache/distillation/init.ja.json
--- a/translate_cache/distillation/init.si.json
+++ b/translate_cache/distillation/init.si.json
--- a/translate_cache/distillation/init.zh.json
+++ b/translate_cache/distillation/init.zh.json
--- a/translate_cache/distillation/readme.ja.json
+++ b/translate_cache/distillation/readme.ja.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/distillation/index.html\">Distilling the Knowledge in a Neural Network</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://papers.labml.ai/paper/1503.02531\">Distilling the Knowledge in a Neural Network</a>.</p>\n<p>It&#x27;s a way of training a small network using the knowledge in a trained larger network; i.e. distilling the knowledge from the large network.</p>\n<p>A large model with regularization or an ensemble of models (using dropout) generalizes better than a small model when trained directly on the data and labels. However, a small model can be trained to generalize better with help of a large model. Smaller models are better in production: faster, less compute, less memory.</p>\n<p>The output probabilities of a trained model give more information than the labels because it assigns non-zero probabilities to incorrect classes as well. These probabilities tell us that a sample has a chance of belonging to certain classes. For instance, when classifying digits, when given an image of digit <em>7</em>, a generalized model will give a high probability to 7 and a small but non-zero probability to 2, while assigning almost zero probability to other digits. Distillation uses this information to train a small model better. </p>\n": "<h1><a href=\"https://nn.labml.ai/distillation/index.html\">\u30cb\u30e5\u30fc\u30e9\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3067\u306e\u77e5\u8b58\u306e\u62bd\u51fa</a></h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://papers.labml.ai/paper/1503.02531\">\u8ad6\u6587\u300c\u30cb\u30e5\u30fc\u30e9\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306b\u304a\u3051\u308b\u77e5\u8b58\u306e\u62bd\u51fa</a>\u300d<a href=\"https://pytorch.org\">\u306ePyTorch\u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3067\u3059</a>\u3002</p>\n<p>\u3053\u308c\u306f\u3001\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u6e08\u307f\u306e\u5927\u898f\u6a21\u306a\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u77e5\u8b58\u3092\u4f7f\u7528\u3057\u3066\u5c0f\u898f\u6a21\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u65b9\u6cd5\u3067\u3059\u3002\u3064\u307e\u308a\u3001\u5927\u898f\u6a21\u306a\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u304b\u3089\u77e5\u8b58\u3092\u62bd\u51fa\u3059\u308b\u65b9\u6cd5\u3067\u3059\u3002</p>\n<p>\u30c7\u30fc\u30bf\u3084\u30e9\u30d9\u30eb\u3067\u76f4\u63a5\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3057\u305f\u5834\u5408\u3001\u6b63\u5247\u5316\u3092\u884c\u3063\u305f\u5927\u898f\u6a21\u306a\u30e2\u30c7\u30eb\u3084 (\u30c9\u30ed\u30c3\u30d7\u30a2\u30a6\u30c8\u3092\u4f7f\u7528\u3057\u305f) \u30e2\u30c7\u30eb\u306e\u30a2\u30f3\u30b5\u30f3\u30d6\u30eb\u306f\u3001\u5c0f\u3055\u306a\u30e2\u30c7\u30eb\u3088\u308a\u3082\u4e00\u822c\u5316\u304c\u5bb9\u6613\u3067\u3059\u3002\u305f\u3060\u3057\u3001\u5c0f\u3055\u3044\u30e2\u30c7\u30eb\u3067\u3082\u3001\u5927\u304d\u306a\u30e2\u30c7\u30eb\u306e\u52a9\u3051\u3092\u501f\u308a\u3066\u3088\u308a\u4e00\u822c\u5316\u3057\u3084\u3059\u3044\u3088\u3046\u306b\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3067\u304d\u307e\u3059\u3002\u672c\u756a\u74b0\u5883\u3067\u306f\u3001\u30e2\u30c7\u30eb\u304c\u5c0f\u3055\u3044\u307b\u3069\u901f\u304f\u3001\u51e6\u7406\u80fd\u529b\u304c\u5c11\u306a\u304f\u3001\u30e1\u30e2\u30ea\u3082\u5c11\u306a\u304f\u3066\u6e08\u307f\u307e\u3059\u3002</p>\n<p>\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u6e08\u307f\u30e2\u30c7\u30eb\u306e\u51fa\u529b\u78ba\u7387\u306f\u3001\u8aa4\u3063\u305f\u30af\u30e9\u30b9\u306b\u3082\u30bc\u30ed\u4ee5\u5916\u306e\u78ba\u7387\u3092\u5272\u308a\u5f53\u3066\u308b\u305f\u3081\u3001\u30e9\u30d9\u30eb\u3088\u308a\u3082\u591a\u304f\u306e\u60c5\u5831\u3092\u63d0\u4f9b\u3057\u307e\u3059\u3002\u3053\u308c\u3089\u306e\u78ba\u7387\u304b\u3089\u3001\u30b5\u30f3\u30d7\u30eb\u304c\u7279\u5b9a\u306e\u30af\u30e9\u30b9\u306b\u5c5e\u3057\u3066\u3044\u308b\u53ef\u80fd\u6027\u304c\u3042\u308b\u3053\u3068\u304c\u308f\u304b\u308a\u307e\u3059\u3002\u305f\u3068\u3048\u3070\u3001\u6570\u5b57\u3092\u5206\u985e\u3059\u308b\u969b\u3001<em>7 \u6841\u306e\u753b\u50cf\u304c\u4e0e\u3048\u3089\u308c\u305f\u5834\u5408\u3001\u4e00\u822c\u5316\u30e2\u30c7\u30eb\u3067\u306f 7</em> \u306b\u306f\u9ad8\u3044\u78ba\u7387\u30012 \u306b\u306f\u5c0f\u3055\u3044\u306a\u304c\u3089\u3082\u30bc\u30ed\u3067\u306f\u306a\u3044\u78ba\u7387\u304c\u4e0e\u3048\u3089\u308c\u3001\u4ed6\u306e\u6570\u5b57\u306b\u306f\u307b\u307c\u30bc\u30ed\u306e\u78ba\u7387\u3092\u5272\u308a\u5f53\u3066\u307e\u3059\u3002\u84b8\u7559\u3067\u306f\u3001\u3053\u306e\u60c5\u5831\u3092\u5229\u7528\u3057\u3066\u5c0f\u578b\u30e2\u30c7\u30eb\u306e\u5b66\u7fd2\u52b9\u679c\u3092\u9ad8\u3081\u307e\u3059</p>\u3002\n",
+ "<h1><a href=\"https://nn.labml.ai/distillation/index.html\">Distilling the Knowledge in a Neural Network</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://arxiv.org/abs/1503.02531\">Distilling the Knowledge in a Neural Network</a>.</p>\n<p>It&#x27;s a way of training a small network using the knowledge in a trained larger network; i.e. distilling the knowledge from the large network.</p>\n<p>A large model with regularization or an ensemble of models (using dropout) generalizes better than a small model when trained directly on the data and labels. However, a small model can be trained to generalize better with help of a large model. Smaller models are better in production: faster, less compute, less memory.</p>\n<p>The output probabilities of a trained model give more information than the labels because it assigns non-zero probabilities to incorrect classes as well. These probabilities tell us that a sample has a chance of belonging to certain classes. For instance, when classifying digits, when given an image of digit <em>7</em>, a generalized model will give a high probability to 7 and a small but non-zero probability to 2, while assigning almost zero probability to other digits. Distillation uses this information to train a small model better. </p>\n": "<h1><a href=\"https://nn.labml.ai/distillation/index.html\">\u30cb\u30e5\u30fc\u30e9\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3067\u306e\u77e5\u8b58\u306e\u62bd\u51fa</a></h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://arxiv.org/abs/1503.02531\">\u8ad6\u6587\u300c\u30cb\u30e5\u30fc\u30e9\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306b\u304a\u3051\u308b\u77e5\u8b58\u306e\u62bd\u51fa</a>\u300d<a href=\"https://pytorch.org\">\u306ePyTorch\u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3067\u3059</a>\u3002</p>\n<p>\u3053\u308c\u306f\u3001\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u6e08\u307f\u306e\u5927\u898f\u6a21\u306a\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u77e5\u8b58\u3092\u4f7f\u7528\u3057\u3066\u5c0f\u898f\u6a21\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u65b9\u6cd5\u3067\u3059\u3002\u3064\u307e\u308a\u3001\u5927\u898f\u6a21\u306a\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u304b\u3089\u77e5\u8b58\u3092\u62bd\u51fa\u3059\u308b\u65b9\u6cd5\u3067\u3059\u3002</p>\n<p>\u30c7\u30fc\u30bf\u3084\u30e9\u30d9\u30eb\u3067\u76f4\u63a5\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3057\u305f\u5834\u5408\u3001\u6b63\u5247\u5316\u3092\u884c\u3063\u305f\u5927\u898f\u6a21\u306a\u30e2\u30c7\u30eb\u3084 (\u30c9\u30ed\u30c3\u30d7\u30a2\u30a6\u30c8\u3092\u4f7f\u7528\u3057\u305f) \u30e2\u30c7\u30eb\u306e\u30a2\u30f3\u30b5\u30f3\u30d6\u30eb\u306f\u3001\u5c0f\u3055\u306a\u30e2\u30c7\u30eb\u3088\u308a\u3082\u4e00\u822c\u5316\u304c\u5bb9\u6613\u3067\u3059\u3002\u305f\u3060\u3057\u3001\u5c0f\u3055\u3044\u30e2\u30c7\u30eb\u3067\u3082\u3001\u5927\u304d\u306a\u30e2\u30c7\u30eb\u306e\u52a9\u3051\u3092\u501f\u308a\u3066\u3088\u308a\u4e00\u822c\u5316\u3057\u3084\u3059\u3044\u3088\u3046\u306b\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3067\u304d\u307e\u3059\u3002\u672c\u756a\u74b0\u5883\u3067\u306f\u3001\u30e2\u30c7\u30eb\u304c\u5c0f\u3055\u3044\u307b\u3069\u901f\u304f\u3001\u51e6\u7406\u80fd\u529b\u304c\u5c11\u306a\u304f\u3001\u30e1\u30e2\u30ea\u3082\u5c11\u306a\u304f\u3066\u6e08\u307f\u307e\u3059\u3002</p>\n<p>\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u6e08\u307f\u30e2\u30c7\u30eb\u306e\u51fa\u529b\u78ba\u7387\u306f\u3001\u8aa4\u3063\u305f\u30af\u30e9\u30b9\u306b\u3082\u30bc\u30ed\u4ee5\u5916\u306e\u78ba\u7387\u3092\u5272\u308a\u5f53\u3066\u308b\u305f\u3081\u3001\u30e9\u30d9\u30eb\u3088\u308a\u3082\u591a\u304f\u306e\u60c5\u5831\u3092\u63d0\u4f9b\u3057\u307e\u3059\u3002\u3053\u308c\u3089\u306e\u78ba\u7387\u304b\u3089\u3001\u30b5\u30f3\u30d7\u30eb\u304c\u7279\u5b9a\u306e\u30af\u30e9\u30b9\u306b\u5c5e\u3057\u3066\u3044\u308b\u53ef\u80fd\u6027\u304c\u3042\u308b\u3053\u3068\u304c\u308f\u304b\u308a\u307e\u3059\u3002\u305f\u3068\u3048\u3070\u3001\u6570\u5b57\u3092\u5206\u985e\u3059\u308b\u969b\u3001<em>7 \u6841\u306e\u753b\u50cf\u304c\u4e0e\u3048\u3089\u308c\u305f\u5834\u5408\u3001\u4e00\u822c\u5316\u30e2\u30c7\u30eb\u3067\u306f 7</em> \u306b\u306f\u9ad8\u3044\u78ba\u7387\u30012 \u306b\u306f\u5c0f\u3055\u3044\u306a\u304c\u3089\u3082\u30bc\u30ed\u3067\u306f\u306a\u3044\u78ba\u7387\u304c\u4e0e\u3048\u3089\u308c\u3001\u4ed6\u306e\u6570\u5b57\u306b\u306f\u307b\u307c\u30bc\u30ed\u306e\u78ba\u7387\u3092\u5272\u308a\u5f53\u3066\u307e\u3059\u3002\u84b8\u7559\u3067\u306f\u3001\u3053\u306e\u60c5\u5831\u3092\u5229\u7528\u3057\u3066\u5c0f\u578b\u30e2\u30c7\u30eb\u306e\u5b66\u7fd2\u52b9\u679c\u3092\u9ad8\u3081\u307e\u3059</p>\u3002\n",
 "Distilling the Knowledge in a Neural Network": "\u30cb\u30e5\u30fc\u30e9\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3067\u306e\u77e5\u8b58\u306e\u62bd\u51fa"
 }
--- a/translate_cache/distillation/readme.si.json
+++ b/translate_cache/distillation/readme.si.json
--- a/translate_cache/distillation/readme.zh.json
+++ b/translate_cache/distillation/readme.zh.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/distillation/index.html\">Distilling the Knowledge in a Neural Network</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://papers.labml.ai/paper/1503.02531\">Distilling the Knowledge in a Neural Network</a>.</p>\n<p>It&#x27;s a way of training a small network using the knowledge in a trained larger network; i.e. distilling the knowledge from the large network.</p>\n<p>A large model with regularization or an ensemble of models (using dropout) generalizes better than a small model when trained directly on the data and labels. However, a small model can be trained to generalize better with help of a large model. Smaller models are better in production: faster, less compute, less memory.</p>\n<p>The output probabilities of a trained model give more information than the labels because it assigns non-zero probabilities to incorrect classes as well. These probabilities tell us that a sample has a chance of belonging to certain classes. For instance, when classifying digits, when given an image of digit <em>7</em>, a generalized model will give a high probability to 7 and a small but non-zero probability to 2, while assigning almost zero probability to other digits. Distillation uses this information to train a small model better. </p>\n": "<h1><a href=\"https://nn.labml.ai/distillation/index.html\">\u5728\u795e\u7ecf\u7f51\u7edc\u4e2d\u63d0\u70bc\u77e5\u8bc6</a></h1>\n<p>\u8fd9\u662f\u8bba\u6587\u300a<a href=\"https://papers.labml.ai/paper/1503.02531\">\u5728\u795e\u7ecf\u7f51\u7edc\u4e2d\u63d0\u70bc\u77e5\u8bc6\u300b\u7684 PyT</a> <a href=\"https://pytorch.org\">orch</a> \u5b9e\u73b0/\u6559\u7a0b\u3002</p>\n<p>\u8fd9\u662f\u4e00\u79cd\u4f7f\u7528\u7ecf\u8fc7\u8bad\u7ec3\u7684\u5927\u578b\u7f51\u7edc\u4e2d\u7684\u77e5\u8bc6\u6765\u8bad\u7ec3\u5c0f\u578b\u7f51\u7edc\u7684\u65b9\u6cd5\uff1b\u5373\u4ece\u5927\u578b\u7f51\u7edc\u4e2d\u63d0\u70bc\u77e5\u8bc6\u3002</p>\n<p>\u76f4\u63a5\u5728\u6570\u636e\u548c\u6807\u7b7e\u4e0a\u8bad\u7ec3\u65f6\uff0c\u5177\u6709\u6b63\u5219\u5316\u6216\u6a21\u578b\u96c6\u5408\uff08\u4f7f\u7528 dropout\uff09\u7684\u5927\u578b\u6a21\u578b\u6bd4\u5c0f\u578b\u6a21\u578b\u7684\u6982\u5316\u6548\u679c\u66f4\u597d\u3002\u4f46\u662f\uff0c\u5728\u5927\u578b\u6a21\u578b\u7684\u5e2e\u52a9\u4e0b\uff0c\u53ef\u4ee5\u8bad\u7ec3\u5c0f\u6a21\u578b\u4ee5\u66f4\u597d\u5730\u8fdb\u884c\u6982\u62ec\u3002\u8f83\u5c0f\u7684\u6a21\u578b\u5728\u751f\u4ea7\u4e2d\u66f4\u597d\uff1a\u901f\u5ea6\u66f4\u5feb\u3001\u8ba1\u7b97\u66f4\u5c11\u3001\u5185\u5b58\u66f4\u5c11\u3002</p>\n<p>\u7ecf\u8fc7\u8bad\u7ec3\u7684\u6a21\u578b\u7684\u8f93\u51fa\u6982\u7387\u6bd4\u6807\u7b7e\u63d0\u4f9b\u7684\u4fe1\u606f\u66f4\u591a\uff0c\u56e0\u4e3a\u5b83\u4e5f\u4f1a\u4e3a\u9519\u8bef\u7684\u7c7b\u5206\u914d\u975e\u96f6\u6982\u7387\u3002\u8fd9\u4e9b\u6982\u7387\u544a\u8bc9\u6211\u4eec\uff0c\u6837\u672c\u6709\u53ef\u80fd\u5c5e\u4e8e\u67d0\u4e9b\u7c7b\u522b\u3002\u4f8b\u5982\uff0c\u5728\u5bf9\u6570\u5b57\u8fdb\u884c\u5206\u7c7b\u65f6\uff0c\u5f53\u7ed9\u5b9a\u6570\u5b57 <em>7</em> \u7684\u56fe\u50cf\u65f6\uff0c\u5e7f\u4e49\u6a21\u578b\u4f1a\u7ed9\u51fa7\u7684\u9ad8\u6982\u7387\uff0c\u7ed92\u7684\u6982\u7387\u5f88\u5c0f\u4f46\u4e0d\u662f\u96f6\uff0c\u800c\u7ed9\u5176\u4ed6\u6570\u5b57\u5206\u914d\u51e0\u4e4e\u4e3a\u96f6\u7684\u6982\u7387\u3002\u84b8\u998f\u5229\u7528\u8fd9\u4e9b\u4fe1\u606f\u6765\u66f4\u597d\u5730\u8bad\u7ec3\u5c0f\u578b\u6a21\u578b\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/distillation/index.html\">Distilling the Knowledge in a Neural Network</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://arxiv.org/abs/1503.02531\">Distilling the Knowledge in a Neural Network</a>.</p>\n<p>It&#x27;s a way of training a small network using the knowledge in a trained larger network; i.e. distilling the knowledge from the large network.</p>\n<p>A large model with regularization or an ensemble of models (using dropout) generalizes better than a small model when trained directly on the data and labels. However, a small model can be trained to generalize better with help of a large model. Smaller models are better in production: faster, less compute, less memory.</p>\n<p>The output probabilities of a trained model give more information than the labels because it assigns non-zero probabilities to incorrect classes as well. These probabilities tell us that a sample has a chance of belonging to certain classes. For instance, when classifying digits, when given an image of digit <em>7</em>, a generalized model will give a high probability to 7 and a small but non-zero probability to 2, while assigning almost zero probability to other digits. Distillation uses this information to train a small model better. </p>\n": "<h1><a href=\"https://nn.labml.ai/distillation/index.html\">\u5728\u795e\u7ecf\u7f51\u7edc\u4e2d\u63d0\u70bc\u77e5\u8bc6</a></h1>\n<p>\u8fd9\u662f\u8bba\u6587\u300a<a href=\"https://arxiv.org/abs/1503.02531\">\u5728\u795e\u7ecf\u7f51\u7edc\u4e2d\u63d0\u70bc\u77e5\u8bc6\u300b\u7684 PyT</a> <a href=\"https://pytorch.org\">orch</a> \u5b9e\u73b0/\u6559\u7a0b\u3002</p>\n<p>\u8fd9\u662f\u4e00\u79cd\u4f7f\u7528\u7ecf\u8fc7\u8bad\u7ec3\u7684\u5927\u578b\u7f51\u7edc\u4e2d\u7684\u77e5\u8bc6\u6765\u8bad\u7ec3\u5c0f\u578b\u7f51\u7edc\u7684\u65b9\u6cd5\uff1b\u5373\u4ece\u5927\u578b\u7f51\u7edc\u4e2d\u63d0\u70bc\u77e5\u8bc6\u3002</p>\n<p>\u76f4\u63a5\u5728\u6570\u636e\u548c\u6807\u7b7e\u4e0a\u8bad\u7ec3\u65f6\uff0c\u5177\u6709\u6b63\u5219\u5316\u6216\u6a21\u578b\u96c6\u5408\uff08\u4f7f\u7528 dropout\uff09\u7684\u5927\u578b\u6a21\u578b\u6bd4\u5c0f\u578b\u6a21\u578b\u7684\u6982\u5316\u6548\u679c\u66f4\u597d\u3002\u4f46\u662f\uff0c\u5728\u5927\u578b\u6a21\u578b\u7684\u5e2e\u52a9\u4e0b\uff0c\u53ef\u4ee5\u8bad\u7ec3\u5c0f\u6a21\u578b\u4ee5\u66f4\u597d\u5730\u8fdb\u884c\u6982\u62ec\u3002\u8f83\u5c0f\u7684\u6a21\u578b\u5728\u751f\u4ea7\u4e2d\u66f4\u597d\uff1a\u901f\u5ea6\u66f4\u5feb\u3001\u8ba1\u7b97\u66f4\u5c11\u3001\u5185\u5b58\u66f4\u5c11\u3002</p>\n<p>\u7ecf\u8fc7\u8bad\u7ec3\u7684\u6a21\u578b\u7684\u8f93\u51fa\u6982\u7387\u6bd4\u6807\u7b7e\u63d0\u4f9b\u7684\u4fe1\u606f\u66f4\u591a\uff0c\u56e0\u4e3a\u5b83\u4e5f\u4f1a\u4e3a\u9519\u8bef\u7684\u7c7b\u5206\u914d\u975e\u96f6\u6982\u7387\u3002\u8fd9\u4e9b\u6982\u7387\u544a\u8bc9\u6211\u4eec\uff0c\u6837\u672c\u6709\u53ef\u80fd\u5c5e\u4e8e\u67d0\u4e9b\u7c7b\u522b\u3002\u4f8b\u5982\uff0c\u5728\u5bf9\u6570\u5b57\u8fdb\u884c\u5206\u7c7b\u65f6\uff0c\u5f53\u7ed9\u5b9a\u6570\u5b57 <em>7</em> \u7684\u56fe\u50cf\u65f6\uff0c\u5e7f\u4e49\u6a21\u578b\u4f1a\u7ed9\u51fa7\u7684\u9ad8\u6982\u7387\uff0c\u7ed92\u7684\u6982\u7387\u5f88\u5c0f\u4f46\u4e0d\u662f\u96f6\uff0c\u800c\u7ed9\u5176\u4ed6\u6570\u5b57\u5206\u914d\u51e0\u4e4e\u4e3a\u96f6\u7684\u6982\u7387\u3002\u84b8\u998f\u5229\u7528\u8fd9\u4e9b\u4fe1\u606f\u6765\u66f4\u597d\u5730\u8bad\u7ec3\u5c0f\u578b\u6a21\u578b\u3002</p>\n",
 "Distilling the Knowledge in a Neural Network": "\u5728\u795e\u7ecf\u7f51\u7edc\u4e2d\u63d0\u70bc\u77e5\u8bc6"
 }
--- a/translate_cache/gan/cycle_gan/init.ja.json
+++ b/translate_cache/gan/cycle_gan/init.ja.json
--- a/translate_cache/gan/cycle_gan/init.si.json
+++ b/translate_cache/gan/cycle_gan/init.si.json
--- a/translate_cache/gan/cycle_gan/init.zh.json
+++ b/translate_cache/gan/cycle_gan/init.zh.json
@ -1,5 +1,5 @@
 {
- "<h1>Cycle GAN</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://papers.labml.ai/paper/1703.10593\">Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks</a>.</p>\n<p>I&#x27;ve taken pieces of code from <a href=\"https://github.com/eriklindernoren/PyTorch-GAN\">eriklindernoren/PyTorch-GAN</a>. It is a very good resource if you want to checkout other GAN variations too.</p>\n<p>Cycle GAN does image-to-image translation. It trains a model to translate an image from given distribution to another, say, images of class A and B. Images of a certain distribution could be things like images of a certain style, or nature. The models do not need paired images between A and B. Just a set of images of each class is enough. This works very well on changing between image styles, lighting changes, pattern changes, etc. For example, changing summer to winter, painting style to photos, and horses to zebras.</p>\n<p>Cycle GAN trains two generator models and two discriminator models. One generator translates images from A to B and the other from B to A. The discriminators test whether the generated images look real.</p>\n<p>This file contains the model code as well as the training code. We also have a Google Colab notebook.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/gan/cycle_gan/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>\u5faa\u73af GAN</h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch \u7684 PyTorch</a> \u5b9e\u73b0/\u6559\u7a0b\uff0c\u8be5\u8bba\u6587<a href=\"https://papers.labml.ai/paper/1703.10593\">\u4f7f\u7528\u5468\u671f\u4e00\u81f4\u6027\u5bf9\u6297\u7f51\u7edc\u8fdb\u884c\u56fe\u50cf\u95f4\u7684\u975e\u914d\u5bf9\u8f6c\u6362</a>\u3002</p>\n<p>\u6211\u4ece <a href=\"https://github.com/eriklindernoren/PyTorch-GAN\">eriklindernoren/pytorch-Gan</a> \u90a3\u91cc\u62ff\u4e86\u4e00\u4e9b\u4ee3\u7801\u3002\u5982\u679c\u4f60\u4e5f\u60f3\u67e5\u770b\u5176\u4ed6 GAN \u53d8\u4f53\uff0c\u8fd9\u662f\u4e00\u4e2a\u975e\u5e38\u597d\u7684\u8d44\u6e90\u3002</p>\nCyc@@ <p>le GAN \u8fdb\u884c\u56fe\u50cf\u5230\u56fe\u50cf\u7684\u8f6c\u6362\u3002\u5b83\u8bad\u7ec3\u6a21\u578b\u5c06\u56fe\u50cf\u4ece\u7ed9\u5b9a\u5206\u5e03\u8f6c\u6362\u5230\u53e6\u4e00\u4e2a\u5206\u5e03\uff0c\u6bd4\u5982A\u7c7b\u548cB\u7c7b\u7684\u56fe\u50cf\uff0c\u67d0\u4e2a\u5206\u5e03\u7684\u56fe\u50cf\u53ef\u4ee5\u662f\u67d0\u79cd\u98ce\u683c\u6216\u81ea\u7136\u7684\u56fe\u50cf\u3002\u6a21\u578b\u4e0d\u9700\u8981 A \u548c B \u4e4b\u95f4\u7684\u914d\u5bf9\u56fe\u50cf\uff0c\u6bcf\u4e2a\u7c7b\u522b\u7684\u4e00\u7ec4\u56fe\u50cf\u5c31\u8db3\u591f\u4e86\u3002\u8fd9\u975e\u5e38\u9002\u5408\u5728\u56fe\u50cf\u98ce\u683c\u3001\u5149\u7167\u53d8\u5316\u3001\u56fe\u6848\u53d8\u5316\u7b49\u4e4b\u95f4\u8fdb\u884c\u5207\u6362\u3002\u4f8b\u5982\uff0c\u5c06\u590f\u5929\u6539\u4e3a\u51ac\u5929\uff0c\u5c06\u7ed8\u753b\u98ce\u683c\u6539\u4e3a\u7167\u7247\uff0c\u5c06\u9a6c\u6539\u4e3a\u6591\u9a6c\u3002</p>\n<p>Cycle GAN \u53ef\u8bad\u7ec3\u4e24\u4e2a\u53d1\u7535\u673a\u6a21\u578b\u548c\u4e24\u4e2a\u9274\u522b\u5668\u6a21\u578b\u3002\u4e00\u4e2a\u751f\u6210\u5668\u5c06\u56fe\u50cf\u4ece A \u8f6c\u6362\u5230 B\uff0c\u53e6\u4e00\u4e2a\u4ece B \u8f6c\u6362\u5230 A\u3002\u5224\u522b\u5668\u6d4b\u8bd5\u751f\u6210\u7684\u56fe\u50cf\u662f\u5426\u771f\u5b9e\u3002</p>\n<p>\u6b64\u6587\u4ef6\u5305\u542b\u6a21\u578b\u4ee3\u7801\u548c\u8bad\u7ec3\u4ee3\u7801\u3002\u6211\u4eec\u8fd8\u6709\u4e00\u53f0\u8c37\u6b4c Colab \u7b14\u8bb0\u672c\u7535\u8111\u3002</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/gan/cycle_gan/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
+ "<h1>Cycle GAN</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://arxiv.org/abs/1703.10593\">Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks</a>.</p>\n<p>I&#x27;ve taken pieces of code from <a href=\"https://github.com/eriklindernoren/PyTorch-GAN\">eriklindernoren/PyTorch-GAN</a>. It is a very good resource if you want to checkout other GAN variations too.</p>\n<p>Cycle GAN does image-to-image translation. It trains a model to translate an image from given distribution to another, say, images of class A and B. Images of a certain distribution could be things like images of a certain style, or nature. The models do not need paired images between A and B. Just a set of images of each class is enough. This works very well on changing between image styles, lighting changes, pattern changes, etc. For example, changing summer to winter, painting style to photos, and horses to zebras.</p>\n<p>Cycle GAN trains two generator models and two discriminator models. One generator translates images from A to B and the other from B to A. The discriminators test whether the generated images look real.</p>\n<p>This file contains the model code as well as the training code. We also have a Google Colab notebook.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/gan/cycle_gan/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>\u5faa\u73af GAN</h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch \u7684 PyTorch</a> \u5b9e\u73b0/\u6559\u7a0b\uff0c\u8be5\u8bba\u6587<a href=\"https://arxiv.org/abs/1703.10593\">\u4f7f\u7528\u5468\u671f\u4e00\u81f4\u6027\u5bf9\u6297\u7f51\u7edc\u8fdb\u884c\u56fe\u50cf\u95f4\u7684\u975e\u914d\u5bf9\u8f6c\u6362</a>\u3002</p>\n<p>\u6211\u4ece <a href=\"https://github.com/eriklindernoren/PyTorch-GAN\">eriklindernoren/pytorch-Gan</a> \u90a3\u91cc\u62ff\u4e86\u4e00\u4e9b\u4ee3\u7801\u3002\u5982\u679c\u4f60\u4e5f\u60f3\u67e5\u770b\u5176\u4ed6 GAN \u53d8\u4f53\uff0c\u8fd9\u662f\u4e00\u4e2a\u975e\u5e38\u597d\u7684\u8d44\u6e90\u3002</p>\nCyc@@ <p>le GAN \u8fdb\u884c\u56fe\u50cf\u5230\u56fe\u50cf\u7684\u8f6c\u6362\u3002\u5b83\u8bad\u7ec3\u6a21\u578b\u5c06\u56fe\u50cf\u4ece\u7ed9\u5b9a\u5206\u5e03\u8f6c\u6362\u5230\u53e6\u4e00\u4e2a\u5206\u5e03\uff0c\u6bd4\u5982A\u7c7b\u548cB\u7c7b\u7684\u56fe\u50cf\uff0c\u67d0\u4e2a\u5206\u5e03\u7684\u56fe\u50cf\u53ef\u4ee5\u662f\u67d0\u79cd\u98ce\u683c\u6216\u81ea\u7136\u7684\u56fe\u50cf\u3002\u6a21\u578b\u4e0d\u9700\u8981 A \u548c B \u4e4b\u95f4\u7684\u914d\u5bf9\u56fe\u50cf\uff0c\u6bcf\u4e2a\u7c7b\u522b\u7684\u4e00\u7ec4\u56fe\u50cf\u5c31\u8db3\u591f\u4e86\u3002\u8fd9\u975e\u5e38\u9002\u5408\u5728\u56fe\u50cf\u98ce\u683c\u3001\u5149\u7167\u53d8\u5316\u3001\u56fe\u6848\u53d8\u5316\u7b49\u4e4b\u95f4\u8fdb\u884c\u5207\u6362\u3002\u4f8b\u5982\uff0c\u5c06\u590f\u5929\u6539\u4e3a\u51ac\u5929\uff0c\u5c06\u7ed8\u753b\u98ce\u683c\u6539\u4e3a\u7167\u7247\uff0c\u5c06\u9a6c\u6539\u4e3a\u6591\u9a6c\u3002</p>\n<p>Cycle GAN \u53ef\u8bad\u7ec3\u4e24\u4e2a\u53d1\u7535\u673a\u6a21\u578b\u548c\u4e24\u4e2a\u9274\u522b\u5668\u6a21\u578b\u3002\u4e00\u4e2a\u751f\u6210\u5668\u5c06\u56fe\u50cf\u4ece A \u8f6c\u6362\u5230 B\uff0c\u53e6\u4e00\u4e2a\u4ece B \u8f6c\u6362\u5230 A\u3002\u5224\u522b\u5668\u6d4b\u8bd5\u751f\u6210\u7684\u56fe\u50cf\u662f\u5426\u771f\u5b9e\u3002</p>\n<p>\u6b64\u6587\u4ef6\u5305\u542b\u6a21\u578b\u4ee3\u7801\u548c\u8bad\u7ec3\u4ee3\u7801\u3002\u6211\u4eec\u8fd8\u6709\u4e00\u53f0\u8c37\u6b4c Colab \u7b14\u8bb0\u672c\u7535\u8111\u3002</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/gan/cycle_gan/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
 "<h2>Configurations</h2>\n": "<h2>\u914d\u7f6e</h2>\n",
 "<h2>Evaluate trained Cycle GAN</h2>\n": "<h2>\u8bc4\u4f30\u8bad\u7ec3\u8fc7\u7684\u5faa\u73af GAN</h2>\n",
 "<h2>Initialize models and data loaders</h2>\n": "<h2>\u521d\u59cb\u5316\u6a21\u578b\u548c\u6570\u636e\u52a0\u8f7d\u5668</h2>\n",
--- a/translate_cache/gan/cycle_gan/readme.ja.json
+++ b/translate_cache/gan/cycle_gan/readme.ja.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/cycle_gan/index.html\">Cycle GAN</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://papers.labml.ai/paper/1703.10593\">Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/cycle_gan/index.html\">\u30b5\u30a4\u30af\u30eb GAN</a></h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://pytorch.org\"><a href=\"https://papers.labml.ai/paper/1703.10593\">\u30b5\u30a4\u30af\u30eb\u30b3\u30f3\u30b7\u30b9\u30c6\u30f3\u30c8\u306a\u6575\u5bfe\u7684\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u4f7f\u7528\u3057\u305f\u30da\u30a2\u30ea\u30f3\u30b0\u3055\u308c\u3066\u3044\u306a\u3044\u753b\u50cf\u304b\u3089\u753b\u50cf\u3078\u306e\u7ffb\u8a33\u3068\u3044\u3046\u8ad6\u6587\u306ePyTorch\u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3067\u3059</a></a>\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/cycle_gan/index.html\">Cycle GAN</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://arxiv.org/abs/1703.10593\">Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/cycle_gan/index.html\">\u30b5\u30a4\u30af\u30eb GAN</a></h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://pytorch.org\"><a href=\"https://arxiv.org/abs/1703.10593\">\u30b5\u30a4\u30af\u30eb\u30b3\u30f3\u30b7\u30b9\u30c6\u30f3\u30c8\u306a\u6575\u5bfe\u7684\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u4f7f\u7528\u3057\u305f\u30da\u30a2\u30ea\u30f3\u30b0\u3055\u308c\u3066\u3044\u306a\u3044\u753b\u50cf\u304b\u3089\u753b\u50cf\u3078\u306e\u7ffb\u8a33\u3068\u3044\u3046\u8ad6\u6587\u306ePyTorch\u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3067\u3059</a></a>\u3002</p>\n",
 "Cycle GAN": "\u30b5\u30a4\u30af\u30eb GAN"
 }
--- a/translate_cache/gan/cycle_gan/readme.si.json
+++ b/translate_cache/gan/cycle_gan/readme.si.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/cycle_gan/index.html\">Cycle GAN</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://papers.labml.ai/paper/1703.10593\">Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/cycle_gan/index.html\">\u0da0\u0d9a\u0dca\u0dbb\u0dba GAN</a></h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://pytorch.org\">PyTorch</a> \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8/\u0db1\u0dd2\u0db6\u0db1\u0dca\u0db0\u0db1\u0dba\u0d9a\u0dca \u0dc0\u0db1 \u0d85\u0dad\u0dbb \u0d91\u0dba <a href=\"https://papers.labml.ai/paper/1703.10593\">Cycle-Consistent adversarial Networks \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0db8\u0dd2\u0db1\u0dca PyTorch Image \u0db4\u0dbb\u0dd2\u0dc0\u0dbb\u0dca\u0dad\u0db1\u0dba \u0db1\u0ddc\u0d9a\u0dc5 \u0d85\u0db1\u0dd4\u0dbb\u0dd6\u0db4\u0dba-\u0dbb\u0dd6\u0db4 \u0db4\u0dbb\u0dd2\u0dc0\u0dbb\u0dca\u0dad\u0db1\u0dba</a> . </p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/cycle_gan/index.html\">Cycle GAN</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://arxiv.org/abs/1703.10593\">Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/cycle_gan/index.html\">\u0da0\u0d9a\u0dca\u0dbb\u0dba GAN</a></h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://pytorch.org\">PyTorch</a> \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8/\u0db1\u0dd2\u0db6\u0db1\u0dca\u0db0\u0db1\u0dba\u0d9a\u0dca \u0dc0\u0db1 \u0d85\u0dad\u0dbb \u0d91\u0dba <a href=\"https://arxiv.org/abs/1703.10593\">Cycle-Consistent adversarial Networks \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0db8\u0dd2\u0db1\u0dca PyTorch Image \u0db4\u0dbb\u0dd2\u0dc0\u0dbb\u0dca\u0dad\u0db1\u0dba \u0db1\u0ddc\u0d9a\u0dc5 \u0d85\u0db1\u0dd4\u0dbb\u0dd6\u0db4\u0dba-\u0dbb\u0dd6\u0db4 \u0db4\u0dbb\u0dd2\u0dc0\u0dbb\u0dca\u0dad\u0db1\u0dba</a> . </p>\n",
 "Cycle GAN": "\u0da0\u0d9a\u0dca\u0dbb\u0dba GAN"
 }
--- a/translate_cache/gan/cycle_gan/readme.zh.json
+++ b/translate_cache/gan/cycle_gan/readme.zh.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/cycle_gan/index.html\">Cycle GAN</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://papers.labml.ai/paper/1703.10593\">Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/cycle_gan/index.html\">\u5faa\u73af\u589e\u76ca</a></h1>\n<p>\u8fd9\u662f\u8bba\u6587\u300a<a href=\"https://papers.labml.ai/paper/1703.10593\">\u4f7f\u7528\u5468\u671f\u4e00\u81f4\u7684\u5bf9\u6297\u7f51\u7edc\u8fdb\u884c\u672a\u914d\u5bf9\u7684\u56fe\u50cf\u5230\u56fe\u50cf\u8f6c\u6362\u300b\u7684 Py</a> <a href=\"https://pytorch.org\">Torch</a> \u5b9e\u73b0/\u6559\u7a0b\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/cycle_gan/index.html\">Cycle GAN</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://arxiv.org/abs/1703.10593\">Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/cycle_gan/index.html\">\u5faa\u73af\u589e\u76ca</a></h1>\n<p>\u8fd9\u662f\u8bba\u6587\u300a<a href=\"https://arxiv.org/abs/1703.10593\">\u4f7f\u7528\u5468\u671f\u4e00\u81f4\u7684\u5bf9\u6297\u7f51\u7edc\u8fdb\u884c\u672a\u914d\u5bf9\u7684\u56fe\u50cf\u5230\u56fe\u50cf\u8f6c\u6362\u300b\u7684 Py</a> <a href=\"https://pytorch.org\">Torch</a> \u5b9e\u73b0/\u6559\u7a0b\u3002</p>\n",
 "Cycle GAN": "\u5faa\u73af\u589e\u76ca"
 }
--- a/translate_cache/gan/dcgan/init.ja.json
+++ b/translate_cache/gan/dcgan/init.ja.json
@ -1,5 +1,5 @@
 {
- "<h1>Deep Convolutional Generative Adversarial Networks (DCGAN)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of paper <a href=\"https://papers.labml.ai/paper/1511.06434\">Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks</a>.</p>\n<p>This implementation is based on the <a href=\"https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html\">PyTorch DCGAN Tutorial</a>.</p>\n": "<h1>\u6df1\u5c64\u7573\u307f\u8fbc\u307f\u578b\u6575\u5bfe\u7684\u751f\u6210\u30cd\u30c3\u30c8\u30ef\u30fc\u30af (DCGAN)</h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://pytorch.org\"><a href=\"https://papers.labml.ai/paper/1511.06434\">\u6df1\u5c64\u7573\u307f\u8fbc\u307f\u751f\u6210\u578b\u6575\u5bfe\u7684\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u7528\u3044\u305f\u6559\u5e2b\u306a\u3057\u8868\u73fe\u5b66\u7fd2\u306ePyTorch\u5b9f\u88c5\u3067\u3059</a></a>\u3002</p>\n<p>\u3053\u306e\u5b9f\u88c5\u306f <a href=\"https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html\">PyTorch DCGAN</a> \u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u306b\u57fa\u3065\u3044\u3066\u3044\u307e\u3059\u3002</p>\n",
+ "<h1>Deep Convolutional Generative Adversarial Networks (DCGAN)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of paper <a href=\"https://arxiv.org/abs/1511.06434\">Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks</a>.</p>\n<p>This implementation is based on the <a href=\"https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html\">PyTorch DCGAN Tutorial</a>.</p>\n": "<h1>\u6df1\u5c64\u7573\u307f\u8fbc\u307f\u578b\u6575\u5bfe\u7684\u751f\u6210\u30cd\u30c3\u30c8\u30ef\u30fc\u30af (DCGAN)</h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://pytorch.org\"><a href=\"https://arxiv.org/abs/1511.06434\">\u6df1\u5c64\u7573\u307f\u8fbc\u307f\u751f\u6210\u578b\u6575\u5bfe\u7684\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u7528\u3044\u305f\u6559\u5e2b\u306a\u3057\u8868\u73fe\u5b66\u7fd2\u306ePyTorch\u5b9f\u88c5\u3067\u3059</a></a>\u3002</p>\n<p>\u3053\u306e\u5b9f\u88c5\u306f <a href=\"https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html\">PyTorch DCGAN</a> \u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u306b\u57fa\u3065\u3044\u3066\u3044\u307e\u3059\u3002</p>\n",
 "<h3>Convolutional Discriminator Network</h3>\n": "<h3>\u7573\u307f\u8fbc\u307f\u5f01\u5225\u30cd\u30c3\u30c8\u30ef\u30fc\u30af</h3>\n",
 "<h3>Convolutional Generator Network</h3>\n<p>This is similar to the de-convolutional network used for CelebA faces, but modified for MNIST images.</p>\n<p><span translate=no>_^_0_^_</span></p>\n": "<h3>\u7573\u307f\u8fbc\u307f\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30cd\u30c3\u30c8\u30ef\u30fc\u30af</h3>\n<p>\u3053\u308c\u306f CeleBA \u30d5\u30a7\u30fc\u30b9\u306b\u4f7f\u7528\u3055\u308c\u3066\u3044\u308b\u30c7\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30ca\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306b\u4f3c\u3066\u3044\u307e\u3059\u304c\u3001MNIST \u30a4\u30e1\u30fc\u30b8\u7528\u306b\u5909\u66f4\u3055\u308c\u3066\u3044\u307e\u3059\u3002</p>\n<p><span translate=no>_^_0_^_</span></p>\n",
 "<p>Change from shape <span translate=no>_^_0_^_</span> to <span translate=no>_^_1_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span>\u5f62\u72b6\u3092\u6b21\u306e\u3088\u3046\u306b\u5909\u66f4 <span translate=no>_^_1_^_</span></p>\n",
--- a/translate_cache/gan/dcgan/init.si.json
+++ b/translate_cache/gan/dcgan/init.si.json
@ -1,5 +1,5 @@
 {
- "<h1>Deep Convolutional Generative Adversarial Networks (DCGAN)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of paper <a href=\"https://papers.labml.ai/paper/1511.06434\">Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks</a>.</p>\n<p>This implementation is based on the <a href=\"https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html\">PyTorch DCGAN Tutorial</a>.</p>\n": "<h1>\u0d9c\u0dd0\u0db9\u0dd4\u0dbb\u0dd4\u0dc3\u0d82\u0dc0\u0dc4\u0db1 \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd (DCGAN)</h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://pytorch.org\">PyTorch</a> <a href=\"https://papers.labml.ai/paper/1511.06434\">\u0d9c\u0dd0\u0db9\u0dd4\u0dbb\u0dd4 \u0dc3\u0d82\u0d9a\u0ddd\u0da0\u0db1 \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd\u0dba\u0db1\u0dca \u0dc3\u0db8\u0d9f \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0d85\u0db0\u0dd3\u0d9a\u0dca\u0dc2\u0dab\u0dba \u0db1\u0ddc\u0d9a\u0dc5 \u0db1\u0dd2\u0dba\u0ddd\u0da2\u0db1 \u0d89\u0d9c\u0dd9\u0db1\u0dd4\u0db8\u0dca</a> \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dba\u0dd2. </p>\n<p>\u0db8\u0dd9\u0db8\u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 <a href=\"https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html\">PyTorch DCGAN \u0db1\u0dd2\u0db6\u0db1\u0dca\u0db0\u0db1\u0dba</a>\u0db8\u0dad \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dda. </p>\n",
+ "<h1>Deep Convolutional Generative Adversarial Networks (DCGAN)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of paper <a href=\"https://arxiv.org/abs/1511.06434\">Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks</a>.</p>\n<p>This implementation is based on the <a href=\"https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html\">PyTorch DCGAN Tutorial</a>.</p>\n": "<h1>\u0d9c\u0dd0\u0db9\u0dd4\u0dbb\u0dd4\u0dc3\u0d82\u0dc0\u0dc4\u0db1 \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd (DCGAN)</h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://pytorch.org\">PyTorch</a> <a href=\"https://arxiv.org/abs/1511.06434\">\u0d9c\u0dd0\u0db9\u0dd4\u0dbb\u0dd4 \u0dc3\u0d82\u0d9a\u0ddd\u0da0\u0db1 \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd\u0dba\u0db1\u0dca \u0dc3\u0db8\u0d9f \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0d85\u0db0\u0dd3\u0d9a\u0dca\u0dc2\u0dab\u0dba \u0db1\u0ddc\u0d9a\u0dc5 \u0db1\u0dd2\u0dba\u0ddd\u0da2\u0db1 \u0d89\u0d9c\u0dd9\u0db1\u0dd4\u0db8\u0dca</a> \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dba\u0dd2. </p>\n<p>\u0db8\u0dd9\u0db8\u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 <a href=\"https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html\">PyTorch DCGAN \u0db1\u0dd2\u0db6\u0db1\u0dca\u0db0\u0db1\u0dba</a>\u0db8\u0dad \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dda. </p>\n",
 "<h3>Convolutional Discriminator Network</h3>\n": "<h3>\u0dc3\u0d82\u0dc0\u0dd2\u0da0\u0dca\u0da1\u0dda\u0daf\u0d9a\u0dc0\u0dd2\u0dc3\u0d82\u0dc0\u0dcf\u0daf\u0dd3 \u0da2\u0dcf\u0dbd\u0dba</h3>\n",
 "<h3>Convolutional Generator Network</h3>\n<p>This is similar to the de-convolutional network used for CelebA faces, but modified for MNIST images.</p>\n<p><span translate=no>_^_0_^_</span></p>\n": "<h3>\u0dc3\u0d82\u0dc0\u0dbb\u0dca\u0dad\u0da2\u0dcf\u0dbd \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0dba\u0db1\u0dca\u0dad\u0dca\u0dbb\u0dba</h3>\n<p>\u0db8\u0dd9\u0dba\u0dc3\u0dd9\u0dbd\u0dd9\u0db6\u0dcf \u0db8\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0dc3\u0db3\u0dc4\u0dcf \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0db1 \u0daf-\u0dc3\u0d82\u0dc0\u0dc4\u0db1 \u0da2\u0dcf\u0dbd\u0dba\u0da7 \u0dc3\u0db8\u0dcf\u0db1 \u0dc0\u0db1 \u0db1\u0db8\u0dd4\u0dad\u0dca MNIST \u0dbb\u0dd6\u0db4 \u0dc3\u0db3\u0dc4\u0dcf \u0dc0\u0dd9\u0db1\u0dc3\u0dca \u0d9a\u0dbb \u0d87\u0dad. </p>\n<p><span translate=no>_^_0_^_</span></p>\n",
 "<p>Change from shape <span translate=no>_^_0_^_</span> to <span translate=no>_^_1_^_</span> </p>\n": "<p>\u0dc4\u0dd0\u0da9\u0dba\u0dd9\u0db1\u0dca\u0dc0\u0dd9\u0db1\u0dc3\u0dca <span translate=no>_^_0_^_</span> \u0d9a\u0dbb\u0db1\u0dca\u0db1 <span translate=no>_^_1_^_</span> </p>\n",
--- a/translate_cache/gan/dcgan/init.zh.json
+++ b/translate_cache/gan/dcgan/init.zh.json
@ -1,5 +1,5 @@
 {
- "<h1>Deep Convolutional Generative Adversarial Networks (DCGAN)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of paper <a href=\"https://papers.labml.ai/paper/1511.06434\">Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks</a>.</p>\n<p>This implementation is based on the <a href=\"https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html\">PyTorch DCGAN Tutorial</a>.</p>\n": "<h1>\u6df1\u5ea6\u5377\u79ef\u751f\u6210\u5bf9\u6297\u7f51\u7edc (DCGAN)</h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5b9e\u73b0\u7684\u8bba\u6587\u300a<a href=\"https://papers.labml.ai/paper/1511.06434\">\u4f7f\u7528\u6df1\u5ea6\u5377\u79ef\u751f\u6210\u5bf9\u6297\u7f51\u7edc\u8fdb\u884c\u65e0\u76d1\u7763\u8868\u793a\u5b66\u4e60</a>\u300b\u3002</p>\n<p>\u6b64\u5b9e\u73b0\u57fa\u4e8e <a href=\"https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html\">PyTorch DCGAN \u6559\u7a0b</a>\u3002</p>\n",
+ "<h1>Deep Convolutional Generative Adversarial Networks (DCGAN)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of paper <a href=\"https://arxiv.org/abs/1511.06434\">Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks</a>.</p>\n<p>This implementation is based on the <a href=\"https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html\">PyTorch DCGAN Tutorial</a>.</p>\n": "<h1>\u6df1\u5ea6\u5377\u79ef\u751f\u6210\u5bf9\u6297\u7f51\u7edc (DCGAN)</h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5b9e\u73b0\u7684\u8bba\u6587\u300a<a href=\"https://arxiv.org/abs/1511.06434\">\u4f7f\u7528\u6df1\u5ea6\u5377\u79ef\u751f\u6210\u5bf9\u6297\u7f51\u7edc\u8fdb\u884c\u65e0\u76d1\u7763\u8868\u793a\u5b66\u4e60</a>\u300b\u3002</p>\n<p>\u6b64\u5b9e\u73b0\u57fa\u4e8e <a href=\"https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html\">PyTorch DCGAN \u6559\u7a0b</a>\u3002</p>\n",
 "<h3>Convolutional Discriminator Network</h3>\n": "<h3>\u5377\u79ef\u9274\u522b\u5668\u7f51\u7edc</h3>\n",
 "<h3>Convolutional Generator Network</h3>\n<p>This is similar to the de-convolutional network used for CelebA faces, but modified for MNIST images.</p>\n<p><span translate=no>_^_0_^_</span></p>\n": "<h3>\u5377\u79ef\u751f\u6210\u5668\u7f51\u7edc</h3>\n<p>\u8fd9\u7c7b\u4f3c\u4e8e\u7528\u4e8e CeleBA \u4eba\u8138\u7684\u53cd\u5377\u79ef\u7f51\u7edc\uff0c\u4f46\u9488\u5bf9 MNIST \u56fe\u50cf\u8fdb\u884c\u4e86\u4fee\u6539\u3002</p>\n<p><span translate=no>_^_0_^_</span></p>\n",
 "<p>Change from shape <span translate=no>_^_0_^_</span> to <span translate=no>_^_1_^_</span> </p>\n": "<p>\u4ece\u5f62\u72b6\u6539<span translate=no>_^_0_^_</span>\u4e3a<span translate=no>_^_1_^_</span></p>\n",
--- a/translate_cache/gan/dcgan/readme.ja.json
+++ b/translate_cache/gan/dcgan/readme.ja.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/dcgan/index.html\">Deep Convolutional Generative Adversarial Networks - DCGAN</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of paper <a href=\"https://papers.labml.ai/paper/1511.06434\">Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/dcgan/index.html\">\u6df1\u5c64\u7573\u307f\u8fbc\u307f\u751f\u6210\u578b\u6575\u5bfe\u30cd\u30c3\u30c8\u30ef\u30fc\u30af-DCGAN</a></h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://pytorch.org\"><a href=\"https://papers.labml.ai/paper/1511.06434\">\u6df1\u5c64\u7573\u307f\u8fbc\u307f\u751f\u6210\u578b\u6575\u5bfe\u7684\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u7528\u3044\u305f\u6559\u5e2b\u306a\u3057\u8868\u73fe\u5b66\u7fd2\u306ePyTorch\u5b9f\u88c5\u3067\u3059</a></a>\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/dcgan/index.html\">Deep Convolutional Generative Adversarial Networks - DCGAN</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of paper <a href=\"https://arxiv.org/abs/1511.06434\">Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/dcgan/index.html\">\u6df1\u5c64\u7573\u307f\u8fbc\u307f\u751f\u6210\u578b\u6575\u5bfe\u30cd\u30c3\u30c8\u30ef\u30fc\u30af-DCGAN</a></h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://pytorch.org\"><a href=\"https://arxiv.org/abs/1511.06434\">\u6df1\u5c64\u7573\u307f\u8fbc\u307f\u751f\u6210\u578b\u6575\u5bfe\u7684\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u7528\u3044\u305f\u6559\u5e2b\u306a\u3057\u8868\u73fe\u5b66\u7fd2\u306ePyTorch\u5b9f\u88c5\u3067\u3059</a></a>\u3002</p>\n",
 "Deep Convolutional Generative Adversarial Networks - DCGAN": "\u6df1\u5c64\u7573\u307f\u8fbc\u307f\u751f\u6210\u578b\u6575\u5bfe\u30cd\u30c3\u30c8\u30ef\u30fc\u30af-DCGAN"
 }
--- a/translate_cache/gan/dcgan/readme.si.json
+++ b/translate_cache/gan/dcgan/readme.si.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/dcgan/index.html\">Deep Convolutional Generative Adversarial Networks - DCGAN</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of paper <a href=\"https://papers.labml.ai/paper/1511.06434\">Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/dcgan/index.html\">\u0d9c\u0dd0\u0db9\u0dd4\u0dbb\u0dd4 \u0dc3\u0d82\u0dc0\u0dbb\u0dca\u0dad \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd - DCGAN</a></h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://pytorch.org\">PyTorch</a> <a href=\"https://papers.labml.ai/paper/1511.06434\">\u0d9c\u0dd0\u0db9\u0dd4\u0dbb\u0dd4 \u0dc3\u0d82\u0d9a\u0ddd\u0da0\u0db1 \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd\u0dba\u0db1\u0dca \u0dc3\u0db8\u0d9f \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0d85\u0db0\u0dd3\u0d9a\u0dca\u0dc2\u0dab\u0dba \u0db1\u0ddc\u0d9a\u0dc5 \u0db1\u0dd2\u0dba\u0ddd\u0da2\u0db1 \u0d89\u0d9c\u0dd9\u0db1\u0dd4\u0db8\u0dca</a> \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dba\u0dd2. </p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/dcgan/index.html\">Deep Convolutional Generative Adversarial Networks - DCGAN</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of paper <a href=\"https://arxiv.org/abs/1511.06434\">Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/dcgan/index.html\">\u0d9c\u0dd0\u0db9\u0dd4\u0dbb\u0dd4 \u0dc3\u0d82\u0dc0\u0dbb\u0dca\u0dad \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd - DCGAN</a></h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://pytorch.org\">PyTorch</a> <a href=\"https://arxiv.org/abs/1511.06434\">\u0d9c\u0dd0\u0db9\u0dd4\u0dbb\u0dd4 \u0dc3\u0d82\u0d9a\u0ddd\u0da0\u0db1 \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd\u0dba\u0db1\u0dca \u0dc3\u0db8\u0d9f \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0d85\u0db0\u0dd3\u0d9a\u0dca\u0dc2\u0dab\u0dba \u0db1\u0ddc\u0d9a\u0dc5 \u0db1\u0dd2\u0dba\u0ddd\u0da2\u0db1 \u0d89\u0d9c\u0dd9\u0db1\u0dd4\u0db8\u0dca</a> \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dba\u0dd2. </p>\n",
 "Deep Convolutional Generative Adversarial Networks - DCGAN": "\u0d9c\u0dd0\u0db9\u0dd4\u0dbb\u0dd4 \u0dc3\u0d82\u0dc0\u0dbb\u0dca\u0dad \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd - DCGAN"
 }
--- a/translate_cache/gan/dcgan/readme.zh.json
+++ b/translate_cache/gan/dcgan/readme.zh.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/dcgan/index.html\">Deep Convolutional Generative Adversarial Networks - DCGAN</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of paper <a href=\"https://papers.labml.ai/paper/1511.06434\">Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/dcgan/index.html\">\u6df1\u5ea6\u5377\u79ef\u751f\u6210\u5bf9\u6297\u7f51\u7edc-DCGAN</a></h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5b9e\u73b0\u7684\u8bba\u6587\u300a<a href=\"https://papers.labml.ai/paper/1511.06434\">\u4f7f\u7528\u6df1\u5ea6\u5377\u79ef\u751f\u6210\u5bf9\u6297\u7f51\u7edc\u8fdb\u884c\u65e0\u76d1\u7763\u8868\u793a\u5b66\u4e60</a>\u300b\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/dcgan/index.html\">Deep Convolutional Generative Adversarial Networks - DCGAN</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of paper <a href=\"https://arxiv.org/abs/1511.06434\">Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/dcgan/index.html\">\u6df1\u5ea6\u5377\u79ef\u751f\u6210\u5bf9\u6297\u7f51\u7edc-DCGAN</a></h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5b9e\u73b0\u7684\u8bba\u6587\u300a<a href=\"https://arxiv.org/abs/1511.06434\">\u4f7f\u7528\u6df1\u5ea6\u5377\u79ef\u751f\u6210\u5bf9\u6297\u7f51\u7edc\u8fdb\u884c\u65e0\u76d1\u7763\u8868\u793a\u5b66\u4e60</a>\u300b\u3002</p>\n",
 "Deep Convolutional Generative Adversarial Networks - DCGAN": "\u6df1\u5ea6\u5377\u79ef\u751f\u6210\u5bf9\u6297\u7f51\u7edc-DCGAN"
 }
--- a/translate_cache/gan/original/init.ja.json
+++ b/translate_cache/gan/original/init.ja.json
@ -1,5 +1,5 @@
 {
- "<h1>Generative Adversarial Networks (GAN)</h1>\n<p>This is an implementation of <a href=\"https://papers.labml.ai/paper/1406.2661\">Generative Adversarial Networks</a>.</p>\n<p>The generator, <span translate=no>_^_0_^_</span> generates samples that match the distribution of data, while the discriminator, <span translate=no>_^_1_^_</span> gives the probability that <span translate=no>_^_2_^_</span> came from data rather than <span translate=no>_^_3_^_</span>.</p>\n<p>We train <span translate=no>_^_4_^_</span> and <span translate=no>_^_5_^_</span> simultaneously on a two-player min-max game with value function <span translate=no>_^_6_^_</span>.</p>\n<p><span translate=no>_^_7_^_</span></p>\n<p><span translate=no>_^_8_^_</span> is the probability distribution over data, whilst <span translate=no>_^_9_^_</span> probability distribution of <span translate=no>_^_10_^_</span>, which is set to gaussian noise.</p>\n<p>This file defines the loss functions. <a href=\"experiment.html\">Here</a> is an MNIST example with two multilayer perceptron for the generator and discriminator.</p>\n": "<h1>\u30b8\u30a7\u30cd\u30ec\u30fc\u30c6\u30a3\u30d6\u30fb\u30a2\u30c9\u30d0\u30fc\u30b5\u30ea\u30a2\u30eb\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af (GAN)</h1>\n<p><a href=\"https://papers.labml.ai/paper/1406.2661\">\u3053\u308c\u306f\u30b8\u30a7\u30cd\u30ec\u30fc\u30c6\u30a3\u30d6\u30fb\u30a2\u30c9\u30d0\u30fc\u30b5\u30ea\u30a2\u30eb\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u5b9f\u88c5\u3067\u3059</a>\u3002</p>\n<p><span translate=no>_^_0_^_</span>\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc\u306f\u30c7\u30fc\u30bf\u306e\u5206\u5e03\u306b\u4e00\u81f4\u3059\u308b\u30b5\u30f3\u30d7\u30eb\u3092\u751f\u6210\u3057\u3001<span translate=no>_^_1_^_</span>\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u306f\u30c7\u30fc\u30bf\u304b\u3089\u5f97\u3089\u308c\u308b\u78ba\u7387\u3067\u306f\u306a\u304f\u3001<span translate=no>_^_2_^_</span>\u30c7\u30fc\u30bf\u304b\u3089\u5f97\u3089\u308c\u308b\u78ba\u7387\u3092\u8fd4\u3057\u307e\u3059\u3002<span translate=no>_^_3_^_</span></p>\n<p><span translate=no>_^_4_^_</span><span translate=no>_^_5_^_</span>\u30d0\u30ea\u30e5\u30fc\u6a5f\u80fd\u3092\u5099\u3048\u305f2\u4eba\u7528\u306e\u30df\u30cb\u30de\u30c3\u30af\u30b9\u30b2\u30fc\u30e0\u3067\u540c\u6642\u306b\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3057\u307e\u3059\u3002<span translate=no>_^_6_^_</span></p>\n<p><span translate=no>_^_7_^_</span></p>\n<p><span translate=no>_^_8_^_</span>\u306f\u30c7\u30fc\u30bf\u5168\u4f53\u306e\u78ba\u7387\u5206\u5e03\u3067<span translate=no>_^_10_^_</span>\u3001<span translate=no>_^_9_^_</span>\u306e\u78ba\u7387\u5206\u5e03\u306f\u30ac\u30a6\u30b9\u30ce\u30a4\u30ba\u306b\u8a2d\u5b9a\u3055\u308c\u307e\u3059\u3002</p>\n<p>\u3053\u306e\u30d5\u30a1\u30a4\u30eb\u306f\u640d\u5931\u95a2\u6570\u3092\u5b9a\u7fa9\u3057\u307e\u3059\u3002<a href=\"experiment.html\">\u3053\u308c\u306f</a>\u3001\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc\u3068\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u306b2\u3064\u306e\u591a\u5c64\u30d1\u30fc\u30bb\u30d7\u30c8\u30ed\u30f3\u3092\u4f7f\u3063\u305fMNIST\u306e\u4f8b\u3067\u3059</p>\u3002\n",
+ "<h1>Generative Adversarial Networks (GAN)</h1>\n<p>This is an implementation of <a href=\"https://arxiv.org/abs/1406.2661\">Generative Adversarial Networks</a>.</p>\n<p>The generator, <span translate=no>_^_0_^_</span> generates samples that match the distribution of data, while the discriminator, <span translate=no>_^_1_^_</span> gives the probability that <span translate=no>_^_2_^_</span> came from data rather than <span translate=no>_^_3_^_</span>.</p>\n<p>We train <span translate=no>_^_4_^_</span> and <span translate=no>_^_5_^_</span> simultaneously on a two-player min-max game with value function <span translate=no>_^_6_^_</span>.</p>\n<p><span translate=no>_^_7_^_</span></p>\n<p><span translate=no>_^_8_^_</span> is the probability distribution over data, whilst <span translate=no>_^_9_^_</span> probability distribution of <span translate=no>_^_10_^_</span>, which is set to gaussian noise.</p>\n<p>This file defines the loss functions. <a href=\"experiment.html\">Here</a> is an MNIST example with two multilayer perceptron for the generator and discriminator.</p>\n": "<h1>\u30b8\u30a7\u30cd\u30ec\u30fc\u30c6\u30a3\u30d6\u30fb\u30a2\u30c9\u30d0\u30fc\u30b5\u30ea\u30a2\u30eb\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af (GAN)</h1>\n<p><a href=\"https://arxiv.org/abs/1406.2661\">\u3053\u308c\u306f\u30b8\u30a7\u30cd\u30ec\u30fc\u30c6\u30a3\u30d6\u30fb\u30a2\u30c9\u30d0\u30fc\u30b5\u30ea\u30a2\u30eb\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u5b9f\u88c5\u3067\u3059</a>\u3002</p>\n<p><span translate=no>_^_0_^_</span>\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc\u306f\u30c7\u30fc\u30bf\u306e\u5206\u5e03\u306b\u4e00\u81f4\u3059\u308b\u30b5\u30f3\u30d7\u30eb\u3092\u751f\u6210\u3057\u3001<span translate=no>_^_1_^_</span>\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u306f\u30c7\u30fc\u30bf\u304b\u3089\u5f97\u3089\u308c\u308b\u78ba\u7387\u3067\u306f\u306a\u304f\u3001<span translate=no>_^_2_^_</span>\u30c7\u30fc\u30bf\u304b\u3089\u5f97\u3089\u308c\u308b\u78ba\u7387\u3092\u8fd4\u3057\u307e\u3059\u3002<span translate=no>_^_3_^_</span></p>\n<p><span translate=no>_^_4_^_</span><span translate=no>_^_5_^_</span>\u30d0\u30ea\u30e5\u30fc\u6a5f\u80fd\u3092\u5099\u3048\u305f2\u4eba\u7528\u306e\u30df\u30cb\u30de\u30c3\u30af\u30b9\u30b2\u30fc\u30e0\u3067\u540c\u6642\u306b\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3057\u307e\u3059\u3002<span translate=no>_^_6_^_</span></p>\n<p><span translate=no>_^_7_^_</span></p>\n<p><span translate=no>_^_8_^_</span>\u306f\u30c7\u30fc\u30bf\u5168\u4f53\u306e\u78ba\u7387\u5206\u5e03\u3067<span translate=no>_^_10_^_</span>\u3001<span translate=no>_^_9_^_</span>\u306e\u78ba\u7387\u5206\u5e03\u306f\u30ac\u30a6\u30b9\u30ce\u30a4\u30ba\u306b\u8a2d\u5b9a\u3055\u308c\u307e\u3059\u3002</p>\n<p>\u3053\u306e\u30d5\u30a1\u30a4\u30eb\u306f\u640d\u5931\u95a2\u6570\u3092\u5b9a\u7fa9\u3057\u307e\u3059\u3002<a href=\"experiment.html\">\u3053\u308c\u306f</a>\u3001\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc\u3068\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u306b2\u3064\u306e\u591a\u5c64\u30d1\u30fc\u30bb\u30d7\u30c8\u30ed\u30f3\u3092\u4f7f\u3063\u305fMNIST\u306e\u4f8b\u3067\u3059</p>\u3002\n",
 "<h2>Discriminator Loss</h2>\n<p>Discriminator should <strong>ascend</strong> on the gradient,</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p><span translate=no>_^_1_^_</span> is the mini-batch size and <span translate=no>_^_2_^_</span> is used to index samples in the mini-batch. <span translate=no>_^_3_^_</span> are samples from <span translate=no>_^_4_^_</span> and <span translate=no>_^_5_^_</span> are samples from <span translate=no>_^_6_^_</span>.</p>\n": "<h2>\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u30ed\u30b9</h2>\n<p><strong>\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u306f\u52fe\u914d\u306e\u4e0a\u3092\u5411\u3044\u3066\u3044\u308b\u306f\u305a\u3067\u3059\u304c</strong></p>\n<p><span translate=no>_^_0_^_</span></p>\n<p><span translate=no>_^_1_^_</span>\u306f\u30df\u30cb\u30d0\u30c3\u30c1\u30b5\u30a4\u30ba\u3067\u3001<span translate=no>_^_2_^_</span>\u30df\u30cb\u30d0\u30c3\u30c1\u5185\u306e\u30b5\u30f3\u30d7\u30eb\u306e\u30a4\u30f3\u30c7\u30c3\u30af\u30b9\u306b\u4f7f\u7528\u3055\u308c\u307e\u3059\u3002<span translate=no>_^_3_^_</span><span translate=no>_^_4_^_</span><span translate=no>_^_5_^_</span>\u304b\u3089\u306e\u30b5\u30f3\u30d7\u30eb\u3067\u3042\u308a\u3001<span translate=no>_^_6_^_</span>\u304b\u3089\u306e\u30b5\u30f3\u30d7\u30eb\u3067\u3059\u3002</p>\n",
 "<h2>Generator Loss</h2>\n<p>Generator should <strong>descend</strong> on the gradient,</p>\n<p><span translate=no>_^_0_^_</span></p>\n": "<h2>\u767a\u96fb\u6a5f\u640d\u5931</h2>\n<p><strong>\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u306f\u52fe\u914d\u306b\u6cbf\u3063\u3066\u4e0b\u964d\u3059\u308b\u306f\u305a\u3067\u3059\u304c</strong>\u3001</p>\n<p><span translate=no>_^_0_^_</span></p>\n",
 "<p> <span translate=no>_^_0_^_</span> are logits from <span translate=no>_^_1_^_</span> and <span translate=no>_^_2_^_</span> are logits from <span translate=no>_^_3_^_</span></p>\n": "<p><span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span><span translate=no>_^_2_^_</span>\u5143\u306e\u30ed\u30b8\u30c3\u30c8\u3068\u5143\u306e\u30ed\u30b8\u30c3\u30c8 <span translate=no>_^_3_^_</span></p>\n",
--- a/translate_cache/gan/original/init.si.json
+++ b/translate_cache/gan/original/init.si.json
@ -1,5 +1,5 @@
 {
- "<h1>Generative Adversarial Networks (GAN)</h1>\n<p>This is an implementation of <a href=\"https://papers.labml.ai/paper/1406.2661\">Generative Adversarial Networks</a>.</p>\n<p>The generator, <span translate=no>_^_0_^_</span> generates samples that match the distribution of data, while the discriminator, <span translate=no>_^_1_^_</span> gives the probability that <span translate=no>_^_2_^_</span> came from data rather than <span translate=no>_^_3_^_</span>.</p>\n<p>We train <span translate=no>_^_4_^_</span> and <span translate=no>_^_5_^_</span> simultaneously on a two-player min-max game with value function <span translate=no>_^_6_^_</span>.</p>\n<p><span translate=no>_^_7_^_</span></p>\n<p><span translate=no>_^_8_^_</span> is the probability distribution over data, whilst <span translate=no>_^_9_^_</span> probability distribution of <span translate=no>_^_10_^_</span>, which is set to gaussian noise.</p>\n<p>This file defines the loss functions. <a href=\"experiment.html\">Here</a> is an MNIST example with two multilayer perceptron for the generator and discriminator.</p>\n": "<h1>\u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a\u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd (GAN)</h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://papers.labml.ai/paper/1406.2661\">Generative Aversarial Network</a>\u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dba\u0dd2. </p>\n<p>\u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a\u0dba\u0db1\u0dca\u0dad\u0dca\u0dbb\u0dba, \u0daf\u0dad\u0dca\u0dad \u0db6\u0dd9\u0daf\u0dcf \u0dc4\u0dd0\u0dbb\u0dd3\u0db8\u0da7 \u0d9c\u0dd0\u0dbd\u0db4\u0dd9\u0db1 \u0dc3\u0dcf\u0db8\u0dca\u0db4\u0dbd <span translate=no>_^_0_^_</span> \u0da2\u0db1\u0db1\u0dba \u0d9a\u0dbb\u0db1 \u0d85\u0dad\u0dbb \u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0db8\u0dca \u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dcf, \u0dc0\u0da9\u0dcf \u0daf\u0dad\u0dca\u0dad \u0dc0\u0dbd\u0dd2\u0db1\u0dca <span translate=no>_^_2_^_</span> \u0db4\u0dd0\u0db8\u0dd2\u0dab\u0dd2 \u0dc3\u0db8\u0dca\u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf\u0dc0 <span translate=no>_^_1_^_</span> \u0dbd\u0db6\u0dcf \u0daf\u0dd9\u0dba\u0dd2 <span translate=no>_^_3_^_</span>. </p>\n<p>\u0dc0\u0da7\u0dd2\u0db1\u0dcf\u0d9a\u0db8\u0dca\u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0d9a\u0dcf\u0dbb\u0dd2\u0dad\u0dca\u0dc0\u0dba \u0dc3\u0dc4\u0dd2\u0dad \u0d9a\u0dca\u0dbb\u0dd3\u0da9\u0d9a \u0daf\u0dd9\u0d9a\u0d9a \u0db8\u0dd2\u0db1\u0dd2-\u0db8\u0dd0\u0d9a\u0dca\u0dc3\u0dca \u0d9a\u0dca\u0dbb\u0dd3\u0da9\u0dcf\u0dc0\u0d9a\u0dca \u0dc3\u0db3\u0dc4\u0dcf \u0d85\u0db4\u0dd2 \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 <span translate=no>_^_4_^_</span> \u0d9a\u0dbb\u0db8\u0dd4 <span translate=no>_^_6_^_</span>. <span translate=no>_^_5_^_</span> </p>\n<p><span translate=no>_^_7_^_</span></p>\n<p><span translate=no>_^_8_^_</span> \u0dba\u0db1\u0dd4 \u0daf\u0dad\u0dca\u0dad \u0dc0\u0dbd\u0da7 \u0dc0\u0da9\u0dcf \u0dc3\u0db8\u0dca\u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0db6\u0dd9\u0daf\u0dcf \u0dc4\u0dd0\u0dbb\u0dd3\u0db8 \u0dc0\u0db1 \u0d85\u0dad\u0dbb <span translate=no>_^_9_^_</span> \u0dc3\u0db8\u0dca\u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0dc0\u0dca\u0dba\u0dcf\u0db4\u0dca\u0dad\u0dd2\u0dba <span translate=no>_^_10_^_</span>\u0dc0\u0db1 \u0d85\u0dad\u0dbb \u0d91\u0dba \u0d9c\u0dc0\u0dd4\u0dc3\u0dd2\u0dba\u0dcf\u0db1\u0dd4 \u0dc1\u0db6\u0dca\u0daf\u0dba\u0da7 \u0dc3\u0d9a\u0dc3\u0dcf \u0d87\u0dad. </p>\n<p>\u0db8\u0dd9\u0db8\u0d9c\u0ddc\u0db1\u0dd4\u0dc0 \u0db4\u0dcf\u0da9\u0dd4 \u0d9a\u0dcf\u0dbb\u0dca\u0dba\u0dba\u0db1\u0dca \u0d85\u0dbb\u0dca\u0dae \u0daf\u0d9a\u0dca\u0dc0\u0dba\u0dd2. <a href=\"experiment.html\">\u0db8\u0dd9\u0db1\u0dca\u0db1</a> \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0dba\u0db1\u0dca\u0dad\u0dca\u0dbb\u0dba \u0dc3\u0dc4 \u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0db8\u0dca \u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dcf \u0dc3\u0db3\u0dc4\u0dcf \u0db6\u0dc4\u0dd4 \u0dc3\u0dca\u0dae\u0dbb perceptron \u0daf\u0dd9\u0d9a\u0d9a\u0dca \u0dc3\u0dc4\u0dd2\u0dad MNIST \u0d8b\u0daf\u0dcf\u0dc4\u0dbb\u0dab\u0dba\u0d9a\u0dd2. </p>\n",
+ "<h1>Generative Adversarial Networks (GAN)</h1>\n<p>This is an implementation of <a href=\"https://arxiv.org/abs/1406.2661\">Generative Adversarial Networks</a>.</p>\n<p>The generator, <span translate=no>_^_0_^_</span> generates samples that match the distribution of data, while the discriminator, <span translate=no>_^_1_^_</span> gives the probability that <span translate=no>_^_2_^_</span> came from data rather than <span translate=no>_^_3_^_</span>.</p>\n<p>We train <span translate=no>_^_4_^_</span> and <span translate=no>_^_5_^_</span> simultaneously on a two-player min-max game with value function <span translate=no>_^_6_^_</span>.</p>\n<p><span translate=no>_^_7_^_</span></p>\n<p><span translate=no>_^_8_^_</span> is the probability distribution over data, whilst <span translate=no>_^_9_^_</span> probability distribution of <span translate=no>_^_10_^_</span>, which is set to gaussian noise.</p>\n<p>This file defines the loss functions. <a href=\"experiment.html\">Here</a> is an MNIST example with two multilayer perceptron for the generator and discriminator.</p>\n": "<h1>\u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a\u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd (GAN)</h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://arxiv.org/abs/1406.2661\">Generative Aversarial Network</a>\u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dba\u0dd2. </p>\n<p>\u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a\u0dba\u0db1\u0dca\u0dad\u0dca\u0dbb\u0dba, \u0daf\u0dad\u0dca\u0dad \u0db6\u0dd9\u0daf\u0dcf \u0dc4\u0dd0\u0dbb\u0dd3\u0db8\u0da7 \u0d9c\u0dd0\u0dbd\u0db4\u0dd9\u0db1 \u0dc3\u0dcf\u0db8\u0dca\u0db4\u0dbd <span translate=no>_^_0_^_</span> \u0da2\u0db1\u0db1\u0dba \u0d9a\u0dbb\u0db1 \u0d85\u0dad\u0dbb \u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0db8\u0dca \u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dcf, \u0dc0\u0da9\u0dcf \u0daf\u0dad\u0dca\u0dad \u0dc0\u0dbd\u0dd2\u0db1\u0dca <span translate=no>_^_2_^_</span> \u0db4\u0dd0\u0db8\u0dd2\u0dab\u0dd2 \u0dc3\u0db8\u0dca\u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf\u0dc0 <span translate=no>_^_1_^_</span> \u0dbd\u0db6\u0dcf \u0daf\u0dd9\u0dba\u0dd2 <span translate=no>_^_3_^_</span>. </p>\n<p>\u0dc0\u0da7\u0dd2\u0db1\u0dcf\u0d9a\u0db8\u0dca\u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0d9a\u0dcf\u0dbb\u0dd2\u0dad\u0dca\u0dc0\u0dba \u0dc3\u0dc4\u0dd2\u0dad \u0d9a\u0dca\u0dbb\u0dd3\u0da9\u0d9a \u0daf\u0dd9\u0d9a\u0d9a \u0db8\u0dd2\u0db1\u0dd2-\u0db8\u0dd0\u0d9a\u0dca\u0dc3\u0dca \u0d9a\u0dca\u0dbb\u0dd3\u0da9\u0dcf\u0dc0\u0d9a\u0dca \u0dc3\u0db3\u0dc4\u0dcf \u0d85\u0db4\u0dd2 \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 <span translate=no>_^_4_^_</span> \u0d9a\u0dbb\u0db8\u0dd4 <span translate=no>_^_6_^_</span>. <span translate=no>_^_5_^_</span> </p>\n<p><span translate=no>_^_7_^_</span></p>\n<p><span translate=no>_^_8_^_</span> \u0dba\u0db1\u0dd4 \u0daf\u0dad\u0dca\u0dad \u0dc0\u0dbd\u0da7 \u0dc0\u0da9\u0dcf \u0dc3\u0db8\u0dca\u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0db6\u0dd9\u0daf\u0dcf \u0dc4\u0dd0\u0dbb\u0dd3\u0db8 \u0dc0\u0db1 \u0d85\u0dad\u0dbb <span translate=no>_^_9_^_</span> \u0dc3\u0db8\u0dca\u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0dc0\u0dca\u0dba\u0dcf\u0db4\u0dca\u0dad\u0dd2\u0dba <span translate=no>_^_10_^_</span>\u0dc0\u0db1 \u0d85\u0dad\u0dbb \u0d91\u0dba \u0d9c\u0dc0\u0dd4\u0dc3\u0dd2\u0dba\u0dcf\u0db1\u0dd4 \u0dc1\u0db6\u0dca\u0daf\u0dba\u0da7 \u0dc3\u0d9a\u0dc3\u0dcf \u0d87\u0dad. </p>\n<p>\u0db8\u0dd9\u0db8\u0d9c\u0ddc\u0db1\u0dd4\u0dc0 \u0db4\u0dcf\u0da9\u0dd4 \u0d9a\u0dcf\u0dbb\u0dca\u0dba\u0dba\u0db1\u0dca \u0d85\u0dbb\u0dca\u0dae \u0daf\u0d9a\u0dca\u0dc0\u0dba\u0dd2. <a href=\"experiment.html\">\u0db8\u0dd9\u0db1\u0dca\u0db1</a> \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0dba\u0db1\u0dca\u0dad\u0dca\u0dbb\u0dba \u0dc3\u0dc4 \u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0db8\u0dca \u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dcf \u0dc3\u0db3\u0dc4\u0dcf \u0db6\u0dc4\u0dd4 \u0dc3\u0dca\u0dae\u0dbb perceptron \u0daf\u0dd9\u0d9a\u0d9a\u0dca \u0dc3\u0dc4\u0dd2\u0dad MNIST \u0d8b\u0daf\u0dcf\u0dc4\u0dbb\u0dab\u0dba\u0d9a\u0dd2. </p>\n",
 "<h2>Discriminator Loss</h2>\n<p>Discriminator should <strong>ascend</strong> on the gradient,</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p><span translate=no>_^_1_^_</span> is the mini-batch size and <span translate=no>_^_2_^_</span> is used to index samples in the mini-batch. <span translate=no>_^_3_^_</span> are samples from <span translate=no>_^_4_^_</span> and <span translate=no>_^_5_^_</span> are samples from <span translate=no>_^_6_^_</span>.</p>\n": "<h2>\u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0db8\u0dca\u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dcf\u0d9c\u0dda \u0db4\u0dcf\u0da9\u0dd4\u0dc0</h2>\n<p>\u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0db8\u0dca\u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dcf \u0dc1\u0dca\u0dbb\u0dda\u0dab\u0dd2\u0dba <strong>\u0db8\u0dad\u0da7 \u0db1\u0dd0\u0d9c\u0dca\u0dc0\u0dd2\u0dba</strong> \u0dba\u0dd4\u0dad\u0dd4\u0dba,</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p><span translate=no>_^_1_^_</span> \u0d9a\u0dd4\u0da9\u0dcf \u0d9a\u0dab\u0dca\u0da9\u0dcf\u0dba\u0db8\u0dca \u0db4\u0dca\u0dbb\u0db8\u0dcf\u0dab\u0dba <span translate=no>_^_2_^_</span> \u0dc0\u0db1 \u0d85\u0dad\u0dbb \u0d9a\u0dd4\u0da9\u0dcf \u0d9a\u0dab\u0dca\u0da9\u0dcf\u0dba\u0db8\u0dda \u0dc3\u0dcf\u0db8\u0dca\u0db4\u0dbd \u0daf\u0dbb\u0dca\u0dc1\u0d9a\u0dba \u0dc3\u0db3\u0dc4\u0dcf \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0dba\u0dd2. <span translate=no>_^_3_^_</span> \u0dc0\u0dd9\u0dad\u0dd2\u0db1\u0dca \u0dc3\u0dcf\u0db8\u0dca\u0db4\u0dbd \u0dc0\u0db1 <span translate=no>_^_4_^_</span> \u0d85\u0dad\u0dbb <span translate=no>_^_5_^_</span> \u0d92\u0dc0\u0dcf \u0dc3\u0dcf\u0db8\u0dca\u0db4\u0dbd <span translate=no>_^_6_^_</span>\u0dc0\u0dda. </p>\n",
 "<h2>Generator Loss</h2>\n<p>Generator should <strong>descend</strong> on the gradient,</p>\n<p><span translate=no>_^_0_^_</span></p>\n": "<h2>\u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a\u0db1\u0dd0\u0dad\u0dd2\u0dc0\u0dd3\u0db8</h2>\n<p>\u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a\u0dba\u0db1\u0dca\u0dad\u0dca\u0dbb\u0dba \u0d85\u0db1\u0dd4\u0d9a\u0dca\u0dbb\u0db8\u0dd2\u0d9a \u0db8\u0dad\u0da7 <strong>\u0db6\u0dd0\u0dc3</strong> \u0dba\u0dcf \u0dba\u0dd4\u0dad\u0dd4\u0dba,</p>\n<p><span translate=no>_^_0_^_</span></p>\n",
 "<p> <span translate=no>_^_0_^_</span> are logits from <span translate=no>_^_1_^_</span> and <span translate=no>_^_2_^_</span> are logits from <span translate=no>_^_3_^_</span></p>\n": "<p> <span translate=no>_^_0_^_</span> \u0dc3\u0dd2\u0da7 \u0db4\u0dd2\u0dc0\u0dd2\u0dc3\u0dd4\u0db8\u0dca <span translate=no>_^_2_^_</span> \u0dc0\u0db1 <span translate=no>_^_1_^_</span> \u0d85\u0dad\u0dbb \u0dc3\u0dd2\u0da7 \u0db4\u0dd2\u0dc0\u0dd2\u0dc3\u0dd4\u0db8\u0dca \u0dc0\u0dda <span translate=no>_^_3_^_</span></p>\n",
--- a/translate_cache/gan/original/init.zh.json
+++ b/translate_cache/gan/original/init.zh.json
@ -1,5 +1,5 @@
 {
- "<h1>Generative Adversarial Networks (GAN)</h1>\n<p>This is an implementation of <a href=\"https://papers.labml.ai/paper/1406.2661\">Generative Adversarial Networks</a>.</p>\n<p>The generator, <span translate=no>_^_0_^_</span> generates samples that match the distribution of data, while the discriminator, <span translate=no>_^_1_^_</span> gives the probability that <span translate=no>_^_2_^_</span> came from data rather than <span translate=no>_^_3_^_</span>.</p>\n<p>We train <span translate=no>_^_4_^_</span> and <span translate=no>_^_5_^_</span> simultaneously on a two-player min-max game with value function <span translate=no>_^_6_^_</span>.</p>\n<p><span translate=no>_^_7_^_</span></p>\n<p><span translate=no>_^_8_^_</span> is the probability distribution over data, whilst <span translate=no>_^_9_^_</span> probability distribution of <span translate=no>_^_10_^_</span>, which is set to gaussian noise.</p>\n<p>This file defines the loss functions. <a href=\"experiment.html\">Here</a> is an MNIST example with two multilayer perceptron for the generator and discriminator.</p>\n": "<h1>\u751f\u6210\u5bf9\u6297\u7f51\u7edc (GAN)</h1>\n<p>\u8fd9\u662f<a href=\"https://papers.labml.ai/paper/1406.2661\">\u751f\u6210\u5bf9\u6297\u7f51\u7edc</a>\u7684\u5b9e\u73b0\u3002</p>\n\u751f\u6210@@ <p>\u5668<span translate=no>_^_0_^_</span>\u751f\u6210\u4e0e\u6570\u636e\u5206\u5e03\u76f8\u5339\u914d\u7684\u6837\u672c\uff0c\u800c\u9274\u522b\u5668\u5219<span translate=no>_^_1_^_</span>\u7ed9\u51fa\u6765\u81ea\u6570\u636e\u800c\u4e0d\u662f<span translate=no>_^_2_^_</span>\u6765\u81ea\u6570\u636e\u7684\u6982\u7387<span translate=no>_^_3_^_</span>\u3002</p>\n<p>\u6211\u4eec\u5728\u5177\u6709\u503c\u529f\u80fd\u7684\u53cc\u4eba\u6700\u5c0f\u6700\u5927\u6e38\u620f\u4e2d<span translate=no>_^_5_^_</span>\u540c\u65f6\u8fdb\u884c\u8bad\u7ec3<span translate=no>_^_4_^_</span><span translate=no>_^_6_^_</span>\u3002</p>\n<p><span translate=no>_^_7_^_</span></p>\n<p><span translate=no>_^_8_^_</span>\u662f\u6570\u636e\u7684\u6982\u7387\u5206\u5e03\uff0c\u800c<span translate=no>_^_9_^_</span>\u6982\u7387\u5206<span translate=no>_^_10_^_</span>\u5e03\u5219\u8bbe\u7f6e\u4e3a\u9ad8\u65af\u566a\u58f0\u3002</p>\n<p>\u8fd9\u4e2a\u6587\u4ef6\u5b9a\u4e49\u4e86\u635f\u5931\u51fd\u6570\u3002<a href=\"experiment.html\">\u8fd9\u662f</a>\u4e00\u4e2a MNIST \u793a\u4f8b\uff0c\u5176\u4e2d\u5305\u542b\u4e24\u4e2a\u7528\u4e8e\u751f\u6210\u5668\u548c\u9274\u522b\u5668\u7684\u591a\u5c42\u611f\u77e5\u5668\u3002</p>\n",
+ "<h1>Generative Adversarial Networks (GAN)</h1>\n<p>This is an implementation of <a href=\"https://arxiv.org/abs/1406.2661\">Generative Adversarial Networks</a>.</p>\n<p>The generator, <span translate=no>_^_0_^_</span> generates samples that match the distribution of data, while the discriminator, <span translate=no>_^_1_^_</span> gives the probability that <span translate=no>_^_2_^_</span> came from data rather than <span translate=no>_^_3_^_</span>.</p>\n<p>We train <span translate=no>_^_4_^_</span> and <span translate=no>_^_5_^_</span> simultaneously on a two-player min-max game with value function <span translate=no>_^_6_^_</span>.</p>\n<p><span translate=no>_^_7_^_</span></p>\n<p><span translate=no>_^_8_^_</span> is the probability distribution over data, whilst <span translate=no>_^_9_^_</span> probability distribution of <span translate=no>_^_10_^_</span>, which is set to gaussian noise.</p>\n<p>This file defines the loss functions. <a href=\"experiment.html\">Here</a> is an MNIST example with two multilayer perceptron for the generator and discriminator.</p>\n": "<h1>\u751f\u6210\u5bf9\u6297\u7f51\u7edc (GAN)</h1>\n<p>\u8fd9\u662f<a href=\"https://arxiv.org/abs/1406.2661\">\u751f\u6210\u5bf9\u6297\u7f51\u7edc</a>\u7684\u5b9e\u73b0\u3002</p>\n\u751f\u6210@@ <p>\u5668<span translate=no>_^_0_^_</span>\u751f\u6210\u4e0e\u6570\u636e\u5206\u5e03\u76f8\u5339\u914d\u7684\u6837\u672c\uff0c\u800c\u9274\u522b\u5668\u5219<span translate=no>_^_1_^_</span>\u7ed9\u51fa\u6765\u81ea\u6570\u636e\u800c\u4e0d\u662f<span translate=no>_^_2_^_</span>\u6765\u81ea\u6570\u636e\u7684\u6982\u7387<span translate=no>_^_3_^_</span>\u3002</p>\n<p>\u6211\u4eec\u5728\u5177\u6709\u503c\u529f\u80fd\u7684\u53cc\u4eba\u6700\u5c0f\u6700\u5927\u6e38\u620f\u4e2d<span translate=no>_^_5_^_</span>\u540c\u65f6\u8fdb\u884c\u8bad\u7ec3<span translate=no>_^_4_^_</span><span translate=no>_^_6_^_</span>\u3002</p>\n<p><span translate=no>_^_7_^_</span></p>\n<p><span translate=no>_^_8_^_</span>\u662f\u6570\u636e\u7684\u6982\u7387\u5206\u5e03\uff0c\u800c<span translate=no>_^_9_^_</span>\u6982\u7387\u5206<span translate=no>_^_10_^_</span>\u5e03\u5219\u8bbe\u7f6e\u4e3a\u9ad8\u65af\u566a\u58f0\u3002</p>\n<p>\u8fd9\u4e2a\u6587\u4ef6\u5b9a\u4e49\u4e86\u635f\u5931\u51fd\u6570\u3002<a href=\"experiment.html\">\u8fd9\u662f</a>\u4e00\u4e2a MNIST \u793a\u4f8b\uff0c\u5176\u4e2d\u5305\u542b\u4e24\u4e2a\u7528\u4e8e\u751f\u6210\u5668\u548c\u9274\u522b\u5668\u7684\u591a\u5c42\u611f\u77e5\u5668\u3002</p>\n",
 "<h2>Discriminator Loss</h2>\n<p>Discriminator should <strong>ascend</strong> on the gradient,</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p><span translate=no>_^_1_^_</span> is the mini-batch size and <span translate=no>_^_2_^_</span> is used to index samples in the mini-batch. <span translate=no>_^_3_^_</span> are samples from <span translate=no>_^_4_^_</span> and <span translate=no>_^_5_^_</span> are samples from <span translate=no>_^_6_^_</span>.</p>\n": "<h2>\u9274\u522b\u5668\u4e22\u5931</h2>\n<p>\u9274\u522b\u5668\u5e94\u8be5\u5728\u68af\u5ea6\u4e0a<strong>\u5347</strong>\uff0c</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p><span translate=no>_^_1_^_</span>\u662f\u5fae\u578b\u6279\u6b21\u5927\u5c0f\uff0c<span translate=no>_^_2_^_</span>\u7528\u4e8e\u7d22\u5f15\u5fae\u578b\u6279\u6b21\u4e2d\u7684\u6837\u672c\u3002<span translate=no>_^_3_^_</span>\u662f\u6765\u81ea\u7684\u6837\u672c<span translate=no>_^_4_^_</span>\uff0c<span translate=no>_^_5_^_</span>\u4e5f\u662f\u6765\u81ea\u7684\u6837\u672c<span translate=no>_^_6_^_</span>\u3002</p>\n",
 "<h2>Generator Loss</h2>\n<p>Generator should <strong>descend</strong> on the gradient,</p>\n<p><span translate=no>_^_0_^_</span></p>\n": "<h2>\u53d1\u7535\u673a\u635f\u5931</h2>\n<p>\u53d1\u7535\u673a\u5e94\u8be5<strong>\u4e0b\u964d\u5230</strong>\u68af\u5ea6\u4e0a\uff0c</p>\n<p><span translate=no>_^_0_^_</span></p>\n",
 "<p> <span translate=no>_^_0_^_</span> are logits from <span translate=no>_^_1_^_</span> and <span translate=no>_^_2_^_</span> are logits from <span translate=no>_^_3_^_</span></p>\n": "<p><span translate=no>_^_0_^_</span>\u662f logits \u6765\u81ea<span translate=no>_^_1_^_</span>\uff0c<span translate=no>_^_2_^_</span>logits \u6765\u81ea<span translate=no>_^_3_^_</span></p>\n",
--- a/translate_cache/gan/original/readme.ja.json
+++ b/translate_cache/gan/original/readme.ja.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/original/index.html\">Generative Adversarial Networks - GAN</a></h1>\n<p>This is an annotated implementation of <a href=\"https://papers.labml.ai/paper/1406.2661\">Generative Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/original/index.html\">\u30b8\u30a7\u30cd\u30ec\u30fc\u30c6\u30a3\u30d6\u30fb\u30a2\u30c9\u30d0\u30fc\u30b5\u30ea\u30a2\u30eb\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af-GAN</a></h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://papers.labml.ai/paper/1406.2661\">\u30b8\u30a7\u30cd\u30ec\u30fc\u30c6\u30a3\u30d6\u30fb\u30a2\u30c9\u30d0\u30fc\u30b5\u30ea\u30a2\u30eb\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u6ce8\u91c8\u4ed8\u304d\u5b9f\u88c5\u3067\u3059</a>\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/original/index.html\">Generative Adversarial Networks - GAN</a></h1>\n<p>This is an annotated implementation of <a href=\"https://arxiv.org/abs/1406.2661\">Generative Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/original/index.html\">\u30b8\u30a7\u30cd\u30ec\u30fc\u30c6\u30a3\u30d6\u30fb\u30a2\u30c9\u30d0\u30fc\u30b5\u30ea\u30a2\u30eb\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af-GAN</a></h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://arxiv.org/abs/1406.2661\">\u30b8\u30a7\u30cd\u30ec\u30fc\u30c6\u30a3\u30d6\u30fb\u30a2\u30c9\u30d0\u30fc\u30b5\u30ea\u30a2\u30eb\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u6ce8\u91c8\u4ed8\u304d\u5b9f\u88c5\u3067\u3059</a>\u3002</p>\n",
 "Generative Adversarial Networks - GAN": "\u30b8\u30a7\u30cd\u30ec\u30fc\u30c6\u30a3\u30d6\u30fb\u30a2\u30c9\u30d0\u30fc\u30b5\u30ea\u30a2\u30eb\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af-GAN"
 }
--- a/translate_cache/gan/original/readme.si.json
+++ b/translate_cache/gan/original/readme.si.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/original/index.html\">Generative Adversarial Networks - GAN</a></h1>\n<p>This is an annotated implementation of <a href=\"https://papers.labml.ai/paper/1406.2661\">Generative Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/original/index.html\">\u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd - GAN</a></h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://papers.labml.ai/paper/1406.2661\">Generative Aversarial Network</a>\u0dc4\u0dd2 \u0dc0\u0dd2\u0dc0\u0dbb\u0dab\u0dba \u0d9a\u0dbb\u0db1 \u0dbd\u0daf \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0dc0\u0dd3\u0db8\u0d9a\u0dd2. </p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/original/index.html\">Generative Adversarial Networks - GAN</a></h1>\n<p>This is an annotated implementation of <a href=\"https://arxiv.org/abs/1406.2661\">Generative Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/original/index.html\">\u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd - GAN</a></h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://arxiv.org/abs/1406.2661\">Generative Aversarial Network</a>\u0dc4\u0dd2 \u0dc0\u0dd2\u0dc0\u0dbb\u0dab\u0dba \u0d9a\u0dbb\u0db1 \u0dbd\u0daf \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0dc0\u0dd3\u0db8\u0d9a\u0dd2. </p>\n",
 "Generative Adversarial Networks - GAN": "\u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd - GAN"
 }
--- a/translate_cache/gan/original/readme.zh.json
+++ b/translate_cache/gan/original/readme.zh.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/original/index.html\">Generative Adversarial Networks - GAN</a></h1>\n<p>This is an annotated implementation of <a href=\"https://papers.labml.ai/paper/1406.2661\">Generative Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/original/index.html\">\u751f\u6210\u5f0f\u5bf9\u6297\u7f51\u7edc-GAN</a></h1>\n<p>\u8fd9\u662f<a href=\"https://papers.labml.ai/paper/1406.2661\">\u751f\u6210\u5bf9\u6297\u7f51\u7edc</a>\u7684\u5e26\u6ce8\u91ca\u7684\u5b9e\u73b0\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/original/index.html\">Generative Adversarial Networks - GAN</a></h1>\n<p>This is an annotated implementation of <a href=\"https://arxiv.org/abs/1406.2661\">Generative Adversarial Networks</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/original/index.html\">\u751f\u6210\u5f0f\u5bf9\u6297\u7f51\u7edc-GAN</a></h1>\n<p>\u8fd9\u662f<a href=\"https://arxiv.org/abs/1406.2661\">\u751f\u6210\u5bf9\u6297\u7f51\u7edc</a>\u7684\u5e26\u6ce8\u91ca\u7684\u5b9e\u73b0\u3002</p>\n",
 "Generative Adversarial Networks - GAN": "\u751f\u6210\u5f0f\u5bf9\u6297\u7f51\u7edc-GAN"
 }
--- a/translate_cache/gan/stylegan/init.ja.json
+++ b/translate_cache/gan/stylegan/init.ja.json
@ -15,20 +15,20 @@
 "<h4>Weight Modulation and Demodulation</h4>\n": "<h4>\u91cd\u307f\u5909\u8abf\u3068\u5fa9\u8abf</h4>\n",
 "<p> <a id=\"discriminator\"></a></p>\n<h2>StyleGAN 2 Discriminator</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p>Discriminator first transforms the image to a feature map of the same resolution and then runs it through a series of blocks with residual connections. The resolution is down-sampled by <span translate=no>_^_1_^_</span> at each block while doubling the number of features.</p>\n": "<p><a id=\"discriminator\"></a></p>\n<h2>\u30b9\u30bf\u30a4\u30eb\u30ac\u30f3 2 \u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p>\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u306f\u3001\u307e\u305a\u753b\u50cf\u3092\u540c\u3058\u89e3\u50cf\u5ea6\u306e\u7279\u5fb4\u30de\u30c3\u30d7\u306b\u5909\u63db\u3057\u3066\u304b\u3089\u3001\u6b8b\u7559\u63a5\u7d9a\u306e\u3042\u308b\u4e00\u9023\u306e\u30d6\u30ed\u30c3\u30af\u3092\u51e6\u7406\u3057\u307e\u3059\u3002\u89e3\u50cf\u5ea6\u306f\u30d6\u30ed\u30c3\u30af\u3054\u3068\u306b\u30c0\u30a6\u30f3\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u3055\u308c\u3001<span translate=no>_^_1_^_</span>\u30d5\u30a3\u30fc\u30c1\u30e3\u306e\u6570\u306f 2 \u500d\u306b\u306a\u308a\u307e\u3059\u3002</p>\n",
 "<p> <a id=\"discriminator_black\"></a></p>\n<h3>Discriminator Block</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p>Discriminator block consists of two <span translate=no>_^_1_^_</span> convolutions with a residual connection.</p>\n": "<p><a id=\"discriminator_black\"></a></p>\n<h3>\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u30d6\u30ed\u30c3\u30af</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p>\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u30d6\u30ed\u30c3\u30af\u306f\u3001\u6b8b\u5dee\u7d50\u5408\u3092\u3082\u3064 2 <span translate=no>_^_1_^_</span> \u3064\u306e\u7573\u307f\u8fbc\u307f\u3067\u69cb\u6210\u3055\u308c\u307e\u3059\u3002</p>\n",
- "<p> <a id=\"down_sample\"></a></p>\n<h3>Down-sample</h3>\n<p>The down-sample operation <a href=\"#smooth\">smoothens</a> each feature channel and  scale <span translate=no>_^_0_^_</span> using bilinear interpolation. This is based on the paper  <a href=\"https://papers.labml.ai/paper/1904.11486\">Making Convolutional Networks Shift-Invariant Again</a>.</p>\n": "<p><a id=\"down_sample\"></a></p>\n<h3>\u30c0\u30a6\u30f3\u30b5\u30f3\u30d7\u30eb</h3>\n<p>\u30c0\u30a6\u30f3\u30b5\u30f3\u30d7\u30eb\u64cd\u4f5c\u3067\u306f\u3001<a href=\"#smooth\"><span translate=no>_^_0_^_</span>\u30d0\u30a4\u30ea\u30cb\u30a2\u88dc\u9593\u3092\u4f7f\u7528\u3057\u3066\u5404\u30d5\u30a3\u30fc\u30c1\u30e3\u30c1\u30e3\u30cd\u30eb\u3068\u30b9\u30b1\u30fc\u30eb\u304c\u6ed1\u3089\u304b\u306b\u306a\u308a\u307e\u3059</a>\u3002\u3053\u308c\u306f\u3001\u300c<a href=\"https://papers.labml.ai/paper/1904.11486\">\u7573\u307f\u8fbc\u307f\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u518d\u3073\u30b7\u30d5\u30c8\u4e0d\u5909\u306b\u3059\u308b</a>\u300d\u3068\u3044\u3046\u8ad6\u6587\u306b\u57fa\u3065\u3044\u3066\u3044\u307e\u3059</p>\u3002\n",
+ "<p> <a id=\"down_sample\"></a></p>\n<h3>Down-sample</h3>\n<p>The down-sample operation <a href=\"#smooth\">smoothens</a> each feature channel and  scale <span translate=no>_^_0_^_</span> using bilinear interpolation. This is based on the paper  <a href=\"https://arxiv.org/abs/1904.11486\">Making Convolutional Networks Shift-Invariant Again</a>.</p>\n": "<p><a id=\"down_sample\"></a></p>\n<h3>\u30c0\u30a6\u30f3\u30b5\u30f3\u30d7\u30eb</h3>\n<p>\u30c0\u30a6\u30f3\u30b5\u30f3\u30d7\u30eb\u64cd\u4f5c\u3067\u306f\u3001<a href=\"#smooth\"><span translate=no>_^_0_^_</span>\u30d0\u30a4\u30ea\u30cb\u30a2\u88dc\u9593\u3092\u4f7f\u7528\u3057\u3066\u5404\u30d5\u30a3\u30fc\u30c1\u30e3\u30c1\u30e3\u30cd\u30eb\u3068\u30b9\u30b1\u30fc\u30eb\u304c\u6ed1\u3089\u304b\u306b\u306a\u308a\u307e\u3059</a>\u3002\u3053\u308c\u306f\u3001\u300c<a href=\"https://arxiv.org/abs/1904.11486\">\u7573\u307f\u8fbc\u307f\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u518d\u3073\u30b7\u30d5\u30c8\u4e0d\u5909\u306b\u3059\u308b</a>\u300d\u3068\u3044\u3046\u8ad6\u6587\u306b\u57fa\u3065\u3044\u3066\u3044\u307e\u3059</p>\u3002\n",
 "<p> <a id=\"equalized_conv2d\"></a></p>\n<h2>Learning-rate Equalized 2D Convolution Layer</h2>\n<p>This uses <a href=\"#equalized_weights\">learning-rate equalized weights</a> for a convolution layer.</p>\n": "<p><a id=\"equalized_conv2d\"></a></p>\n<h2>\u5b66\u7fd2\u7387\u5747\u7b49\u53162D\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc</h2>\n<p>\u3053\u308c\u306f\u3001<a href=\"#equalized_weights\">\u7573\u307f\u8fbc\u307f\u5c64\u306b\u5b66\u7fd2\u7387\u304c\u5747\u7b49\u5316\u3055\u308c\u305f\u91cd\u307f\u3092\u4f7f\u7528\u3057\u307e\u3059</a>\u3002</p>\n",
 "<p> <a id=\"equalized_linear\"></a></p>\n<h2>Learning-rate Equalized Linear Layer</h2>\n<p>This uses <a href=\"#equalized_weights\">learning-rate equalized weights</a> for a linear layer.</p>\n": "<p><a id=\"equalized_linear\"></a></p>\n<h2>\u5b66\u7fd2\u7387\u5747\u7b49\u5316\u7dda\u5f62\u5c64</h2>\n<p>\u3053\u308c\u306f\u3001<a href=\"#equalized_weights\">\u7dda\u5f62\u30ec\u30a4\u30e4\u30fc\u306e\u5b66\u7fd2\u7387\u304c\u5747\u7b49\u5316\u3055\u308c\u305f\u91cd\u307f\u3092\u4f7f\u7528\u3057\u307e\u3059</a>\u3002</p>\n",
 "<p> <a id=\"equalized_weight\"></a></p>\n<h2>Learning-rate Equalized Weights Parameter</h2>\n<p>This is based on equalized learning rate introduced in the Progressive GAN paper. Instead of initializing weights at <span translate=no>_^_0_^_</span> they initialize weights to <span translate=no>_^_1_^_</span> and then multiply them by <span translate=no>_^_2_^_</span> when using it. <span translate=no>_^_3_^_</span></p>\n<p>The gradients on stored parameters <span translate=no>_^_4_^_</span> get multiplied by <span translate=no>_^_5_^_</span> but this doesn&#x27;t have an affect since optimizers such as Adam normalize them by a running mean of the squared gradients.</p>\n<p>The optimizer updates on <span translate=no>_^_6_^_</span> are proportionate to the learning rate <span translate=no>_^_7_^_</span>. But the effective weights <span translate=no>_^_8_^_</span> get updated proportionately to <span translate=no>_^_9_^_</span>. Without equalized learning rate, the effective weights will get updated proportionately to just <span translate=no>_^_10_^_</span>.</p>\n<p>So we are effectively scaling the learning rate by <span translate=no>_^_11_^_</span> for these weight parameters.</p>\n": "<p><a id=\"equalized_weight\"></a></p>\n<h2>\u5b66\u7fd2\u7387\u5747\u7b49\u5316\u91cd\u307f\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc</h2>\n<p>\u3053\u308c\u306f\u3001\u30d7\u30ed\u30b0\u30ec\u30c3\u30b7\u30d6GAN\u306e\u8ad6\u6587\u3067\u7d39\u4ecb\u3055\u308c\u305f\u5b66\u7fd2\u7387\u306e\u5747\u7b49\u5316\u306b\u57fa\u3065\u3044\u3066\u3044\u307e\u3059\u3002\u30a6\u30a7\u30a4\u30c8\u3092\u3067\u521d\u671f\u5316\u3059\u308b\u4ee3\u308f\u308a\u306b\u3001\u30a6\u30a7\u30a4\u30c8\u3092\u306b\u521d\u671f\u5316\u3057\u3001<span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span>\u4f7f\u7528\u6642\u306b\u305d\u306e\u30a6\u30a7\u30a4\u30c8\u3092\u4e57\u7b97\u3057\u307e\u3059\u3002<span translate=no>_^_2_^_</span><span translate=no>_^_3_^_</span></p>\n<p><span translate=no>_^_4_^_</span><span translate=no>_^_5_^_</span>\u4fdd\u5b58\u3055\u308c\u3066\u3044\u308b\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc\u306e\u52fe\u914d\u306f\u4e57\u7b97\u3055\u308c\u307e\u3059\u304c\u3001Adam \u306a\u3069\u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u306f\u52fe\u914d\u306e 2 \u4e57\u5e73\u5747\u3067\u6b63\u898f\u5316\u3059\u308b\u305f\u3081\u3001\u5f71\u97ff\u306f\u3042\u308a\u307e\u305b\u3093\u3002</p>\n<p><span translate=no>_^_6_^_</span>\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u306e\u66f4\u65b0\u306f\u5b66\u7fd2\u7387\u306b\u6bd4\u4f8b\u3057\u307e\u3059\u3002<span translate=no>_^_7_^_</span>\u305f\u3060\u3057\u3001<span translate=no>_^_8_^_</span>\u6709\u52b9\u91cd\u307f\u306f\u305d\u308c\u306b\u6bd4\u4f8b\u3057\u3066\u66f4\u65b0\u3055\u308c\u307e\u3059\u3002<span translate=no>_^_9_^_</span>\u5b66\u7fd2\u7387\u304c\u5747\u7b49\u5316\u3055\u308c\u3066\u3044\u306a\u3044\u3068\u3001\u6709\u52b9\u91cd\u307f\u306f\u6b63\u306b\u6bd4\u4f8b\u3057\u3066\u66f4\u65b0\u3055\u308c\u307e\u3059</p>\u3002<span translate=no>_^_10_^_</span>\n<p>\u305d\u3053\u3067\u3001<span translate=no>_^_11_^_</span>\u3053\u308c\u3089\u306e\u91cd\u307f\u30d1\u30e9\u30e1\u30fc\u30bf\u306b\u3088\u3063\u3066\u5b66\u7fd2\u7387\u3092\u52b9\u679c\u7684\u306b\u30b9\u30b1\u30fc\u30ea\u30f3\u30b0\u3057\u3066\u3044\u307e\u3059\u3002</p>\n",
 "<p> <a id=\"generator\"></a></p>\n<h2>StyleGAN2 Generator</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> denotes a linear layer. <span translate=no>_^_2_^_</span> denotes a broadcast and scaling operation (noise is a single channel). <a href=\"#to_rgb\"><span translate=no>_^_3_^_</span></a> also has a style modulation which is not shown in the diagram to keep it simple.</em></small></p>\n<p>The generator starts with a learned constant. Then it has a series of blocks. The feature map resolution is doubled at each block Each block outputs an RGB image and they are scaled up and summed to get the final RGB image.</p>\n": "<p><a id=\"generator\"></a></p>\n<h2>\u30b9\u30bf\u30a4\u30eb GAN2 \u30b8\u30a7\u30cd\u30ec\u30fc\u30bf</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span>\u7dda\u5f62\u30ec\u30a4\u30e4\u30fc\u3092\u793a\u3057\u307e\u3059\u3002<span translate=no>_^_2_^_</span>\u30d6\u30ed\u30fc\u30c9\u30ad\u30e3\u30b9\u30c8\u3068\u30b9\u30b1\u30fc\u30ea\u30f3\u30b0\u64cd\u4f5c\u3092\u8868\u3057\u307e\u3059\uff08\u30ce\u30a4\u30ba\u306f\u5358\u4e00\u30c1\u30e3\u30cd\u30eb\uff09\u3002<a href=\"#to_rgb\"><span translate=no>_^_3_^_</span></a>\u307e\u305f\u3001\u56f3\u306b\u306f\u793a\u3055\u308c\u3066\u3044\u306a\u3044\u30b9\u30bf\u30a4\u30eb\u30e2\u30b8\u30e5\u30ec\u30fc\u30b7\u30e7\u30f3\u3082\u4ed8\u3044\u3066\u304a\u308a\u3001\u30b7\u30f3\u30d7\u30eb\u3055\u3092\u4fdd\u3063\u3066\u3044\u307e\u3059</em></small></p>\u3002\n<p>\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u306f\u5b66\u7fd2\u3057\u305f\u5b9a\u6570\u304b\u3089\u59cb\u307e\u308a\u307e\u3059\u3002\u6b21\u306b\u3001\u4e00\u9023\u306e\u30d6\u30ed\u30c3\u30af\u304c\u3042\u308a\u307e\u3059\u3002\u7279\u5fb4\u30de\u30c3\u30d7\u306e\u89e3\u50cf\u5ea6\u306f\u5404\u30d6\u30ed\u30c3\u30af\u3067 2 \u500d\u306b\u306a\u308a\u307e\u3059\u3002\u5404\u30d6\u30ed\u30c3\u30af\u306f RGB \u753b\u50cf\u3092\u51fa\u529b\u3057\u3001\u305d\u308c\u3089\u3092\u62e1\u5927\u3057\u3066\u5408\u8a08\u3057\u3066\u6700\u7d42\u7684\u306a RGB \u753b\u50cf\u306b\u306a\u308a\u307e\u3059</p>\u3002\n",
 "<p> <a id=\"generator_block\"></a></p>\n<h3>Generator Block</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> denotes a linear layer. <span translate=no>_^_2_^_</span> denotes a broadcast and scaling operation (noise is a single channel). <a href=\"#to_rgb\"><span translate=no>_^_3_^_</span></a> also has a style modulation which is not shown in the diagram to keep it simple.</em></small></p>\n<p>The generator block consists of two <a href=\"#style_block\">style blocks</a> (<span translate=no>_^_4_^_</span> convolutions with style modulation) and an RGB output.</p>\n": "<p><a id=\"generator_block\"></a></p>\n<h3>\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30d6\u30ed\u30c3\u30af</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span>\u7dda\u5f62\u30ec\u30a4\u30e4\u30fc\u3092\u793a\u3057\u307e\u3059\u3002<span translate=no>_^_2_^_</span>\u30d6\u30ed\u30fc\u30c9\u30ad\u30e3\u30b9\u30c8\u3068\u30b9\u30b1\u30fc\u30ea\u30f3\u30b0\u64cd\u4f5c\u3092\u8868\u3057\u307e\u3059\uff08\u30ce\u30a4\u30ba\u306f\u5358\u4e00\u30c1\u30e3\u30cd\u30eb\uff09\u3002<a href=\"#to_rgb\"><span translate=no>_^_3_^_</span></a>\u307e\u305f\u3001\u56f3\u306b\u306f\u793a\u3055\u308c\u3066\u3044\u306a\u3044\u30b9\u30bf\u30a4\u30eb\u30e2\u30b8\u30e5\u30ec\u30fc\u30b7\u30e7\u30f3\u3082\u4ed8\u3044\u3066\u304a\u308a\u3001\u30b7\u30f3\u30d7\u30eb\u3055\u3092\u4fdd\u3063\u3066\u3044\u307e\u3059</em></small></p>\u3002\n<p>\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc\u30d6\u30ed\u30c3\u30af\u306f\u30012 <a href=\"#style_block\">\u3064\u306e\u30b9\u30bf\u30a4\u30eb\u30d6\u30ed\u30c3\u30af (<span translate=no>_^_4_^_</span>\u30b9\u30bf\u30a4\u30eb\u5909\u8abf\u306b\u3088\u308b\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3</a>) \u3068 1 \u3064\u306e RGB \u51fa\u529b\u3067\u69cb\u6210\u3055\u308c\u3066\u3044\u307e\u3059\u3002</p>\n",
- "<p> <a id=\"gradient_penalty\"></a></p>\n<h2>Gradient Penalty</h2>\n<p>This is the <span translate=no>_^_0_^_</span> regularization penality from the paper <a href=\"https://papers.labml.ai/paper/1801.04406\">Which Training Methods for GANs do actually Converge?</a>.</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>That is we try to reduce the L2 norm of gradients of the discriminator with respect to images, for real images (<span translate=no>_^_2_^_</span>).</p>\n": "<p><a id=\"gradient_penalty\"></a></p>\n<h2>\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u30da\u30ca\u30eb\u30c6\u30a3</h2>\n<p>\u3053\u308c\u306f\u3001\u300c<a href=\"https://papers.labml.ai/paper/1801.04406\">GAN\u306e\u3069\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u65b9\u6cd5\u304c\u5b9f\u969b\u306b\u53ce\u675f\u3059\u308b\u306e\u304b\u300d<span translate=no>_^_0_^_</span> \u3068\u3044\u3046\u8ad6\u6587\u306e\u6b63\u5247\u5316\u306e\u30da\u30ca\u30eb\u30c6\u30a3\u3067\u3059</a>\u3002</p>\u3002\n<p><span translate=no>_^_1_^_</span></p>\n<p>\u3064\u307e\u308a\u3001\u5b9f\u969b\u306e\u753b\u50cf () \u306b\u3064\u3044\u3066\u3001\u753b\u50cf\u306b\u5bfe\u3059\u308b\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u306e\u52fe\u914d\u306eL2\u30ce\u30eb\u30e0\u3092\u5c0f\u3055\u304f\u3057\u3088\u3046\u3068\u3057\u3066\u3044\u307e\u3059\u3002<span translate=no>_^_2_^_</span></p>\n",
+ "<p> <a id=\"gradient_penalty\"></a></p>\n<h2>Gradient Penalty</h2>\n<p>This is the <span translate=no>_^_0_^_</span> regularization penality from the paper <a href=\"https://arxiv.org/abs/1801.04406\">Which Training Methods for GANs do actually Converge?</a>.</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>That is we try to reduce the L2 norm of gradients of the discriminator with respect to images, for real images (<span translate=no>_^_2_^_</span>).</p>\n": "<p><a id=\"gradient_penalty\"></a></p>\n<h2>\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u30da\u30ca\u30eb\u30c6\u30a3</h2>\n<p>\u3053\u308c\u306f\u3001\u300c<a href=\"https://arxiv.org/abs/1801.04406\">GAN\u306e\u3069\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u65b9\u6cd5\u304c\u5b9f\u969b\u306b\u53ce\u675f\u3059\u308b\u306e\u304b\u300d<span translate=no>_^_0_^_</span> \u3068\u3044\u3046\u8ad6\u6587\u306e\u6b63\u5247\u5316\u306e\u30da\u30ca\u30eb\u30c6\u30a3\u3067\u3059</a>\u3002</p>\u3002\n<p><span translate=no>_^_1_^_</span></p>\n<p>\u3064\u307e\u308a\u3001\u5b9f\u969b\u306e\u753b\u50cf () \u306b\u3064\u3044\u3066\u3001\u753b\u50cf\u306b\u5bfe\u3059\u308b\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u306e\u52fe\u914d\u306eL2\u30ce\u30eb\u30e0\u3092\u5c0f\u3055\u304f\u3057\u3088\u3046\u3068\u3057\u3066\u3044\u307e\u3059\u3002<span translate=no>_^_2_^_</span></p>\n",
 "<p> <a id=\"mapping_network\"></a></p>\n<h2>Mapping Network</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p>This is an MLP with 8 linear layers. The mapping network maps the latent vector <span translate=no>_^_1_^_</span> to an intermediate latent space <span translate=no>_^_2_^_</span>. <span translate=no>_^_3_^_</span> space will be disentangled from the image space where the factors of variation become more linear.</p>\n": "<p><a id=\"mapping_network\"></a></p>\n<h2>\u30de\u30c3\u30d4\u30f3\u30b0\u30cd\u30c3\u30c8\u30ef\u30fc\u30af</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p>\u3053\u308c\u306f8\u3064\u306e\u7dda\u5f62\u30ec\u30a4\u30e4\u30fc\u3092\u5099\u3048\u305fMLP\u3067\u3059\u3002\u30de\u30c3\u30d4\u30f3\u30b0\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306f\u3001<span translate=no>_^_1_^_</span>\u6f5c\u5728\u30d9\u30af\u30c8\u30eb\u3092\u4e2d\u9593\u6f5c\u5728\u7a7a\u9593\u306b\u30de\u30c3\u30d4\u30f3\u30b0\u3057\u307e\u3059\u3002<span translate=no>_^_2_^_</span><span translate=no>_^_3_^_</span>\u7a7a\u9593\u306f\u753b\u50cf\u7a7a\u9593\u304b\u3089\u5207\u308a\u96e2\u3055\u308c\u3001\u5909\u5316\u306e\u8981\u56e0\u304c\u3088\u308a\u76f4\u7dda\u7684\u306b\u306a\u308a\u307e\u3059</p>\u3002\n",
 "<p> <a id=\"mini_batch_std_dev\"></a></p>\n<h3>Mini-batch Standard Deviation</h3>\n<p>Mini-batch standard deviation calculates the standard deviation across a mini-batch (or a subgroups within the mini-batch) for each feature in the feature map. Then it takes the mean of all the standard deviations and appends it to the feature map as one extra feature.</p>\n": "<p><a id=\"mini_batch_std_dev\"></a></p>\n<h3>\u30df\u30cb\u30d0\u30c3\u30c1\u6a19\u6e96\u504f\u5dee</h3>\n<p>\u30df\u30cb\u30d0\u30c3\u30c1\u6a19\u6e96\u504f\u5dee\u306f\u3001\u7279\u5fb4\u30de\u30c3\u30d7\u5185\u306e\u5404\u30d5\u30a3\u30fc\u30c1\u30e3\u306b\u3064\u3044\u3066\u3001\u30df\u30cb\u30d0\u30c3\u30c1 (\u307e\u305f\u306f\u30df\u30cb\u30d0\u30c3\u30c1\u5185\u306e\u30b5\u30d6\u30b0\u30eb\u30fc\u30d7) \u5168\u4f53\u306e\u6a19\u6e96\u504f\u5dee\u3092\u8a08\u7b97\u3057\u307e\u3059\u3002\u6b21\u306b\u3001\u3059\u3079\u3066\u306e\u6a19\u6e96\u504f\u5dee\u306e\u5e73\u5747\u3092\u53d6\u5f97\u3057\u3001\u305d\u308c\u3092 1 \u3064\u306e\u7279\u5fb4\u3068\u3057\u3066\u7279\u5fb4\u30de\u30c3\u30d7\u306b\u8ffd\u52a0\u3057\u307e\u3059\u3002</p>\n",
 "<p> <a id=\"path_length_penalty\"></a></p>\n<h2>Path Length Penalty</h2>\n<p>This regularization encourages a fixed-size step in <span translate=no>_^_0_^_</span> to result in a fixed-magnitude change in the image.</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>where <span translate=no>_^_2_^_</span> is the Jacobian <span translate=no>_^_3_^_</span>, <span translate=no>_^_4_^_</span> are sampled from <span translate=no>_^_5_^_</span> from the mapping network, and <span translate=no>_^_6_^_</span> are images with noise <span translate=no>_^_7_^_</span>.</p>\n<p><span translate=no>_^_8_^_</span> is the exponential moving average of <span translate=no>_^_9_^_</span> as the training progresses.</p>\n<p><span translate=no>_^_10_^_</span> is calculated without explicitly calculating the Jacobian using <span translate=no>_^_11_^_</span></p>\n": "<p><a id=\"path_length_penalty\"></a></p>\n<h2>\u7d4c\u8def\u9577\u30da\u30ca\u30eb\u30c6\u30a3</h2>\n<p>\u3053\u306e\u6b63\u5247\u5316\u306b\u3088\u308a\u3001<span translate=no>_^_0_^_</span>\u56fa\u5b9a\u30b5\u30a4\u30ba\u306e\u30b9\u30c6\u30c3\u30d7\u30a4\u30f3\u304c\u4fc3\u9032\u3055\u308c\u3001\u753b\u50cf\u306e\u5927\u304d\u3055\u304c\u56fa\u5b9a\u3055\u308c\u305f\u5909\u5316\u304c\u751f\u3058\u307e\u3059\u3002</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>\u3053\u3053\u3067\u3001<span translate=no>_^_2_^_</span><span translate=no>_^_5_^_</span>\u306f\u30de\u30c3\u30d4\u30f3\u30b0\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u304b\u3089\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u3055\u308c\u305f\u30e4\u30b3\u30d3\u30a2\u30f3\u3067\u3001<span translate=no>_^_3_^_</span><span translate=no>_^_6_^_</span>\u30ce\u30a4\u30ba\u306e\u5165\u3063\u305f\u753b\u50cf\u3067\u3059\u3002<span translate=no>_^_4_^_</span> <span translate=no>_^_7_^_</span></p>\n<p><span translate=no>_^_8_^_</span>\u306f\u3001<span translate=no>_^_9_^_</span>\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306e\u9032\u884c\u306b\u4f34\u3046\u6307\u6570\u79fb\u52d5\u5e73\u5747\u3067\u3059\u3002</p>\n<p><span translate=no>_^_10_^_</span>\u3092\u4f7f\u7528\u3057\u3066\u30e4\u30b3\u30d3\u30a2\u30f3\u3092\u660e\u793a\u7684\u306b\u8a08\u7b97\u305b\u305a\u306b\u8a08\u7b97\u3055\u308c\u307e\u3059 <span translate=no>_^_11_^_</span></p>\n",
 "<p> <a id=\"smooth\"></a></p>\n<h3>Smoothing Layer</h3>\n<p>This layer blurs each channel</p>\n": "<p><a id=\"smooth\"></a></p>\n<h3>\u30b9\u30e0\u30fc\u30b8\u30f3\u30b0\u30ec\u30a4\u30e4\u30fc</h3>\n<p>\u3053\u306e\u30ec\u30a4\u30e4\u30fc\u306f\u5404\u30c1\u30e3\u30f3\u30cd\u30eb\u3092\u307c\u304b\u3057\u307e\u3059</p>\n",
 "<p> <a id=\"style_block\"></a></p>\n<h3>Style Block</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> denotes a linear layer. <span translate=no>_^_2_^_</span> denotes a broadcast and scaling operation (noise is single channel).</em></small></p>\n<p>Style block has a weight modulation convolution layer.</p>\n": "<p><a id=\"style_block\"></a></p>\n<h3>\u30b9\u30bf\u30a4\u30eb\u30d6\u30ed\u30c3\u30af</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span>\u7dda\u5f62\u30ec\u30a4\u30e4\u30fc\u3092\u793a\u3057\u307e\u3059\u3002<span translate=no>_^_2_^_</span>\u30d6\u30ed\u30fc\u30c9\u30ad\u30e3\u30b9\u30c8\u3068\u30b9\u30b1\u30fc\u30ea\u30f3\u30b0\u64cd\u4f5c\u3092\u8868\u3057\u307e\u3059\uff08\u30ce\u30a4\u30ba\u306f\u30b7\u30f3\u30b0\u30eb\u30c1\u30e3\u30cd\u30eb</em></small></p>\uff09\u3002\n<p>\u30b9\u30bf\u30a4\u30eb\u30d6\u30ed\u30c3\u30af\u306b\u306f\u30a6\u30a7\u30a4\u30c8\u30e2\u30b8\u30e5\u30ec\u30fc\u30b7\u30e7\u30f3\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u304c\u3042\u308a\u307e\u3059\u3002</p>\n",
 "<p> <a id=\"to_rgb\"></a></p>\n<h3>To RGB</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> denotes a linear layer.</em></small></p>\n<p>Generates an RGB image from a feature map using <span translate=no>_^_2_^_</span> convolution.</p>\n": "<p><a id=\"to_rgb\"></a></p>\n<h3>RGB \u3078</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span>\u7dda\u5f62\u30ec\u30a4\u30e4\u30fc\u3092\u793a\u3057\u307e\u3059\u3002</em></small></p>\n<p><span translate=no>_^_2_^_</span>\u7573\u307f\u8fbc\u307f\u3092\u4f7f\u7528\u3057\u3066\u3001\u7279\u5fb4\u30de\u30c3\u30d7\u304b\u3089 RGB \u753b\u50cf\u3092\u751f\u6210\u3057\u307e\u3059\u3002</p>\n",
- "<p> <a id=\"up_sample\"></a></p>\n<h3>Up-sample</h3>\n<p>The up-sample operation scales the image up by <span translate=no>_^_0_^_</span> and <a href=\"#smooth\">smoothens</a> each feature channel. This is based on the paper  <a href=\"https://papers.labml.ai/paper/1904.11486\">Making Convolutional Networks Shift-Invariant Again</a>.</p>\n": "<p><a id=\"up_sample\"></a></p>\n<h3>\u30a2\u30c3\u30d7\u30b5\u30f3\u30d7\u30eb</h3>\n<p>\u30a2\u30c3\u30d7\u30b5\u30f3\u30d7\u30eb\u64cd\u4f5c\u3067\u306f\u3001<span translate=no>_^_0_^_</span>\u753b\u50cf\u304c\u5404\u30d5\u30a3\u30fc\u30c1\u30e3\u30c1\u30e3\u30cd\u30eb\u3054\u3068\u306b\u62e1\u5927\u3055\u308c\u3001<a href=\"#smooth\">\u6ed1\u3089\u304b\u306b\u306a\u308a\u307e\u3059</a>\u3002\u3053\u308c\u306f\u3001\u300c<a href=\"https://papers.labml.ai/paper/1904.11486\">\u7573\u307f\u8fbc\u307f\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u518d\u3073\u30b7\u30d5\u30c8\u4e0d\u5909\u306b\u3059\u308b</a>\u300d\u3068\u3044\u3046\u8ad6\u6587\u306b\u57fa\u3065\u3044\u3066\u3044\u307e\u3059</p>\u3002\n",
+ "<p> <a id=\"up_sample\"></a></p>\n<h3>Up-sample</h3>\n<p>The up-sample operation scales the image up by <span translate=no>_^_0_^_</span> and <a href=\"#smooth\">smoothens</a> each feature channel. This is based on the paper  <a href=\"https://arxiv.org/abs/1904.11486\">Making Convolutional Networks Shift-Invariant Again</a>.</p>\n": "<p><a id=\"up_sample\"></a></p>\n<h3>\u30a2\u30c3\u30d7\u30b5\u30f3\u30d7\u30eb</h3>\n<p>\u30a2\u30c3\u30d7\u30b5\u30f3\u30d7\u30eb\u64cd\u4f5c\u3067\u306f\u3001<span translate=no>_^_0_^_</span>\u753b\u50cf\u304c\u5404\u30d5\u30a3\u30fc\u30c1\u30e3\u30c1\u30e3\u30cd\u30eb\u3054\u3068\u306b\u62e1\u5927\u3055\u308c\u3001<a href=\"#smooth\">\u6ed1\u3089\u304b\u306b\u306a\u308a\u307e\u3059</a>\u3002\u3053\u308c\u306f\u3001\u300c<a href=\"https://arxiv.org/abs/1904.11486\">\u7573\u307f\u8fbc\u307f\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u518d\u3073\u30b7\u30d5\u30c8\u4e0d\u5909\u306b\u3059\u308b</a>\u300d\u3068\u3044\u3046\u8ad6\u6587\u306b\u57fa\u3065\u3044\u3066\u3044\u307e\u3059</p>\u3002\n",
 "<p><a href=\"#equalized_linear\">Equalized learning-rate linear layers</a> </p>\n": "<p><a href=\"#equalized_linear\">\u5b66\u7fd2\u7387\u306e\u5747\u7b49\u5316\u30ea\u30cb\u30a2\u30ec\u30a4\u30e4\u30fc</a></p>\n",
 "<p><a href=\"#equalized_weight\">Weights parameter with equalized learning rate</a> </p>\n": "<p><a href=\"#equalized_weight\">\u5b66\u7fd2\u7387\u304c\u5747\u7b49\u5316\u3055\u308c\u305f\u91cd\u307f\u30d1\u30e9\u30e1\u30fc\u30bf</a></p>\n",
 "<p><a href=\"#equalized_weights\">Learning-rate equalized weights</a> </p>\n": "<p><a href=\"#equalized_weights\">\u5b66\u7fd2\u7387\u5747\u7b49\u5316\u30a6\u30a7\u30a4\u30c8</a></p>\n",
@ -159,7 +159,7 @@
 "<p>Then the convolution weights <span translate=no>_^_0_^_</span> are modulated as follows. (<span translate=no>_^_1_^_</span> here on refers to weights not intermediate latent space,  we are sticking to the same notation as the paper.)</p>\n": "<p>\u6b21\u306b\u3001<span translate=no>_^_0_^_</span>\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u306e\u91cd\u307f\u306f\u6b21\u306e\u3088\u3046\u306b\u5909\u8abf\u3055\u308c\u307e\u3059\u3002\uff08<span translate=no>_^_1_^_</span>\u3053\u3053\u3067\u306f\u4e2d\u9593\u306e\u6f5c\u5728\u7a7a\u9593\u3067\u306f\u306a\u304f\u91cd\u307f\u3092\u6307\u3057\u307e\u3059\u3002\u8ad6\u6587\u3068\u540c\u3058\u8868\u8a18\u6cd5\u306b\u3053\u3060\u308f\u3063\u3066\u3044\u307e\u3059</p>\u3002\uff09\n",
 "<p>They remove the <span translate=no>_^_0_^_</span> operator and replace it with  the weight modulation and demodulation step. This is supposed to improve what they call droplet artifacts that are present in generated images,  which are caused by the normalization in <span translate=no>_^_1_^_</span> operator. Style vector per layer is calculated from <span translate=no>_^_2_^_</span> as <span translate=no>_^_3_^_</span>.</p>\n": "<p><span translate=no>_^_0_^_</span>\u30aa\u30da\u30ec\u30fc\u30bf\u3092\u53d6\u308a\u5916\u3057\u3066\u3001\u91cd\u307f\u5909\u8abf\u3068\u5fa9\u8abf\u306e\u30b9\u30c6\u30c3\u30d7\u306b\u7f6e\u304d\u63db\u3048\u307e\u3059\u3002\u3053\u308c\u306f\u3001\u6f14\u7b97\u5b50\u306e\u6b63\u898f\u5316\u306b\u3088\u3063\u3066\u751f\u6210\u3055\u308c\u308b\u753b\u50cf\u306b\u5b58\u5728\u3059\u308b\u3001\u3044\u308f\u3086\u308b\u30c9\u30ed\u30c3\u30d7\u30ec\u30c3\u30c8\u30a2\u30fc\u30c6\u30a3\u30d5\u30a1\u30af\u30c8\u3092\u6539\u5584\u3059\u308b\u305f\u3081\u306e\u3082\u306e\u3067\u3059\u3002<span translate=no>_^_1_^_</span>\u30ec\u30a4\u30e4\u30fc\u3054\u3068\u306e\u30b9\u30bf\u30a4\u30eb\u30d9\u30af\u30c8\u30eb\u306f\u3001<span translate=no>_^_2_^_</span>\u304b\u3089\u8a08\u7b97\u3055\u308c\u307e\u3059<span translate=no>_^_3_^_</span>\u3002</p>\n",
 "<p>They use <strong>minibatch standard deviation</strong> to increase variation and  <strong>equalized learning rate</strong> which we discussed below in the implementation. They also use <strong>pixel-wise normalization</strong> where at each pixel the feature vector is normalized. They apply this to all the convolution layer outputs (except RGB).</p>\n": "<p><strong>\u30df\u30cb\u30d0\u30c3\u30c1\u6a19\u6e96\u504f\u5dee\u3092\u4f7f\u7528\u3057\u3066\u5909\u52d5\u3092\u5897\u3084\u3057</strong>\u3001<strong>\u5b66\u7fd2\u7387\u3092\u5747\u7b49\u5316\u3057\u307e\u3059</strong>\u3002\u3053\u308c\u306b\u3064\u3044\u3066\u306f\u3001\u5b9f\u88c5\u3067\u5f8c\u8ff0\u3057\u307e\u3059\u3002\u307e\u305f\u3001<strong>\u30d4\u30af\u30bb\u30eb\u5358\u4f4d\u306e\u6b63\u898f\u5316\u3082\u4f7f\u7528\u3057\u3066\u304a\u308a</strong>\u3001\u5404\u30d4\u30af\u30bb\u30eb\u3067\u7279\u5fb4\u30d9\u30af\u30c8\u30eb\u304c\u6b63\u898f\u5316\u3055\u308c\u307e\u3059\u3002\u3053\u308c\u3092\u3059\u3079\u3066\u306e\u7573\u307f\u8fbc\u307f\u5c64\u51fa\u529b (RGB \u3092\u9664\u304f) \u306b\u9069\u7528\u3057\u307e\u3059</p>\u3002\n",
- "<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper  <a href=\"https://papers.labml.ai/paper/1912.04958\">Analyzing and Improving the Image Quality of StyleGAN</a>  which introduces <strong>StyleGAN 2</strong>. StyleGAN 2 is an improvement over <strong>StyleGAN</strong> from the paper  <a href=\"https://papers.labml.ai/paper/1812.04948\">A Style-Based Generator Architecture for Generative Adversarial Networks</a>. And StyleGAN is based on <strong>Progressive GAN</strong> from the paper  <a href=\"https://papers.labml.ai/paper/1710.10196\">Progressive Growing of GANs for Improved Quality, Stability, and Variation</a>. All three papers are from the same authors from <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>.</p>\n": "<p><strong>\u3053\u308c\u306f\u3001<a href=\"https://papers.labml.ai/paper/1912.04958\">StyleGan 2\u3092\u7d39\u4ecb\u3059\u308b\u8ad6\u6587\u300c\u30b9\u30bf\u30a4\u30eb\u30ac\u30f3\u306e\u753b\u8cea\u306e\u5206\u6790\u3068\u6539\u5584\u300d<a href=\"https://pytorch.org\">\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a></a>\u3002</strong>StyleGan 2\u306f\u3001\u8ad6\u6587\u300c<strong><a href=\"https://papers.labml.ai/paper/1812.04948\">\u6575\u5bfe\u7684\u751f\u6210\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u305f\u3081\u306e\u30b9\u30bf\u30a4\u30eb\u30d9\u30fc\u30b9\u306e\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u300d\u306eStyleGAN\u3092\u6539\u826f\u3057\u305f\u3082\u306e\u3067\u3059</a></strong>\u3002\u307e\u305f\u3001StyleGan\u306f\u8ad6\u6587\u300c<strong>GAN\u306e\u6f38\u9032\u7684\u6210\u9577\u306b\u3088\u308b\u54c1\u8cea</strong><a href=\"https://papers.labml.ai/paper/1710.10196\">\u3001\u5b89\u5b9a\u6027\u3001\u30d0\u30ea\u30a8\u30fc\u30b7\u30e7\u30f3\u306e\u5411\u4e0a\u300d\u306e\u30d7\u30ed\u30b0\u30ec\u30c3\u30b7\u30d6GAN\u3092\u30d9\u30fc\u30b9\u306b\u3057\u3066\u3044\u307e\u3059</a>\u30023 \u3064\u306e\u8ad6\u6587\u306f\u3059\u3079\u3066 <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA</a> AI \u306e\u540c\u3058\u8457\u8005\u306b\u3088\u308b\u3082\u306e\u3067\u3059</p>\u3002\n",
+ "<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper  <a href=\"https://arxiv.org/abs/1912.04958\">Analyzing and Improving the Image Quality of StyleGAN</a>  which introduces <strong>StyleGAN 2</strong>. StyleGAN 2 is an improvement over <strong>StyleGAN</strong> from the paper  <a href=\"https://arxiv.org/abs/1812.04948\">A Style-Based Generator Architecture for Generative Adversarial Networks</a>. And StyleGAN is based on <strong>Progressive GAN</strong> from the paper  <a href=\"https://arxiv.org/abs/1710.10196\">Progressive Growing of GANs for Improved Quality, Stability, and Variation</a>. All three papers are from the same authors from <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>.</p>\n": "<p><strong>\u3053\u308c\u306f\u3001<a href=\"https://arxiv.org/abs/1912.04958\">StyleGan 2\u3092\u7d39\u4ecb\u3059\u308b\u8ad6\u6587\u300c\u30b9\u30bf\u30a4\u30eb\u30ac\u30f3\u306e\u753b\u8cea\u306e\u5206\u6790\u3068\u6539\u5584\u300d<a href=\"https://pytorch.org\">\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a></a>\u3002</strong>StyleGan 2\u306f\u3001\u8ad6\u6587\u300c<strong><a href=\"https://arxiv.org/abs/1812.04948\">\u6575\u5bfe\u7684\u751f\u6210\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u305f\u3081\u306e\u30b9\u30bf\u30a4\u30eb\u30d9\u30fc\u30b9\u306e\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u300d\u306eStyleGAN\u3092\u6539\u826f\u3057\u305f\u3082\u306e\u3067\u3059</a></strong>\u3002\u307e\u305f\u3001StyleGan\u306f\u8ad6\u6587\u300c<strong>GAN\u306e\u6f38\u9032\u7684\u6210\u9577\u306b\u3088\u308b\u54c1\u8cea</strong><a href=\"https://arxiv.org/abs/1710.10196\">\u3001\u5b89\u5b9a\u6027\u3001\u30d0\u30ea\u30a8\u30fc\u30b7\u30e7\u30f3\u306e\u5411\u4e0a\u300d\u306e\u30d7\u30ed\u30b0\u30ec\u30c3\u30b7\u30d6GAN\u3092\u30d9\u30fc\u30b9\u306b\u3057\u3066\u3044\u307e\u3059</a>\u30023 \u3064\u306e\u8ad6\u6587\u306f\u3059\u3079\u3066 <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA</a> AI \u306e\u540c\u3058\u8457\u8005\u306b\u3088\u308b\u3082\u306e\u3067\u3059</p>\u3002\n",
 "<p>To prevent the generator from assuming adjacent styles are correlated,  they randomly use different styles for different blocks. That is, they sample two latent vectors <span translate=no>_^_0_^_</span> and corresponding <span translate=no>_^_1_^_</span> and  use <span translate=no>_^_2_^_</span> based styles for some blocks and <span translate=no>_^_3_^_</span> based styles for some blacks randomly.</p>\n": "<p>\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u304c\u96a3\u63a5\u3059\u308b\u30b9\u30bf\u30a4\u30eb\u304c\u76f8\u4e92\u306b\u95a2\u9023\u3057\u3066\u3044\u308b\u3068\u898b\u306a\u3055\u306a\u3044\u3088\u3046\u306b\u3001\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u306f\u30d6\u30ed\u30c3\u30af\u3054\u3068\u306b\u7570\u306a\u308b\u30b9\u30bf\u30a4\u30eb\u3092\u30e9\u30f3\u30c0\u30e0\u306b\u4f7f\u7528\u3057\u307e\u3059\u3002\u3064\u307e\u308a\u3001<span translate=no>_^_0_^_</span> <span translate=no>_^_1_^_</span> 2\u3064\u306e\u6f5c\u5728\u30d9\u30af\u30c8\u30eb\u3068\u305d\u308c\u306b\u5bfe\u5fdc\u3059\u308b\u3082\u306e\u3092\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u3057\u3001<span translate=no>_^_2_^_</span>\u4e00\u90e8\u306e\u30d6\u30ed\u30c3\u30af\u306b\u306f\u30d9\u30fc\u30b9\u30b9\u30bf\u30a4\u30eb\u3092\u4f7f\u7528\u3057\u3001<span translate=no>_^_3_^_</span>\u4e00\u90e8\u306e\u9ed2\u4eba\u306b\u306f\u30d9\u30fc\u30b9\u30b9\u30bf\u30a4\u30eb\u3092\u30e9\u30f3\u30c0\u30e0\u306b\u4f7f\u7528\u3057\u307e\u3059</p>\u3002\n",
 "<p>Trainable <span translate=no>_^_0_^_</span> constant </p>\n": "<p>\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u53ef\u80fd\u306a\u5b9a\u6570 <span translate=no>_^_0_^_</span></p>\n",
 "<p>Try to normalize the image (this is totally optional, but sped up the early training a little) </p>\n": "<p>\u753b\u50cf\u3092\u6b63\u898f\u5316\u3057\u3066\u307f\u3066\u304f\u3060\u3055\u3044\uff08\u3053\u308c\u306f\u5b8c\u5168\u306b\u30aa\u30d7\u30b7\u30e7\u30f3\u3067\u3059\u304c\u3001\u521d\u671f\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3092\u5c11\u3057\u30b9\u30d4\u30fc\u30c9\u30a2\u30c3\u30d7\u3067\u304d\u307e\u3059\uff09</p>\n",
--- a/translate_cache/gan/stylegan/init.si.json
+++ b/translate_cache/gan/stylegan/init.si.json
@ -15,20 +15,20 @@
 "<h4>Weight Modulation and Demodulation</h4>\n": "<h4>\u0db6\u0dbb\u0db8\u0ddc\u0da9\u0dd2\u0dba\u0dd4\u0dbd\u0dda\u0dc2\u0db1\u0dca \u0dc3\u0dc4 \u0da9\u0dd2\u0db8\u0ddc\u0da9\u0dd2\u0dba\u0dd4\u0dbd\u0dda\u0dc2\u0db1\u0dca</h4>\n",
 "<p> <a id=\"discriminator\"></a></p>\n<h2>StyleGAN 2 Discriminator</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p>Discriminator first transforms the image to a feature map of the same resolution and then runs it through a series of blocks with residual connections. The resolution is down-sampled by <span translate=no>_^_1_^_</span> at each block while doubling the number of features.</p>\n": "<p> <a id=\"discriminator\"></a></p>\n<h2>StyleGan2 \u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0db8\u0dca \u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dcf</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p>\u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0db8\u0dca\u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dcf \u0db4\u0dc5\u0db8\u0dd4\u0dc0 \u0dbb\u0dd6\u0db4\u0dba \u0d91\u0d9a\u0db8 \u0dc0\u0dd2\u0db7\u0dda\u0daf\u0db1\u0dba\u0dda \u0dc0\u0dd2\u0dc1\u0dda\u0dc2\u0dcf\u0d82\u0d9c \u0dc3\u0dd2\u0dad\u0dd2\u0dba\u0db8\u0d9a\u0da7 \u0db4\u0dbb\u0dd2\u0dc0\u0dbb\u0dca\u0dad\u0db1\u0dba \u0d9a\u0dbb \u0db4\u0dc3\u0dd4\u0dc0 \u0d85\u0dc0\u0dc1\u0dda\u0dc2 \u0dc3\u0db8\u0dca\u0db6\u0db1\u0dca\u0db0\u0dad\u0dcf \u0dc3\u0dc4\u0dd2\u0dad \u0d9a\u0dd4\u0da7\u0dca\u0da7\u0dd2 \u0db8\u0dcf\u0dbd\u0dcf\u0dc0\u0d9a\u0dca \u0dc4\u0dbb\u0dc4\u0dcf \u0d91\u0dba \u0db0\u0dcf\u0dc0\u0db1\u0dba \u0d9a\u0dbb\u0dba\u0dd2. \u0dc0\u0dd2\u0dc1\u0dda\u0dc2\u0dcf\u0d82\u0d9c \u0d9c\u0dab\u0db1 \u0daf\u0dd9\u0d9c\u0dd4\u0dab \u0d9a\u0dbb\u0db1 \u0d85\u0dad\u0dbb \u0dc0\u0dd2\u0db7\u0dda\u0daf\u0db1\u0dba \u0d91\u0d9a\u0dca \u0d91\u0d9a\u0dca \u0db6\u0dca\u0dbd\u0ddc\u0d9a\u0dca <span translate=no>_^_1_^_</span> \u0d91\u0d9a\u0dda \u0db4\u0dc4\u0dc5-\u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd2 \u0d9a\u0dbb \u0d87\u0dad. </p>\n",
 "<p> <a id=\"discriminator_black\"></a></p>\n<h3>Discriminator Block</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p>Discriminator block consists of two <span translate=no>_^_1_^_</span> convolutions with a residual connection.</p>\n": "<p> <a id=\"discriminator_black\"></a></p>\n<h3>\u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0db8\u0dca\u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dcf \u0db6\u0dca\u0dbd\u0ddc\u0d9a\u0dca</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p>Disturistator\u0db6\u0dca\u0dbd\u0ddc\u0d9a\u0dca \u0d85\u0dc0\u0dc1\u0dda\u0dc2 \u0dc3\u0db8\u0dca\u0db6\u0db1\u0dca\u0db0\u0dad\u0dcf\u0dc0\u0dba\u0d9a\u0dca \u0dc3\u0dc4\u0dd2\u0dad <span translate=no>_^_1_^_</span> \u0dc0\u0dca\u0dba\u0dcf\u0d82\u0da2\u0db1 \u0daf\u0dd9\u0d9a\u0d9a\u0dd2\u0db1\u0dca \u0dc3\u0db8\u0db1\u0dca\u0dc0\u0dd2\u0dad \u0dc0\u0dda. </p>\n",
- "<p> <a id=\"down_sample\"></a></p>\n<h3>Down-sample</h3>\n<p>The down-sample operation <a href=\"#smooth\">smoothens</a> each feature channel and  scale <span translate=no>_^_0_^_</span> using bilinear interpolation. This is based on the paper  <a href=\"https://papers.labml.ai/paper/1904.11486\">Making Convolutional Networks Shift-Invariant Again</a>.</p>\n": "<p> <a id=\"down_sample\"></a></p>\n<h3>\u0db4\u0dc4\u0dc5-\u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd2\u0dba</h3>\n<p>\u0db4\u0dc4\u0dc5-\u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd2\u0db8\u0dd9\u0dc4\u0dd9\u0dba\u0dd4\u0db8 \u0daf\u0dca\u0dc0\u0dd2\u0dbd\u0dd3\u0db1 <a href=\"#smooth\">\u0d85\u0db1\u0dca\u0dad\u0dbb\u0dca\u0db1\u0dd2\u0dc0\u0dda\u0dc2\u0dab\u0dba <span translate=no>_^_0_^_</span> \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0db8\u0dd2\u0db1\u0dca \u0d91\u0d9a\u0dca \u0d91\u0d9a\u0dca \u0dc0\u0dd2\u0dc1\u0dda\u0dc2\u0dcf\u0d82\u0d9c \u0db1\u0dcf\u0dbd\u0dd2\u0d9a\u0dcf\u0dc0 \u0dc3\u0dc4 \u0db4\u0dbb\u0dd2\u0db8\u0dcf\u0dab\u0dba \u0dc3\u0dd4\u0db8\u0da7\u0db1\u0dba</a> \u0d9a\u0dbb\u0dba\u0dd2. \u0db8\u0dd9\u0dba \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd3 \u0d87\u0dad\u0dca\u0dad\u0dda <a href=\"https://papers.labml.ai/paper/1904.11486\">\u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0db8\u0dad \u0dba Convolutional Networks Shift-Invariant \u0db1\u0dd0\u0dc0\u0dad\u0dad\u0dca</a>. </p>\n",
+ "<p> <a id=\"down_sample\"></a></p>\n<h3>Down-sample</h3>\n<p>The down-sample operation <a href=\"#smooth\">smoothens</a> each feature channel and  scale <span translate=no>_^_0_^_</span> using bilinear interpolation. This is based on the paper  <a href=\"https://arxiv.org/abs/1904.11486\">Making Convolutional Networks Shift-Invariant Again</a>.</p>\n": "<p> <a id=\"down_sample\"></a></p>\n<h3>\u0db4\u0dc4\u0dc5-\u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd2\u0dba</h3>\n<p>\u0db4\u0dc4\u0dc5-\u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd2\u0db8\u0dd9\u0dc4\u0dd9\u0dba\u0dd4\u0db8 \u0daf\u0dca\u0dc0\u0dd2\u0dbd\u0dd3\u0db1 <a href=\"#smooth\">\u0d85\u0db1\u0dca\u0dad\u0dbb\u0dca\u0db1\u0dd2\u0dc0\u0dda\u0dc2\u0dab\u0dba <span translate=no>_^_0_^_</span> \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0db8\u0dd2\u0db1\u0dca \u0d91\u0d9a\u0dca \u0d91\u0d9a\u0dca \u0dc0\u0dd2\u0dc1\u0dda\u0dc2\u0dcf\u0d82\u0d9c \u0db1\u0dcf\u0dbd\u0dd2\u0d9a\u0dcf\u0dc0 \u0dc3\u0dc4 \u0db4\u0dbb\u0dd2\u0db8\u0dcf\u0dab\u0dba \u0dc3\u0dd4\u0db8\u0da7\u0db1\u0dba</a> \u0d9a\u0dbb\u0dba\u0dd2. \u0db8\u0dd9\u0dba \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd3 \u0d87\u0dad\u0dca\u0dad\u0dda <a href=\"https://arxiv.org/abs/1904.11486\">\u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0db8\u0dad \u0dba Convolutional Networks Shift-Invariant \u0db1\u0dd0\u0dc0\u0dad\u0dad\u0dca</a>. </p>\n",
 "<p> <a id=\"equalized_conv2d\"></a></p>\n<h2>Learning-rate Equalized 2D Convolution Layer</h2>\n<p>This uses <a href=\"#equalized_weights\">learning-rate equalized weights</a> for a convolution layer.</p>\n": "<p> <a id=\"equalized_conv2d\"></a></p>\n<h2>\u0d89\u0d9c\u0dd9\u0db1\u0dd4\u0db8\u0dca-\u0d85\u0db1\u0dd4\u0db4\u0dcf\u0dad\u0dc3\u0db8\u0dcf\u0db1 2D \u0dc3\u0db8\u0dca\u0db8\u0dd4\u0dad\u0dd2\u0dba \u0dc3\u0dca\u0dae\u0dbb\u0dba</h2>\n<p>\u0db8\u0dd9\u0dba\u0d9a\u0dd0\u0da7\u0dd2 \u0d9c\u0dd0\u0dc3\u0dd4\u0dab\u0dd4 \u0dc3\u0dca\u0dae\u0dbb\u0dba\u0d9a\u0dca \u0dc3\u0db3\u0dc4\u0dcf <a href=\"#equalized_weights\">\u0d89\u0d9c\u0dd9\u0db1\u0dd3\u0db8-\u0d85\u0db1\u0dd4\u0db4\u0dcf\u0dad \u0dc3\u0db8\u0dcf\u0db1 \u0db6\u0dbb</a> \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0dba\u0dd2. </p>\n",
 "<p> <a id=\"equalized_linear\"></a></p>\n<h2>Learning-rate Equalized Linear Layer</h2>\n<p>This uses <a href=\"#equalized_weights\">learning-rate equalized weights</a> for a linear layer.</p>\n": "<p> <a id=\"equalized_linear\"></a></p>\n<h2>\u0d89\u0d9c\u0dd9\u0db1\u0dd3\u0db8-\u0d85\u0db1\u0dd4\u0db4\u0dcf\u0dad\u0dc3\u0db8\u0dcf\u0db1 \u0dbb\u0dda\u0d9b\u0dd3\u0dba \u0dc3\u0dca\u0dae\u0dbb\u0dba</h2>\n<p>\u0db8\u0dd9\u0dba\u0dbb\u0dda\u0d9b\u0dd3\u0dba \u0dc3\u0dca\u0dad\u0dbb\u0dba\u0d9a\u0dca \u0dc3\u0db3\u0dc4\u0dcf <a href=\"#equalized_weights\">\u0d89\u0d9c\u0dd9\u0db1\u0dd3\u0db8\u0dda \u0d85\u0db1\u0dd4\u0db4\u0dcf\u0dad \u0dc3\u0db8\u0dcf\u0db1 \u0db6\u0dbb</a> \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0dba\u0dd2. </p>\n",
 "<p> <a id=\"equalized_weight\"></a></p>\n<h2>Learning-rate Equalized Weights Parameter</h2>\n<p>This is based on equalized learning rate introduced in the Progressive GAN paper. Instead of initializing weights at <span translate=no>_^_0_^_</span> they initialize weights to <span translate=no>_^_1_^_</span> and then multiply them by <span translate=no>_^_2_^_</span> when using it. <span translate=no>_^_3_^_</span></p>\n<p>The gradients on stored parameters <span translate=no>_^_4_^_</span> get multiplied by <span translate=no>_^_5_^_</span> but this doesn&#x27;t have an affect since optimizers such as Adam normalize them by a running mean of the squared gradients.</p>\n<p>The optimizer updates on <span translate=no>_^_6_^_</span> are proportionate to the learning rate <span translate=no>_^_7_^_</span>. But the effective weights <span translate=no>_^_8_^_</span> get updated proportionately to <span translate=no>_^_9_^_</span>. Without equalized learning rate, the effective weights will get updated proportionately to just <span translate=no>_^_10_^_</span>.</p>\n<p>So we are effectively scaling the learning rate by <span translate=no>_^_11_^_</span> for these weight parameters.</p>\n": "<p> <a id=\"equalized_weight\"></a></p>\n<h2>\u0d89\u0d9c\u0dd9\u0db1\u0dd3\u0db8-\u0d85\u0db1\u0dd4\u0db4\u0dcf\u0dad\u0dba\u0dc3\u0db8\u0dcf\u0db1 \u0db6\u0dbb \u0db4\u0dbb\u0dcf\u0db8\u0dd2\u0dad\u0dd2\u0dba</h2>\n<p>\u0db8\u0dd9\u0dba\u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd3 \u0d87\u0dad\u0dca\u0dad\u0dda \u0db4\u0dca\u0dbb\u0d9c\u0dad\u0dd2\u0dc1\u0dd3\u0dbd\u0dd3 GAN \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dbd \u0dc4\u0db3\u0dd4\u0db1\u0dca\u0dc0\u0dcf \u0daf\u0dd3 \u0d87\u0dad\u0dd2 \u0dc3\u0db8\u0dcf\u0db1 \u0d89\u0d9c\u0dd9\u0db1\u0dd4\u0db8\u0dca \u0d85\u0db1\u0dd4\u0db4\u0dcf\u0dad\u0dba \u0db8\u0dad \u0dba. <span translate=no>_^_0_^_</span> \u0d94\u0dc0\u0dd4\u0db1\u0dca \u0db6\u0dbb \u0d86\u0dbb\u0db8\u0dca\u0db7 \u0d9a\u0dbb\u0db1\u0dc0\u0dcf \u0dc0\u0dd9\u0db1\u0dd4\u0dc0\u0da7 \u0db6\u0dbb \u0d86\u0dbb\u0db8\u0dca\u0db7 \u0d9a\u0dbb <span translate=no>_^_1_^_</span> \u0d91\u0dba \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0db1 <span translate=no>_^_2_^_</span> \u0dc0\u0dd2\u0da7 \u0d92\u0dc0\u0dcf \u0d9c\u0dd4\u0dab \u0d9a\u0dbb\u0db1\u0dca\u0db1. <span translate=no>_^_3_^_</span></p>\n<p>\u0d9c\u0db6\u0da9\u0dcf\u0d9a\u0dbb\u0db1 \u0dbd\u0daf <span translate=no>_^_4_^_</span> \u0db4\u0dbb\u0dcf\u0db8\u0dd2\u0dad\u0dd3\u0db1\u0dca\u0dc4\u0dd2 \u0d85\u0db1\u0dd4\u0d9a\u0dca\u0dbb\u0db8\u0dd2\u0d9a \u0d9c\u0dd4\u0dab \u0d9a\u0dbb\u0db1 <span translate=no>_^_5_^_</span> \u0db1\u0db8\u0dd4\u0dad\u0dca \u0d86\u0daf\u0db8\u0dca \u0dc0\u0dd0\u0db1\u0dd2 \u0db4\u0dca\u0dbb\u0dc1\u0dc3\u0dca\u0dad\u0dd2\u0d9a\u0dba\u0db1\u0dca \u0dc0\u0dbb\u0dca\u0d9c \u0d9a\u0dc5 \u0dc1\u0dca\u0dbb\u0dda\u0dab\u0dd2\u0dba\u0d9a \u0db0\u0dcf\u0dc0\u0db1 \u0db8\u0db0\u0dca\u0dba\u0db1\u0dca\u0dba\u0dba\u0d9a\u0dd2\u0db1\u0dca \u0d92\u0dc0\u0dcf \u0dc3\u0dcf\u0db8\u0dcf\u0db1\u0dca\u0dba\u0d9a\u0dbb\u0dab\u0dba \u0d9a\u0dbb\u0db1 \u0db6\u0dd0\u0dc0\u0dd2\u0db1\u0dca \u0db8\u0dd9\u0dba\u0da7 \u0db6\u0dbd\u0db4\u0dd1\u0db8\u0d9a\u0dca \u0d87\u0dad\u0dd2 \u0db1\u0ddc\u0d9a\u0dbb\u0dba\u0dd2. </p>\n<p>\u0db4\u0dca\u0dbb\u0dc1\u0dc3\u0dca\u0dad\u0dd2\u0d9a\u0dbb\u0dab\u0dba\u0dcf\u0dc0\u0dad\u0dca\u0d9a\u0dcf\u0dbd\u0dd3\u0db1 <span translate=no>_^_6_^_</span> \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dca \u0d89\u0d9c\u0dd9\u0db1\u0dd4\u0db8\u0dca \u0d85\u0db1\u0dd4\u0db4\u0dcf\u0dad\u0dba\u0da7 \u0dc3\u0db8\u0dcf\u0db1\u0dd4\u0db4\u0dcf\u0dad\u0dd2\u0d9a <span translate=no>_^_7_^_</span>\u0dc0\u0dda. \u0db1\u0db8\u0dd4\u0dad\u0dca <span translate=no>_^_8_^_</span> \u0db5\u0dbd\u0daf\u0dcf\u0dba\u0dd3 \u0db4\u0da9\u0dd2 \u0dc3\u0db8\u0dcf\u0db1\u0dd4\u0db4\u0dcf\u0dad\u0dd2\u0d9a\u0dc0 \u0dba\u0dcf\u0dc0\u0dad\u0dca\u0d9a\u0dcf\u0dbd\u0dd3\u0db1 \u0dc0\u0dda <span translate=no>_^_9_^_</span>. \u0dc3\u0db8\u0dcf\u0db1 \u0d89\u0d9c\u0dd9\u0db1\u0dd4\u0db8\u0dca \u0d85\u0db1\u0dd4\u0db4\u0dcf\u0dad\u0dba \u0dad\u0ddc\u0dbb\u0dc0, \u0db5\u0dbd\u0daf\u0dcf\u0dba\u0dd3 \u0db4\u0da9\u0dd2 \u0db4\u0db8\u0dab\u0d9a\u0dca \u0dc3\u0db8\u0dcf\u0db1\u0dd4\u0db4\u0dcf\u0dad\u0dd2\u0d9a\u0dc0 \u0dba\u0dcf\u0dc0\u0dad\u0dca\u0d9a\u0dcf\u0dbd\u0dd3\u0db1 \u0dbd\u0dd0\u0db6\u0dd9\u0db1\u0dd4 \u0d87\u0dad <span translate=no>_^_10_^_</span>. </p>\n<p>\u0d91\u0db6\u0dd0\u0dc0\u0dd2\u0db1\u0dca\u0d85\u0db4\u0dd2 \u0db8\u0dd9\u0db8 \u0db6\u0dbb \u0db4\u0dbb\u0dcf\u0db8\u0dd2\u0dad\u0dd3\u0db1\u0dca <span translate=no>_^_11_^_</span> \u0dc3\u0db3\u0dc4\u0dcf \u0d89\u0d9c\u0dd9\u0db1\u0dd4\u0db8\u0dca \u0d85\u0db1\u0dd4\u0db4\u0dcf\u0dad\u0dba effectively \u0dbd\u0daf\u0dcf\u0dba\u0dd3 \u0dbd\u0dd9\u0dc3 \u0db4\u0dbb\u0dd2\u0db8\u0dcf\u0dab\u0dba \u0d9a\u0dbb\u0db8\u0dd4. </p>\n",
 "<p> <a id=\"generator\"></a></p>\n<h2>StyleGAN2 Generator</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> denotes a linear layer. <span translate=no>_^_2_^_</span> denotes a broadcast and scaling operation (noise is a single channel). <a href=\"#to_rgb\"><span translate=no>_^_3_^_</span></a> also has a style modulation which is not shown in the diagram to keep it simple.</em></small></p>\n<p>The generator starts with a learned constant. Then it has a series of blocks. The feature map resolution is doubled at each block Each block outputs an RGB image and they are scaled up and summed to get the final RGB image.</p>\n": "<p> <a id=\"generator\"></a></p>\n<h2>StyleGan2\u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0dba\u0db1\u0dca\u0dad\u0dca\u0dbb\u0dba</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> \u0dbb\u0dda\u0d9b\u0dd3\u0dba \u0dc3\u0dca\u0dae\u0dbb\u0dba\u0d9a\u0dca \u0daf\u0d9a\u0dca\u0dc0\u0dba\u0dd2. <span translate=no>_^_2_^_</span> \u0dc0\u0dd2\u0d9a\u0dcf\u0dc1\u0db1 \u0dc4\u0dcf \u0db4\u0dbb\u0dd2\u0db8\u0dcf\u0dab \u0db8\u0dd9\u0dc4\u0dd9\u0dba\u0dd4\u0db8\u0d9a\u0dca \u0daf\u0d9a\u0dca\u0dc0\u0dba\u0dd2 (\u0dc1\u0db6\u0dca\u0daf\u0dba \u0dad\u0db1\u0dd2 \u0db1\u0dcf\u0dbd\u0dd2\u0d9a\u0dcf\u0dc0\u0d9a\u0dd2). <a href=\"#to_rgb\"><span translate=no>_^_3_^_</span></a> \u0dc1\u0ddb\u0dbd\u0dd3\u0dba \u0db8\u0ddc\u0da9\u0dd2\u0dba\u0dd4\u0dbd\u0dda\u0dc2\u0db1\u0dca \u0d91\u0d9a\u0d9a\u0dca \u0daf \u0d87\u0dad\u0dd2 \u0d85\u0dad\u0dbb \u0d91\u0dba \u0dc3\u0dbb\u0dbd \u0dbd\u0dd9\u0dc3 \u0dad\u0db6\u0dcf \u0d9c\u0dd0\u0db1\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0dbb\u0dd6\u0db4 \u0dc3\u0da7\u0dc4\u0db1\u0dda \u0db4\u0dd9\u0db1\u0dca\u0dc0\u0dcf \u0db1\u0dd0\u0dad. </em></small></p>\n<p>\u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a\u0dba\u0db1\u0dca\u0dad\u0dca\u0dbb\u0dba \u0d89\u0d9c\u0dd9\u0db1 \u0d9c\u0dad\u0dca \u0db1\u0dd2\u0dba\u0dad\u0dba\u0d9a\u0dd2\u0db1\u0dca \u0d86\u0dbb\u0db8\u0dca\u0db7 \u0dc0\u0dda. \u0d91\u0dc0\u0dd2\u0da7 \u0d91\u0dba \u0d9a\u0dd4\u0da7\u0dca\u0da7\u0dd2 \u0db8\u0dcf\u0dbd\u0dcf\u0dc0\u0d9a\u0dca \u0d87\u0dad. \u0dc0\u0dd2\u0dc1\u0dda\u0dc2\u0dcf\u0d82\u0d9c \u0dc3\u0dd2\u0dad\u0dd2\u0dba\u0db8\u0dca \u0dc0\u0dd2\u0db7\u0dda\u0daf\u0db1\u0dba \u0dc3\u0dd1\u0db8 \u0db6\u0dca\u0dbd\u0ddc\u0d9a\u0dca \u0d91\u0d9a\u0d9a\u0db8 \u0daf\u0dd9\u0d9c\u0dd4\u0dab \u0dc0\u0dda \u0dc3\u0dd1\u0db8 \u0db6\u0dca\u0dbd\u0ddc\u0d9a\u0dca \u0d91\u0d9a\u0d9a\u0dca\u0db8 RGB \u0dbb\u0dd6\u0db4\u0dba\u0d9a\u0dca \u0db4\u0dca\u0dbb\u0dad\u0dd2\u0daf\u0dcf\u0db1\u0dba \u0d9a\u0dbb\u0db1 \u0d85\u0dad\u0dbb \u0d85\u0dc0\u0dc3\u0dcf\u0db1 RGB \u0dbb\u0dd6\u0db4\u0dba \u0dbd\u0db6\u0dcf \u0d9c\u0dd0\u0db1\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0d92\u0dc0\u0dcf \u0db4\u0dbb\u0dd2\u0db8\u0dcf\u0dab\u0dba \u0d9a\u0dbb \u0dc3\u0dcf\u0dbb\u0dcf\u0d82\u0dc1\u0d9c\u0dad \u0d9a\u0dbb \u0d87\u0dad. </p>\n",
 "<p> <a id=\"generator_block\"></a></p>\n<h3>Generator Block</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> denotes a linear layer. <span translate=no>_^_2_^_</span> denotes a broadcast and scaling operation (noise is a single channel). <a href=\"#to_rgb\"><span translate=no>_^_3_^_</span></a> also has a style modulation which is not shown in the diagram to keep it simple.</em></small></p>\n<p>The generator block consists of two <a href=\"#style_block\">style blocks</a> (<span translate=no>_^_4_^_</span> convolutions with style modulation) and an RGB output.</p>\n": "<p> <a id=\"generator_block\"></a></p>\n<h3>\u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a\u0db6\u0dca\u0dbd\u0ddc\u0d9a\u0dca</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> \u0dbb\u0dda\u0d9b\u0dd3\u0dba \u0dc3\u0dca\u0dae\u0dbb\u0dba\u0d9a\u0dca \u0daf\u0d9a\u0dca\u0dc0\u0dba\u0dd2. <span translate=no>_^_2_^_</span> \u0dc0\u0dd2\u0d9a\u0dcf\u0dc1\u0db1 \u0dc4\u0dcf \u0db4\u0dbb\u0dd2\u0db8\u0dcf\u0dab \u0db8\u0dd9\u0dc4\u0dd9\u0dba\u0dd4\u0db8\u0d9a\u0dca \u0daf\u0d9a\u0dca\u0dc0\u0dba\u0dd2 (\u0dc1\u0db6\u0dca\u0daf\u0dba \u0dad\u0db1\u0dd2 \u0db1\u0dcf\u0dbd\u0dd2\u0d9a\u0dcf\u0dc0\u0d9a\u0dd2). <a href=\"#to_rgb\"><span translate=no>_^_3_^_</span></a> \u0dc1\u0ddb\u0dbd\u0dd3\u0dba \u0db8\u0ddc\u0da9\u0dd2\u0dba\u0dd4\u0dbd\u0dda\u0dc2\u0db1\u0dca \u0d91\u0d9a\u0d9a\u0dca \u0daf \u0d87\u0dad\u0dd2 \u0d85\u0dad\u0dbb \u0d91\u0dba \u0dc3\u0dbb\u0dbd \u0dbd\u0dd9\u0dc3 \u0dad\u0db6\u0dcf \u0d9c\u0dd0\u0db1\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0dbb\u0dd6\u0db4 \u0dc3\u0da7\u0dc4\u0db1\u0dda \u0db4\u0dd9\u0db1\u0dca\u0dc0\u0dcf \u0db1\u0dd0\u0dad. </em></small></p>\n<p>\u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a\u0d9a\u0ddc\u0da7\u0dc3 <a href=\"#style_block\">\u0dc1\u0ddb\u0dbd\u0dd3\u0dba \u0d9a\u0dd4\u0da7\u0dca\u0da7\u0dd2 \u0daf\u0dd9\u0d9a\u0d9a\u0dd2\u0db1\u0dca (\u0dc1\u0ddb\u0dbd\u0dd3\u0dba</a> \u0db8\u0ddc\u0da9\u0dd2\u0dba\u0dd4\u0dbd\u0dda\u0dc2\u0db1\u0dca \u0dc3\u0dc4\u0dd2\u0dad<span translate=no>_^_4_^_</span> \u0d9a\u0dd0\u0da7\u0dd2 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dca) \u0dc3\u0dc4 RGB \u0db4\u0dca\u0dbb\u0dad\u0dd2\u0daf\u0dcf\u0db1\u0dba\u0d9a\u0dd2\u0db1\u0dca \u0dc3\u0db8\u0db1\u0dca\u0dc0\u0dd2\u0dad \u0dc0\u0dda. </p>\n",
- "<p> <a id=\"gradient_penalty\"></a></p>\n<h2>Gradient Penalty</h2>\n<p>This is the <span translate=no>_^_0_^_</span> regularization penality from the paper <a href=\"https://papers.labml.ai/paper/1801.04406\">Which Training Methods for GANs do actually Converge?</a>.</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>That is we try to reduce the L2 norm of gradients of the discriminator with respect to images, for real images (<span translate=no>_^_2_^_</span>).</p>\n": "<p> <a id=\"gradient_penalty\"></a></p>\n<h2>\u0d9c\u0dca\u0dbb\u0dda\u0da9\u0dd2\u0dba\u0db1\u0dca\u0da7\u0dca\u0daf\u0dac\u0dd4\u0dc0\u0db8</h2>\n<p>\u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2\u0dc0\u0dbd\u0dd2\u0db1\u0dca <span translate=no>_^_0_^_</span> \u0dc0\u0dd2\u0db0\u0dd2\u0db8\u0dad\u0dca \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dda \u0daf ality \u0dd4\u0dc0\u0db8 \u0db8\u0dd9\u0dba\u0dba\u0dd2 <a href=\"https://papers.labml.ai/paper/1801.04406\">GANs \u0dc3\u0db3\u0dc4\u0dcf \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dca\u0dbb\u0db8 \u0d87\u0dad\u0dca\u0dad \u0dc0\u0dc1\u0dba\u0dd9\u0db1\u0dca\u0db8 \u0d85\u0db7\u0dd2\u0dc3\u0dcf\u0dbb\u0dd3 \u0dc0\u0db1\u0dca\u0db1\u0dda \u0d9a\u0dd4\u0db8\u0d9a\u0dca\u0daf? </a>. </p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>\u0d92\u0dad\u0db8\u0dba\u0dd2 \u0d85\u0db4\u0dd2 \u0dbb\u0dd6\u0db4 \u0dc3\u0db8\u0dca\u0db6\u0db1\u0dca\u0db0\u0dba\u0dd9\u0db1\u0dca \u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0db8\u0dca \u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dcf\u0d9c\u0dda L2 \u0dc3\u0db8\u0dca\u0db8\u0dad\u0dba \u0d85\u0da9\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0da7 \u0d8b\u0dad\u0dca\u0dc3\u0dcf\u0dc4 \u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dda \u0dc3\u0dd0\u0db6\u0dd1 \u0dbb\u0dd6\u0db4 \u0dc3\u0db3\u0dc4\u0dcf (<span translate=no>_^_2_^_</span>). </p>\n",
+ "<p> <a id=\"gradient_penalty\"></a></p>\n<h2>Gradient Penalty</h2>\n<p>This is the <span translate=no>_^_0_^_</span> regularization penality from the paper <a href=\"https://arxiv.org/abs/1801.04406\">Which Training Methods for GANs do actually Converge?</a>.</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>That is we try to reduce the L2 norm of gradients of the discriminator with respect to images, for real images (<span translate=no>_^_2_^_</span>).</p>\n": "<p> <a id=\"gradient_penalty\"></a></p>\n<h2>\u0d9c\u0dca\u0dbb\u0dda\u0da9\u0dd2\u0dba\u0db1\u0dca\u0da7\u0dca\u0daf\u0dac\u0dd4\u0dc0\u0db8</h2>\n<p>\u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2\u0dc0\u0dbd\u0dd2\u0db1\u0dca <span translate=no>_^_0_^_</span> \u0dc0\u0dd2\u0db0\u0dd2\u0db8\u0dad\u0dca \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dda \u0daf ality \u0dd4\u0dc0\u0db8 \u0db8\u0dd9\u0dba\u0dba\u0dd2 <a href=\"https://arxiv.org/abs/1801.04406\">GANs \u0dc3\u0db3\u0dc4\u0dcf \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dca\u0dbb\u0db8 \u0d87\u0dad\u0dca\u0dad \u0dc0\u0dc1\u0dba\u0dd9\u0db1\u0dca\u0db8 \u0d85\u0db7\u0dd2\u0dc3\u0dcf\u0dbb\u0dd3 \u0dc0\u0db1\u0dca\u0db1\u0dda \u0d9a\u0dd4\u0db8\u0d9a\u0dca\u0daf? </a>. </p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>\u0d92\u0dad\u0db8\u0dba\u0dd2 \u0d85\u0db4\u0dd2 \u0dbb\u0dd6\u0db4 \u0dc3\u0db8\u0dca\u0db6\u0db1\u0dca\u0db0\u0dba\u0dd9\u0db1\u0dca \u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0db8\u0dca \u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dcf\u0d9c\u0dda L2 \u0dc3\u0db8\u0dca\u0db8\u0dad\u0dba \u0d85\u0da9\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0da7 \u0d8b\u0dad\u0dca\u0dc3\u0dcf\u0dc4 \u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dda \u0dc3\u0dd0\u0db6\u0dd1 \u0dbb\u0dd6\u0db4 \u0dc3\u0db3\u0dc4\u0dcf (<span translate=no>_^_2_^_</span>). </p>\n",
 "<p> <a id=\"mapping_network\"></a></p>\n<h2>Mapping Network</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p>This is an MLP with 8 linear layers. The mapping network maps the latent vector <span translate=no>_^_1_^_</span> to an intermediate latent space <span translate=no>_^_2_^_</span>. <span translate=no>_^_3_^_</span> space will be disentangled from the image space where the factors of variation become more linear.</p>\n": "<p> <a id=\"mapping_network\"></a></p>\n<h2>\u0da2\u0dcf\u0dbd\u0dc3\u0dd2\u0dad\u0dd2\u0dba\u0db8\u0dca\u0d9a\u0dbb\u0dab\u0dba</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p>\u0db8\u0dd9\u0dba\u0dbb\u0dda\u0d9b\u0dd3\u0dba \u0dc3\u0dca\u0dae\u0dbb 8 \u0d9a\u0dca \u0dc3\u0dc4\u0dd2\u0dad MLP \u0dc0\u0dda. \u0dc3\u0dd2\u0dad\u0dd2\u0dba\u0db8\u0dca\u0d9a\u0dbb\u0dab \u0da2\u0dcf\u0dbd\u0dba \u0d9c\u0dd4\u0db4\u0dca\u0dad \u0daf\u0ddb\u0dc1\u0dd2\u0d9a\u0dba \u0d85\u0dad\u0dbb\u0db8\u0dd0\u0daf\u0dd2 \u0d9c\u0dd4\u0db4\u0dca\u0dad <span translate=no>_^_1_^_</span> \u0d85\u0dc0\u0d9a\u0dcf\u0dc1\u0dba\u0d9a\u0da7 \u0dc3\u0dd2\u0dad\u0dd2\u0dba\u0db8\u0dca \u0d9c\u0dad <span translate=no>_^_2_^_</span>\u0d9a\u0dbb\u0dba\u0dd2. <span translate=no>_^_3_^_</span> \u0dc0\u0dd2\u0da0\u0dbd\u0dca\u0dba\u0dad\u0dcf\u0dc0\u0dba\u0dda \u0dc3\u0dcf\u0db0\u0d9a \u0dc0\u0da9\u0dcf\u0dad\u0dca \u0dbb\u0dda\u0d9b\u0dd3\u0dba \u0dc0\u0db1 \u0dbb\u0dd6\u0db4 \u0d85\u0dc0\u0d9a\u0dcf\u0dc1\u0dba\u0dd9\u0db1\u0dca \u0d85\u0dc0\u0d9a\u0dcf\u0dc1\u0dba \u0dc0\u0dd2\u0dc3\u0dd4\u0dbb\u0dd4\u0dc0\u0dcf \u0dc4\u0dbb\u0dd2\u0db1\u0dd4 \u0d87\u0dad. </p>\n",
 "<p> <a id=\"mini_batch_std_dev\"></a></p>\n<h3>Mini-batch Standard Deviation</h3>\n<p>Mini-batch standard deviation calculates the standard deviation across a mini-batch (or a subgroups within the mini-batch) for each feature in the feature map. Then it takes the mean of all the standard deviations and appends it to the feature map as one extra feature.</p>\n": "<p> <a id=\"mini_batch_std_dev\"></a></p>\n<h3>\u0d9a\u0dd4\u0da9\u0dcf\u0d9a\u0dab\u0dca\u0da9\u0dcf\u0dba\u0db8 \u0dc3\u0db8\u0dca\u0db8\u0dad \u0d85\u0db4\u0d9c\u0db8\u0db1\u0dba</h3>\n<p>\u0d9a\u0dd4\u0da9\u0dcf\u0d9a\u0dab\u0dca\u0da9\u0dcf\u0dba\u0db8\u0dca \u0dc3\u0db8\u0dca\u0db8\u0dad \u0d85\u0db4\u0d9c\u0db8\u0db1\u0dba \u0dc0\u0dd2\u0dc1\u0dda\u0dc2\u0dcf\u0d82\u0d9c \u0dc3\u0dd2\u0dad\u0dd2\u0dba\u0db8\u0dda \u0d87\u0dad\u0dd2 \u0d91\u0d9a\u0dca \u0d91\u0d9a\u0dca \u0dbd\u0d9a\u0dca\u0dc2\u0dab\u0dba \u0dc3\u0db3\u0dc4\u0dcf \u0d9a\u0dd4\u0da9\u0dcf \u0d9a\u0dab\u0dca\u0da9\u0dcf\u0dba\u0db8\u0d9a\u0dca (\u0dc4\u0ddd \u0d9a\u0dd4\u0da9\u0dcf \u0d9a\u0dab\u0dca\u0da9\u0dcf\u0dba\u0db8 \u0dad\u0dd4\u0dc5 \u0d87\u0dad\u0dd2 \u0d8b\u0db4 \u0d9a\u0dab\u0dca\u0da9\u0dcf\u0dba\u0db8\u0dca) \u0dc4\u0dbb\u0dc4\u0dcf \u0dc3\u0db8\u0dca\u0db8\u0dad \u0d85\u0db4\u0d9c\u0db8\u0db1\u0dba \u0d9c\u0dab\u0db1\u0dba \u0d9a\u0dbb\u0dba\u0dd2. \u0d91\u0dc0\u0dd2\u0da7 \u0d91\u0dba \u0dc3\u0dd2\u0dba\u0dbd\u0dd4 \u0dc3\u0db8\u0dca\u0db8\u0dad \u0d85\u0db4\u0d9c\u0db8\u0db1\u0dba\u0db1\u0dca\u0dc4\u0dd2 \u0db8\u0db0\u0dca\u0dba\u0db1\u0dca\u0dba\u0dba \u0d9c\u0dd9\u0db1 \u0d91\u0dba \u0d91\u0d9a\u0dca \u0d85\u0db8\u0dad\u0dbb \u0d85\u0d82\u0d9c\u0dba\u0d9a\u0dca \u0dbd\u0dd9\u0dc3 \u0dc0\u0dd2\u0dc1\u0dda\u0dc2\u0dcf\u0d82\u0d9c \u0dc3\u0dd2\u0dad\u0dd2\u0dba\u0db8\u0da7 \u0d91\u0d9a\u0dad\u0dd4 \u0d9a\u0dbb\u0dba\u0dd2. </p>\n",
 "<p> <a id=\"path_length_penalty\"></a></p>\n<h2>Path Length Penalty</h2>\n<p>This regularization encourages a fixed-size step in <span translate=no>_^_0_^_</span> to result in a fixed-magnitude change in the image.</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>where <span translate=no>_^_2_^_</span> is the Jacobian <span translate=no>_^_3_^_</span>, <span translate=no>_^_4_^_</span> are sampled from <span translate=no>_^_5_^_</span> from the mapping network, and <span translate=no>_^_6_^_</span> are images with noise <span translate=no>_^_7_^_</span>.</p>\n<p><span translate=no>_^_8_^_</span> is the exponential moving average of <span translate=no>_^_9_^_</span> as the training progresses.</p>\n<p><span translate=no>_^_10_^_</span> is calculated without explicitly calculating the Jacobian using <span translate=no>_^_11_^_</span></p>\n": "<p> <a id=\"path_length_penalty\"></a></p>\n<h2>\u0db8\u0dcf\u0dbb\u0dca\u0d9c\u0dba\u0daf\u0dd2\u0d9c \u0daf\u0dab\u0dca\u0da9\u0db1</h2>\n<p>\u0db8\u0dd9\u0db8\u0db1\u0dd2\u0dba\u0dcf\u0db8\u0db1\u0dba \u0db8\u0d9f\u0dd2\u0db1\u0dca \u0dbb\u0dd6\u0db4\u0dba\u0dda \u0dc3\u0dca\u0dae\u0dcf\u0dc0\u0dbb \u0db4\u0dca\u0dbb\u0db8\u0dcf\u0dab\u0dba\u0dda \u0dc0\u0dd9\u0db1\u0dc3\u0d9a\u0dca \u0d87\u0dad\u0dd2 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 <span translate=no>_^_0_^_</span> \u0dc3\u0db3\u0dc4\u0dcf \u0dc3\u0dca\u0dae\u0dcf\u0dc0\u0dbb \u0db4\u0dca\u0dbb\u0db8\u0dcf\u0dab\u0dba\u0dda \u0db4\u0dd2\u0dba\u0dc0\u0dbb\u0d9a\u0dca \u0daf\u0dd2\u0dbb\u0dd2\u0db8\u0dad\u0dca \u0d9a\u0dbb\u0dba\u0dd2. </p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>\u0da2\u0dd0\u0d9a\u0ddd\u0db6\u0dd2\u0dba\u0db1\u0dca <span translate=no>_^_2_^_</span> \u0d9a\u0ddc\u0dc4\u0dd9\u0daf <span translate=no>_^_3_^_</span>, <span translate=no>_^_4_^_</span> \u0dc3\u0dd2\u0dad\u0dd2\u0dba\u0db8\u0dca\u0d9a\u0dbb\u0dab <span translate=no>_^_5_^_</span> \u0da2\u0dcf\u0dbd\u0dba\u0dd9\u0db1\u0dca \u0dc3\u0dcf\u0db8\u0dca\u0db4\u0dbd \u0dbd\u0db6\u0dcf \u0d87\u0dad, \u0dc3\u0dc4 <span translate=no>_^_6_^_</span> \u0d92\u0dc0\u0dcf \u0dc1\u0db6\u0dca\u0daf\u0dba \u0dc3\u0dc4\u0dd2\u0dad \u0dbb\u0dd6\u0db4 <span translate=no>_^_7_^_</span>. </p>\n<p><span translate=no>_^_8_^_</span> \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4\u0dc0 \u0d89\u0daf\u0dd2\u0dbb\u0dd2\u0dba\u0da7 \u0dba\u0dad\u0dca\u0db8 \u0d9d\u0dcf\u0dad\u0dd3\u0dba \u0da0\u0dbd\u0db1\u0dba \u0dc0\u0db1 \u0dc3\u0dcf\u0db8\u0dcf\u0db1\u0dca\u0dba\u0dba \u0dc0\u0dda. <span translate=no>_^_9_^_</span> </p>\n<p><span translate=no>_^_10_^_</span> \u0da2\u0dd0\u0d9a\u0ddc\u0db6\u0dd2\u0dba\u0db1\u0dca \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dba\u0dd9\u0db1\u0dca \u0db4\u0dd0\u0dc4\u0dd0\u0daf\u0dd2\u0dbd\u0dd2\u0dc0 \u0d9c\u0dab\u0db1\u0dba \u0db1\u0ddc\u0d9a\u0dbb \u0d9c\u0dab\u0db1\u0dba \u0d9a\u0dbb\u0db1\u0dd4 \u0dbd\u0dd0\u0db6\u0dda <span translate=no>_^_11_^_</span></p>\n",
 "<p> <a id=\"smooth\"></a></p>\n<h3>Smoothing Layer</h3>\n<p>This layer blurs each channel</p>\n": "<p> <a id=\"smooth\"></a></p>\n<h3>\u0dc3\u0dca\u0dae\u0dbb\u0dba\u0dc3\u0dd4\u0db8\u0da7\u0db1\u0dba</h3>\n<p>\u0db8\u0dd9\u0db8\u0dc3\u0dca\u0dad\u0dbb\u0dba \u0d91\u0d9a\u0dca \u0d91\u0d9a\u0dca \u0db1\u0dcf\u0dbd\u0dd2\u0d9a\u0dcf\u0dc0 \u0db6\u0ddc\u0db3 \u0d9a\u0dbb\u0dba\u0dd2</p>\n",
 "<p> <a id=\"style_block\"></a></p>\n<h3>Style Block</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> denotes a linear layer. <span translate=no>_^_2_^_</span> denotes a broadcast and scaling operation (noise is single channel).</em></small></p>\n<p>Style block has a weight modulation convolution layer.</p>\n": "<p> <a id=\"style_block\"></a></p>\n<h3>\u0db6\u0dca\u0dbd\u0ddc\u0d9a\u0dca\u0dc3\u0dca\u0da7\u0dba\u0dd2\u0dbd\u0dca</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> \u0dbb\u0dda\u0d9b\u0dd3\u0dba \u0dc3\u0dca\u0dae\u0dbb\u0dba\u0d9a\u0dca \u0daf\u0d9a\u0dca\u0dc0\u0dba\u0dd2. <span translate=no>_^_2_^_</span> \u0dc0\u0dd2\u0d9a\u0dcf\u0dc1\u0db1 \u0dc4\u0dcf \u0db4\u0dbb\u0dd2\u0db8\u0dcf\u0dab \u0db8\u0dd9\u0dc4\u0dd9\u0dba\u0dd4\u0db8\u0d9a\u0dca \u0daf\u0d9a\u0dca\u0dc0\u0dba\u0dd2 (\u0dc1\u0db6\u0dca\u0daf\u0dba \u0dad\u0db1\u0dd2 \u0db1\u0dcf\u0dbd\u0dd2\u0d9a\u0dcf\u0dc0\u0d9a\u0dd2). </em></small></p>\n<p>\u0dc3\u0dca\u0da7\u0dba\u0dd2\u0dbd\u0dca\u0db6\u0dca\u0dbd\u0ddc\u0d9a\u0dca \u0db6\u0dbb \u0db8\u0ddc\u0da9\u0dd2\u0dba\u0dd4\u0dbd\u0dda\u0dc2\u0db1\u0dca \u0d9a\u0dd0\u0da7\u0dd2 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dda \u0dad\u0da7\u0dca\u0da7\u0dd4\u0dc0\u0d9a\u0dca \u0d87\u0dad. </p>\n",
 "<p> <a id=\"to_rgb\"></a></p>\n<h3>To RGB</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> denotes a linear layer.</em></small></p>\n<p>Generates an RGB image from a feature map using <span translate=no>_^_2_^_</span> convolution.</p>\n": "<p> <a id=\"to_rgb\"></a></p>\n<h3>RGB\u0dc0\u0dd9\u0dad</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> \u0dbb\u0dda\u0d9b\u0dd3\u0dba \u0dc3\u0dca\u0dae\u0dbb\u0dba\u0d9a\u0dca \u0daf\u0d9a\u0dca\u0dc0\u0dba\u0dd2. </em></small></p>\n<p><span translate=no>_^_2_^_</span> \u0dc3\u0d82\u0dc0\u0dbd\u0dd2\u0dad \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dba\u0dd9\u0db1\u0dca \u0dc0\u0dd2\u0dc1\u0dda\u0dc2\u0dcf\u0d82\u0d9c \u0dc3\u0dd2\u0dad\u0dd2\u0dba\u0db8\u0d9a\u0dd2\u0db1\u0dca RGB \u0dbb\u0dd6\u0db4\u0dba\u0d9a\u0dca \u0da2\u0db1\u0db1\u0dba \u0d9a\u0dbb\u0dba\u0dd2. </p>\n",
- "<p> <a id=\"up_sample\"></a></p>\n<h3>Up-sample</h3>\n<p>The up-sample operation scales the image up by <span translate=no>_^_0_^_</span> and <a href=\"#smooth\">smoothens</a> each feature channel. This is based on the paper  <a href=\"https://papers.labml.ai/paper/1904.11486\">Making Convolutional Networks Shift-Invariant Again</a>.</p>\n": "<p> <a id=\"up_sample\"></a></p>\n<h3>\u0d89\u0dc4\u0dc5\u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd2\u0dba</h3>\n<p>\u0daf\u0d9a\u0dca\u0dc0\u0dcf-\u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd2\u0db8\u0dd9\u0dc4\u0dd9\u0dba\u0dd4\u0db8 \u0db8\u0d9f\u0dd2\u0db1\u0dca \u0dbb\u0dd6\u0db4\u0dba \u0d89\u0dc4\u0dc5\u0da7 \u0d9c\u0dd9\u0db1 \u0dba\u0db1 <span translate=no>_^_0_^_</span> \u0d85\u0dad\u0dbb \u0d91\u0d9a\u0dca \u0d91\u0d9a\u0dca \u0dc0\u0dd2\u0dc1\u0dda\u0dc2\u0dcf\u0d82\u0d9c \u0db1\u0dcf\u0dbd\u0dd2\u0d9a\u0dcf\u0dc0 <a href=\"#smooth\">\u0dc3\u0dd4\u0db8\u0da7\u0db1\u0dba \u0d9a\u0dbb\u0dba\u0dd2</a> . \u0db8\u0dd9\u0dba \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd3 \u0d87\u0dad\u0dca\u0dad\u0dda <a href=\"https://papers.labml.ai/paper/1904.11486\">\u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0db8\u0dad \u0dba Convolutional Networks Shift-Invariant \u0db1\u0dd0\u0dc0\u0dad\u0dad\u0dca</a>. </p>\n",
+ "<p> <a id=\"up_sample\"></a></p>\n<h3>Up-sample</h3>\n<p>The up-sample operation scales the image up by <span translate=no>_^_0_^_</span> and <a href=\"#smooth\">smoothens</a> each feature channel. This is based on the paper  <a href=\"https://arxiv.org/abs/1904.11486\">Making Convolutional Networks Shift-Invariant Again</a>.</p>\n": "<p> <a id=\"up_sample\"></a></p>\n<h3>\u0d89\u0dc4\u0dc5\u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd2\u0dba</h3>\n<p>\u0daf\u0d9a\u0dca\u0dc0\u0dcf-\u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd2\u0db8\u0dd9\u0dc4\u0dd9\u0dba\u0dd4\u0db8 \u0db8\u0d9f\u0dd2\u0db1\u0dca \u0dbb\u0dd6\u0db4\u0dba \u0d89\u0dc4\u0dc5\u0da7 \u0d9c\u0dd9\u0db1 \u0dba\u0db1 <span translate=no>_^_0_^_</span> \u0d85\u0dad\u0dbb \u0d91\u0d9a\u0dca \u0d91\u0d9a\u0dca \u0dc0\u0dd2\u0dc1\u0dda\u0dc2\u0dcf\u0d82\u0d9c \u0db1\u0dcf\u0dbd\u0dd2\u0d9a\u0dcf\u0dc0 <a href=\"#smooth\">\u0dc3\u0dd4\u0db8\u0da7\u0db1\u0dba \u0d9a\u0dbb\u0dba\u0dd2</a> . \u0db8\u0dd9\u0dba \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd3 \u0d87\u0dad\u0dca\u0dad\u0dda <a href=\"https://arxiv.org/abs/1904.11486\">\u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0db8\u0dad \u0dba Convolutional Networks Shift-Invariant \u0db1\u0dd0\u0dc0\u0dad\u0dad\u0dca</a>. </p>\n",
 "<p><a href=\"#equalized_linear\">Equalized learning-rate linear layers</a> </p>\n": "<p><a href=\"#equalized_linear\">\u0dc3\u0db8\u0dcf\u0db1 \u0d89\u0d9c\u0dd9\u0db1\u0dd4\u0db8\u0dca-\u0d85\u0db1\u0dd4\u0db4\u0dcf\u0dad \u0dbb\u0dda\u0d9b\u0dd3\u0dba \u0dc3\u0dca\u0dae\u0dbb</a> </p>\n",
 "<p><a href=\"#equalized_weight\">Weights parameter with equalized learning rate</a> </p>\n": "<p><a href=\"#equalized_weight\">\u0dc3\u0db8\u0dcf\u0db1 \u0d89\u0d9c\u0dd9\u0db1\u0dd4\u0db8\u0dca \u0d85\u0db1\u0dd4\u0db4\u0dcf\u0dad\u0dba \u0dc3\u0dc4\u0dd2\u0dad \u0db6\u0dbb \u0db4\u0dbb\u0dcf\u0db8\u0dd2\u0dad\u0dd2\u0dba</a> </p>\n",
 "<p><a href=\"#equalized_weights\">Learning-rate equalized weights</a> </p>\n": "<p><a href=\"#equalized_weights\">\u0d89\u0d9c\u0dd9\u0db1\u0dd3\u0db8-\u0d85\u0db1\u0dd4\u0db4\u0dcf\u0dad \u0dc3\u0db8\u0dcf\u0db1 \u0db6\u0dbb</a> </p>\n",
@ -159,7 +159,7 @@
 "<p>Then the convolution weights <span translate=no>_^_0_^_</span> are modulated as follows. (<span translate=no>_^_1_^_</span> here on refers to weights not intermediate latent space,  we are sticking to the same notation as the paper.)</p>\n": "<p>\u0d91\u0dc0\u0dd2\u0da7\u0d9a\u0dd0\u0da7\u0dd2 \u0d9c\u0dd0\u0dc3\u0dd4\u0dab\u0dd4 \u0db6\u0dbb <span translate=no>_^_0_^_</span> \u0db4\u0dc4\u0dad \u0db4\u0dbb\u0dd2\u0daf\u0dd2 \u0db8\u0ddc\u0da9\u0dd2\u0dba\u0dd4\u0dbd\u0dda\u0da7\u0dca \u0dc0\u0dda. (<span translate=no>_^_1_^_</span> \u0db8\u0dd9\u0dc4\u0dd2 \u0daf\u0dd3 \u0d85\u0dad\u0dbb\u0db8\u0dd0\u0daf\u0dd2 \u0d9c\u0dd4\u0db4\u0dca\u0dad \u0d85\u0dc0\u0d9a\u0dcf\u0dc1\u0dba \u0db1\u0ddc\u0dc0\u0db1 \u0db4\u0da9\u0dd2 \u0d85\u0daf\u0dc4\u0dc3\u0dca, \u0d85\u0db4\u0dd2 \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dbd\u0dd9\u0dc3 \u0d91\u0db8 \u0d85\u0d82\u0d9a\u0db1\u0dba \u0daf\u0dd0\u0da9\u0dd2\u0dc0 \u0db6\u0dd0\u0db3\u0dd3 \u0dc3\u0dd2\u0da7\u0dd2\u0db1.) </p>\n",
 "<p>They remove the <span translate=no>_^_0_^_</span> operator and replace it with  the weight modulation and demodulation step. This is supposed to improve what they call droplet artifacts that are present in generated images,  which are caused by the normalization in <span translate=no>_^_1_^_</span> operator. Style vector per layer is calculated from <span translate=no>_^_2_^_</span> as <span translate=no>_^_3_^_</span>.</p>\n": "<p>\u0d94\u0dc0\u0dd4\u0db1\u0dca <span translate=no>_^_0_^_</span> \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0d9a\u0dbb\u0dd4 \u0d89\u0dc0\u0dad\u0dca \u0d9a\u0dbb \u0db6\u0dbb \u0db8\u0ddc\u0da9\u0dd2\u0dba\u0dd4\u0dbd\u0dda\u0dc2\u0db1\u0dca \u0dc3\u0dc4 \u0da9\u0dd2\u0db8\u0ddc\u0da9\u0dd2\u0dba\u0dd4\u0dbd\u0dda\u0dc2\u0db1\u0dca \u0db4\u0dd2\u0dba\u0dc0\u0dbb\u0dd9\u0db1\u0dca \u0d91\u0dba \u0db4\u0dca\u0dbb\u0dad\u0dd2\u0dc3\u0dca\u0dae\u0dcf\u0db4\u0db1\u0dba \u0d9a\u0dbb\u0dba\u0dd2. \u0db8\u0dd9\u0db8\u0d9f\u0dd2\u0db1\u0dca \u0da2\u0db1\u0db1\u0dba \u0d9a\u0dbb\u0db1 \u0dbd\u0daf \u0dbb\u0dd6\u0db4\u0dc0\u0dbd \u0d87\u0dad\u0dd2 \u0da2\u0dbd \u0db6\u0dd2\u0db3\u0dd2\u0dad\u0dd2 \u0d9a\u0dde\u0dad\u0dd4\u0d9a \u0dc0\u0dc3\u0dca\u0dad\u0dd4 \u0dbd\u0dd9\u0dc3 \u0d94\u0dc0\u0dd4\u0db1\u0dca \u0dc4\u0db3\u0dd4\u0db1\u0dca\u0dc0\u0db1 \u0daf\u0dda \u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dc5 \u0dba\u0dd4\u0dad\u0dd4 \u0d85\u0dad\u0dbb \u0d92\u0dc0\u0dcf <span translate=no>_^_1_^_</span> \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0d9a\u0dbb\u0dd4 \u0dad\u0dd4\u0dc5 \u0dc3\u0dcf\u0db8\u0dcf\u0db1\u0dca\u0dba\u0d9a\u0dbb\u0dab\u0dba \u0dc0\u0dd3\u0db8 \u0db1\u0dd2\u0dc3\u0dcf \u0d87\u0dad\u0dd2\u0dc0\u0dda. \u0dc3\u0dca\u0dae\u0dbb\u0dba\u0d9a\u0da7 \u0dc1\u0ddb\u0dbd\u0dd3\u0dba \u0daf\u0ddb\u0dc1\u0dd2\u0d9a\u0dba \u0d9c\u0dab\u0db1\u0dba <span translate=no>_^_2_^_</span> \u0d9a\u0dbb\u0db1\u0dd4 \u0dbd\u0dd0\u0db6\u0dda <span translate=no>_^_3_^_</span>. </p>\n",
 "<p>They use <strong>minibatch standard deviation</strong> to increase variation and  <strong>equalized learning rate</strong> which we discussed below in the implementation. They also use <strong>pixel-wise normalization</strong> where at each pixel the feature vector is normalized. They apply this to all the convolution layer outputs (except RGB).</p>\n": "<p>\u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a\u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dda\u0daf\u0dd3 \u0d85\u0db4 \u0db4\u0dc4\u0dad \u0dc3\u0dcf\u0d9a\u0da0\u0dca\u0da1\u0dcf <strong>\u0d9a\u0dc5 \u0dc0\u0dd2\u0da0\u0dbd\u0db1\u0dba \u0dc3\u0dc4 \u0dc3\u0db8\u0dcf\u0db1 \u0d89\u0d9c\u0dd9\u0db1\u0dd4\u0db8\u0dca \u0d85\u0db1\u0dd4\u0db4\u0dcf\u0dad\u0dba</strong> \u0dc0\u0dd0\u0da9\u0dd2 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0d94\u0dc0\u0dd4\u0db1\u0dca <strong>\u0db8\u0dd2\u0db1\u0dd2\u0db6\u0dd0\u0da0\u0dca \u0dc3\u0db8\u0dca\u0db8\u0dad \u0d85\u0db4\u0d9c\u0db8\u0db1\u0dba</strong> \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0dba\u0dd2. \u0d94\u0dc0\u0dd4\u0db1\u0dca <strong>\u0db4\u0dd2\u0d9a\u0dca\u0dc3\u0dbd\u0dca \u0d85\u0db1\u0dd4\u0dc0 \u0dc3\u0dcf\u0db8\u0dcf\u0db1\u0dca\u0dba\u0d9a\u0dbb\u0dab\u0dba</strong> \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0db1 \u0d85\u0dad\u0dbb \u0d91\u0dc4\u0dd2\u0daf\u0dd3 \u0dc3\u0dd1\u0db8 \u0db4\u0dd2\u0d9a\u0dca\u0dc3\u0dd9\u0dbd\u0dca \u0d91\u0d9a\u0d9a\u0db8 \u0dc0\u0dd2\u0dc1\u0dda\u0dc2\u0dcf\u0d82\u0d9c \u0daf\u0ddb\u0dc1\u0dd2\u0d9a\u0dba \u0dc3\u0dcf\u0db8\u0dcf\u0db1\u0dca\u0dba\u0d9a\u0dbb\u0dab\u0dba \u0dc0\u0dda. \u0d94\u0dc0\u0dd4\u0db1\u0dca \u0dc3\u0dd2\u0dba\u0dbd\u0dd4 \u0d9a\u0dd0\u0da7\u0dd2 \u0d9c\u0dd0\u0dc3\u0dd4\u0dab\u0dd4 \u0dc3\u0dca\u0dae\u0dbb \u0db4\u0dca\u0dbb\u0dad\u0dd2\u0daf\u0dcf\u0db1\u0dba\u0db1\u0dca \u0dc3\u0db3\u0dc4\u0dcf \u0db8\u0dd9\u0dba \u0d85\u0daf\u0dcf\u0dc5 \u0dc0\u0dda (RGB \u0dc4\u0dd0\u0dbb). </p>\n",
- "<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper  <a href=\"https://papers.labml.ai/paper/1912.04958\">Analyzing and Improving the Image Quality of StyleGAN</a>  which introduces <strong>StyleGAN 2</strong>. StyleGAN 2 is an improvement over <strong>StyleGAN</strong> from the paper  <a href=\"https://papers.labml.ai/paper/1812.04948\">A Style-Based Generator Architecture for Generative Adversarial Networks</a>. And StyleGAN is based on <strong>Progressive GAN</strong> from the paper  <a href=\"https://papers.labml.ai/paper/1710.10196\">Progressive Growing of GANs for Improved Quality, Stability, and Variation</a>. All three papers are from the same authors from <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>.</p>\n": "<p>\u0db8\u0dd9\u0dba <a href=\"https://pytorch.org\">PyTorch</a> \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0d9a\u0dd2 StyleGan <strong>2</strong>\u0dc4\u0db3\u0dd4\u0db1\u0dca\u0dc0\u0dcf \u0daf\u0dd9\u0db1 <a href=\"https://papers.labml.ai/paper/1912.04958\">StyleGan \u0dc4\u0dd2 \u0dbb\u0dd6\u0db4\u0dba\u0dda \u0d9c\u0dd4\u0dab\u0dcf\u0dad\u0dca\u0db8\u0d9a\u0db7\u0dcf\u0dc0\u0dba \u0dc0\u0dd2\u0dc1\u0dca\u0dbd\u0dda\u0dc2\u0dab\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0dc4 \u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8</a> . StyleGan 2 \u0dba\u0db1\u0dd4 \u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0d9a\u0dd2 <strong>StyleGan</strong> <a href=\"https://papers.labml.ai/paper/1812.04948\">\u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dbd\u0dd2\u0db1\u0dca \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd \u0dc3\u0db3\u0dc4\u0dcf \u0dc1\u0ddb\u0dbd\u0dd2\u0dba \u0db8\u0dad \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd6 \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d9c\u0dd8\u0dc4 \u0db1\u0dd2\u0dbb\u0dca\u0db8\u0dcf\u0dab \u0dc1\u0dd2\u0dbd\u0dca\u0db4\u0dba</a>. \u0dc3\u0dc4 StyleGan \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd3 \u0d87\u0dad\u0dca\u0dad\u0dda \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dbd\u0dd2\u0db1\u0dca <strong>\u0db4\u0dca\u0dbb\u0d9c\u0dad\u0dd2\u0dc1\u0dd3\u0dbd\u0dd3 GAN</strong> \u0db8\u0dad \u0dba <a href=\"https://papers.labml.ai/paper/1710.10196\">\u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dc5 \u0d9c\u0dd4\u0dab\u0dcf\u0dad\u0dca\u0db8\u0d9a\u0db7\u0dcf\u0dc0\u0dba, \u0dc3\u0dca\u0dae\u0dcf\u0dba\u0dd2\u0dad\u0dcf\u0dc0 \u0dc3\u0dc4 \u0dc0\u0dd2\u0da0\u0dbd\u0db1\u0dba \u0dc3\u0db3\u0dc4\u0dcf GANs \u0db4\u0dca\u0dbb\u0d9c\u0dad\u0dd2\u0dc1\u0dd3\u0dbd\u0dd3 \u0dc0\u0dbb\u0dca\u0db0\u0db1\u0dba</a>. \u0db8\u0dd9\u0db8 \u0db4\u0dad\u0dca\u0dbb\u0dd2\u0d9a\u0dcf \u0dad\u0dd4\u0db1\u0db8 <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>\u0dc0\u0dd9\u0dad\u0dd2\u0db1\u0dca \u0d91\u0d9a\u0db8 \u0d9a\u0dad\u0dd4\u0dc0\u0dbb\u0dd4\u0db1\u0dca\u0d9c\u0dd9\u0db1\u0dca \u0dc0\u0dda. </p>\n",
+ "<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper  <a href=\"https://arxiv.org/abs/1912.04958\">Analyzing and Improving the Image Quality of StyleGAN</a>  which introduces <strong>StyleGAN 2</strong>. StyleGAN 2 is an improvement over <strong>StyleGAN</strong> from the paper  <a href=\"https://arxiv.org/abs/1812.04948\">A Style-Based Generator Architecture for Generative Adversarial Networks</a>. And StyleGAN is based on <strong>Progressive GAN</strong> from the paper  <a href=\"https://arxiv.org/abs/1710.10196\">Progressive Growing of GANs for Improved Quality, Stability, and Variation</a>. All three papers are from the same authors from <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>.</p>\n": "<p>\u0db8\u0dd9\u0dba <a href=\"https://pytorch.org\">PyTorch</a> \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0d9a\u0dd2 StyleGan <strong>2</strong>\u0dc4\u0db3\u0dd4\u0db1\u0dca\u0dc0\u0dcf \u0daf\u0dd9\u0db1 <a href=\"https://arxiv.org/abs/1912.04958\">StyleGan \u0dc4\u0dd2 \u0dbb\u0dd6\u0db4\u0dba\u0dda \u0d9c\u0dd4\u0dab\u0dcf\u0dad\u0dca\u0db8\u0d9a\u0db7\u0dcf\u0dc0\u0dba \u0dc0\u0dd2\u0dc1\u0dca\u0dbd\u0dda\u0dc2\u0dab\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0dc4 \u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8</a> . StyleGan 2 \u0dba\u0db1\u0dd4 \u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0d9a\u0dd2 <strong>StyleGan</strong> <a href=\"https://arxiv.org/abs/1812.04948\">\u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dbd\u0dd2\u0db1\u0dca \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd \u0dc3\u0db3\u0dc4\u0dcf \u0dc1\u0ddb\u0dbd\u0dd2\u0dba \u0db8\u0dad \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd6 \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d9c\u0dd8\u0dc4 \u0db1\u0dd2\u0dbb\u0dca\u0db8\u0dcf\u0dab \u0dc1\u0dd2\u0dbd\u0dca\u0db4\u0dba</a>. \u0dc3\u0dc4 StyleGan \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd3 \u0d87\u0dad\u0dca\u0dad\u0dda \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dbd\u0dd2\u0db1\u0dca <strong>\u0db4\u0dca\u0dbb\u0d9c\u0dad\u0dd2\u0dc1\u0dd3\u0dbd\u0dd3 GAN</strong> \u0db8\u0dad \u0dba <a href=\"https://arxiv.org/abs/1710.10196\">\u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dc5 \u0d9c\u0dd4\u0dab\u0dcf\u0dad\u0dca\u0db8\u0d9a\u0db7\u0dcf\u0dc0\u0dba, \u0dc3\u0dca\u0dae\u0dcf\u0dba\u0dd2\u0dad\u0dcf\u0dc0 \u0dc3\u0dc4 \u0dc0\u0dd2\u0da0\u0dbd\u0db1\u0dba \u0dc3\u0db3\u0dc4\u0dcf GANs \u0db4\u0dca\u0dbb\u0d9c\u0dad\u0dd2\u0dc1\u0dd3\u0dbd\u0dd3 \u0dc0\u0dbb\u0dca\u0db0\u0db1\u0dba</a>. \u0db8\u0dd9\u0db8 \u0db4\u0dad\u0dca\u0dbb\u0dd2\u0d9a\u0dcf \u0dad\u0dd4\u0db1\u0db8 <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>\u0dc0\u0dd9\u0dad\u0dd2\u0db1\u0dca \u0d91\u0d9a\u0db8 \u0d9a\u0dad\u0dd4\u0dc0\u0dbb\u0dd4\u0db1\u0dca\u0d9c\u0dd9\u0db1\u0dca \u0dc0\u0dda. </p>\n",
 "<p>To prevent the generator from assuming adjacent styles are correlated,  they randomly use different styles for different blocks. That is, they sample two latent vectors <span translate=no>_^_0_^_</span> and corresponding <span translate=no>_^_1_^_</span> and  use <span translate=no>_^_2_^_</span> based styles for some blocks and <span translate=no>_^_3_^_</span> based styles for some blacks randomly.</p>\n": "<p>\u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a\u0dba\u0db1\u0dca\u0dad\u0dca\u0dbb\u0dba \u0dba\u0dcf\u0db6\u0daf \u0db8\u0ddd\u0dc3\u0dca\u0dad\u0dbb \u0dc3\u0dc4\u0dc3\u0db8\u0dca\u0db6\u0db1\u0dca\u0db0\u0dd2\u0dad \u0dba\u0dd0\u0dba\u0dd2 \u0d8b\u0db4\u0d9a\u0dbd\u0dca\u0db4\u0db1\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc0\u0dd0\u0dc5\u0dd0\u0d9a\u0dca\u0dc0\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf, \u0d94\u0dc0\u0dd4\u0db1\u0dca \u0d85\u0dc4\u0db9\u0dd4 \u0dbd\u0dd9\u0dc3 \u0dc0\u0dd2\u0dc0\u0dd2\u0db0 \u0db6\u0dca\u0dbd\u0ddc\u0d9a\u0dca \u0dc3\u0db3\u0dc4\u0dcf \u0dc0\u0dd2\u0dc0\u0dd2\u0db0 \u0db8\u0ddd\u0dc3\u0dca\u0dad\u0dbb \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0dba\u0dd2. \u0d91\u0db1\u0db8\u0dca, \u0d92\u0dc0\u0dcf \u0d9c\u0dd4\u0db4\u0dca\u0dad \u0daf\u0ddb\u0dc1\u0dd2\u0d9a \u0daf\u0dd9\u0d9a\u0d9a\u0dca <span translate=no>_^_0_^_</span> \u0dc3\u0dc4 \u0d85\u0db1\u0dd4\u0dbb\u0dd6\u0db4 <span translate=no>_^_1_^_</span> \u0db1\u0dd2\u0dba\u0dd0\u0daf\u0dd2 \u0d9a\u0dbb \u0dc3\u0db8\u0dc4\u0dbb \u0db6\u0dca\u0dbd\u0ddc\u0d9a\u0dca \u0dc3\u0db3\u0dc4\u0dcf <span translate=no>_^_2_^_</span> \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd6 \u0db8\u0ddd\u0dc3\u0dca\u0dad\u0dbb \u0dc3\u0dc4 \u0dc3\u0db8\u0dc4\u0dbb\u0d9a\u0dca \u0dc3\u0db3\u0dc4\u0dcf <span translate=no>_^_3_^_</span> \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd6 \u0db8\u0ddd\u0dc3\u0dca\u0dad\u0dbb \u0db7\u0dcf\u0dc0\u0dd2\u0dad\u0dcf \u0d9a\u0dbb\u0dba\u0dd2 \u0d85\u0dc4\u0db9\u0dd4 \u0dbd\u0dd9\u0dc3 \u0d9a\u0dc5\u0dd4 \u0db4\u0dd0\u0dc4\u0dd0\u0dba\u0d9a\u0dca \u0d9c\u0db1\u0dd3. </p>\n",
 "<p>Trainable <span translate=no>_^_0_^_</span> constant </p>\n": "<p>\u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4\u0d9a\u0dc5 \u0dc4\u0dd0\u0d9a\u0dd2 <span translate=no>_^_0_^_</span> \u0db1\u0dd2\u0dba\u0dad\u0dba </p>\n",
 "<p>Try to normalize the image (this is totally optional, but sped up the early training a little) </p>\n": "<p>\u0dbb\u0dd6\u0db4\u0dba\u0dc3\u0dcf\u0db8\u0dcf\u0db1\u0dca\u0dba\u0d9a\u0dbb\u0dab\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0da7 \u0d8b\u0dad\u0dca\u0dc3\u0dcf\u0dc4 \u0d9a\u0dbb\u0db1\u0dca\u0db1 (\u0db8\u0dd9\u0dba \u0db8\u0dd4\u0dc5\u0dd4\u0db8\u0db1\u0dd2\u0db1\u0dca\u0db8 \u0dc0\u0dd2\u0d9a\u0dbd\u0dca\u0db4\u0dba\u0d9a\u0dd2, \u0db1\u0db8\u0dd4\u0dad\u0dca \u0db8\u0dd4\u0dbd\u0dca \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4\u0dc0 \u0da7\u0dd2\u0d9a\u0d9a\u0dca \u0dc0\u0dda\u0d9c\u0dc0\u0dad\u0dca \u0d9a\u0dbb\u0db1\u0dca\u0db1) </p>\n",
--- a/translate_cache/gan/stylegan/init.zh.json
+++ b/translate_cache/gan/stylegan/init.zh.json
@ -15,20 +15,20 @@
 "<h4>Weight Modulation and Demodulation</h4>\n": "<h4>\u6743\u91cd\u8c03\u5236\u548c\u89e3\u8c03</h4>\n",
 "<p> <a id=\"discriminator\"></a></p>\n<h2>StyleGAN 2 Discriminator</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p>Discriminator first transforms the image to a feature map of the same resolution and then runs it through a series of blocks with residual connections. The resolution is down-sampled by <span translate=no>_^_1_^_</span> at each block while doubling the number of features.</p>\n": "<p><a id=\"discriminator\"></a></p>\n<h2>StyleGan 2 \u9274\u522b\u5668</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p>\u9274\u522b\u5668\u9996\u5148\u5c06\u56fe\u50cf\u8f6c\u6362\u4e3a\u5177\u6709\u76f8\u540c\u5206\u8fa8\u7387\u7684\u7279\u5f81\u56fe\uff0c\u7136\u540e\u901a\u8fc7\u4e00\u7cfb\u5217\u5177\u6709\u5269\u4f59\u8fde\u63a5\u7684\u5757\u8fdb\u884c\u8fd0\u884c\u3002\u5728\u6bcf\u4e2a\u533a\u5757<span translate=no>_^_1_^_</span>\u5904\u5bf9\u5206\u8fa8\u7387\u8fdb\u884c\u4e0b\u91c7\u6837\uff0c\u540c\u65f6\u5c06\u8981\u7d20\u6570\u91cf\u589e\u52a0\u4e00\u500d\u3002</p>\n",
 "<p> <a id=\"discriminator_black\"></a></p>\n<h3>Discriminator Block</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p>Discriminator block consists of two <span translate=no>_^_1_^_</span> convolutions with a residual connection.</p>\n": "<p><a id=\"discriminator_black\"></a></p>\n<h3>\u9274\u522b\u5668\u5757</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p>\u9274\u522b\u5668\u6a21\u5757\u7531\u4e24\u4e2a\u5e26\u6709\u5269\u4f59\u8fde\u63a5\u7684<span translate=no>_^_1_^_</span>\u5377\u79ef\u7ec4\u6210\u3002</p>\n",
- "<p> <a id=\"down_sample\"></a></p>\n<h3>Down-sample</h3>\n<p>The down-sample operation <a href=\"#smooth\">smoothens</a> each feature channel and  scale <span translate=no>_^_0_^_</span> using bilinear interpolation. This is based on the paper  <a href=\"https://papers.labml.ai/paper/1904.11486\">Making Convolutional Networks Shift-Invariant Again</a>.</p>\n": "<p><a id=\"down_sample\"></a></p>\n<h3>\u5411\u4e0b\u91c7\u6837</h3>\n<p>\u4e0b\u91c7\u6837\u64cd\u4f5c<span translate=no>_^_0_^_</span>\u4f7f\u7528\u53cc\u7ebf\u6027\u63d2\u503c\u6cd5<a href=\"#smooth\">\u5e73\u6ed1</a>\u6bcf\u4e2a\u7279\u5f81\u901a\u9053\u548c\u7f29\u653e\u3002\u8fd9\u662f\u57fa\u4e8e\u8bba\u6587\u300a<a href=\"https://papers.labml.ai/paper/1904.11486\">\u8ba9\u5377\u79ef\u7f51\u7edc\u518d\u6b21\u79fb\u4f4d\u4e0d\u53d8</a>\u300b\u3002</p>\n",
+ "<p> <a id=\"down_sample\"></a></p>\n<h3>Down-sample</h3>\n<p>The down-sample operation <a href=\"#smooth\">smoothens</a> each feature channel and  scale <span translate=no>_^_0_^_</span> using bilinear interpolation. This is based on the paper  <a href=\"https://arxiv.org/abs/1904.11486\">Making Convolutional Networks Shift-Invariant Again</a>.</p>\n": "<p><a id=\"down_sample\"></a></p>\n<h3>\u5411\u4e0b\u91c7\u6837</h3>\n<p>\u4e0b\u91c7\u6837\u64cd\u4f5c<span translate=no>_^_0_^_</span>\u4f7f\u7528\u53cc\u7ebf\u6027\u63d2\u503c\u6cd5<a href=\"#smooth\">\u5e73\u6ed1</a>\u6bcf\u4e2a\u7279\u5f81\u901a\u9053\u548c\u7f29\u653e\u3002\u8fd9\u662f\u57fa\u4e8e\u8bba\u6587\u300a<a href=\"https://arxiv.org/abs/1904.11486\">\u8ba9\u5377\u79ef\u7f51\u7edc\u518d\u6b21\u79fb\u4f4d\u4e0d\u53d8</a>\u300b\u3002</p>\n",
 "<p> <a id=\"equalized_conv2d\"></a></p>\n<h2>Learning-rate Equalized 2D Convolution Layer</h2>\n<p>This uses <a href=\"#equalized_weights\">learning-rate equalized weights</a> for a convolution layer.</p>\n": "<p><a id=\"equalized_conv2d\"></a></p>\n<h2>\u5b66\u4e60\u901f\u7387\u5747\u8861\u7684 2D \u5377\u79ef\u5c42</h2>\n<p>\u8fd9\u4f7f\u7528\u5377\u79ef\u5c42\u7684<a href=\"#equalized_weights\">\u5b66\u4e60\u901f\u7387\u5747\u8861\u6743\u91cd</a>\u3002</p>\n",
 "<p> <a id=\"equalized_linear\"></a></p>\n<h2>Learning-rate Equalized Linear Layer</h2>\n<p>This uses <a href=\"#equalized_weights\">learning-rate equalized weights</a> for a linear layer.</p>\n": "<p><a id=\"equalized_linear\"></a></p>\n<h2>\u5b66\u4e60\u901f\u7387\u5747\u8861\u7ebf\u6027\u5c42</h2>\n<p>\u8fd9\u4f7f\u7528\u7ebf\u6027\u56fe\u5c42\u7684<a href=\"#equalized_weights\">\u5b66\u4e60\u901f\u7387\u5747\u8861\u6743\u91cd</a>\u3002</p>\n",
 "<p> <a id=\"equalized_weight\"></a></p>\n<h2>Learning-rate Equalized Weights Parameter</h2>\n<p>This is based on equalized learning rate introduced in the Progressive GAN paper. Instead of initializing weights at <span translate=no>_^_0_^_</span> they initialize weights to <span translate=no>_^_1_^_</span> and then multiply them by <span translate=no>_^_2_^_</span> when using it. <span translate=no>_^_3_^_</span></p>\n<p>The gradients on stored parameters <span translate=no>_^_4_^_</span> get multiplied by <span translate=no>_^_5_^_</span> but this doesn&#x27;t have an affect since optimizers such as Adam normalize them by a running mean of the squared gradients.</p>\n<p>The optimizer updates on <span translate=no>_^_6_^_</span> are proportionate to the learning rate <span translate=no>_^_7_^_</span>. But the effective weights <span translate=no>_^_8_^_</span> get updated proportionately to <span translate=no>_^_9_^_</span>. Without equalized learning rate, the effective weights will get updated proportionately to just <span translate=no>_^_10_^_</span>.</p>\n<p>So we are effectively scaling the learning rate by <span translate=no>_^_11_^_</span> for these weight parameters.</p>\n": "<p><a id=\"equalized_weight\"></a></p>\n<h2>\u5b66\u4e60\u901f\u7387\u5747\u8861\u6743\u91cd\u53c2\u6570</h2>\n<p>\u8fd9\u662f\u57fa\u4e8e Progressive GAN \u8bba\u6587\u4e2d\u4ecb\u7ecd\u7684\u5747\u8861\u5b66\u4e60\u7387\u3002<span translate=no>_^_0_^_</span>\u5b83\u4eec\u4e0d\u662f\u5728\u521d\u59cb\u5316\u6743\u91cd\uff0c\u800c\u662f\u5c06\u6743\u91cd\u521d\u59cb\u5316\u4e3a\uff0c<span translate=no>_^_1_^_</span>\u7136\u540e\u5728\u4f7f\u7528<span translate=no>_^_2_^_</span>\u65f6\u5c06\u5176\u4e58\u4ee5\u3002<span translate=no>_^_3_^_</span></p>\n<p>\u5b58\u50a8\u53c2\u6570\u7684\u68af\u5ea6\u4f1a\u88ab<span translate=no>_^_4_^_</span>\u4e58\u4ee5\uff0c<span translate=no>_^_5_^_</span>\u4f46\u8fd9\u4e0d\u4f1a\u4ea7\u751f\u5f71\u54cd\uff0c\u56e0\u4e3a\u50cf Adam \u8fd9\u6837\u7684\u4f18\u5316\u5668\u5c06\u5b83\u4eec\u5f52\u4e00\u5316\u4e3a\u68af\u5ea6\u7684\u5e73\u65b9\u3002</p>\n<p>\u4e0a\u7684\u4f18\u5316\u5668\u66f4\u65b0\u4e0e\u5b66\u4e60\u901f\u7387\u6210<span translate=no>_^_6_^_</span>\u6b63\u6bd4<span translate=no>_^_7_^_</span>\u3002\u4f46\u662f\u6709\u6548\u6743\u91cd<span translate=no>_^_8_^_</span>\u4f1a\u6309\u6bd4\u4f8b\u66f4\u65b0<span translate=no>_^_9_^_</span>\u3002\u5982\u679c\u6ca1\u6709\u5747\u8861\u7684\u5b66\u4e60\u7387\uff0c\u6709\u6548\u6743\u91cd\u5c06\u6309\u6bd4\u4f8b\u66f4\u65b0\u4e3a just<span translate=no>_^_10_^_</span>\u3002</p>\n<p>\u56e0\u6b64\uff0c\u6211\u4eec\u6b63\u5728\u6709\u6548\u5730\u7f29\u653e\u8fd9\u4e9b\u6743\u91cd\u53c2\u6570<span translate=no>_^_11_^_</span>\u7684\u5b66\u4e60\u901f\u7387\u3002</p>\n",
 "<p> <a id=\"generator\"></a></p>\n<h2>StyleGAN2 Generator</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> denotes a linear layer. <span translate=no>_^_2_^_</span> denotes a broadcast and scaling operation (noise is a single channel). <a href=\"#to_rgb\"><span translate=no>_^_3_^_</span></a> also has a style modulation which is not shown in the diagram to keep it simple.</em></small></p>\n<p>The generator starts with a learned constant. Then it has a series of blocks. The feature map resolution is doubled at each block Each block outputs an RGB image and they are scaled up and summed to get the final RGB image.</p>\n": "<p><a id=\"generator\"></a></p>\n<h2>StyleGan2 \u751f\u6210\u5668</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span>\u8868\u793a\u7ebf\u6027\u5c42\u3002<span translate=no>_^_2_^_</span>\u8868\u793a\u5e7f\u64ad\u548c\u7f29\u653e\u64cd\u4f5c\uff08\u566a\u58f0\u662f\u5355\u4e2a\u4fe1\u9053\uff09\u3002<a href=\"#to_rgb\"><span translate=no>_^_3_^_</span></a>\u8fd8\u6709\u4e00\u79cd\u98ce\u683c\u8c03\u5236\uff0c\u4e3a\u4e86\u7b80\u5355\u8d77\u89c1\uff0c\u56fe\u4e2d\u6ca1\u6709\u663e\u793a\u8fd9\u79cd\u8c03\u5236\u3002</em></small></p>\n<p>\u751f\u6210\u5668\u4ee5\u5b66\u4e60\u7684\u5e38\u6570\u5f00\u59cb\u3002\u7136\u540e\u5b83\u6709\u4e00\u7cfb\u5217\u65b9\u5757\u3002\u6bcf\u4e2a\u533a\u5757\u7684\u8981\u7d20\u56fe\u5206\u8fa8\u7387\u52a0\u500d\u3002\u6bcf\u4e2a\u6a21\u5757\u8f93\u51fa\u4e00\u4e2a RGB \u56fe\u50cf\uff0c\u7136\u540e\u653e\u5927\u548c\u6c42\u548c\u4ee5\u83b7\u5f97\u6700\u7ec8\u7684 RGB \u56fe\u50cf\u3002</p>\n",
 "<p> <a id=\"generator_block\"></a></p>\n<h3>Generator Block</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> denotes a linear layer. <span translate=no>_^_2_^_</span> denotes a broadcast and scaling operation (noise is a single channel). <a href=\"#to_rgb\"><span translate=no>_^_3_^_</span></a> also has a style modulation which is not shown in the diagram to keep it simple.</em></small></p>\n<p>The generator block consists of two <a href=\"#style_block\">style blocks</a> (<span translate=no>_^_4_^_</span> convolutions with style modulation) and an RGB output.</p>\n": "<p><a id=\"generator_block\"></a></p>\n<h3>\u53d1\u7535\u673a\u7ec4</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span>\u8868\u793a\u7ebf\u6027\u5c42\u3002<span translate=no>_^_2_^_</span>\u8868\u793a\u5e7f\u64ad\u548c\u7f29\u653e\u64cd\u4f5c\uff08\u566a\u58f0\u662f\u5355\u4e2a\u4fe1\u9053\uff09\u3002<a href=\"#to_rgb\"><span translate=no>_^_3_^_</span></a>\u8fd8\u6709\u4e00\u79cd\u98ce\u683c\u8c03\u5236\uff0c\u4e3a\u4e86\u7b80\u5355\u8d77\u89c1\uff0c\u56fe\u4e2d\u6ca1\u6709\u663e\u793a\u8fd9\u79cd\u8c03\u5236\u3002</em></small></p>\n<p>\u751f\u6210\u5668\u6a21\u5757\u7531\u4e24\u4e2a<a href=\"#style_block\">\u6837\u5f0f\u5757</a>\uff08\u5e26\u6837\u5f0f\u8c03\u5236\u7684<span translate=no>_^_4_^_</span>\u5377\u79ef\uff09\u548c\u4e00\u4e2a RGB \u8f93\u51fa\u7ec4\u6210\u3002</p>\n",
- "<p> <a id=\"gradient_penalty\"></a></p>\n<h2>Gradient Penalty</h2>\n<p>This is the <span translate=no>_^_0_^_</span> regularization penality from the paper <a href=\"https://papers.labml.ai/paper/1801.04406\">Which Training Methods for GANs do actually Converge?</a>.</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>That is we try to reduce the L2 norm of gradients of the discriminator with respect to images, for real images (<span translate=no>_^_2_^_</span>).</p>\n": "<p><a id=\"gradient_penalty\"></a></p>\n<h2>\u68af\u5ea6\u60e9\u7f5a</h2>\n<p>\u8fd9\u662f\u8bba\u6587\u300a<a href=\"https://papers.labml.ai/paper/1801.04406\">\u54ea\u79cd\u9488\u5bf9 GAN \u7684\u8bad\u7ec3\u65b9\u6cd5\u5b9e\u9645\u4e0a\u4f1a\u6536\u655b\uff1f\u300b\u4e2d\u7684<span translate=no>_^_0_^_</span>\u6b63\u5219\u5316\u60e9\u7f5a</a>\u3002</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>\u4e5f\u5c31\u662f\u8bf4\uff0c\u5bf9\u4e8e\u771f\u5b9e\u56fe\u50cf\uff0c\u6211\u4eec\u5c1d\u8bd5\u51cf\u5c11\u9274\u522b\u5668\u76f8\u5bf9\u4e8e\u56fe\u50cf\u7684\u68af\u5ea6\u7684 L2 \u8303\u6570 (<span translate=no>_^_2_^_</span>)\u3002</p>\n",
+ "<p> <a id=\"gradient_penalty\"></a></p>\n<h2>Gradient Penalty</h2>\n<p>This is the <span translate=no>_^_0_^_</span> regularization penality from the paper <a href=\"https://arxiv.org/abs/1801.04406\">Which Training Methods for GANs do actually Converge?</a>.</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>That is we try to reduce the L2 norm of gradients of the discriminator with respect to images, for real images (<span translate=no>_^_2_^_</span>).</p>\n": "<p><a id=\"gradient_penalty\"></a></p>\n<h2>\u68af\u5ea6\u60e9\u7f5a</h2>\n<p>\u8fd9\u662f\u8bba\u6587\u300a<a href=\"https://arxiv.org/abs/1801.04406\">\u54ea\u79cd\u9488\u5bf9 GAN \u7684\u8bad\u7ec3\u65b9\u6cd5\u5b9e\u9645\u4e0a\u4f1a\u6536\u655b\uff1f\u300b\u4e2d\u7684<span translate=no>_^_0_^_</span>\u6b63\u5219\u5316\u60e9\u7f5a</a>\u3002</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>\u4e5f\u5c31\u662f\u8bf4\uff0c\u5bf9\u4e8e\u771f\u5b9e\u56fe\u50cf\uff0c\u6211\u4eec\u5c1d\u8bd5\u51cf\u5c11\u9274\u522b\u5668\u76f8\u5bf9\u4e8e\u56fe\u50cf\u7684\u68af\u5ea6\u7684 L2 \u8303\u6570 (<span translate=no>_^_2_^_</span>)\u3002</p>\n",
 "<p> <a id=\"mapping_network\"></a></p>\n<h2>Mapping Network</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p>This is an MLP with 8 linear layers. The mapping network maps the latent vector <span translate=no>_^_1_^_</span> to an intermediate latent space <span translate=no>_^_2_^_</span>. <span translate=no>_^_3_^_</span> space will be disentangled from the image space where the factors of variation become more linear.</p>\n": "<p><a id=\"mapping_network\"></a></p>\n<h2>\u6620\u5c04\u7f51\u7edc</h2>\n<p><span translate=no>_^_0_^_</span></p>\n<p>\u8fd9\u662f\u4e00\u4e2a\u5305\u542b 8 \u4e2a\u7ebf\u6027\u5c42\u7684 MLP\u3002\u6620\u5c04\u7f51\u7edc\u5c06\u6f5c\u5728\u5411\u91cf\u6620\u5c04<span translate=no>_^_1_^_</span>\u5230\u4e2d\u95f4\u6f5c\u7a7a\u95f4<span translate=no>_^_2_^_</span>\u3002<span translate=no>_^_3_^_</span>\u7a7a\u95f4\u5c06\u4e0e\u56fe\u50cf\u7a7a\u95f4\u5206\u5f00\uff0c\u5728\u56fe\u50cf\u7a7a\u95f4\u4e2d\uff0c\u53d8\u5f02\u56e0\u5b50\u53d8\u5f97\u66f4\u52a0\u7ebf\u6027\u3002</p>\n",
 "<p> <a id=\"mini_batch_std_dev\"></a></p>\n<h3>Mini-batch Standard Deviation</h3>\n<p>Mini-batch standard deviation calculates the standard deviation across a mini-batch (or a subgroups within the mini-batch) for each feature in the feature map. Then it takes the mean of all the standard deviations and appends it to the feature map as one extra feature.</p>\n": "<p><a id=\"mini_batch_std_dev\"></a></p>\n<h3>\u5c0f\u6279\u91cf\u6807\u51c6\u5dee</h3>\n<p>\u5c0f\u6279\u91cf\u6807\u51c6\u5dee\u8ba1\u7b97\u8981\u7d20\u6620\u5c04\u4e2d\u6bcf\u4e2a\u8981\u7d20\u7684\u5c0f\u6279\u6b21\uff08\u6216\u5fae\u578b\u6279\u6b21\u4e2d\u7684\u5b50\u7ec4\uff09\u7684\u6807\u51c6\u5dee\u3002\u7136\u540e\uff0c\u5b83\u53d6\u6240\u6709\u6807\u51c6\u5dee\u7684\u5e73\u5747\u503c\uff0c\u5e76\u5c06\u5176\u4f5c\u4e3a\u4e00\u9879\u989d\u5916\u8981\u7d20\u9644\u52a0\u5230\u8981\u7d20\u5730\u56fe\u4e2d\u3002</p>\n",
 "<p> <a id=\"path_length_penalty\"></a></p>\n<h2>Path Length Penalty</h2>\n<p>This regularization encourages a fixed-size step in <span translate=no>_^_0_^_</span> to result in a fixed-magnitude change in the image.</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>where <span translate=no>_^_2_^_</span> is the Jacobian <span translate=no>_^_3_^_</span>, <span translate=no>_^_4_^_</span> are sampled from <span translate=no>_^_5_^_</span> from the mapping network, and <span translate=no>_^_6_^_</span> are images with noise <span translate=no>_^_7_^_</span>.</p>\n<p><span translate=no>_^_8_^_</span> is the exponential moving average of <span translate=no>_^_9_^_</span> as the training progresses.</p>\n<p><span translate=no>_^_10_^_</span> is calculated without explicitly calculating the Jacobian using <span translate=no>_^_11_^_</span></p>\n": "<p><a id=\"path_length_penalty\"></a></p>\n<h2>\u8def\u5f84\u957f\u5ea6\u60e9\u7f5a</h2>\n<p>\u8fd9\u79cd\u6b63\u5219\u5316\u9f13\u52b1\u91c7\u7528\u56fa\u5b9a\u5927\u5c0f\u7684\u6b65\u8fdb\uff0c<span translate=no>_^_0_^_</span>\u4ece\u800c\u5bfc\u81f4\u56fe\u50cf\u4e2d\u7684\u56fa\u5b9a\u5e45\u5ea6\u53d8\u5316\u3002</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>\u5176\u4e2d<span translate=no>_^_2_^_</span>\u662f Jacobian<span translate=no>_^_3_^_</span>\uff0c<span translate=no>_^_4_^_</span>\u662f<span translate=no>_^_5_^_</span>\u4ece\u6d4b\u7ed8\u7f51\u7edc\u4e2d\u91c7\u6837\u7684\uff0c\u5e76\u4e14<span translate=no>_^_6_^_</span>\u662f\u6709\u566a\u70b9\u7684\u56fe\u50cf<span translate=no>_^_7_^_</span>\u3002</p>\n<p><span translate=no>_^_8_^_</span>\u662f\u8bad\u7ec3\u8fdb\u884c\u65f6\u7684<span translate=no>_^_9_^_</span>\u6307\u6570\u79fb\u52a8\u5e73\u5747\u7ebf\u3002</p>\n<p><span translate=no>_^_10_^_</span>\u8ba1\u7b97\u65f6\u672a\u4f7f\u7528\u663e\u5f0f\u8ba1\u7b97\u96c5\u53ef\u6bd4\u5f0f<span translate=no>_^_11_^_</span></p>\n",
 "<p> <a id=\"smooth\"></a></p>\n<h3>Smoothing Layer</h3>\n<p>This layer blurs each channel</p>\n": "<p><a id=\"smooth\"></a></p>\n<h3>\u5e73\u6ed1\u5c42</h3>\n<p>\u8be5\u56fe\u5c42\u6a21\u7cca\u4e86\u6bcf\u4e2a\u901a\u9053</p>\n",
 "<p> <a id=\"style_block\"></a></p>\n<h3>Style Block</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> denotes a linear layer. <span translate=no>_^_2_^_</span> denotes a broadcast and scaling operation (noise is single channel).</em></small></p>\n<p>Style block has a weight modulation convolution layer.</p>\n": "<p><a id=\"style_block\"></a></p>\n<h3>\u6837\u5f0f\u65b9\u5757</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span>\u8868\u793a\u7ebf\u6027\u5c42\u3002<span translate=no>_^_2_^_</span>\u8868\u793a\u5e7f\u64ad\u548c\u7f29\u653e\u64cd\u4f5c\uff08\u566a\u58f0\u662f\u5355\u58f0\u9053\uff09\u3002</em></small></p>\n<p>\u6837\u5f0f\u5757\u5177\u6709\u6743\u91cd\u8c03\u5236\u5377\u79ef\u5c42\u3002</p>\n",
 "<p> <a id=\"to_rgb\"></a></p>\n<h3>To RGB</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span> denotes a linear layer.</em></small></p>\n<p>Generates an RGB image from a feature map using <span translate=no>_^_2_^_</span> convolution.</p>\n": "<p><a id=\"to_rgb\"></a></p>\n<h3>\u5230 RGB</h3>\n<p><span translate=no>_^_0_^_</span></p>\n<p><small><em><span translate=no>_^_1_^_</span>\u8868\u793a\u7ebf\u6027\u5c42\u3002</em></small></p>\n<p>\u4f7f\u7528<span translate=no>_^_2_^_</span>\u5377\u79ef\u4ece\u8981\u7d20\u5730\u56fe\u751f\u6210 RGB \u56fe\u50cf\u3002</p>\n",
- "<p> <a id=\"up_sample\"></a></p>\n<h3>Up-sample</h3>\n<p>The up-sample operation scales the image up by <span translate=no>_^_0_^_</span> and <a href=\"#smooth\">smoothens</a> each feature channel. This is based on the paper  <a href=\"https://papers.labml.ai/paper/1904.11486\">Making Convolutional Networks Shift-Invariant Again</a>.</p>\n": "<p><a id=\"up_sample\"></a></p>\n<h3>\u5411\u4e0a\u91c7\u6837</h3>\n<p>\u4e0a\u91c7\u6837\u64cd\u4f5c\u5c06\u56fe\u50cf\u5411\u4e0a\u7f29\u653e<span translate=no>_^_0_^_</span>\u5e76<a href=\"#smooth\">\u5e73\u6ed1</a>\u6bcf\u4e2a\u7279\u5f81\u901a\u9053\u3002\u8fd9\u662f\u57fa\u4e8e\u8bba\u6587\u300a<a href=\"https://papers.labml.ai/paper/1904.11486\">\u8ba9\u5377\u79ef\u7f51\u7edc\u518d\u6b21\u79fb\u4f4d\u4e0d\u53d8</a>\u300b\u3002</p>\n",
+ "<p> <a id=\"up_sample\"></a></p>\n<h3>Up-sample</h3>\n<p>The up-sample operation scales the image up by <span translate=no>_^_0_^_</span> and <a href=\"#smooth\">smoothens</a> each feature channel. This is based on the paper  <a href=\"https://arxiv.org/abs/1904.11486\">Making Convolutional Networks Shift-Invariant Again</a>.</p>\n": "<p><a id=\"up_sample\"></a></p>\n<h3>\u5411\u4e0a\u91c7\u6837</h3>\n<p>\u4e0a\u91c7\u6837\u64cd\u4f5c\u5c06\u56fe\u50cf\u5411\u4e0a\u7f29\u653e<span translate=no>_^_0_^_</span>\u5e76<a href=\"#smooth\">\u5e73\u6ed1</a>\u6bcf\u4e2a\u7279\u5f81\u901a\u9053\u3002\u8fd9\u662f\u57fa\u4e8e\u8bba\u6587\u300a<a href=\"https://arxiv.org/abs/1904.11486\">\u8ba9\u5377\u79ef\u7f51\u7edc\u518d\u6b21\u79fb\u4f4d\u4e0d\u53d8</a>\u300b\u3002</p>\n",
 "<p><a href=\"#equalized_linear\">Equalized learning-rate linear layers</a> </p>\n": "<p><a href=\"#equalized_linear\">\u5747\u8861\u5b66\u4e60\u901f\u7387\u7ebf\u6027\u5c42</a></p>\n",
 "<p><a href=\"#equalized_weight\">Weights parameter with equalized learning rate</a> </p>\n": "<p><a href=\"#equalized_weight\">\u5177\u6709\u5747\u8861\u5b66\u4e60\u901f\u7387\u7684\u6743\u91cd\u53c2\u6570</a></p>\n",
 "<p><a href=\"#equalized_weights\">Learning-rate equalized weights</a> </p>\n": "<p><a href=\"#equalized_weights\">\u5b66\u4e60\u901f\u7387\u5747\u8861\u6743\u91cd</a></p>\n",
@ -159,7 +159,7 @@
 "<p>Then the convolution weights <span translate=no>_^_0_^_</span> are modulated as follows. (<span translate=no>_^_1_^_</span> here on refers to weights not intermediate latent space,  we are sticking to the same notation as the paper.)</p>\n": "<p>\u7136\u540e\u6309\u5982\u4e0b\u65b9\u5f0f\u8c03<span translate=no>_^_0_^_</span>\u5236\u5377\u79ef\u6743\u91cd\u3002\uff08<span translate=no>_^_1_^_</span>\u8fd9\u91cc\u6307\u7684\u662f\u6743\u91cd\u800c\u4e0d\u662f\u4e2d\u95f4\u7684\u6f5c\u5728\u7a7a\u95f4\uff0c\u6211\u4eec\u575a\u6301\u4f7f\u7528\u4e0e\u7eb8\u5f20\u76f8\u540c\u7684\u7b26\u53f7\u3002\uff09</p>\n",
 "<p>They remove the <span translate=no>_^_0_^_</span> operator and replace it with  the weight modulation and demodulation step. This is supposed to improve what they call droplet artifacts that are present in generated images,  which are caused by the normalization in <span translate=no>_^_1_^_</span> operator. Style vector per layer is calculated from <span translate=no>_^_2_^_</span> as <span translate=no>_^_3_^_</span>.</p>\n": "<p>\u4ed6\u4eec\u5c06<span translate=no>_^_0_^_</span>\u64cd\u4f5c\u5458\u79fb\u9664\uff0c\u5e76\u7528\u6743\u91cd\u8c03\u5236\u548c\u89e3\u8c03\u6b65\u9aa4\u4ee3\u66ff\u5b83\u3002\u8fd9\u5e94\u8be5\u6539\u5584\u4ed6\u4eec\u6240\u8c13\u7684\u6db2\u6ef4\u4f2a\u50cf\uff0c\u8fd9\u4e9b\u4f2a\u5f71\u5b58\u5728\u4e8e\u751f\u6210\u7684\u56fe\u50cf\u4e2d\uff0c\u8fd9\u662f\u7531<span translate=no>_^_1_^_</span>\u8fd0\u7b97\u7b26\u4e2d\u7684\u5f52\u4e00\u5316\u5f15\u8d77\u7684\u3002\u6bcf\u4e2a\u56fe\u5c42\u7684\u6837\u5f0f\u5411\u91cf\u662f\u6839\u636e\u8ba1\u7b97\u5f97\u51fa<span translate=no>_^_2_^_</span>\u7684<span translate=no>_^_3_^_</span>\u3002</p>\n",
 "<p>They use <strong>minibatch standard deviation</strong> to increase variation and  <strong>equalized learning rate</strong> which we discussed below in the implementation. They also use <strong>pixel-wise normalization</strong> where at each pixel the feature vector is normalized. They apply this to all the convolution layer outputs (except RGB).</p>\n": "<p>\u4ed6\u4eec\u4f7f\u7528 <strong>minibatch\u6807\u51c6\u5dee</strong>\u6765\u589e\u52a0\u53d8\u5f02\u548c<strong>\u5747\u8861\u5b66\u4e60\u7387</strong>\uff0c\u6211\u4eec\u5728\u4e0b\u6587\u7684\u5b9e\u73b0\u4e2d\u5bf9\u6b64\u8fdb\u884c\u4e86\u8ba8\u8bba\u3002\u5b83\u4eec\u8fd8\u4f7f\u7528<strong>\u9010\u50cf\u7d20\u5f52\u4e00\u5316</strong>\uff0c\u5176\u4e2d\u7279\u5f81\u5411\u91cf\u5728\u6bcf\u4e2a\u50cf\u7d20\u5904\u8fdb\u884c\u5f52\u4e00\u5316\u3002\u5b83\u4eec\u5c06\u5176\u5e94\u7528\u4e8e\u6240\u6709\u5377\u79ef\u5c42\u8f93\u51fa\uff08RGB \u9664\u5916\uff09\u3002</p>\n",
- "<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper  <a href=\"https://papers.labml.ai/paper/1912.04958\">Analyzing and Improving the Image Quality of StyleGAN</a>  which introduces <strong>StyleGAN 2</strong>. StyleGAN 2 is an improvement over <strong>StyleGAN</strong> from the paper  <a href=\"https://papers.labml.ai/paper/1812.04948\">A Style-Based Generator Architecture for Generative Adversarial Networks</a>. And StyleGAN is based on <strong>Progressive GAN</strong> from the paper  <a href=\"https://papers.labml.ai/paper/1710.10196\">Progressive Growing of GANs for Improved Quality, Stability, and Variation</a>. All three papers are from the same authors from <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>.</p>\n": "<p>\u8fd9\u662f\u300a<a href=\"https://papers.labml.ai/paper/1912.04958\">\u5206\u6790\u548c\u63d0\u9ad8 StyleGan \u7684\u56fe\u50cf\u8d28\u91cf\u300b</a><a href=\"https://pytorch.org\">\u4e00\u6587\u7684 PyTorch</a> \u5b9e\u73b0\uff0c\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86 <strong>StyleGan 2</strong>\u3002StyleGan 2 \u662f\u5bf9\u8bba\u6587\u300a\u751f\u6210<a href=\"https://papers.labml.ai/paper/1812.04948\">\u5bf9\u6297\u7f51\u7edc\u7684\u57fa\u4e8e\u6837\u5f0f\u7684\u751f\u6210\u5668\u67b6\u6784\u300b\u4e2d\u5bf9</a> <strong>StyleG</strong> an \u7684\u6539\u8fdb\u3002StyleG <strong>an \u57fa\u4e8e\u8bba\u6587\u300a\u9010\u6b65</strong><a href=\"https://papers.labml.ai/paper/1710.10196\">\u751f\u957f GaN \u4ee5\u63d0\u9ad8\u8d28\u91cf\u3001\u7a33\u5b9a\u6027\u548c\u53d8\u5f02\u6027\u300b\u4e2d\u7684\u6e10\u8fdb\u5f0f GAN</a>\u3002\u8fd9\u4e09\u7bc7\u8bba\u6587\u5747\u51fa\u81ea <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a> \u7684\u540c\u4e00\u4f4d\u4f5c\u8005\u3002</p>\n",
+ "<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper  <a href=\"https://arxiv.org/abs/1912.04958\">Analyzing and Improving the Image Quality of StyleGAN</a>  which introduces <strong>StyleGAN 2</strong>. StyleGAN 2 is an improvement over <strong>StyleGAN</strong> from the paper  <a href=\"https://arxiv.org/abs/1812.04948\">A Style-Based Generator Architecture for Generative Adversarial Networks</a>. And StyleGAN is based on <strong>Progressive GAN</strong> from the paper  <a href=\"https://arxiv.org/abs/1710.10196\">Progressive Growing of GANs for Improved Quality, Stability, and Variation</a>. All three papers are from the same authors from <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>.</p>\n": "<p>\u8fd9\u662f\u300a<a href=\"https://arxiv.org/abs/1912.04958\">\u5206\u6790\u548c\u63d0\u9ad8 StyleGan \u7684\u56fe\u50cf\u8d28\u91cf\u300b</a><a href=\"https://pytorch.org\">\u4e00\u6587\u7684 PyTorch</a> \u5b9e\u73b0\uff0c\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86 <strong>StyleGan 2</strong>\u3002StyleGan 2 \u662f\u5bf9\u8bba\u6587\u300a\u751f\u6210<a href=\"https://arxiv.org/abs/1812.04948\">\u5bf9\u6297\u7f51\u7edc\u7684\u57fa\u4e8e\u6837\u5f0f\u7684\u751f\u6210\u5668\u67b6\u6784\u300b\u4e2d\u5bf9</a> <strong>StyleG</strong> an \u7684\u6539\u8fdb\u3002StyleG <strong>an \u57fa\u4e8e\u8bba\u6587\u300a\u9010\u6b65</strong><a href=\"https://arxiv.org/abs/1710.10196\">\u751f\u957f GaN \u4ee5\u63d0\u9ad8\u8d28\u91cf\u3001\u7a33\u5b9a\u6027\u548c\u53d8\u5f02\u6027\u300b\u4e2d\u7684\u6e10\u8fdb\u5f0f GAN</a>\u3002\u8fd9\u4e09\u7bc7\u8bba\u6587\u5747\u51fa\u81ea <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a> \u7684\u540c\u4e00\u4f4d\u4f5c\u8005\u3002</p>\n",
 "<p>To prevent the generator from assuming adjacent styles are correlated,  they randomly use different styles for different blocks. That is, they sample two latent vectors <span translate=no>_^_0_^_</span> and corresponding <span translate=no>_^_1_^_</span> and  use <span translate=no>_^_2_^_</span> based styles for some blocks and <span translate=no>_^_3_^_</span> based styles for some blacks randomly.</p>\n": "<p>\u4e3a\u4e86\u9632\u6b62\u751f\u6210\u5668\u5047\u8bbe\u76f8\u90bb\u6837\u5f0f\u662f\u76f8\u5173\u7684\uff0c\u5b83\u4eec\u4f1a\u968f\u673a\u5bf9\u4e0d\u540c\u7684\u5757\u4f7f\u7528\u4e0d\u540c\u7684\u6837\u5f0f\u3002\u4e5f\u5c31\u662f\u8bf4\uff0c\u4ed6\u4eec\u5bf9\u4e24\u4e2a\u6f5c\u5728\u5411\u91cf\u8fdb\u884c\u91c7\u6837\uff0c\u5bf9\u67d0\u4e9b\u5757\u8fdb\u884c\u5bf9\u5e94<span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span>\u548c\u4f7f\u7528<span translate=no>_^_2_^_</span>\u57fa\u4e8e\u6837\u5f0f\uff0c\u5bf9\u67d0\u4e9b\u5757\u4f7f\u7528<span translate=no>_^_3_^_</span>\u57fa\u4e8e\u6837\u5f0f\u968f\u673a\u9ed1\u4eba\u3002</p>\n",
 "<p>Trainable <span translate=no>_^_0_^_</span> constant </p>\n": "<p>\u53ef\u8bad\u7ec3<span translate=no>_^_0_^_</span>\u5e38\u6570</p>\n",
 "<p>Try to normalize the image (this is totally optional, but sped up the early training a little) </p>\n": "<p>\u5c1d\u8bd5\u89c4\u8303\u5316\u56fe\u50cf\uff08\u8fd9\u5b8c\u5168\u662f\u53ef\u9009\u7684\uff0c\u4f46\u7a0d\u5fae\u52a0\u5feb\u4e86\u65e9\u671f\u8bad\u7ec3\uff09</p>\n",
--- a/translate_cache/gan/stylegan/readme.ja.json
+++ b/translate_cache/gan/stylegan/readme.ja.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/stylegan/index.html\">StyleGAN 2</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper  <a href=\"https://papers.labml.ai/paper/1912.04958\">Analyzing and Improving the Image Quality of StyleGAN</a>  which introduces <strong>StyleGAN2</strong>. StyleGAN 2 is an improvement over <strong>StyleGAN</strong> from the paper  <a href=\"https://papers.labml.ai/paper/1812.04948\">A Style-Based Generator Architecture for Generative Adversarial Networks</a>. And StyleGAN is based on <strong>Progressive GAN</strong> from the paper  <a href=\"https://papers.labml.ai/paper/1710.10196\">Progressive Growing of GANs for Improved Quality, Stability, and Variation</a>. All three papers are from the same authors from <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/stylegan/index.html\">\u30b9\u30bf\u30a4\u30eb\u30ac\u30f3 2</a></h1>\n<p><strong>\u3053\u308c\u306f\u3001StyleGAN2\u3092\u7d39\u4ecb\u3059\u308b\u8ad6\u6587\u300c<a href=\"https://papers.labml.ai/paper/1912.04958\">StyleGAN\u306e\u753b\u8cea\u306e\u5206\u6790\u3068\u6539\u5584\u300d<a href=\"https://pytorch.org\">\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a></a>\u3002</strong>StyleGan 2\u306f\u3001\u8ad6\u6587\u300c<strong><a href=\"https://papers.labml.ai/paper/1812.04948\">\u6575\u5bfe\u7684\u751f\u6210\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u305f\u3081\u306e\u30b9\u30bf\u30a4\u30eb\u30d9\u30fc\u30b9\u306e\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u300d\u306eStyleGAN\u3092\u6539\u826f\u3057\u305f\u3082\u306e\u3067\u3059</a></strong>\u3002\u307e\u305f\u3001StyleGan\u306f\u8ad6\u6587\u300c<strong>GAN\u306e\u6f38\u9032\u7684\u6210\u9577\u306b\u3088\u308b\u54c1\u8cea</strong><a href=\"https://papers.labml.ai/paper/1710.10196\">\u3001\u5b89\u5b9a\u6027\u3001\u30d0\u30ea\u30a8\u30fc\u30b7\u30e7\u30f3\u306e\u5411\u4e0a\u300d\u306e\u30d7\u30ed\u30b0\u30ec\u30c3\u30b7\u30d6GAN\u3092\u30d9\u30fc\u30b9\u306b\u3057\u3066\u3044\u307e\u3059</a>\u30023 \u3064\u306e\u8ad6\u6587\u306f\u3059\u3079\u3066 <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA</a> AI \u306e\u540c\u3058\u8457\u8005\u306b\u3088\u308b\u3082\u306e\u3067\u3059</p>\u3002\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/stylegan/index.html\">StyleGAN 2</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper  <a href=\"https://arxiv.org/abs/1912.04958\">Analyzing and Improving the Image Quality of StyleGAN</a>  which introduces <strong>StyleGAN2</strong>. StyleGAN 2 is an improvement over <strong>StyleGAN</strong> from the paper  <a href=\"https://arxiv.org/abs/1812.04948\">A Style-Based Generator Architecture for Generative Adversarial Networks</a>. And StyleGAN is based on <strong>Progressive GAN</strong> from the paper  <a href=\"https://arxiv.org/abs/1710.10196\">Progressive Growing of GANs for Improved Quality, Stability, and Variation</a>. All three papers are from the same authors from <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/stylegan/index.html\">\u30b9\u30bf\u30a4\u30eb\u30ac\u30f3 2</a></h1>\n<p><strong>\u3053\u308c\u306f\u3001StyleGAN2\u3092\u7d39\u4ecb\u3059\u308b\u8ad6\u6587\u300c<a href=\"https://arxiv.org/abs/1912.04958\">StyleGAN\u306e\u753b\u8cea\u306e\u5206\u6790\u3068\u6539\u5584\u300d<a href=\"https://pytorch.org\">\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a></a>\u3002</strong>StyleGan 2\u306f\u3001\u8ad6\u6587\u300c<strong><a href=\"https://arxiv.org/abs/1812.04948\">\u6575\u5bfe\u7684\u751f\u6210\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u305f\u3081\u306e\u30b9\u30bf\u30a4\u30eb\u30d9\u30fc\u30b9\u306e\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u300d\u306eStyleGAN\u3092\u6539\u826f\u3057\u305f\u3082\u306e\u3067\u3059</a></strong>\u3002\u307e\u305f\u3001StyleGan\u306f\u8ad6\u6587\u300c<strong>GAN\u306e\u6f38\u9032\u7684\u6210\u9577\u306b\u3088\u308b\u54c1\u8cea</strong><a href=\"https://arxiv.org/abs/1710.10196\">\u3001\u5b89\u5b9a\u6027\u3001\u30d0\u30ea\u30a8\u30fc\u30b7\u30e7\u30f3\u306e\u5411\u4e0a\u300d\u306e\u30d7\u30ed\u30b0\u30ec\u30c3\u30b7\u30d6GAN\u3092\u30d9\u30fc\u30b9\u306b\u3057\u3066\u3044\u307e\u3059</a>\u30023 \u3064\u306e\u8ad6\u6587\u306f\u3059\u3079\u3066 <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA</a> AI \u306e\u540c\u3058\u8457\u8005\u306b\u3088\u308b\u3082\u306e\u3067\u3059</p>\u3002\n",
 "StyleGAN 2": "\u30b9\u30bf\u30a4\u30eb\u30ac\u30f3 2"
 }
--- a/translate_cache/gan/stylegan/readme.si.json
+++ b/translate_cache/gan/stylegan/readme.si.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/stylegan/index.html\">StyleGAN 2</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper  <a href=\"https://papers.labml.ai/paper/1912.04958\">Analyzing and Improving the Image Quality of StyleGAN</a>  which introduces <strong>StyleGAN2</strong>. StyleGAN 2 is an improvement over <strong>StyleGAN</strong> from the paper  <a href=\"https://papers.labml.ai/paper/1812.04948\">A Style-Based Generator Architecture for Generative Adversarial Networks</a>. And StyleGAN is based on <strong>Progressive GAN</strong> from the paper  <a href=\"https://papers.labml.ai/paper/1710.10196\">Progressive Growing of GANs for Improved Quality, Stability, and Variation</a>. All three papers are from the same authors from <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/stylegan/index.html\">Style\u0d9c\u0db1\u0dca 2</a></h1>\n<p>\u0db8\u0dd9\u0dba\u0d85 <a href=\"https://pytorch.org\">PyTorch</a> \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 <a href=\"https://papers.labml.ai/paper/1912.04958\">\u0dc0\u0dd2\u0dc1\u0dca\u0dbd\u0dda\u0dc2\u0dab\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0dc4 \u0dbb\u0dd6\u0db4\u0dba\u0dda \u0d9c\u0dd4\u0dab\u0dcf\u0dad\u0dca\u0db8\u0d9a\u0db7\u0dcf\u0dc0\u0dba \u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 StyleGan</a> \u0dc4\u0db3\u0dd4\u0db1\u0dca\u0dc0\u0dcf \u0daf\u0dd9\u0db1 <strong>StyleGan 2</strong>. StyleGan 2 \u0dba\u0db1\u0dd4 \u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0d9a\u0dd2 <strong>StyleGan</strong> <a href=\"https://papers.labml.ai/paper/1812.04948\">\u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dbd\u0dd2\u0db1\u0dca \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd \u0dc3\u0db3\u0dc4\u0dcf \u0dc1\u0ddb\u0dbd\u0dd2\u0dba \u0db8\u0dad \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd6 \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d9c\u0dd8\u0dc4 \u0db1\u0dd2\u0dbb\u0dca\u0db8\u0dcf\u0dab \u0dc1\u0dd2\u0dbd\u0dca\u0db4\u0dba</a>. \u0dc3\u0dc4 StyleGan \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd3 \u0d87\u0dad\u0dca\u0dad\u0dda \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dbd\u0dd2\u0db1\u0dca <strong>\u0db4\u0dca\u0dbb\u0d9c\u0dad\u0dd2\u0dc1\u0dd3\u0dbd\u0dd3 GAN</strong> \u0db8\u0dad \u0dba <a href=\"https://papers.labml.ai/paper/1710.10196\">\u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dc5 \u0d9c\u0dd4\u0dab\u0dcf\u0dad\u0dca\u0db8\u0d9a\u0db7\u0dcf\u0dc0\u0dba, \u0dc3\u0dca\u0dae\u0dcf\u0dba\u0dd2\u0dad\u0dcf\u0dc0 \u0dc3\u0dc4 \u0dc0\u0dd2\u0da0\u0dbd\u0db1\u0dba \u0dc3\u0db3\u0dc4\u0dcf GANs \u0db4\u0dca\u0dbb\u0d9c\u0dad\u0dd2\u0dc1\u0dd3\u0dbd\u0dd3 \u0dc0\u0dbb\u0dca\u0db0\u0db1\u0dba</a>. \u0db8\u0dd9\u0db8 \u0db4\u0dad\u0dca\u0dbb\u0dd2\u0d9a\u0dcf \u0dad\u0dd4\u0db1\u0db8 <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>\u0dc0\u0dd9\u0dad\u0dd2\u0db1\u0dca \u0d91\u0d9a\u0db8 \u0d9a\u0dad\u0dd4\u0dc0\u0dbb\u0dd4\u0db1\u0dca\u0d9c\u0dd9\u0db1\u0dca \u0dc0\u0dda. </p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/stylegan/index.html\">StyleGAN 2</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper  <a href=\"https://arxiv.org/abs/1912.04958\">Analyzing and Improving the Image Quality of StyleGAN</a>  which introduces <strong>StyleGAN2</strong>. StyleGAN 2 is an improvement over <strong>StyleGAN</strong> from the paper  <a href=\"https://arxiv.org/abs/1812.04948\">A Style-Based Generator Architecture for Generative Adversarial Networks</a>. And StyleGAN is based on <strong>Progressive GAN</strong> from the paper  <a href=\"https://arxiv.org/abs/1710.10196\">Progressive Growing of GANs for Improved Quality, Stability, and Variation</a>. All three papers are from the same authors from <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/stylegan/index.html\">Style\u0d9c\u0db1\u0dca 2</a></h1>\n<p>\u0db8\u0dd9\u0dba\u0d85 <a href=\"https://pytorch.org\">PyTorch</a> \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 <a href=\"https://arxiv.org/abs/1912.04958\">\u0dc0\u0dd2\u0dc1\u0dca\u0dbd\u0dda\u0dc2\u0dab\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0dc4 \u0dbb\u0dd6\u0db4\u0dba\u0dda \u0d9c\u0dd4\u0dab\u0dcf\u0dad\u0dca\u0db8\u0d9a\u0db7\u0dcf\u0dc0\u0dba \u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 StyleGan</a> \u0dc4\u0db3\u0dd4\u0db1\u0dca\u0dc0\u0dcf \u0daf\u0dd9\u0db1 <strong>StyleGan 2</strong>. StyleGan 2 \u0dba\u0db1\u0dd4 \u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0d9a\u0dd2 <strong>StyleGan</strong> <a href=\"https://arxiv.org/abs/1812.04948\">\u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dbd\u0dd2\u0db1\u0dca \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d85\u0dc4\u0dd2\u0dad\u0d9a\u0dbb \u0da2\u0dcf\u0dbd \u0dc3\u0db3\u0dc4\u0dcf \u0dc1\u0ddb\u0dbd\u0dd2\u0dba \u0db8\u0dad \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd6 \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0d9a \u0d9c\u0dd8\u0dc4 \u0db1\u0dd2\u0dbb\u0dca\u0db8\u0dcf\u0dab \u0dc1\u0dd2\u0dbd\u0dca\u0db4\u0dba</a>. \u0dc3\u0dc4 StyleGan \u0db4\u0daf\u0db1\u0db8\u0dca \u0dc0\u0dd3 \u0d87\u0dad\u0dca\u0dad\u0dda \u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 \u0dc0\u0dbd\u0dd2\u0db1\u0dca <strong>\u0db4\u0dca\u0dbb\u0d9c\u0dad\u0dd2\u0dc1\u0dd3\u0dbd\u0dd3 GAN</strong> \u0db8\u0dad \u0dba <a href=\"https://arxiv.org/abs/1710.10196\">\u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dc5 \u0d9c\u0dd4\u0dab\u0dcf\u0dad\u0dca\u0db8\u0d9a\u0db7\u0dcf\u0dc0\u0dba, \u0dc3\u0dca\u0dae\u0dcf\u0dba\u0dd2\u0dad\u0dcf\u0dc0 \u0dc3\u0dc4 \u0dc0\u0dd2\u0da0\u0dbd\u0db1\u0dba \u0dc3\u0db3\u0dc4\u0dcf GANs \u0db4\u0dca\u0dbb\u0d9c\u0dad\u0dd2\u0dc1\u0dd3\u0dbd\u0dd3 \u0dc0\u0dbb\u0dca\u0db0\u0db1\u0dba</a>. \u0db8\u0dd9\u0db8 \u0db4\u0dad\u0dca\u0dbb\u0dd2\u0d9a\u0dcf \u0dad\u0dd4\u0db1\u0db8 <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>\u0dc0\u0dd9\u0dad\u0dd2\u0db1\u0dca \u0d91\u0d9a\u0db8 \u0d9a\u0dad\u0dd4\u0dc0\u0dbb\u0dd4\u0db1\u0dca\u0d9c\u0dd9\u0db1\u0dca \u0dc0\u0dda. </p>\n",
 "StyleGAN 2": "Style\u0d9c\u0db1\u0dca 2"
 }
--- a/translate_cache/gan/stylegan/readme.zh.json
+++ b/translate_cache/gan/stylegan/readme.zh.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/stylegan/index.html\">StyleGAN 2</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper  <a href=\"https://papers.labml.ai/paper/1912.04958\">Analyzing and Improving the Image Quality of StyleGAN</a>  which introduces <strong>StyleGAN2</strong>. StyleGAN 2 is an improvement over <strong>StyleGAN</strong> from the paper  <a href=\"https://papers.labml.ai/paper/1812.04948\">A Style-Based Generator Architecture for Generative Adversarial Networks</a>. And StyleGAN is based on <strong>Progressive GAN</strong> from the paper  <a href=\"https://papers.labml.ai/paper/1710.10196\">Progressive Growing of GANs for Improved Quality, Stability, and Variation</a>. All three papers are from the same authors from <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/stylegan/index.html\">StyleGan 2</a></h1>\n<p>\u8fd9\u662f\u300a<a href=\"https://papers.labml.ai/paper/1912.04958\">\u5206\u6790\u548c\u63d0\u9ad8 StyleGan \u7684\u56fe\u50cf\u8d28\u91cf\u300b</a><a href=\"https://pytorch.org\">\u4e00\u6587\u7684 PyTorch</a> \u5b9e\u73b0\uff0c\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86 <strong>StyleGan2</strong>\u3002StyleGan 2 \u662f\u5bf9\u8bba\u6587\u300a\u751f\u6210<a href=\"https://papers.labml.ai/paper/1812.04948\">\u5bf9\u6297\u7f51\u7edc\u7684\u57fa\u4e8e\u6837\u5f0f\u7684\u751f\u6210\u5668\u67b6\u6784\u300b\u4e2d\u5bf9</a> <strong>StyleG</strong> an \u7684\u6539\u8fdb\u3002StyleG <strong>an \u57fa\u4e8e\u8bba\u6587\u300a\u9010\u6b65</strong><a href=\"https://papers.labml.ai/paper/1710.10196\">\u751f\u957f GaN \u4ee5\u63d0\u9ad8\u8d28\u91cf\u3001\u7a33\u5b9a\u6027\u548c\u53d8\u5f02\u6027\u300b\u4e2d\u7684\u6e10\u8fdb\u5f0f GAN</a>\u3002\u8fd9\u4e09\u7bc7\u8bba\u6587\u5747\u51fa\u81ea <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a> \u7684\u540c\u4e00\u4f4d\u4f5c\u8005\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/stylegan/index.html\">StyleGAN 2</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper  <a href=\"https://arxiv.org/abs/1912.04958\">Analyzing and Improving the Image Quality of StyleGAN</a>  which introduces <strong>StyleGAN2</strong>. StyleGAN 2 is an improvement over <strong>StyleGAN</strong> from the paper  <a href=\"https://arxiv.org/abs/1812.04948\">A Style-Based Generator Architecture for Generative Adversarial Networks</a>. And StyleGAN is based on <strong>Progressive GAN</strong> from the paper  <a href=\"https://arxiv.org/abs/1710.10196\">Progressive Growing of GANs for Improved Quality, Stability, and Variation</a>. All three papers are from the same authors from <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/stylegan/index.html\">StyleGan 2</a></h1>\n<p>\u8fd9\u662f\u300a<a href=\"https://arxiv.org/abs/1912.04958\">\u5206\u6790\u548c\u63d0\u9ad8 StyleGan \u7684\u56fe\u50cf\u8d28\u91cf\u300b</a><a href=\"https://pytorch.org\">\u4e00\u6587\u7684 PyTorch</a> \u5b9e\u73b0\uff0c\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86 <strong>StyleGan2</strong>\u3002StyleGan 2 \u662f\u5bf9\u8bba\u6587\u300a\u751f\u6210<a href=\"https://arxiv.org/abs/1812.04948\">\u5bf9\u6297\u7f51\u7edc\u7684\u57fa\u4e8e\u6837\u5f0f\u7684\u751f\u6210\u5668\u67b6\u6784\u300b\u4e2d\u5bf9</a> <strong>StyleG</strong> an \u7684\u6539\u8fdb\u3002StyleG <strong>an \u57fa\u4e8e\u8bba\u6587\u300a\u9010\u6b65</strong><a href=\"https://arxiv.org/abs/1710.10196\">\u751f\u957f GaN \u4ee5\u63d0\u9ad8\u8d28\u91cf\u3001\u7a33\u5b9a\u6027\u548c\u53d8\u5f02\u6027\u300b\u4e2d\u7684\u6e10\u8fdb\u5f0f GAN</a>\u3002\u8fd9\u4e09\u7bc7\u8bba\u6587\u5747\u51fa\u81ea <a href=\"https://twitter.com/NVIDIAAI\">NVIDIA AI</a> \u7684\u540c\u4e00\u4f4d\u4f5c\u8005\u3002</p>\n",
 "StyleGAN 2": "StyleGan 2"
 }
--- a/translate_cache/gan/wasserstein/init.ja.json
+++ b/translate_cache/gan/wasserstein/init.ja.json
--- a/translate_cache/gan/wasserstein/init.si.json
+++ b/translate_cache/gan/wasserstein/init.si.json
--- a/translate_cache/gan/wasserstein/init.zh.json
+++ b/translate_cache/gan/wasserstein/init.zh.json
--- a/translate_cache/gan/wasserstein/gradient_penalty/init.ja.json
+++ b/translate_cache/gan/wasserstein/gradient_penalty/init.ja.json
@ -1,5 +1,5 @@
 {
- "<h1>Gradient Penalty for Wasserstein GAN (WGAN-GP)</h1>\n<p>This is an implementation of <a href=\"https://papers.labml.ai/paper/1704.00028\">Improved Training of Wasserstein GANs</a>.</p>\n<p><a href=\"../index.html\">WGAN</a> suggests clipping weights to enforce Lipschitz constraint on the discriminator network (critic). This and other weight constraints like L2 norm clipping, weight normalization, L1, L2 weight decay have problems:</p>\n<p>1. Limiting the capacity of the discriminator 2. Exploding and vanishing gradients (without <a href=\"../../../normalization/batch_norm/index.html\">Batch Normalization</a>).</p>\n<p>The paper <a href=\"https://papers.labml.ai/paper/1704.00028\">Improved Training of Wasserstein GANs</a> proposal a better way to improve Lipschitz constraint, a gradient penalty.</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p>where <span translate=no>_^_1_^_</span> is the penalty weight and</p>\n<span translate=no>_^_2_^_</span><p>That is we try to keep the gradient norm <span translate=no>_^_3_^_</span> close to <span translate=no>_^_4_^_</span>.</p>\n<p>In this implementation we set <span translate=no>_^_5_^_</span>.</p>\n<p>Here is the <a href=\"experiment.html\">code for an experiment</a> that uses gradient penalty.</p>\n": "<h1>\u30ef\u30c3\u30b5\u30fc\u30b9\u30bf\u30a4\u30f3 GAN (WGAN-GP) \u306e\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u30da\u30ca\u30eb\u30c6\u30a3</h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://papers.labml.ai/paper/1704.00028\">\u30f4\u30a1\u30c3\u30b5\u30fc\u30b9\u30bf\u30a4\u30f3GAN\u306e\u6539\u826f\u578b\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306e\u5b9f\u88c5\u3067\u3059</a>\u3002</p>\n<p><a href=\"../index.html\">WGAN\u306f</a>\u3001\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306b\u30ea\u30c3\u30d7\u30b7\u30c3\u30c4\u5236\u7d04\u3092\u9069\u7528\u3059\u308b\u305f\u3081\u306b\u30a6\u30a7\u30a4\u30c8\u3092\u30af\u30ea\u30c3\u30d4\u30f3\u30b0\u3059\u308b\u3053\u3068\u3092\u63d0\u6848\u3057\u3066\u3044\u308b\uff08\u8a55\u8ad6\u5bb6\uff09\u3002\u3053\u308c\u306b\u52a0\u3048\u3066\u3001L2 \u30ce\u30eb\u30e0\u30af\u30ea\u30c3\u30d4\u30f3\u30b0\u3001\u30a6\u30a7\u30a4\u30c8\u6b63\u898f\u5316\u3001L1\u3001L2 \u30a6\u30a7\u30a4\u30c8\u6e1b\u8870\u306a\u3069\u306e\u4ed6\u306e\u30a6\u30a7\u30a4\u30c8\u5236\u7d04\u306b\u306f\u554f\u984c\u304c\u3042\u308a\u307e\u3059</p>\u3002\n<p>1\u3002\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u306e\u5bb9\u91cf\u5236\u9650 2.<a href=\"../../../normalization/batch_norm/index.html\">\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u304c\u7206\u767a\u3057\u305f\u308a\u6d88\u3048\u305f\u308a\u3059\u308b (\u30d0\u30c3\u30c1\u6b63\u898f\u5316\u306a\u3057)</a></p>\n<p>\u8ad6\u6587\u300c<a href=\"https://papers.labml.ai/paper/1704.00028\">Wasserstein GAN\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306e\u6539\u5584</a>\u300d\u306f\u3001\u52fe\u914d\u30da\u30ca\u30eb\u30c6\u30a3\u3067\u3042\u308b\u30ea\u30c3\u30d7\u30b7\u30c3\u30c4\u5236\u7d04\u3092\u6539\u5584\u3059\u308b\u3088\u308a\u826f\u3044\u65b9\u6cd5\u3092\u63d0\u6848\u3057\u3066\u3044\u307e\u3059\u3002</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p><span translate=no>_^_1_^_</span>\u30da\u30ca\u30eb\u30c6\u30a3\u30a6\u30a7\u30a4\u30c8\u306f\u3069\u3053\u3067</p>\n<span translate=no>_^_2_^_</span><p>\u3064\u307e\u308a\u3001<span translate=no>_^_3_^_</span>\u52fe\u914d\u306e\u30ce\u30eb\u30e0\u3092\u8fd1\u304f\u306b\u4fdd\u3064\u3088\u3046\u306b\u3057\u3066\u3044\u307e\u3059\u3002<span translate=no>_^_4_^_</span></p>\n<p><span translate=no>_^_5_^_</span>\u3053\u306e\u5b9f\u88c5\u3067\u306f\u8a2d\u5b9a\u3057\u307e\u3057\u305f\u3002</p>\n<p><a href=\"experiment.html\">\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u30da\u30ca\u30eb\u30c6\u30a3\u3092\u4f7f\u7528\u3059\u308b\u5b9f\u9a13\u306e\u30b3\u30fc\u30c9\u306f\u6b21\u306e\u3068\u304a\u308a\u3067\u3059</a>\u3002</p>\n",
+ "<h1>Gradient Penalty for Wasserstein GAN (WGAN-GP)</h1>\n<p>This is an implementation of <a href=\"https://arxiv.org/abs/1704.00028\">Improved Training of Wasserstein GANs</a>.</p>\n<p><a href=\"../index.html\">WGAN</a> suggests clipping weights to enforce Lipschitz constraint on the discriminator network (critic). This and other weight constraints like L2 norm clipping, weight normalization, L1, L2 weight decay have problems:</p>\n<p>1. Limiting the capacity of the discriminator 2. Exploding and vanishing gradients (without <a href=\"../../../normalization/batch_norm/index.html\">Batch Normalization</a>).</p>\n<p>The paper <a href=\"https://arxiv.org/abs/1704.00028\">Improved Training of Wasserstein GANs</a> proposal a better way to improve Lipschitz constraint, a gradient penalty.</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p>where <span translate=no>_^_1_^_</span> is the penalty weight and</p>\n<span translate=no>_^_2_^_</span><p>That is we try to keep the gradient norm <span translate=no>_^_3_^_</span> close to <span translate=no>_^_4_^_</span>.</p>\n<p>In this implementation we set <span translate=no>_^_5_^_</span>.</p>\n<p>Here is the <a href=\"experiment.html\">code for an experiment</a> that uses gradient penalty.</p>\n": "<h1>\u30ef\u30c3\u30b5\u30fc\u30b9\u30bf\u30a4\u30f3 GAN (WGAN-GP) \u306e\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u30da\u30ca\u30eb\u30c6\u30a3</h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://arxiv.org/abs/1704.00028\">\u30f4\u30a1\u30c3\u30b5\u30fc\u30b9\u30bf\u30a4\u30f3GAN\u306e\u6539\u826f\u578b\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306e\u5b9f\u88c5\u3067\u3059</a>\u3002</p>\n<p><a href=\"../index.html\">WGAN\u306f</a>\u3001\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306b\u30ea\u30c3\u30d7\u30b7\u30c3\u30c4\u5236\u7d04\u3092\u9069\u7528\u3059\u308b\u305f\u3081\u306b\u30a6\u30a7\u30a4\u30c8\u3092\u30af\u30ea\u30c3\u30d4\u30f3\u30b0\u3059\u308b\u3053\u3068\u3092\u63d0\u6848\u3057\u3066\u3044\u308b\uff08\u8a55\u8ad6\u5bb6\uff09\u3002\u3053\u308c\u306b\u52a0\u3048\u3066\u3001L2 \u30ce\u30eb\u30e0\u30af\u30ea\u30c3\u30d4\u30f3\u30b0\u3001\u30a6\u30a7\u30a4\u30c8\u6b63\u898f\u5316\u3001L1\u3001L2 \u30a6\u30a7\u30a4\u30c8\u6e1b\u8870\u306a\u3069\u306e\u4ed6\u306e\u30a6\u30a7\u30a4\u30c8\u5236\u7d04\u306b\u306f\u554f\u984c\u304c\u3042\u308a\u307e\u3059</p>\u3002\n<p>1\u3002\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u306e\u5bb9\u91cf\u5236\u9650 2.<a href=\"../../../normalization/batch_norm/index.html\">\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u304c\u7206\u767a\u3057\u305f\u308a\u6d88\u3048\u305f\u308a\u3059\u308b (\u30d0\u30c3\u30c1\u6b63\u898f\u5316\u306a\u3057)</a></p>\n<p>\u8ad6\u6587\u300c<a href=\"https://arxiv.org/abs/1704.00028\">Wasserstein GAN\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306e\u6539\u5584</a>\u300d\u306f\u3001\u52fe\u914d\u30da\u30ca\u30eb\u30c6\u30a3\u3067\u3042\u308b\u30ea\u30c3\u30d7\u30b7\u30c3\u30c4\u5236\u7d04\u3092\u6539\u5584\u3059\u308b\u3088\u308a\u826f\u3044\u65b9\u6cd5\u3092\u63d0\u6848\u3057\u3066\u3044\u307e\u3059\u3002</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p><span translate=no>_^_1_^_</span>\u30da\u30ca\u30eb\u30c6\u30a3\u30a6\u30a7\u30a4\u30c8\u306f\u3069\u3053\u3067</p>\n<span translate=no>_^_2_^_</span><p>\u3064\u307e\u308a\u3001<span translate=no>_^_3_^_</span>\u52fe\u914d\u306e\u30ce\u30eb\u30e0\u3092\u8fd1\u304f\u306b\u4fdd\u3064\u3088\u3046\u306b\u3057\u3066\u3044\u307e\u3059\u3002<span translate=no>_^_4_^_</span></p>\n<p><span translate=no>_^_5_^_</span>\u3053\u306e\u5b9f\u88c5\u3067\u306f\u8a2d\u5b9a\u3057\u307e\u3057\u305f\u3002</p>\n<p><a href=\"experiment.html\">\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u30da\u30ca\u30eb\u30c6\u30a3\u3092\u4f7f\u7528\u3059\u308b\u5b9f\u9a13\u306e\u30b3\u30fc\u30c9\u306f\u6b21\u306e\u3068\u304a\u308a\u3067\u3059</a>\u3002</p>\n",
 "<h2>Gradient Penalty</h2>\n": "<h2>\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u30da\u30ca\u30eb\u30c6\u30a3</h2>\n",
 "<p>Calculate gradients of <span translate=no>_^_0_^_</span> with respect to <span translate=no>_^_1_^_</span>. <span translate=no>_^_2_^_</span> is set to ones since we want the gradients of <span translate=no>_^_3_^_</span>, and we need to create and retain graph since we have to compute gradients with respect to weight on this loss. </p>\n": "<p><span translate=no>_^_0_^_</span>\u3092\u57fa\u6e96\u3068\u3057\u305f\u52fe\u914d\u3092\u8a08\u7b97\u3057\u307e\u3059\u3002<span translate=no>_^_1_^_</span><span translate=no>_^_2_^_</span>\u306e\u52fe\u914d\u3092\u6c42\u3081\u3066\u3044\u308b\u306e\u30671\u306b\u8a2d\u5b9a\u3057<span translate=no>_^_3_^_</span>\u3001\u3053\u306e\u640d\u5931\u306e\u91cd\u307f\u306b\u5bfe\u3059\u308b\u52fe\u914d\u3092\u8a08\u7b97\u3059\u308b\u5fc5\u8981\u304c\u3042\u308b\u305f\u3081\u3001\u30b0\u30e9\u30d5\u3092\u4f5c\u6210\u3057\u3066\u4fdd\u6301\u3059\u308b\u5fc5\u8981\u304c\u3042\u308a\u307e\u3059</p>\u3002\n",
 "<p>Calculate the norm <span translate=no>_^_0_^_</span> </p>\n": "<p>\u30ce\u30eb\u30e0\u306e\u8a08\u7b97 <span translate=no>_^_0_^_</span></p>\n",
--- a/translate_cache/gan/wasserstein/gradient_penalty/init.si.json
+++ b/translate_cache/gan/wasserstein/gradient_penalty/init.si.json
--- a/translate_cache/gan/wasserstein/gradient_penalty/init.zh.json
+++ b/translate_cache/gan/wasserstein/gradient_penalty/init.zh.json
@ -1,5 +1,5 @@
 {
- "<h1>Gradient Penalty for Wasserstein GAN (WGAN-GP)</h1>\n<p>This is an implementation of <a href=\"https://papers.labml.ai/paper/1704.00028\">Improved Training of Wasserstein GANs</a>.</p>\n<p><a href=\"../index.html\">WGAN</a> suggests clipping weights to enforce Lipschitz constraint on the discriminator network (critic). This and other weight constraints like L2 norm clipping, weight normalization, L1, L2 weight decay have problems:</p>\n<p>1. Limiting the capacity of the discriminator 2. Exploding and vanishing gradients (without <a href=\"../../../normalization/batch_norm/index.html\">Batch Normalization</a>).</p>\n<p>The paper <a href=\"https://papers.labml.ai/paper/1704.00028\">Improved Training of Wasserstein GANs</a> proposal a better way to improve Lipschitz constraint, a gradient penalty.</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p>where <span translate=no>_^_1_^_</span> is the penalty weight and</p>\n<span translate=no>_^_2_^_</span><p>That is we try to keep the gradient norm <span translate=no>_^_3_^_</span> close to <span translate=no>_^_4_^_</span>.</p>\n<p>In this implementation we set <span translate=no>_^_5_^_</span>.</p>\n<p>Here is the <a href=\"experiment.html\">code for an experiment</a> that uses gradient penalty.</p>\n": "<h1>Wasserstein GAN (WGAN-GP) \u7684\u68af\u5ea6\u60e9\u7f5a</h1>\n<p>\u8fd9\u662f <a href=\"https://papers.labml.ai/paper/1704.00028\">Wasserstein GAN \u6539\u8fdb\u8bad\u7ec3\u7684</a>\u5b9e\u73b0\u3002</p>\n<p><a href=\"../index.html\">WGAN</a> \u5efa\u8bae\u524a\u51cf\u6743\u91cd\u4ee5\u5bf9\u9274\u522b\u5668\u7f51\u7edc\u5f3a\u5236\u6267\u884c Lipschitz \u9650\u5236\uff08\u8bc4\u8bba\u5bb6\uff09\u3002\u8fd9\u4e2a\u548c\u5176\u4ed6\u6743\u91cd\u7ea6\u675f\uff0c\u5982L2\u6807\u51c6\u524a\u51cf\u3001\u6743\u91cd\u6807\u51c6\u5316\u3001L1\u3001L2\u6743\u91cd\u8870\u51cf\u90fd\u6709\u95ee\u9898\uff1a</p>\n<p>1.\u9650\u5236\u9274\u522b\u5668\u7684\u5bb9\u91cf 2.\u5206\u89e3\u548c\u6d88\u5931\u6e10\u53d8\uff08\u4e0d\u5e26<a href=\"../../../normalization/batch_norm/index.html\">\u6279\u91cf\u5f52\u4e00\u5316</a>\uff09\u3002</p>\n<p>\u8bba\u6587\u300a<a href=\"https://papers.labml.ai/paper/1704.00028\">\u6539\u8fdb\u4e86 Wasserstein GaN \u7684\u8bad\u7ec3\u300b</a>\u63d0\u51fa\u4e86\u6539\u8fdb Lipschitz \u7ea6\u675f\u7684\u66f4\u597d\u65b9\u6cd5\uff0c\u5373\u68af\u5ea6\u60e9\u7f5a\u3002</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p>\u60e9\u7f5a\u91cd<span translate=no>_^_1_^_</span>\u91cf\u5728\u54ea\u91cc</p>\n<span translate=no>_^_2_^_</span><p>\u4e5f\u5c31\u662f\u8bf4\uff0c\u6211\u4eec\u5c3d\u91cf\u4fdd\u6301\u68af\u5ea6\u8303<span translate=no>_^_3_^_</span>\u6570\u63a5\u8fd1<span translate=no>_^_4_^_</span>\u3002</p>\n<p>\u5728\u8fd9\u4e2a\u5b9e\u73b0\u4e2d\uff0c\u6211\u4eec\u8bbe\u7f6e<span translate=no>_^_5_^_</span>\u3002</p>\n<p>\u4ee5\u4e0b\u662f\u4f7f\u7528\u68af\u5ea6\u60e9\u7f5a<a href=\"experiment.html\">\u7684\u5b9e\u9a8c\u7684\u4ee3\u7801</a>\u3002</p>\n",
+ "<h1>Gradient Penalty for Wasserstein GAN (WGAN-GP)</h1>\n<p>This is an implementation of <a href=\"https://arxiv.org/abs/1704.00028\">Improved Training of Wasserstein GANs</a>.</p>\n<p><a href=\"../index.html\">WGAN</a> suggests clipping weights to enforce Lipschitz constraint on the discriminator network (critic). This and other weight constraints like L2 norm clipping, weight normalization, L1, L2 weight decay have problems:</p>\n<p>1. Limiting the capacity of the discriminator 2. Exploding and vanishing gradients (without <a href=\"../../../normalization/batch_norm/index.html\">Batch Normalization</a>).</p>\n<p>The paper <a href=\"https://arxiv.org/abs/1704.00028\">Improved Training of Wasserstein GANs</a> proposal a better way to improve Lipschitz constraint, a gradient penalty.</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p>where <span translate=no>_^_1_^_</span> is the penalty weight and</p>\n<span translate=no>_^_2_^_</span><p>That is we try to keep the gradient norm <span translate=no>_^_3_^_</span> close to <span translate=no>_^_4_^_</span>.</p>\n<p>In this implementation we set <span translate=no>_^_5_^_</span>.</p>\n<p>Here is the <a href=\"experiment.html\">code for an experiment</a> that uses gradient penalty.</p>\n": "<h1>Wasserstein GAN (WGAN-GP) \u7684\u68af\u5ea6\u60e9\u7f5a</h1>\n<p>\u8fd9\u662f <a href=\"https://arxiv.org/abs/1704.00028\">Wasserstein GAN \u6539\u8fdb\u8bad\u7ec3\u7684</a>\u5b9e\u73b0\u3002</p>\n<p><a href=\"../index.html\">WGAN</a> \u5efa\u8bae\u524a\u51cf\u6743\u91cd\u4ee5\u5bf9\u9274\u522b\u5668\u7f51\u7edc\u5f3a\u5236\u6267\u884c Lipschitz \u9650\u5236\uff08\u8bc4\u8bba\u5bb6\uff09\u3002\u8fd9\u4e2a\u548c\u5176\u4ed6\u6743\u91cd\u7ea6\u675f\uff0c\u5982L2\u6807\u51c6\u524a\u51cf\u3001\u6743\u91cd\u6807\u51c6\u5316\u3001L1\u3001L2\u6743\u91cd\u8870\u51cf\u90fd\u6709\u95ee\u9898\uff1a</p>\n<p>1.\u9650\u5236\u9274\u522b\u5668\u7684\u5bb9\u91cf 2.\u5206\u89e3\u548c\u6d88\u5931\u6e10\u53d8\uff08\u4e0d\u5e26<a href=\"../../../normalization/batch_norm/index.html\">\u6279\u91cf\u5f52\u4e00\u5316</a>\uff09\u3002</p>\n<p>\u8bba\u6587\u300a<a href=\"https://arxiv.org/abs/1704.00028\">\u6539\u8fdb\u4e86 Wasserstein GaN \u7684\u8bad\u7ec3\u300b</a>\u63d0\u51fa\u4e86\u6539\u8fdb Lipschitz \u7ea6\u675f\u7684\u66f4\u597d\u65b9\u6cd5\uff0c\u5373\u68af\u5ea6\u60e9\u7f5a\u3002</p>\n<p><span translate=no>_^_0_^_</span></p>\n<p>\u60e9\u7f5a\u91cd<span translate=no>_^_1_^_</span>\u91cf\u5728\u54ea\u91cc</p>\n<span translate=no>_^_2_^_</span><p>\u4e5f\u5c31\u662f\u8bf4\uff0c\u6211\u4eec\u5c3d\u91cf\u4fdd\u6301\u68af\u5ea6\u8303<span translate=no>_^_3_^_</span>\u6570\u63a5\u8fd1<span translate=no>_^_4_^_</span>\u3002</p>\n<p>\u5728\u8fd9\u4e2a\u5b9e\u73b0\u4e2d\uff0c\u6211\u4eec\u8bbe\u7f6e<span translate=no>_^_5_^_</span>\u3002</p>\n<p>\u4ee5\u4e0b\u662f\u4f7f\u7528\u68af\u5ea6\u60e9\u7f5a<a href=\"experiment.html\">\u7684\u5b9e\u9a8c\u7684\u4ee3\u7801</a>\u3002</p>\n",
 "<h2>Gradient Penalty</h2>\n": "<h2>\u68af\u5ea6\u60e9\u7f5a</h2>\n",
 "<p>Calculate gradients of <span translate=no>_^_0_^_</span> with respect to <span translate=no>_^_1_^_</span>. <span translate=no>_^_2_^_</span> is set to ones since we want the gradients of <span translate=no>_^_3_^_</span>, and we need to create and retain graph since we have to compute gradients with respect to weight on this loss. </p>\n": "<p>\u8ba1\u7b97<span translate=no>_^_0_^_</span>\u76f8\u5bf9\u4e8e\u7684\u68af\u5ea6<span translate=no>_^_1_^_</span>\u3002<span translate=no>_^_2_^_</span>\u8bbe\u7f6e\u4e3a 1\uff0c\u56e0\u4e3a\u6211\u4eec\u60f3\u8981\u68af\u5ea6<span translate=no>_^_3_^_</span>\uff0c\u6211\u4eec\u9700\u8981\u521b\u5efa\u548c\u4fdd\u7559\u56fe\u5f62\uff0c\u56e0\u4e3a\u6211\u4eec\u5fc5\u987b\u8ba1\u7b97\u76f8\u5bf9\u4e8e\u6b64\u635f\u5931\u7684\u6743\u91cd\u7684\u68af\u5ea6\u3002</p>\n",
 "<p>Calculate the norm <span translate=no>_^_0_^_</span> </p>\n": "<p>\u8ba1\u7b97\u5e38\u6570<span translate=no>_^_0_^_</span></p>\n",
--- a/translate_cache/gan/wasserstein/gradient_penalty/readme.ja.json
+++ b/translate_cache/gan/wasserstein/gradient_penalty/readme.ja.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/gradient_penalty/index.html\">Gradient Penalty for Wasserstein GAN (WGAN-GP)</a></h1>\n<p>This is an implementation of <a href=\"https://papers.labml.ai/paper/1704.00028\">Improved Training of Wasserstein GANs</a>.</p>\n<p><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">WGAN</a> suggests clipping weights to enforce Lipschitz constraint on the discriminator network (critic). This and other weight constraints like L2 norm clipping, weight normalization, L1, L2 weight decay have problems:</p>\n<p>1. Limiting the capacity of the discriminator 2. Exploding and vanishing gradients (without <a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">Batch Normalization</a>).</p>\n<p>The paper <a href=\"https://papers.labml.ai/paper/1704.00028\">Improved Training of Wasserstein GANs</a> proposal a better way to improve Lipschitz constraint, a gradient penalty. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/gradient_penalty/index.html\">\u30ef\u30c3\u30b5\u30fc\u30b9\u30bf\u30a4\u30f3 GAN (WGAN-GP) \u306e\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u30da\u30ca\u30eb\u30c6\u30a3</a></h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://papers.labml.ai/paper/1704.00028\">\u30f4\u30a1\u30c3\u30b5\u30fc\u30b9\u30bf\u30a4\u30f3GAN\u306e\u6539\u826f\u578b\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306e\u5b9f\u88c5\u3067\u3059</a>\u3002</p>\n<p><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">WGAN\u306f</a>\u3001\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306b\u30ea\u30c3\u30d7\u30b7\u30c3\u30c4\u5236\u7d04\u3092\u9069\u7528\u3059\u308b\u305f\u3081\u306b\u30a6\u30a7\u30a4\u30c8\u3092\u30af\u30ea\u30c3\u30d4\u30f3\u30b0\u3059\u308b\u3053\u3068\u3092\u63d0\u6848\u3057\u3066\u3044\u308b\uff08\u8a55\u8ad6\u5bb6\uff09\u3002\u3053\u308c\u306b\u52a0\u3048\u3066\u3001L2 \u30ce\u30eb\u30e0\u30af\u30ea\u30c3\u30d4\u30f3\u30b0\u3001\u30a6\u30a7\u30a4\u30c8\u6b63\u898f\u5316\u3001L1\u3001L2 \u30a6\u30a7\u30a4\u30c8\u6e1b\u8870\u306a\u3069\u306e\u4ed6\u306e\u30a6\u30a7\u30a4\u30c8\u5236\u7d04\u306b\u306f\u554f\u984c\u304c\u3042\u308a\u307e\u3059</p>\u3002\n<p>1\u3002\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u306e\u5bb9\u91cf\u5236\u9650 2.<a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u304c\u7206\u767a\u3057\u305f\u308a\u6d88\u3048\u305f\u308a\u3059\u308b (\u30d0\u30c3\u30c1\u6b63\u898f\u5316\u306a\u3057)</a></p>\n<p>\u8ad6\u6587\u300c<a href=\"https://papers.labml.ai/paper/1704.00028\">Wasserstein GAN\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306e\u6539\u5584</a>\u300d\u306f\u3001\u52fe\u914d\u30da\u30ca\u30eb\u30c6\u30a3\u3067\u3042\u308b\u30ea\u30c3\u30d7\u30b7\u30c3\u30c4\u5236\u7d04\u3092\u6539\u5584\u3059\u308b\u3088\u308a\u826f\u3044\u65b9\u6cd5\u3092\u63d0\u6848\u3057\u3066\u3044\u307e\u3059\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/gradient_penalty/index.html\">Gradient Penalty for Wasserstein GAN (WGAN-GP)</a></h1>\n<p>This is an implementation of <a href=\"https://arxiv.org/abs/1704.00028\">Improved Training of Wasserstein GANs</a>.</p>\n<p><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">WGAN</a> suggests clipping weights to enforce Lipschitz constraint on the discriminator network (critic). This and other weight constraints like L2 norm clipping, weight normalization, L1, L2 weight decay have problems:</p>\n<p>1. Limiting the capacity of the discriminator 2. Exploding and vanishing gradients (without <a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">Batch Normalization</a>).</p>\n<p>The paper <a href=\"https://arxiv.org/abs/1704.00028\">Improved Training of Wasserstein GANs</a> proposal a better way to improve Lipschitz constraint, a gradient penalty. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/gradient_penalty/index.html\">\u30ef\u30c3\u30b5\u30fc\u30b9\u30bf\u30a4\u30f3 GAN (WGAN-GP) \u306e\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u30da\u30ca\u30eb\u30c6\u30a3</a></h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://arxiv.org/abs/1704.00028\">\u30f4\u30a1\u30c3\u30b5\u30fc\u30b9\u30bf\u30a4\u30f3GAN\u306e\u6539\u826f\u578b\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306e\u5b9f\u88c5\u3067\u3059</a>\u3002</p>\n<p><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">WGAN\u306f</a>\u3001\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306b\u30ea\u30c3\u30d7\u30b7\u30c3\u30c4\u5236\u7d04\u3092\u9069\u7528\u3059\u308b\u305f\u3081\u306b\u30a6\u30a7\u30a4\u30c8\u3092\u30af\u30ea\u30c3\u30d4\u30f3\u30b0\u3059\u308b\u3053\u3068\u3092\u63d0\u6848\u3057\u3066\u3044\u308b\uff08\u8a55\u8ad6\u5bb6\uff09\u3002\u3053\u308c\u306b\u52a0\u3048\u3066\u3001L2 \u30ce\u30eb\u30e0\u30af\u30ea\u30c3\u30d4\u30f3\u30b0\u3001\u30a6\u30a7\u30a4\u30c8\u6b63\u898f\u5316\u3001L1\u3001L2 \u30a6\u30a7\u30a4\u30c8\u6e1b\u8870\u306a\u3069\u306e\u4ed6\u306e\u30a6\u30a7\u30a4\u30c8\u5236\u7d04\u306b\u306f\u554f\u984c\u304c\u3042\u308a\u307e\u3059</p>\u3002\n<p>1\u3002\u30c7\u30a3\u30b9\u30af\u30ea\u30df\u30cd\u30fc\u30bf\u30fc\u306e\u5bb9\u91cf\u5236\u9650 2.<a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u304c\u7206\u767a\u3057\u305f\u308a\u6d88\u3048\u305f\u308a\u3059\u308b (\u30d0\u30c3\u30c1\u6b63\u898f\u5316\u306a\u3057)</a></p>\n<p>\u8ad6\u6587\u300c<a href=\"https://arxiv.org/abs/1704.00028\">Wasserstein GAN\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306e\u6539\u5584</a>\u300d\u306f\u3001\u52fe\u914d\u30da\u30ca\u30eb\u30c6\u30a3\u3067\u3042\u308b\u30ea\u30c3\u30d7\u30b7\u30c3\u30c4\u5236\u7d04\u3092\u6539\u5584\u3059\u308b\u3088\u308a\u826f\u3044\u65b9\u6cd5\u3092\u63d0\u6848\u3057\u3066\u3044\u307e\u3059\u3002</p>\n",
 "Gradient Penalty for Wasserstein GAN (WGAN-GP)": "\u30ef\u30c3\u30b5\u30fc\u30b9\u30bf\u30a4\u30f3 GAN (WGAN-GP) \u306e\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u30da\u30ca\u30eb\u30c6\u30a3"
 }
--- a/translate_cache/gan/wasserstein/gradient_penalty/readme.si.json
+++ b/translate_cache/gan/wasserstein/gradient_penalty/readme.si.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/gradient_penalty/index.html\">Gradient Penalty for Wasserstein GAN (WGAN-GP)</a></h1>\n<p>This is an implementation of <a href=\"https://papers.labml.ai/paper/1704.00028\">Improved Training of Wasserstein GANs</a>.</p>\n<p><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">WGAN</a> suggests clipping weights to enforce Lipschitz constraint on the discriminator network (critic). This and other weight constraints like L2 norm clipping, weight normalization, L1, L2 weight decay have problems:</p>\n<p>1. Limiting the capacity of the discriminator 2. Exploding and vanishing gradients (without <a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">Batch Normalization</a>).</p>\n<p>The paper <a href=\"https://papers.labml.ai/paper/1704.00028\">Improved Training of Wasserstein GANs</a> proposal a better way to improve Lipschitz constraint, a gradient penalty. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/gradient_penalty/index.html\">\u0dc0\u0ddc\u0dc3\u0dbb\u0dca\u0dc3\u0dca\u0da7\u0dba\u0dd2\u0db1\u0dca GAN (WGAN-GP) \u0dc3\u0db3\u0dc4\u0dcf \u0d9c\u0dca\u0dbb\u0dda\u0da9\u0dd2\u0dba\u0db1\u0dca\u0da7\u0dca \u0daf penalty \u0dd4\u0dc0\u0db8</a></h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://papers.labml.ai/paper/1704.00028\">\u0dc0\u0ddc\u0dc3\u0dbb\u0dca\u0dc3\u0dca\u0da7\u0dba\u0dd2\u0db1\u0dca GANs \u0dc0\u0dd0\u0da9\u0dd2\u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dc5 \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4\u0dc0</a>\u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dba\u0dd2. </p>\n<p>\u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0ddc\u0da7 \u0dc3\u0dd0\u0dbd\u0d9a\u0dd3\u0db8\u0dda \u0da2\u0dcf\u0dbd\u0dba\u0dda (\u0dc0\u0dd2\u0da0\u0dcf\u0dbb\u0d9a)<a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">\u0dbd\u0dd2\u0db4\u0dca\u0dc3\u0dca\u0da0\u0dd2\u0da7\u0dca\u0dc3\u0dca \u0d85\u0dc0\u0dc4\u0dd2\u0dbb\u0dad\u0dcf \u0db6\u0dbd\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf WGAN</a> \u0dc0\u0dd2\u0dc3\u0dd2\u0db1\u0dca \u0d9a\u0dca\u0dbd\u0dd2\u0db4\u0dd2\u0db1\u0dca \u0db6\u0dbb \u0dba\u0ddd\u0da2\u0db1\u0dcf \u0d9a\u0dbb\u0dba\u0dd2. \u0db8\u0dd9\u0dba \u0dc3\u0dc4 L2 \u0dc3\u0db8\u0dca\u0db8\u0dad \u0d9a\u0dca\u0dbd\u0dd2\u0db4\u0dd2\u0db1\u0dca \u0d9a\u0dd2\u0dbb\u0dd3\u0db8, \u0db6\u0dbb \u0dc3\u0dcf\u0db8\u0dcf\u0db1\u0dca\u0dba\u0d9a\u0dbb\u0dab\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8, L1, L2 \u0db6\u0dbb \u0d9a\u0dca\u0dc2\u0dba \u0dc0\u0dd3\u0db8 \u0dc0\u0dd0\u0db1\u0dd2 \u0dc0\u0dd9\u0db1\u0dad\u0dca \u0db6\u0dbb \u0db6\u0dcf\u0db0\u0d9a \u0d9c\u0dd0\u0da7\u0dc5\u0dd4 \u0d87\u0dad:</p>\n<p>1. \u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0db8\u0dca \u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dcf\u0d9c\u0dda \u0db0\u0dcf\u0dbb\u0dd2\u0dad\u0dcf\u0dc0 \u0dc3\u0dd3\u0db8\u0dcf \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 2. \u0db4\u0dd4\u0db4\u0dd4\u0dbb\u0dcf \u0dba\u0dcf\u0db8 \u0dc3\u0dc4 \u0d85\u0dad\u0dd4\u0dbb\u0dd4\u0daf\u0dc4\u0db1\u0dca \u0dc0\u0dd3\u0db8 ( <a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">\u0d9a\u0dab\u0dca\u0da9\u0dcf\u0dba\u0db8\u0dca \u0dc3\u0dcf\u0db8\u0dcf\u0db1\u0dca\u0dba\u0d9a\u0dbb\u0dab\u0dba</a>\u0db1\u0ddc\u0db8\u0dd0\u0dad\u0dd2\u0dc0). </p>\n<p>\u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 <a href=\"https://papers.labml.ai/paper/1704.00028\">\u0dc0\u0dd0\u0da9\u0dd2\u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dbb\u0db1 \u0dbd\u0daf \u0dc0\u0ddc\u0dc3\u0dbb\u0dca\u0dc3\u0dca\u0da7\u0dba\u0dd2\u0db1\u0dca GANS \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4\u0dc0</a> Lipschitz \u0d85\u0dc0\u0dc4\u0dd2\u0dbb\u0dad\u0dcf \u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0dc0\u0da9\u0dcf \u0dc4\u0ddc\u0db3 \u0d9a\u0dca\u0dbb\u0db8\u0dba\u0d9a\u0dca \u0dba\u0ddd\u0da2\u0db1\u0dcf \u0d9a\u0dbb\u0dba\u0dd2. </p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/gradient_penalty/index.html\">Gradient Penalty for Wasserstein GAN (WGAN-GP)</a></h1>\n<p>This is an implementation of <a href=\"https://arxiv.org/abs/1704.00028\">Improved Training of Wasserstein GANs</a>.</p>\n<p><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">WGAN</a> suggests clipping weights to enforce Lipschitz constraint on the discriminator network (critic). This and other weight constraints like L2 norm clipping, weight normalization, L1, L2 weight decay have problems:</p>\n<p>1. Limiting the capacity of the discriminator 2. Exploding and vanishing gradients (without <a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">Batch Normalization</a>).</p>\n<p>The paper <a href=\"https://arxiv.org/abs/1704.00028\">Improved Training of Wasserstein GANs</a> proposal a better way to improve Lipschitz constraint, a gradient penalty. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/gradient_penalty/index.html\">\u0dc0\u0ddc\u0dc3\u0dbb\u0dca\u0dc3\u0dca\u0da7\u0dba\u0dd2\u0db1\u0dca GAN (WGAN-GP) \u0dc3\u0db3\u0dc4\u0dcf \u0d9c\u0dca\u0dbb\u0dda\u0da9\u0dd2\u0dba\u0db1\u0dca\u0da7\u0dca \u0daf penalty \u0dd4\u0dc0\u0db8</a></h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://arxiv.org/abs/1704.00028\">\u0dc0\u0ddc\u0dc3\u0dbb\u0dca\u0dc3\u0dca\u0da7\u0dba\u0dd2\u0db1\u0dca GANs \u0dc0\u0dd0\u0da9\u0dd2\u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dc5 \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4\u0dc0</a>\u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dba\u0dd2. </p>\n<p>\u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0ddc\u0da7 \u0dc3\u0dd0\u0dbd\u0d9a\u0dd3\u0db8\u0dda \u0da2\u0dcf\u0dbd\u0dba\u0dda (\u0dc0\u0dd2\u0da0\u0dcf\u0dbb\u0d9a)<a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">\u0dbd\u0dd2\u0db4\u0dca\u0dc3\u0dca\u0da0\u0dd2\u0da7\u0dca\u0dc3\u0dca \u0d85\u0dc0\u0dc4\u0dd2\u0dbb\u0dad\u0dcf \u0db6\u0dbd\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf WGAN</a> \u0dc0\u0dd2\u0dc3\u0dd2\u0db1\u0dca \u0d9a\u0dca\u0dbd\u0dd2\u0db4\u0dd2\u0db1\u0dca \u0db6\u0dbb \u0dba\u0ddd\u0da2\u0db1\u0dcf \u0d9a\u0dbb\u0dba\u0dd2. \u0db8\u0dd9\u0dba \u0dc3\u0dc4 L2 \u0dc3\u0db8\u0dca\u0db8\u0dad \u0d9a\u0dca\u0dbd\u0dd2\u0db4\u0dd2\u0db1\u0dca \u0d9a\u0dd2\u0dbb\u0dd3\u0db8, \u0db6\u0dbb \u0dc3\u0dcf\u0db8\u0dcf\u0db1\u0dca\u0dba\u0d9a\u0dbb\u0dab\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8, L1, L2 \u0db6\u0dbb \u0d9a\u0dca\u0dc2\u0dba \u0dc0\u0dd3\u0db8 \u0dc0\u0dd0\u0db1\u0dd2 \u0dc0\u0dd9\u0db1\u0dad\u0dca \u0db6\u0dbb \u0db6\u0dcf\u0db0\u0d9a \u0d9c\u0dd0\u0da7\u0dc5\u0dd4 \u0d87\u0dad:</p>\n<p>1. \u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0d9a\u0db8\u0dca \u0d9a\u0dbb\u0db1\u0dca\u0db1\u0dcf\u0d9c\u0dda \u0db0\u0dcf\u0dbb\u0dd2\u0dad\u0dcf\u0dc0 \u0dc3\u0dd3\u0db8\u0dcf \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 2. \u0db4\u0dd4\u0db4\u0dd4\u0dbb\u0dcf \u0dba\u0dcf\u0db8 \u0dc3\u0dc4 \u0d85\u0dad\u0dd4\u0dbb\u0dd4\u0daf\u0dc4\u0db1\u0dca \u0dc0\u0dd3\u0db8 ( <a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">\u0d9a\u0dab\u0dca\u0da9\u0dcf\u0dba\u0db8\u0dca \u0dc3\u0dcf\u0db8\u0dcf\u0db1\u0dca\u0dba\u0d9a\u0dbb\u0dab\u0dba</a>\u0db1\u0ddc\u0db8\u0dd0\u0dad\u0dd2\u0dc0). </p>\n<p>\u0d9a\u0da9\u0daf\u0dcf\u0dc3\u0dd2 <a href=\"https://arxiv.org/abs/1704.00028\">\u0dc0\u0dd0\u0da9\u0dd2\u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dbb\u0db1 \u0dbd\u0daf \u0dc0\u0ddc\u0dc3\u0dbb\u0dca\u0dc3\u0dca\u0da7\u0dba\u0dd2\u0db1\u0dca GANS \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4\u0dc0</a> Lipschitz \u0d85\u0dc0\u0dc4\u0dd2\u0dbb\u0dad\u0dcf \u0dc0\u0dd0\u0da9\u0dd2 \u0daf\u0dd2\u0dba\u0dd4\u0dab\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0dc0\u0da9\u0dcf \u0dc4\u0ddc\u0db3 \u0d9a\u0dca\u0dbb\u0db8\u0dba\u0d9a\u0dca \u0dba\u0ddd\u0da2\u0db1\u0dcf \u0d9a\u0dbb\u0dba\u0dd2. </p>\n",
 "Gradient Penalty for Wasserstein GAN (WGAN-GP)": "\u0dc0\u0ddc\u0dc3\u0dbb\u0dca\u0dc3\u0dca\u0da7\u0dba\u0dd2\u0db1\u0dca GAN (WGAN-GP) \u0dc3\u0db3\u0dc4\u0dcf \u0d9c\u0dca\u0dbb\u0dda\u0da9\u0dd2\u0dba\u0db1\u0dca\u0da7\u0dca \u0daf penalty \u0dd4\u0dc0\u0db8"
 }
--- a/translate_cache/gan/wasserstein/gradient_penalty/readme.zh.json
+++ b/translate_cache/gan/wasserstein/gradient_penalty/readme.zh.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/gradient_penalty/index.html\">Gradient Penalty for Wasserstein GAN (WGAN-GP)</a></h1>\n<p>This is an implementation of <a href=\"https://papers.labml.ai/paper/1704.00028\">Improved Training of Wasserstein GANs</a>.</p>\n<p><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">WGAN</a> suggests clipping weights to enforce Lipschitz constraint on the discriminator network (critic). This and other weight constraints like L2 norm clipping, weight normalization, L1, L2 weight decay have problems:</p>\n<p>1. Limiting the capacity of the discriminator 2. Exploding and vanishing gradients (without <a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">Batch Normalization</a>).</p>\n<p>The paper <a href=\"https://papers.labml.ai/paper/1704.00028\">Improved Training of Wasserstein GANs</a> proposal a better way to improve Lipschitz constraint, a gradient penalty. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/gradient_penalty/index.html\">Wasserstein GAN (WGAN-GP) \u7684\u68af\u5ea6\u60e9\u7f5a</a></h1>\n<p>\u8fd9\u662f <a href=\"https://papers.labml.ai/paper/1704.00028\">Wasserstein GAN \u6539\u8fdb\u8bad\u7ec3\u7684</a>\u5b9e\u73b0\u3002</p>\n<p><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">WGAN</a> \u5efa\u8bae\u524a\u51cf\u6743\u91cd\u4ee5\u5bf9\u9274\u522b\u5668\u7f51\u7edc\u5f3a\u5236\u6267\u884c Lipschitz \u9650\u5236\uff08\u8bc4\u8bba\u5bb6\uff09\u3002\u8fd9\u4e2a\u548c\u5176\u4ed6\u6743\u91cd\u7ea6\u675f\uff0c\u5982L2\u6807\u51c6\u524a\u51cf\u3001\u6743\u91cd\u6807\u51c6\u5316\u3001L1\u3001L2\u6743\u91cd\u8870\u51cf\u90fd\u6709\u95ee\u9898\uff1a</p>\n<p>1.\u9650\u5236\u9274\u522b\u5668\u7684\u5bb9\u91cf 2.\u5206\u89e3\u548c\u6d88\u5931\u6e10\u53d8\uff08\u4e0d\u5e26<a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">\u6279\u91cf\u5f52\u4e00\u5316</a>\uff09\u3002</p>\n<p>\u8bba\u6587\u300a<a href=\"https://papers.labml.ai/paper/1704.00028\">\u6539\u8fdb\u4e86 Wasserstein GaN \u7684\u8bad\u7ec3\u300b</a>\u63d0\u51fa\u4e86\u6539\u8fdb Lipschitz \u7ea6\u675f\u7684\u66f4\u597d\u65b9\u6cd5\uff0c\u5373\u68af\u5ea6\u60e9\u7f5a\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/gradient_penalty/index.html\">Gradient Penalty for Wasserstein GAN (WGAN-GP)</a></h1>\n<p>This is an implementation of <a href=\"https://arxiv.org/abs/1704.00028\">Improved Training of Wasserstein GANs</a>.</p>\n<p><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">WGAN</a> suggests clipping weights to enforce Lipschitz constraint on the discriminator network (critic). This and other weight constraints like L2 norm clipping, weight normalization, L1, L2 weight decay have problems:</p>\n<p>1. Limiting the capacity of the discriminator 2. Exploding and vanishing gradients (without <a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">Batch Normalization</a>).</p>\n<p>The paper <a href=\"https://arxiv.org/abs/1704.00028\">Improved Training of Wasserstein GANs</a> proposal a better way to improve Lipschitz constraint, a gradient penalty. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/gradient_penalty/index.html\">Wasserstein GAN (WGAN-GP) \u7684\u68af\u5ea6\u60e9\u7f5a</a></h1>\n<p>\u8fd9\u662f <a href=\"https://arxiv.org/abs/1704.00028\">Wasserstein GAN \u6539\u8fdb\u8bad\u7ec3\u7684</a>\u5b9e\u73b0\u3002</p>\n<p><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">WGAN</a> \u5efa\u8bae\u524a\u51cf\u6743\u91cd\u4ee5\u5bf9\u9274\u522b\u5668\u7f51\u7edc\u5f3a\u5236\u6267\u884c Lipschitz \u9650\u5236\uff08\u8bc4\u8bba\u5bb6\uff09\u3002\u8fd9\u4e2a\u548c\u5176\u4ed6\u6743\u91cd\u7ea6\u675f\uff0c\u5982L2\u6807\u51c6\u524a\u51cf\u3001\u6743\u91cd\u6807\u51c6\u5316\u3001L1\u3001L2\u6743\u91cd\u8870\u51cf\u90fd\u6709\u95ee\u9898\uff1a</p>\n<p>1.\u9650\u5236\u9274\u522b\u5668\u7684\u5bb9\u91cf 2.\u5206\u89e3\u548c\u6d88\u5931\u6e10\u53d8\uff08\u4e0d\u5e26<a href=\"https://nn.labml.ai/normalization/batch_norm/index.html\">\u6279\u91cf\u5f52\u4e00\u5316</a>\uff09\u3002</p>\n<p>\u8bba\u6587\u300a<a href=\"https://arxiv.org/abs/1704.00028\">\u6539\u8fdb\u4e86 Wasserstein GaN \u7684\u8bad\u7ec3\u300b</a>\u63d0\u51fa\u4e86\u6539\u8fdb Lipschitz \u7ea6\u675f\u7684\u66f4\u597d\u65b9\u6cd5\uff0c\u5373\u68af\u5ea6\u60e9\u7f5a\u3002</p>\n",
 "Gradient Penalty for Wasserstein GAN (WGAN-GP)": "Wasserstein GAN (WGAN-GP) \u7684\u68af\u5ea6\u60e9\u7f5a"
 }
--- a/translate_cache/gan/wasserstein/readme.ja.json
+++ b/translate_cache/gan/wasserstein/readme.ja.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">Wasserstein GAN - WGAN</a></h1>\n<p>This is an implementation of <a href=\"https://papers.labml.ai/paper/1701.07875\">Wasserstein GAN</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">\u30ef\u30c3\u30b5\u30fc\u30b9\u30bf\u30a4\u30f3 GAN-WGAN</a></h1>\n<p><a href=\"https://papers.labml.ai/paper/1701.07875\">\u3053\u308c\u306f\u30ef\u30c3\u30b5\u30fc\u30b9\u30bf\u30a4\u30f3</a> GAN \u306e\u5b9f\u88c5\u3067\u3059\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">Wasserstein GAN - WGAN</a></h1>\n<p>This is an implementation of <a href=\"https://arxiv.org/abs/1701.07875\">Wasserstein GAN</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">\u30ef\u30c3\u30b5\u30fc\u30b9\u30bf\u30a4\u30f3 GAN-WGAN</a></h1>\n<p><a href=\"https://arxiv.org/abs/1701.07875\">\u3053\u308c\u306f\u30ef\u30c3\u30b5\u30fc\u30b9\u30bf\u30a4\u30f3</a> GAN \u306e\u5b9f\u88c5\u3067\u3059\u3002</p>\n",
 "Wasserstein GAN - WGAN": "\u30ef\u30c3\u30b5\u30fc\u30b9\u30bf\u30a4\u30f3 GAN-WGAN"
 }
--- a/translate_cache/gan/wasserstein/readme.si.json
+++ b/translate_cache/gan/wasserstein/readme.si.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">Wasserstein GAN - WGAN</a></h1>\n<p>This is an implementation of <a href=\"https://papers.labml.ai/paper/1701.07875\">Wasserstein GAN</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">\u0dc0\u0ddc\u0dc3\u0dbb\u0dca\u0dc3\u0dca\u0da7\u0dba\u0dd2\u0db1\u0dca GAN - WGAN</a></h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://papers.labml.ai/paper/1701.07875\">\u0dc0\u0ddc\u0dc3\u0dbb\u0dca\u0dc3\u0dca\u0da7\u0dba\u0dd2\u0db1\u0dca GAN</a>\u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dba\u0dd2. </p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">Wasserstein GAN - WGAN</a></h1>\n<p>This is an implementation of <a href=\"https://arxiv.org/abs/1701.07875\">Wasserstein GAN</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">\u0dc0\u0ddc\u0dc3\u0dbb\u0dca\u0dc3\u0dca\u0da7\u0dba\u0dd2\u0db1\u0dca GAN - WGAN</a></h1>\n<p>\u0db8\u0dd9\u0dba <a href=\"https://arxiv.org/abs/1701.07875\">\u0dc0\u0ddc\u0dc3\u0dbb\u0dca\u0dc3\u0dca\u0da7\u0dba\u0dd2\u0db1\u0dca GAN</a>\u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dba\u0dd2. </p>\n",
 "Wasserstein GAN - WGAN": "\u0dc0\u0ddc\u0dc3\u0dbb\u0dca\u0dc3\u0dca\u0da7\u0dba\u0dd2\u0db1\u0dca GAN - WGAN"
 }
--- a/translate_cache/gan/wasserstein/readme.zh.json
+++ b/translate_cache/gan/wasserstein/readme.zh.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">Wasserstein GAN - WGAN</a></h1>\n<p>This is an implementation of <a href=\"https://papers.labml.ai/paper/1701.07875\">Wasserstein GAN</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">Wasserstein GAN-WGAN</a></h1>\n<p>\u8fd9\u662f <a href=\"https://papers.labml.ai/paper/1701.07875\">Wasserstein GAN</a> \u7684\u5b9e\u73b0\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">Wasserstein GAN - WGAN</a></h1>\n<p>This is an implementation of <a href=\"https://arxiv.org/abs/1701.07875\">Wasserstein GAN</a>. </p>\n": "<h1><a href=\"https://nn.labml.ai/gan/wasserstein/index.html\">Wasserstein GAN-WGAN</a></h1>\n<p>\u8fd9\u662f <a href=\"https://arxiv.org/abs/1701.07875\">Wasserstein GAN</a> \u7684\u5b9e\u73b0\u3002</p>\n",
 "Wasserstein GAN - WGAN": "Wasserstein GAN-WGAN"
 }
--- a/translate_cache/graphs/gat/init.ja.json
+++ b/translate_cache/graphs/gat/init.ja.json
@ -1,5 +1,5 @@
 {
- "<h1>Graph Attention Networks (GAT)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://papers.labml.ai/paper/1710.10903\">Graph Attention Networks</a>.</p>\n<p>GATs work on graph data. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>GAT uses masked self-attention, kind of similar to <a href=\"../../transformers/mha.html\">transformers</a>. GAT consists of graph attention layers stacked on top of each other. Each graph attention layer gets node embeddings as inputs and outputs transformed embeddings. The node embeddings pay attention to the embeddings of other nodes it&#x27;s connected to. The details of graph attention layers are included alongside the implementation.</p>\n<p>Here is <a href=\"experiment.html\">the training code</a> for training a two-layer GAT on Cora dataset.</p>\n": "<h1>\u30b0\u30e9\u30d5\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af (GAT)</h1>\n<p>\u3053\u308c\u306f\u8ad6\u6587\u306e\u300c<a href=\"https://papers.labml.ai/paper/1710.10903\">\u30b0\u30e9\u30d5\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af</a>\u300d\u306e <a href=\"https://pytorch.org\">PyTorch</a> \u5b9f\u88c5\u3067\u3059\u3002</p>\n<p>GAT \u306f\u30b0\u30e9\u30d5\u30c7\u30fc\u30bf\u3092\u51e6\u7406\u3057\u307e\u3059\u3002\u30b0\u30e9\u30d5\u306f\u3001\u30ce\u30fc\u30c9\u3068\u30ce\u30fc\u30c9\u3092\u63a5\u7d9a\u3059\u308b\u30a8\u30c3\u30b8\u3067\u69cb\u6210\u3055\u308c\u307e\u3059\u3002\u305f\u3068\u3048\u3070\u3001Cora\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3067\u306f\u3001\u30ce\u30fc\u30c9\u306f\u7814\u7a76\u8ad6\u6587\u3067\u3001\u7aef\u306f\u8ad6\u6587\u3092\u3064\u306a\u3050\u5f15\u7528\u3067\u3059</p>\u3002\n<p><a href=\"../../transformers/mha.html\">GAT\u306f\u3001\u30c8\u30e9\u30f3\u30b9\u30d5\u30a9\u30fc\u30de\u30fc\u306b\u4f3c\u305f\u3001\u30de\u30b9\u30af\u3055\u308c\u305f\u30bb\u30eb\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u3092\u4f7f\u3044\u307e\u3059\u3002</a>GAT\u306f\u3001\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u304c\u4e92\u3044\u306b\u91cd\u306a\u308a\u5408\u3063\u3066\u69cb\u6210\u3055\u308c\u3066\u3044\u307e\u3059\u3002\u5404\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u306f\u3001\u5165\u529b\u3068\u3057\u3066\u30ce\u30fc\u30c9\u57cb\u3081\u8fbc\u307f\u3092\u53d6\u5f97\u3057\u3001\u5909\u63db\u3055\u308c\u305f\u57cb\u3081\u8fbc\u307f\u3092\u51fa\u529b\u3057\u307e\u3059\u3002\u30ce\u30fc\u30c9\u57cb\u3081\u8fbc\u307f\u306f\u3001\u63a5\u7d9a\u3055\u308c\u3066\u3044\u308b\u4ed6\u306e\u30ce\u30fc\u30c9\u306e\u57cb\u3081\u8fbc\u307f\u306b\u6ce8\u76ee\u3057\u307e\u3059\u3002\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u306e\u8a73\u7d30\u306f\u3001\u5b9f\u88c5\u3068\u3068\u3082\u306b\u542b\u307e\u308c\u3066\u3044\u307e\u3059\u3002</p>\n<p><a href=\"experiment.html\">Cora \u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3067 2 \u5c64 GAT \u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u305f\u3081\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30b3\u30fc\u30c9\u3092\u6b21\u306b\u793a\u3057\u307e\u3059</a>\u3002</p>\n",
+ "<h1>Graph Attention Networks (GAT)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://arxiv.org/abs/1710.10903\">Graph Attention Networks</a>.</p>\n<p>GATs work on graph data. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>GAT uses masked self-attention, kind of similar to <a href=\"../../transformers/mha.html\">transformers</a>. GAT consists of graph attention layers stacked on top of each other. Each graph attention layer gets node embeddings as inputs and outputs transformed embeddings. The node embeddings pay attention to the embeddings of other nodes it&#x27;s connected to. The details of graph attention layers are included alongside the implementation.</p>\n<p>Here is <a href=\"experiment.html\">the training code</a> for training a two-layer GAT on Cora dataset.</p>\n": "<h1>\u30b0\u30e9\u30d5\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af (GAT)</h1>\n<p>\u3053\u308c\u306f\u8ad6\u6587\u306e\u300c<a href=\"https://arxiv.org/abs/1710.10903\">\u30b0\u30e9\u30d5\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af</a>\u300d\u306e <a href=\"https://pytorch.org\">PyTorch</a> \u5b9f\u88c5\u3067\u3059\u3002</p>\n<p>GAT \u306f\u30b0\u30e9\u30d5\u30c7\u30fc\u30bf\u3092\u51e6\u7406\u3057\u307e\u3059\u3002\u30b0\u30e9\u30d5\u306f\u3001\u30ce\u30fc\u30c9\u3068\u30ce\u30fc\u30c9\u3092\u63a5\u7d9a\u3059\u308b\u30a8\u30c3\u30b8\u3067\u69cb\u6210\u3055\u308c\u307e\u3059\u3002\u305f\u3068\u3048\u3070\u3001Cora\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3067\u306f\u3001\u30ce\u30fc\u30c9\u306f\u7814\u7a76\u8ad6\u6587\u3067\u3001\u7aef\u306f\u8ad6\u6587\u3092\u3064\u306a\u3050\u5f15\u7528\u3067\u3059</p>\u3002\n<p><a href=\"../../transformers/mha.html\">GAT\u306f\u3001\u30c8\u30e9\u30f3\u30b9\u30d5\u30a9\u30fc\u30de\u30fc\u306b\u4f3c\u305f\u3001\u30de\u30b9\u30af\u3055\u308c\u305f\u30bb\u30eb\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u3092\u4f7f\u3044\u307e\u3059\u3002</a>GAT\u306f\u3001\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u304c\u4e92\u3044\u306b\u91cd\u306a\u308a\u5408\u3063\u3066\u69cb\u6210\u3055\u308c\u3066\u3044\u307e\u3059\u3002\u5404\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u306f\u3001\u5165\u529b\u3068\u3057\u3066\u30ce\u30fc\u30c9\u57cb\u3081\u8fbc\u307f\u3092\u53d6\u5f97\u3057\u3001\u5909\u63db\u3055\u308c\u305f\u57cb\u3081\u8fbc\u307f\u3092\u51fa\u529b\u3057\u307e\u3059\u3002\u30ce\u30fc\u30c9\u57cb\u3081\u8fbc\u307f\u306f\u3001\u63a5\u7d9a\u3055\u308c\u3066\u3044\u308b\u4ed6\u306e\u30ce\u30fc\u30c9\u306e\u57cb\u3081\u8fbc\u307f\u306b\u6ce8\u76ee\u3057\u307e\u3059\u3002\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u306e\u8a73\u7d30\u306f\u3001\u5b9f\u88c5\u3068\u3068\u3082\u306b\u542b\u307e\u308c\u3066\u3044\u307e\u3059\u3002</p>\n<p><a href=\"experiment.html\">Cora \u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3067 2 \u5c64 GAT \u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u305f\u3081\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30b3\u30fc\u30c9\u3092\u6b21\u306b\u793a\u3057\u307e\u3059</a>\u3002</p>\n",
 "<h2>Graph attention layer</h2>\n<p>This is a single graph attention layer. A GAT is made up of multiple such layers.</p>\n<p>It takes <span translate=no>_^_0_^_</span>, where <span translate=no>_^_1_^_</span> as input and outputs <span translate=no>_^_2_^_</span>, where <span translate=no>_^_3_^_</span>.</p>\n": "<h2>\u30b0\u30e9\u30d5\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30ec\u30a4\u30e4\u30fc</h2>\n<p>\u3053\u308c\u306f\u5358\u4e00\u306e\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u3067\u3059\u3002GAT \u306f\u3053\u306e\u3088\u3046\u306a\u8907\u6570\u306e\u30ec\u30a4\u30e4\u30fc\u3067\u69cb\u6210\u3055\u308c\u3066\u3044\u307e\u3059</p>\u3002\n<p><span translate=no>_^_1_^_</span>\u5165\u529b\u3068\u3057\u3066<span translate=no>_^_0_^_</span>\u3001where \u3092\u3001\u51fa\u529b\u3068\u3057\u3066<span translate=no>_^_2_^_</span>\u3001where <span translate=no>_^_3_^_</span> \u3092\u53d6\u308a\u307e\u3059\u3002</p>\n",
 "<h4>Calculate attention score</h4>\n<p>We calculate these for each head <span translate=no>_^_0_^_</span>. <em>We have omitted <span translate=no>_^_1_^_</span> for simplicity</em>.</p>\n<p><span translate=no>_^_2_^_</span></p>\n<p><span translate=no>_^_3_^_</span> is the attention score (importance) from node <span translate=no>_^_4_^_</span> to node <span translate=no>_^_5_^_</span>. We calculate this for each head.</p>\n<p><span translate=no>_^_6_^_</span> is the attention mechanism, that calculates the attention score. The paper concatenates <span translate=no>_^_7_^_</span>, <span translate=no>_^_8_^_</span> and does a linear transformation with a weight vector <span translate=no>_^_9_^_</span> followed by a <span translate=no>_^_10_^_</span>.</p>\n<p><span translate=no>_^_11_^_</span> </p>\n": "<h4>\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30b9\u30b3\u30a2\u306e\u8a08\u7b97</h4>\n<p><span translate=no>_^_0_^_</span>\u3053\u308c\u3089\u306f\u982d\u3054\u3068\u306b\u8a08\u7b97\u3057\u307e\u3059\u3002<em><span translate=no>_^_1_^_</span>\u308f\u304b\u308a\u3084\u3059\u304f\u3059\u308b\u305f\u3081\u306b\u7701\u7565\u3057\u307e\u3057\u305f\u3002</em></p>\n<p><span translate=no>_^_2_^_</span></p>\n<p><span translate=no>_^_3_^_</span><span translate=no>_^_4_^_</span>\u30ce\u30fc\u30c9\u3054\u3068\u306e\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30b9\u30b3\u30a2\uff08\u91cd\u8981\u5ea6\uff09<span translate=no>_^_5_^_</span>\u3067\u3059\u3002\u3053\u308c\u3092\u982d\u3054\u3068\u306b\u8a08\u7b97\u3057\u307e\u3059\u3002</p>\n<p><span translate=no>_^_6_^_</span>\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30b9\u30b3\u30a2\u3092\u8a08\u7b97\u3059\u308b\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30e1\u30ab\u30cb\u30ba\u30e0\u3067\u3059\u3002\u3053\u306e\u8ad6\u6587\u3067\u306f<span translate=no>_^_7_^_</span>\u3001<span translate=no>_^_8_^_</span><span translate=no>_^_9_^_</span>\u91cd\u307f\u30d9\u30af\u30c8\u30eb\u306e\u5f8c\u306ba\u3092\u9023\u7d50\u3057\u3001\u7dda\u5f62\u5909\u63db\u3092\u884c\u3044\u307e\u3059</p>\u3002<span translate=no>_^_10_^_</span>\n<p><span translate=no>_^_11_^_</span></p>\n",
 "<p><span translate=no>_^_0_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span></p>\n",
--- a/translate_cache/graphs/gat/init.si.json
+++ b/translate_cache/graphs/gat/init.si.json
--- a/translate_cache/graphs/gat/init.zh.json
+++ b/translate_cache/graphs/gat/init.zh.json
@ -1,5 +1,5 @@
 {
- "<h1>Graph Attention Networks (GAT)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://papers.labml.ai/paper/1710.10903\">Graph Attention Networks</a>.</p>\n<p>GATs work on graph data. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>GAT uses masked self-attention, kind of similar to <a href=\"../../transformers/mha.html\">transformers</a>. GAT consists of graph attention layers stacked on top of each other. Each graph attention layer gets node embeddings as inputs and outputs transformed embeddings. The node embeddings pay attention to the embeddings of other nodes it&#x27;s connected to. The details of graph attention layers are included alongside the implementation.</p>\n<p>Here is <a href=\"experiment.html\">the training code</a> for training a two-layer GAT on Cora dataset.</p>\n": "<h1>\u56fe\u8868\u6ce8\u610f\u529b\u7f51\u7edc (GAT)</h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5bf9\u300a<a href=\"https://papers.labml.ai/paper/1710.10903\">\u56fe\u5f62\u6ce8\u610f\u529b\u7f51\u7edc</a>\u300b\u8bba\u6587\u7684\u5b9e\u73b0\u3002</p>\n<p>GAT \u5904\u7406\u56fe\u8868\u6570\u636e\u3002\u56fe\u7531\u8282\u70b9\u548c\u8fde\u63a5\u8282\u70b9\u7684\u8fb9\u7ec4\u6210\u3002\u4f8b\u5982\uff0c\u5728 Cora \u6570\u636e\u96c6\u4e2d\uff0c\u8282\u70b9\u662f\u7814\u7a76\u8bba\u6587\uff0c\u8fb9\u7f18\u662f\u8fde\u63a5\u8bba\u6587\u7684\u5f15\u6587\u3002</p>\n<p>GAT \u4f7f\u7528\u8499\u9762\u81ea\u6ce8\u610f\u529b\uff0c\u6709\u70b9\u7c7b\u4f3c\u4e8e<a href=\"../../transformers/mha.html\">\u53d8\u5f62\u91d1\u521a</a>\u3002GAT \u7531\u76f8\u4e92\u5806\u53e0\u7684\u56fe\u8868\u6ce8\u610f\u529b\u5c42\u7ec4\u6210\u3002\u6bcf\u4e2a\u56fe\u6ce8\u610f\u529b\u5c42\u90fd\u5c06\u8282\u70b9\u5d4c\u5165\u4f5c\u4e3a\u8f6c\u6362\u540e\u7684\u5d4c\u5165\u7684\u8f93\u5165\u548c\u8f93\u51fa\u83b7\u5f97\u8282\u70b9\u3002\u8282\u70b9\u5d4c\u5165\u4f1a\u6ce8\u610f\u5b83\u6240\u8fde\u63a5\u7684\u5176\u4ed6\u8282\u70b9\u7684\u5d4c\u5165\u3002\u56fe\u5f62\u6ce8\u610f\u529b\u5c42\u7684\u8be6\u7ec6\u4fe1\u606f\u4e0e\u5b9e\u73b0\u4e00\u8d77\u5305\u62ec\u5728\u5185\u3002</p>\n<p>\u4ee5\u4e0b\u662f<a href=\"experiment.html\">\u5728 Cora \u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\u4e24\u5c42 GAT \u7684\u8bad\u7ec3\u4ee3\u7801</a>\u3002</p>\n",
+ "<h1>Graph Attention Networks (GAT)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://arxiv.org/abs/1710.10903\">Graph Attention Networks</a>.</p>\n<p>GATs work on graph data. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>GAT uses masked self-attention, kind of similar to <a href=\"../../transformers/mha.html\">transformers</a>. GAT consists of graph attention layers stacked on top of each other. Each graph attention layer gets node embeddings as inputs and outputs transformed embeddings. The node embeddings pay attention to the embeddings of other nodes it&#x27;s connected to. The details of graph attention layers are included alongside the implementation.</p>\n<p>Here is <a href=\"experiment.html\">the training code</a> for training a two-layer GAT on Cora dataset.</p>\n": "<h1>\u56fe\u8868\u6ce8\u610f\u529b\u7f51\u7edc (GAT)</h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5bf9\u300a<a href=\"https://arxiv.org/abs/1710.10903\">\u56fe\u5f62\u6ce8\u610f\u529b\u7f51\u7edc</a>\u300b\u8bba\u6587\u7684\u5b9e\u73b0\u3002</p>\n<p>GAT \u5904\u7406\u56fe\u8868\u6570\u636e\u3002\u56fe\u7531\u8282\u70b9\u548c\u8fde\u63a5\u8282\u70b9\u7684\u8fb9\u7ec4\u6210\u3002\u4f8b\u5982\uff0c\u5728 Cora \u6570\u636e\u96c6\u4e2d\uff0c\u8282\u70b9\u662f\u7814\u7a76\u8bba\u6587\uff0c\u8fb9\u7f18\u662f\u8fde\u63a5\u8bba\u6587\u7684\u5f15\u6587\u3002</p>\n<p>GAT \u4f7f\u7528\u8499\u9762\u81ea\u6ce8\u610f\u529b\uff0c\u6709\u70b9\u7c7b\u4f3c\u4e8e<a href=\"../../transformers/mha.html\">\u53d8\u5f62\u91d1\u521a</a>\u3002GAT \u7531\u76f8\u4e92\u5806\u53e0\u7684\u56fe\u8868\u6ce8\u610f\u529b\u5c42\u7ec4\u6210\u3002\u6bcf\u4e2a\u56fe\u6ce8\u610f\u529b\u5c42\u90fd\u5c06\u8282\u70b9\u5d4c\u5165\u4f5c\u4e3a\u8f6c\u6362\u540e\u7684\u5d4c\u5165\u7684\u8f93\u5165\u548c\u8f93\u51fa\u83b7\u5f97\u8282\u70b9\u3002\u8282\u70b9\u5d4c\u5165\u4f1a\u6ce8\u610f\u5b83\u6240\u8fde\u63a5\u7684\u5176\u4ed6\u8282\u70b9\u7684\u5d4c\u5165\u3002\u56fe\u5f62\u6ce8\u610f\u529b\u5c42\u7684\u8be6\u7ec6\u4fe1\u606f\u4e0e\u5b9e\u73b0\u4e00\u8d77\u5305\u62ec\u5728\u5185\u3002</p>\n<p>\u4ee5\u4e0b\u662f<a href=\"experiment.html\">\u5728 Cora \u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\u4e24\u5c42 GAT \u7684\u8bad\u7ec3\u4ee3\u7801</a>\u3002</p>\n",
 "<h2>Graph attention layer</h2>\n<p>This is a single graph attention layer. A GAT is made up of multiple such layers.</p>\n<p>It takes <span translate=no>_^_0_^_</span>, where <span translate=no>_^_1_^_</span> as input and outputs <span translate=no>_^_2_^_</span>, where <span translate=no>_^_3_^_</span>.</p>\n": "<h2>\u56fe\u5f62\u5173\u6ce8\u5c42</h2>\n<p>\u8fd9\u662f\u4e00\u4e2a\u5355\u4e00\u7684\u56fe\u5f62\u5173\u6ce8\u5c42\u3002\u4e00\u4e2a GAT \u7531\u591a\u4e2a\u8fd9\u6837\u7684\u5c42\u7ec4\u6210\u3002</p>\n<p>\u5b83\u9700\u8981<span translate=no>_^_0_^_</span>\uff0c\u5176\u4e2d<span translate=no>_^_1_^_</span>\u4f5c\u4e3a\u8f93\u5165\u548c\u8f93\u51fa<span translate=no>_^_2_^_</span>\uff0c\u5728\u54ea\u91cc<span translate=no>_^_3_^_</span>\u3002</p>\n",
 "<h4>Calculate attention score</h4>\n<p>We calculate these for each head <span translate=no>_^_0_^_</span>. <em>We have omitted <span translate=no>_^_1_^_</span> for simplicity</em>.</p>\n<p><span translate=no>_^_2_^_</span></p>\n<p><span translate=no>_^_3_^_</span> is the attention score (importance) from node <span translate=no>_^_4_^_</span> to node <span translate=no>_^_5_^_</span>. We calculate this for each head.</p>\n<p><span translate=no>_^_6_^_</span> is the attention mechanism, that calculates the attention score. The paper concatenates <span translate=no>_^_7_^_</span>, <span translate=no>_^_8_^_</span> and does a linear transformation with a weight vector <span translate=no>_^_9_^_</span> followed by a <span translate=no>_^_10_^_</span>.</p>\n<p><span translate=no>_^_11_^_</span> </p>\n": "<h4>\u8ba1\u7b97\u6ce8\u610f\u529b\u5206\u6570</h4>\n<p>\u6211\u4eec\u4e3a\u6bcf\u4e2a\u5934\u90e8\u8ba1\u7b97\u8fd9\u4e9b<span translate=no>_^_0_^_</span>\u3002<em><span translate=no>_^_1_^_</span>\u4e3a\u7b80\u5355\u8d77\u89c1\uff0c\u6211\u4eec\u7701\u7565\u4e86</em>\u3002</p>\n<p><span translate=no>_^_2_^_</span></p>\n<p><span translate=no>_^_3_^_</span>\u662f\u4ece\u4e00\u4e2a\u8282\u70b9\u5230\u53e6\u4e00\u4e2a\u8282\u70b9\u7684<span translate=no>_^_4_^_</span>\u6ce8\u610f\u529b\u5206\u6570\uff08\u91cd\u8981\u6027\uff09<span translate=no>_^_5_^_</span>\u3002\u6211\u4eec\u4e3a\u6bcf\u4e2a\u5934\u90e8\u8ba1\u7b97\u8fd9\u4e2a\u503c\u3002</p>\n<p><span translate=no>_^_6_^_</span>\u662f\u8ba1\u7b97\u6ce8\u610f\u529b\u5206\u6570\u7684\u6ce8\u610f\u529b\u673a\u5236\u3002\u672c\u6587\u8fde\u63a5\u8d77\u6765<span translate=no>_^_7_^_</span>\uff0c<span translate=no>_^_8_^_</span>\u7136\u540e\u4f7f\u7528\u6743\u91cd\u5411\u91cf<span translate=no>_^_9_^_</span>\u540e\u8ddf a \u8fdb\u884c\u7ebf\u6027\u53d8\u6362<span translate=no>_^_10_^_</span>\u3002</p>\n<p><span translate=no>_^_11_^_</span></p>\n",
 "<p><span translate=no>_^_0_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span></p>\n",
--- a/translate_cache/graphs/gat/readme.ja.json
+++ b/translate_cache/graphs/gat/readme.ja.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/graphs/gat/index.html\">Graph Attention Networks (GAT)</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://papers.labml.ai/paper/1710.10903\">Graph Attention Networks</a>.</p>\n<p>GATs work on graph data. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>GAT uses masked self-attention, kind of similar to <a href=\"https://nn.labml.ai/transformers/mha.html\">transformers</a>. GAT consists of graph attention layers stacked on top of each other. Each graph attention layer gets node embeddings as inputs and outputs transformed embeddings. The node embeddings pay attention to the embeddings of other nodes it&#x27;s connected to. The details of graph attention layers are included alongside the implementation.</p>\n<p>Here is <a href=\"https://nn.labml.ai/graphs/gat/experiment.html\">the training code</a> for training a two-layer GAT on Cora dataset. </p>\n": "<h1><a href=\"https://nn.labml.ai/graphs/gat/index.html\">\u30b0\u30e9\u30d5\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af (GAT)</a></h1>\n<p>\u3053\u308c\u306f\u8ad6\u6587\u306e\u300c<a href=\"https://papers.labml.ai/paper/1710.10903\">\u30b0\u30e9\u30d5\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af</a>\u300d\u306e <a href=\"https://pytorch.org\">PyTorch</a> \u5b9f\u88c5\u3067\u3059\u3002</p>\n<p>GAT \u306f\u30b0\u30e9\u30d5\u30c7\u30fc\u30bf\u3092\u51e6\u7406\u3057\u307e\u3059\u3002\u30b0\u30e9\u30d5\u306f\u3001\u30ce\u30fc\u30c9\u3068\u30ce\u30fc\u30c9\u3092\u63a5\u7d9a\u3059\u308b\u30a8\u30c3\u30b8\u3067\u69cb\u6210\u3055\u308c\u307e\u3059\u3002\u305f\u3068\u3048\u3070\u3001Cora\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3067\u306f\u3001\u30ce\u30fc\u30c9\u306f\u7814\u7a76\u8ad6\u6587\u3067\u3001\u7aef\u306f\u8ad6\u6587\u3092\u3064\u306a\u3050\u5f15\u7528\u3067\u3059</p>\u3002\n<p><a href=\"https://nn.labml.ai/transformers/mha.html\">GAT\u306f\u3001\u30c8\u30e9\u30f3\u30b9\u30d5\u30a9\u30fc\u30de\u30fc\u306b\u4f3c\u305f\u3001\u30de\u30b9\u30af\u3055\u308c\u305f\u30bb\u30eb\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u3092\u4f7f\u3044\u307e\u3059\u3002</a>GAT\u306f\u3001\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u304c\u4e92\u3044\u306b\u91cd\u306a\u308a\u5408\u3063\u3066\u69cb\u6210\u3055\u308c\u3066\u3044\u307e\u3059\u3002\u5404\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u306f\u3001\u5165\u529b\u3068\u3057\u3066\u30ce\u30fc\u30c9\u57cb\u3081\u8fbc\u307f\u3092\u53d6\u5f97\u3057\u3001\u5909\u63db\u3055\u308c\u305f\u57cb\u3081\u8fbc\u307f\u3092\u51fa\u529b\u3057\u307e\u3059\u3002\u30ce\u30fc\u30c9\u57cb\u3081\u8fbc\u307f\u306f\u3001\u63a5\u7d9a\u3055\u308c\u3066\u3044\u308b\u4ed6\u306e\u30ce\u30fc\u30c9\u306e\u57cb\u3081\u8fbc\u307f\u306b\u6ce8\u76ee\u3057\u307e\u3059\u3002\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u306e\u8a73\u7d30\u306f\u3001\u5b9f\u88c5\u3068\u3068\u3082\u306b\u542b\u307e\u308c\u3066\u3044\u307e\u3059\u3002</p>\n<p><a href=\"https://nn.labml.ai/graphs/gat/experiment.html\">Cora \u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3067 2 \u5c64 GAT \u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u305f\u3081\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30b3\u30fc\u30c9\u3092\u6b21\u306b\u793a\u3057\u307e\u3059</a>\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/graphs/gat/index.html\">Graph Attention Networks (GAT)</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://arxiv.org/abs/1710.10903\">Graph Attention Networks</a>.</p>\n<p>GATs work on graph data. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>GAT uses masked self-attention, kind of similar to <a href=\"https://nn.labml.ai/transformers/mha.html\">transformers</a>. GAT consists of graph attention layers stacked on top of each other. Each graph attention layer gets node embeddings as inputs and outputs transformed embeddings. The node embeddings pay attention to the embeddings of other nodes it&#x27;s connected to. The details of graph attention layers are included alongside the implementation.</p>\n<p>Here is <a href=\"https://nn.labml.ai/graphs/gat/experiment.html\">the training code</a> for training a two-layer GAT on Cora dataset. </p>\n": "<h1><a href=\"https://nn.labml.ai/graphs/gat/index.html\">\u30b0\u30e9\u30d5\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af (GAT)</a></h1>\n<p>\u3053\u308c\u306f\u8ad6\u6587\u306e\u300c<a href=\"https://arxiv.org/abs/1710.10903\">\u30b0\u30e9\u30d5\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af</a>\u300d\u306e <a href=\"https://pytorch.org\">PyTorch</a> \u5b9f\u88c5\u3067\u3059\u3002</p>\n<p>GAT \u306f\u30b0\u30e9\u30d5\u30c7\u30fc\u30bf\u3092\u51e6\u7406\u3057\u307e\u3059\u3002\u30b0\u30e9\u30d5\u306f\u3001\u30ce\u30fc\u30c9\u3068\u30ce\u30fc\u30c9\u3092\u63a5\u7d9a\u3059\u308b\u30a8\u30c3\u30b8\u3067\u69cb\u6210\u3055\u308c\u307e\u3059\u3002\u305f\u3068\u3048\u3070\u3001Cora\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3067\u306f\u3001\u30ce\u30fc\u30c9\u306f\u7814\u7a76\u8ad6\u6587\u3067\u3001\u7aef\u306f\u8ad6\u6587\u3092\u3064\u306a\u3050\u5f15\u7528\u3067\u3059</p>\u3002\n<p><a href=\"https://nn.labml.ai/transformers/mha.html\">GAT\u306f\u3001\u30c8\u30e9\u30f3\u30b9\u30d5\u30a9\u30fc\u30de\u30fc\u306b\u4f3c\u305f\u3001\u30de\u30b9\u30af\u3055\u308c\u305f\u30bb\u30eb\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u3092\u4f7f\u3044\u307e\u3059\u3002</a>GAT\u306f\u3001\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u304c\u4e92\u3044\u306b\u91cd\u306a\u308a\u5408\u3063\u3066\u69cb\u6210\u3055\u308c\u3066\u3044\u307e\u3059\u3002\u5404\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u306f\u3001\u5165\u529b\u3068\u3057\u3066\u30ce\u30fc\u30c9\u57cb\u3081\u8fbc\u307f\u3092\u53d6\u5f97\u3057\u3001\u5909\u63db\u3055\u308c\u305f\u57cb\u3081\u8fbc\u307f\u3092\u51fa\u529b\u3057\u307e\u3059\u3002\u30ce\u30fc\u30c9\u57cb\u3081\u8fbc\u307f\u306f\u3001\u63a5\u7d9a\u3055\u308c\u3066\u3044\u308b\u4ed6\u306e\u30ce\u30fc\u30c9\u306e\u57cb\u3081\u8fbc\u307f\u306b\u6ce8\u76ee\u3057\u307e\u3059\u3002\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u306e\u8a73\u7d30\u306f\u3001\u5b9f\u88c5\u3068\u3068\u3082\u306b\u542b\u307e\u308c\u3066\u3044\u307e\u3059\u3002</p>\n<p><a href=\"https://nn.labml.ai/graphs/gat/experiment.html\">Cora \u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3067 2 \u5c64 GAT \u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u305f\u3081\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30b3\u30fc\u30c9\u3092\u6b21\u306b\u793a\u3057\u307e\u3059</a>\u3002</p>\n",
 "Graph Attention Networks (GAT)": "\u30b0\u30e9\u30d5\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af (GAT)"
 }
--- a/translate_cache/graphs/gat/readme.si.json
+++ b/translate_cache/graphs/gat/readme.si.json
--- a/translate_cache/graphs/gat/readme.zh.json
+++ b/translate_cache/graphs/gat/readme.zh.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/graphs/gat/index.html\">Graph Attention Networks (GAT)</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://papers.labml.ai/paper/1710.10903\">Graph Attention Networks</a>.</p>\n<p>GATs work on graph data. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>GAT uses masked self-attention, kind of similar to <a href=\"https://nn.labml.ai/transformers/mha.html\">transformers</a>. GAT consists of graph attention layers stacked on top of each other. Each graph attention layer gets node embeddings as inputs and outputs transformed embeddings. The node embeddings pay attention to the embeddings of other nodes it&#x27;s connected to. The details of graph attention layers are included alongside the implementation.</p>\n<p>Here is <a href=\"https://nn.labml.ai/graphs/gat/experiment.html\">the training code</a> for training a two-layer GAT on Cora dataset. </p>\n": "<h1><a href=\"https://nn.labml.ai/graphs/gat/index.html\">\u56fe\u8868\u6ce8\u610f\u529b\u7f51\u7edc (GAT)</a></h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5bf9\u300a<a href=\"https://papers.labml.ai/paper/1710.10903\">\u56fe\u5f62\u6ce8\u610f\u529b\u7f51\u7edc</a>\u300b\u8bba\u6587\u7684\u5b9e\u73b0\u3002</p>\n<p>GAT \u5904\u7406\u56fe\u8868\u6570\u636e\u3002\u56fe\u7531\u8282\u70b9\u548c\u8fde\u63a5\u8282\u70b9\u7684\u8fb9\u7ec4\u6210\u3002\u4f8b\u5982\uff0c\u5728 Cora \u6570\u636e\u96c6\u4e2d\uff0c\u8282\u70b9\u662f\u7814\u7a76\u8bba\u6587\uff0c\u8fb9\u7f18\u662f\u8fde\u63a5\u8bba\u6587\u7684\u5f15\u6587\u3002</p>\n<p>GAT \u4f7f\u7528\u8499\u9762\u81ea\u6ce8\u610f\u529b\uff0c\u6709\u70b9\u7c7b\u4f3c\u4e8e<a href=\"https://nn.labml.ai/transformers/mha.html\">\u53d8\u5f62\u91d1\u521a</a>\u3002GAT \u7531\u76f8\u4e92\u5806\u53e0\u7684\u56fe\u8868\u6ce8\u610f\u529b\u5c42\u7ec4\u6210\u3002\u6bcf\u4e2a\u56fe\u6ce8\u610f\u529b\u5c42\u90fd\u5c06\u8282\u70b9\u5d4c\u5165\u4f5c\u4e3a\u8f6c\u6362\u540e\u7684\u5d4c\u5165\u7684\u8f93\u5165\u548c\u8f93\u51fa\u83b7\u5f97\u8282\u70b9\u3002\u8282\u70b9\u5d4c\u5165\u4f1a\u6ce8\u610f\u5b83\u6240\u8fde\u63a5\u7684\u5176\u4ed6\u8282\u70b9\u7684\u5d4c\u5165\u3002\u56fe\u5f62\u6ce8\u610f\u529b\u5c42\u7684\u8be6\u7ec6\u4fe1\u606f\u4e0e\u5b9e\u73b0\u4e00\u8d77\u5305\u62ec\u5728\u5185\u3002</p>\n<p>\u4ee5\u4e0b\u662f<a href=\"https://nn.labml.ai/graphs/gat/experiment.html\">\u5728 Cora \u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\u4e24\u5c42 GAT \u7684\u8bad\u7ec3\u4ee3\u7801</a>\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/graphs/gat/index.html\">Graph Attention Networks (GAT)</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://arxiv.org/abs/1710.10903\">Graph Attention Networks</a>.</p>\n<p>GATs work on graph data. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>GAT uses masked self-attention, kind of similar to <a href=\"https://nn.labml.ai/transformers/mha.html\">transformers</a>. GAT consists of graph attention layers stacked on top of each other. Each graph attention layer gets node embeddings as inputs and outputs transformed embeddings. The node embeddings pay attention to the embeddings of other nodes it&#x27;s connected to. The details of graph attention layers are included alongside the implementation.</p>\n<p>Here is <a href=\"https://nn.labml.ai/graphs/gat/experiment.html\">the training code</a> for training a two-layer GAT on Cora dataset. </p>\n": "<h1><a href=\"https://nn.labml.ai/graphs/gat/index.html\">\u56fe\u8868\u6ce8\u610f\u529b\u7f51\u7edc (GAT)</a></h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5bf9\u300a<a href=\"https://arxiv.org/abs/1710.10903\">\u56fe\u5f62\u6ce8\u610f\u529b\u7f51\u7edc</a>\u300b\u8bba\u6587\u7684\u5b9e\u73b0\u3002</p>\n<p>GAT \u5904\u7406\u56fe\u8868\u6570\u636e\u3002\u56fe\u7531\u8282\u70b9\u548c\u8fde\u63a5\u8282\u70b9\u7684\u8fb9\u7ec4\u6210\u3002\u4f8b\u5982\uff0c\u5728 Cora \u6570\u636e\u96c6\u4e2d\uff0c\u8282\u70b9\u662f\u7814\u7a76\u8bba\u6587\uff0c\u8fb9\u7f18\u662f\u8fde\u63a5\u8bba\u6587\u7684\u5f15\u6587\u3002</p>\n<p>GAT \u4f7f\u7528\u8499\u9762\u81ea\u6ce8\u610f\u529b\uff0c\u6709\u70b9\u7c7b\u4f3c\u4e8e<a href=\"https://nn.labml.ai/transformers/mha.html\">\u53d8\u5f62\u91d1\u521a</a>\u3002GAT \u7531\u76f8\u4e92\u5806\u53e0\u7684\u56fe\u8868\u6ce8\u610f\u529b\u5c42\u7ec4\u6210\u3002\u6bcf\u4e2a\u56fe\u6ce8\u610f\u529b\u5c42\u90fd\u5c06\u8282\u70b9\u5d4c\u5165\u4f5c\u4e3a\u8f6c\u6362\u540e\u7684\u5d4c\u5165\u7684\u8f93\u5165\u548c\u8f93\u51fa\u83b7\u5f97\u8282\u70b9\u3002\u8282\u70b9\u5d4c\u5165\u4f1a\u6ce8\u610f\u5b83\u6240\u8fde\u63a5\u7684\u5176\u4ed6\u8282\u70b9\u7684\u5d4c\u5165\u3002\u56fe\u5f62\u6ce8\u610f\u529b\u5c42\u7684\u8be6\u7ec6\u4fe1\u606f\u4e0e\u5b9e\u73b0\u4e00\u8d77\u5305\u62ec\u5728\u5185\u3002</p>\n<p>\u4ee5\u4e0b\u662f<a href=\"https://nn.labml.ai/graphs/gat/experiment.html\">\u5728 Cora \u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\u4e24\u5c42 GAT \u7684\u8bad\u7ec3\u4ee3\u7801</a>\u3002</p>\n",
 "Graph Attention Networks (GAT)": "\u56fe\u5173\u6ce8\u7f51\u7edc (GAT)"
 }
--- a/translate_cache/graphs/gatv2/init.ja.json
+++ b/translate_cache/graphs/gatv2/init.ja.json
--- a/translate_cache/graphs/gatv2/init.si.json
+++ b/translate_cache/graphs/gatv2/init.si.json
--- a/translate_cache/graphs/gatv2/init.zh.json
+++ b/translate_cache/graphs/gatv2/init.zh.json
@ -1,5 +1,5 @@
 {
- "<h1>Graph Attention Networks v2 (GATv2)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the GATv2 operator from the paper <a href=\"https://papers.labml.ai/paper/2105.14491\">How Attentive are Graph Attention Networks?</a>.</p>\n<p>GATv2s work on graph data similar to <a href=\"../gat/index.html\">GAT</a>. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>The GATv2 operator fixes the static attention problem of the standard <a href=\"../gat/index.html\">GAT</a>. Static attention is when the attention to the key nodes has the same rank (order) for any query node. <a href=\"../gat/index.html\">GAT</a> computes attention from query node <span translate=no>_^_0_^_</span> to key node <span translate=no>_^_1_^_</span> as,</p>\n<span translate=no>_^_2_^_</span><p>Note that for any query node <span translate=no>_^_3_^_</span>, the attention rank (<span translate=no>_^_4_^_</span>) of keys depends only on <span translate=no>_^_5_^_</span>. Therefore the attention rank of keys remains the same (<em>static</em>) for all queries.</p>\n<p>GATv2 allows dynamic attention by changing the attention mechanism,</p>\n<span translate=no>_^_6_^_</span><p>The paper shows that GATs static attention mechanism fails on some graph problems with a synthetic dictionary lookup dataset. It&#x27;s a fully connected bipartite graph where one set of nodes (query nodes) have a key associated with it and the other set of nodes have both a key and a value associated with it. The goal is to predict the values of query nodes. GAT fails on this task because of its limited static attention.</p>\n<p>Here is <a href=\"experiment.html\">the training code</a> for training a two-layer GATv2 on Cora dataset.</p>\n": "<h1>Graph \u6ce8\u610f\u529b\u7f51\u7edc v2 (Gatv2)</h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5bf9 Gatv2 \u8fd0\u7b97\u7b26\u7684\u5b9e\u73b0\uff0c\u6458\u81ea\u300a<a href=\"https://papers.labml.ai/paper/2105.14491\">\u56fe\u6ce8\u610f\u529b\u7f51\u7edc\u6709\u591a\u4e13\u5fc3\uff1f</a>\u300b</p>\u3002\n<p>Gatv2 \u5904\u7406\u7684\u56fe\u5f62\u6570\u636e\u4e0e <a href=\"../gat/index.html\">GAT</a> \u7c7b\u4f3c\u3002\u56fe\u7531\u8282\u70b9\u548c\u8fde\u63a5\u8282\u70b9\u7684\u8fb9\u7ec4\u6210\u3002\u4f8b\u5982\uff0c\u5728 Cora \u6570\u636e\u96c6\u4e2d\uff0c\u8282\u70b9\u662f\u7814\u7a76\u8bba\u6587\uff0c\u8fb9\u7f18\u662f\u8fde\u63a5\u8bba\u6587\u7684\u5f15\u6587\u3002</p>\n<p>Gatv2 \u64cd\u4f5c\u5458\u4fee\u590d\u4e86\u6807\u51c6 <a href=\"../gat/index.html\">G</a> AT \u7684\u9759\u6001\u6ce8\u610f\u529b\u95ee\u9898\u3002\u9759\u6001\u6ce8\u610f\u529b\u662f\u6307\u4efb\u4f55\u67e5\u8be2\u8282\u70b9\u5bf9\u5173\u952e\u8282\u70b9\u7684\u5173\u6ce8\u7b49\u7ea7\uff08\u987a\u5e8f\uff09\u76f8\u540c\u3002<a href=\"../gat/index.html\">GAT</a> \u5c06\u4ece\u67e5\u8be2\u8282\u70b9<span translate=no>_^_0_^_</span>\u5230\u5173\u952e\u8282\u70b9\u7684\u6ce8\u610f\u529b\u8ba1\u7b97<span translate=no>_^_1_^_</span>\u4e3a\uff0c</p>\n<span translate=no>_^_2_^_</span><p>\u8bf7\u6ce8\u610f\uff0c\u5bf9\u4e8e\u4efb\u4f55\u67e5\u8be2\u8282\u70b9<span translate=no>_^_3_^_</span>\uff0c\u952e\u7684\u6ce8\u610f\u529b\u7b49\u7ea7 (<span translate=no>_^_4_^_</span>) \u4ec5\u53d6\u51b3\u4e8e<span translate=no>_^_5_^_</span>\u3002\u56e0\u6b64\uff0c\u6240\u6709\u67e5\u8be2\u7684\u952e\u7684\u6ce8\u610f\u529b\u7b49\u7ea7\u4fdd\u6301\u4e0d\u53d8\uff08<em>\u9759\u6001</em>\uff09\u3002</p>\n<p>Gatv2 \u901a\u8fc7\u6539\u53d8\u6ce8\u610f\u529b\u673a\u5236\u6765\u5141\u8bb8\u52a8\u6001\u5173\u6ce8\uff0c</p>\n<span translate=no>_^_6_^_</span><p>\u8be5\u8bba\u6587\u8868\u660e\uff0cGAT\u7684\u9759\u6001\u6ce8\u610f\u529b\u673a\u5236\u5728\u5408\u6210\u5b57\u5178\u67e5\u627e\u6570\u636e\u96c6\u7684\u67d0\u4e9b\u56fe\u5f62\u95ee\u9898\u4e0a\u4f1a\u5931\u8d25\u3002\u8fd9\u662f\u4e00\u4e2a\u5b8c\u5168\u8fde\u63a5\u7684\u4e8c\u5206\u56fe\uff0c\u5176\u4e2d\u4e00\u7ec4\u8282\u70b9\uff08\u67e5\u8be2\u8282\u70b9\uff09\u5177\u6709\u4e0e\u4e4b\u5173\u8054\u7684\u5bc6\u94a5\uff0c\u800c\u53e6\u4e00\u7ec4\u8282\u70b9\u65e2\u6709\u952e\u53c8\u6709\u4e0e\u4e4b\u5173\u8054\u7684\u503c\u3002\u76ee\u6807\u662f\u9884\u6d4b\u67e5\u8be2\u8282\u70b9\u7684\u503c\u3002GAT \u65e0\u6cd5\u5b8c\u6210\u6b64\u4efb\u52a1\uff0c\u56e0\u4e3a\u5176\u9759\u6001\u6ce8\u610f\u529b\u6709\u9650\u3002</p>\n<p>\u4ee5\u4e0b\u662f<a href=\"experiment.html\">\u5728 Cora \u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\u53cc\u5c42 Gatv2 \u7684\u8bad\u7ec3\u4ee3\u7801</a>\u3002</p>\n",
+ "<h1>Graph Attention Networks v2 (GATv2)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the GATv2 operator from the paper <a href=\"https://arxiv.org/abs/2105.14491\">How Attentive are Graph Attention Networks?</a>.</p>\n<p>GATv2s work on graph data similar to <a href=\"../gat/index.html\">GAT</a>. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>The GATv2 operator fixes the static attention problem of the standard <a href=\"../gat/index.html\">GAT</a>. Static attention is when the attention to the key nodes has the same rank (order) for any query node. <a href=\"../gat/index.html\">GAT</a> computes attention from query node <span translate=no>_^_0_^_</span> to key node <span translate=no>_^_1_^_</span> as,</p>\n<span translate=no>_^_2_^_</span><p>Note that for any query node <span translate=no>_^_3_^_</span>, the attention rank (<span translate=no>_^_4_^_</span>) of keys depends only on <span translate=no>_^_5_^_</span>. Therefore the attention rank of keys remains the same (<em>static</em>) for all queries.</p>\n<p>GATv2 allows dynamic attention by changing the attention mechanism,</p>\n<span translate=no>_^_6_^_</span><p>The paper shows that GATs static attention mechanism fails on some graph problems with a synthetic dictionary lookup dataset. It&#x27;s a fully connected bipartite graph where one set of nodes (query nodes) have a key associated with it and the other set of nodes have both a key and a value associated with it. The goal is to predict the values of query nodes. GAT fails on this task because of its limited static attention.</p>\n<p>Here is <a href=\"experiment.html\">the training code</a> for training a two-layer GATv2 on Cora dataset.</p>\n": "<h1>Graph \u6ce8\u610f\u529b\u7f51\u7edc v2 (Gatv2)</h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5bf9 Gatv2 \u8fd0\u7b97\u7b26\u7684\u5b9e\u73b0\uff0c\u6458\u81ea\u300a<a href=\"https://arxiv.org/abs/2105.14491\">\u56fe\u6ce8\u610f\u529b\u7f51\u7edc\u6709\u591a\u4e13\u5fc3\uff1f</a>\u300b</p>\u3002\n<p>Gatv2 \u5904\u7406\u7684\u56fe\u5f62\u6570\u636e\u4e0e <a href=\"../gat/index.html\">GAT</a> \u7c7b\u4f3c\u3002\u56fe\u7531\u8282\u70b9\u548c\u8fde\u63a5\u8282\u70b9\u7684\u8fb9\u7ec4\u6210\u3002\u4f8b\u5982\uff0c\u5728 Cora \u6570\u636e\u96c6\u4e2d\uff0c\u8282\u70b9\u662f\u7814\u7a76\u8bba\u6587\uff0c\u8fb9\u7f18\u662f\u8fde\u63a5\u8bba\u6587\u7684\u5f15\u6587\u3002</p>\n<p>Gatv2 \u64cd\u4f5c\u5458\u4fee\u590d\u4e86\u6807\u51c6 <a href=\"../gat/index.html\">G</a> AT \u7684\u9759\u6001\u6ce8\u610f\u529b\u95ee\u9898\u3002\u9759\u6001\u6ce8\u610f\u529b\u662f\u6307\u4efb\u4f55\u67e5\u8be2\u8282\u70b9\u5bf9\u5173\u952e\u8282\u70b9\u7684\u5173\u6ce8\u7b49\u7ea7\uff08\u987a\u5e8f\uff09\u76f8\u540c\u3002<a href=\"../gat/index.html\">GAT</a> \u5c06\u4ece\u67e5\u8be2\u8282\u70b9<span translate=no>_^_0_^_</span>\u5230\u5173\u952e\u8282\u70b9\u7684\u6ce8\u610f\u529b\u8ba1\u7b97<span translate=no>_^_1_^_</span>\u4e3a\uff0c</p>\n<span translate=no>_^_2_^_</span><p>\u8bf7\u6ce8\u610f\uff0c\u5bf9\u4e8e\u4efb\u4f55\u67e5\u8be2\u8282\u70b9<span translate=no>_^_3_^_</span>\uff0c\u952e\u7684\u6ce8\u610f\u529b\u7b49\u7ea7 (<span translate=no>_^_4_^_</span>) \u4ec5\u53d6\u51b3\u4e8e<span translate=no>_^_5_^_</span>\u3002\u56e0\u6b64\uff0c\u6240\u6709\u67e5\u8be2\u7684\u952e\u7684\u6ce8\u610f\u529b\u7b49\u7ea7\u4fdd\u6301\u4e0d\u53d8\uff08<em>\u9759\u6001</em>\uff09\u3002</p>\n<p>Gatv2 \u901a\u8fc7\u6539\u53d8\u6ce8\u610f\u529b\u673a\u5236\u6765\u5141\u8bb8\u52a8\u6001\u5173\u6ce8\uff0c</p>\n<span translate=no>_^_6_^_</span><p>\u8be5\u8bba\u6587\u8868\u660e\uff0cGAT\u7684\u9759\u6001\u6ce8\u610f\u529b\u673a\u5236\u5728\u5408\u6210\u5b57\u5178\u67e5\u627e\u6570\u636e\u96c6\u7684\u67d0\u4e9b\u56fe\u5f62\u95ee\u9898\u4e0a\u4f1a\u5931\u8d25\u3002\u8fd9\u662f\u4e00\u4e2a\u5b8c\u5168\u8fde\u63a5\u7684\u4e8c\u5206\u56fe\uff0c\u5176\u4e2d\u4e00\u7ec4\u8282\u70b9\uff08\u67e5\u8be2\u8282\u70b9\uff09\u5177\u6709\u4e0e\u4e4b\u5173\u8054\u7684\u5bc6\u94a5\uff0c\u800c\u53e6\u4e00\u7ec4\u8282\u70b9\u65e2\u6709\u952e\u53c8\u6709\u4e0e\u4e4b\u5173\u8054\u7684\u503c\u3002\u76ee\u6807\u662f\u9884\u6d4b\u67e5\u8be2\u8282\u70b9\u7684\u503c\u3002GAT \u65e0\u6cd5\u5b8c\u6210\u6b64\u4efb\u52a1\uff0c\u56e0\u4e3a\u5176\u9759\u6001\u6ce8\u610f\u529b\u6709\u9650\u3002</p>\n<p>\u4ee5\u4e0b\u662f<a href=\"experiment.html\">\u5728 Cora \u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\u53cc\u5c42 Gatv2 \u7684\u8bad\u7ec3\u4ee3\u7801</a>\u3002</p>\n",
 "<h2>Graph attention v2 layer</h2>\n<p>This is a single graph attention v2 layer. A GATv2 is made up of multiple such layers. It takes <span translate=no>_^_0_^_</span>, where <span translate=no>_^_1_^_</span> as input and outputs <span translate=no>_^_2_^_</span>, where <span translate=no>_^_3_^_</span>.</p>\n": "<h2>Graph \u6ce8\u610f\u529b v2 \u5c42</h2>\n<p>\u8fd9\u662f\u5355\u56fe\u5173\u6ce8 v2 \u5c42\u3002GATv2 \u7531\u591a\u4e2a\u8fd9\u6837\u7684\u5c42\u7ec4\u6210\u3002\u5b83\u9700\u8981<span translate=no>_^_0_^_</span>\uff0c\u5176\u4e2d<span translate=no>_^_1_^_</span>\u4f5c\u4e3a\u8f93\u5165\u548c\u8f93\u51fa<span translate=no>_^_2_^_</span>\uff0c\u5728\u54ea\u91cc<span translate=no>_^_3_^_</span>\u3002</p>\n",
 "<h4>Calculate attention score</h4>\n<p>We calculate these for each head <span translate=no>_^_0_^_</span>. <em>We have omitted <span translate=no>_^_1_^_</span> for simplicity</em>.</p>\n<p><span translate=no>_^_2_^_</span></p>\n<p><span translate=no>_^_3_^_</span> is the attention score (importance) from node <span translate=no>_^_4_^_</span> to node <span translate=no>_^_5_^_</span>. We calculate this for each head.</p>\n<p><span translate=no>_^_6_^_</span> is the attention mechanism, that calculates the attention score. The paper sums <span translate=no>_^_7_^_</span>, <span translate=no>_^_8_^_</span> followed by a <span translate=no>_^_9_^_</span> and does a linear transformation with a weight vector <span translate=no>_^_10_^_</span></p>\n<p><span translate=no>_^_11_^_</span> Note: The paper desrcibes <span translate=no>_^_12_^_</span> as <span translate=no>_^_13_^_</span> which is equivalent to the definition we use here. </p>\n": "<h4>\u8ba1\u7b97\u6ce8\u610f\u529b\u5206\u6570</h4>\n<p>\u6211\u4eec\u4e3a\u6bcf\u4e2a\u5934\u90e8\u8ba1\u7b97\u8fd9\u4e9b<span translate=no>_^_0_^_</span>\u3002<em><span translate=no>_^_1_^_</span>\u4e3a\u7b80\u5355\u8d77\u89c1\uff0c\u6211\u4eec\u7701\u7565\u4e86</em>\u3002</p>\n<p><span translate=no>_^_2_^_</span></p>\n<p><span translate=no>_^_3_^_</span>\u662f\u4ece\u4e00\u4e2a\u8282\u70b9\u5230\u53e6\u4e00\u4e2a\u8282\u70b9\u7684<span translate=no>_^_4_^_</span>\u6ce8\u610f\u529b\u5206\u6570\uff08\u91cd\u8981\u6027\uff09<span translate=no>_^_5_^_</span>\u3002\u6211\u4eec\u4e3a\u6bcf\u4e2a\u5934\u90e8\u8ba1\u7b97\u8fd9\u4e2a\u503c\u3002</p>\n<p><span translate=no>_^_6_^_</span>\u662f\u8ba1\u7b97\u6ce8\u610f\u529b\u5206\u6570\u7684\u6ce8\u610f\u529b\u673a\u5236\u3002\u672c\u6587\u6c42\u548c<span translate=no>_^_7_^_</span>\uff0c<span translate=no>_^_8_^_</span>\u7136\u540e\u662f a\uff0c<span translate=no>_^_9_^_</span>\u7136\u540e\u4f7f\u7528\u6743\u91cd\u5411\u91cf\u8fdb\u884c\u7ebf\u6027\u53d8\u6362<span translate=no>_^_10_^_</span></p>\n<p><span translate=no>_^_11_^_</span>\u6ce8\u610f\uff1a\u672c\u6587\u63cf\u8ff0\u7684\u5185\u5bb9<span translate=no>_^_12_^_</span>\u7b49\u540c<span translate=no>_^_13_^_</span>\u4e8e\u6211\u4eec\u5728\u6b64\u5904\u4f7f\u7528\u7684\u5b9a\u4e49\u3002</p>\n",
 "<p><span translate=no>_^_0_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span></p>\n",
--- a/translate_cache/graphs/gatv2/readme.ja.json
+++ b/translate_cache/graphs/gatv2/readme.ja.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/graphs/gatv2/index.html\">Graph Attention Networks v2 (GATv2)</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the GATv2 operator from the paper <a href=\"https://papers.labml.ai/paper/2105.14491\">How Attentive are Graph Attention Networks?</a>.</p>\n<p>GATv2s work on graph data. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>The GATv2 operator fixes the static attention problem of the standard GAT: since the linear layers in the standard GAT are applied right after each other, the ranking of attended nodes is unconditioned on the query node. In contrast, in GATv2, every node can attend to any other node.</p>\n<p>Here is <a href=\"https://nn.labml.ai/graphs/gatv2/experiment.html\">the training code</a> for training a two-layer GATv2 on Cora dataset. </p>\n": "<h1><a href=\"https://nn.labml.ai/graphs/gatv2/index.html\">\u30b0\u30e9\u30d5\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u30b9 v2 (GATv2)</a></h1>\n<p>\u3053\u308c\u306f\u3001\u300c<a href=\"https://papers.labml.ai/paper/2105.14491\">\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306f\u3069\u306e\u7a0b\u5ea6\u6ce8\u610f\u6df1\u3044\u306e\u304b</a>\uff1f\u300d<a href=\"https://pytorch.org\">\u3068\u3044\u3046\u8ad6\u6587\u306eGATv2\u6f14\u7b97\u5b50\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a>\u3002</p>\u3002\n<p>GATv2\u306f\u30b0\u30e9\u30d5\u30c7\u30fc\u30bf\u3092\u51e6\u7406\u3057\u307e\u3059\u3002\u30b0\u30e9\u30d5\u306f\u3001\u30ce\u30fc\u30c9\u3068\u30ce\u30fc\u30c9\u3092\u63a5\u7d9a\u3059\u308b\u30a8\u30c3\u30b8\u3067\u69cb\u6210\u3055\u308c\u307e\u3059\u3002\u305f\u3068\u3048\u3070\u3001Cora\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3067\u306f\u3001\u30ce\u30fc\u30c9\u306f\u7814\u7a76\u8ad6\u6587\u3067\u3001\u7aef\u306f\u8ad6\u6587\u3092\u3064\u306a\u3050\u5f15\u7528\u3067\u3059</p>\u3002\n<p>GATv2 \u6f14\u7b97\u5b50\u306f\u3001\u6a19\u6e96 GAT \u306e\u9759\u7684\u6ce8\u610f\u306e\u554f\u984c\u3092\u89e3\u6c7a\u3057\u307e\u3059\u3002\u6a19\u6e96 GAT \u306e\u7dda\u5f62\u30ec\u30a4\u30e4\u30fc\u306f\u6b21\u3005\u306b\u9069\u7528\u3055\u308c\u308b\u305f\u3081\u3001\u53c2\u52a0\u30ce\u30fc\u30c9\u306e\u30e9\u30f3\u30af\u4ed8\u3051\u306f\u30af\u30a8\u30ea\u30ce\u30fc\u30c9\u3067\u6761\u4ef6\u4ed8\u3051\u3055\u308c\u307e\u305b\u3093\u3002\u5bfe\u7167\u7684\u306b\u3001GATv2\u3067\u306f\u3001\u3059\u3079\u3066\u306e\u30ce\u30fc\u30c9\u304c\u4ed6\u306e\u30ce\u30fc\u30c9\u306b\u63a5\u7d9a\u3067\u304d\u307e\u3059</p>\u3002\n<p>\u3053\u308c\u306f\u3001<a href=\"https://nn.labml.ai/graphs/gatv2/experiment.html\">Cora\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u30672\u5c64GATv2\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u305f\u3081\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30b3\u30fc\u30c9\u3067\u3059</a>\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/graphs/gatv2/index.html\">Graph Attention Networks v2 (GATv2)</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the GATv2 operator from the paper <a href=\"https://arxiv.org/abs/2105.14491\">How Attentive are Graph Attention Networks?</a>.</p>\n<p>GATv2s work on graph data. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>The GATv2 operator fixes the static attention problem of the standard GAT: since the linear layers in the standard GAT are applied right after each other, the ranking of attended nodes is unconditioned on the query node. In contrast, in GATv2, every node can attend to any other node.</p>\n<p>Here is <a href=\"https://nn.labml.ai/graphs/gatv2/experiment.html\">the training code</a> for training a two-layer GATv2 on Cora dataset. </p>\n": "<h1><a href=\"https://nn.labml.ai/graphs/gatv2/index.html\">\u30b0\u30e9\u30d5\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u30b9 v2 (GATv2)</a></h1>\n<p>\u3053\u308c\u306f\u3001\u300c<a href=\"https://arxiv.org/abs/2105.14491\">\u30b0\u30e9\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306f\u3069\u306e\u7a0b\u5ea6\u6ce8\u610f\u6df1\u3044\u306e\u304b</a>\uff1f\u300d<a href=\"https://pytorch.org\">\u3068\u3044\u3046\u8ad6\u6587\u306eGATv2\u6f14\u7b97\u5b50\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a>\u3002</p>\u3002\n<p>GATv2\u306f\u30b0\u30e9\u30d5\u30c7\u30fc\u30bf\u3092\u51e6\u7406\u3057\u307e\u3059\u3002\u30b0\u30e9\u30d5\u306f\u3001\u30ce\u30fc\u30c9\u3068\u30ce\u30fc\u30c9\u3092\u63a5\u7d9a\u3059\u308b\u30a8\u30c3\u30b8\u3067\u69cb\u6210\u3055\u308c\u307e\u3059\u3002\u305f\u3068\u3048\u3070\u3001Cora\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3067\u306f\u3001\u30ce\u30fc\u30c9\u306f\u7814\u7a76\u8ad6\u6587\u3067\u3001\u7aef\u306f\u8ad6\u6587\u3092\u3064\u306a\u3050\u5f15\u7528\u3067\u3059</p>\u3002\n<p>GATv2 \u6f14\u7b97\u5b50\u306f\u3001\u6a19\u6e96 GAT \u306e\u9759\u7684\u6ce8\u610f\u306e\u554f\u984c\u3092\u89e3\u6c7a\u3057\u307e\u3059\u3002\u6a19\u6e96 GAT \u306e\u7dda\u5f62\u30ec\u30a4\u30e4\u30fc\u306f\u6b21\u3005\u306b\u9069\u7528\u3055\u308c\u308b\u305f\u3081\u3001\u53c2\u52a0\u30ce\u30fc\u30c9\u306e\u30e9\u30f3\u30af\u4ed8\u3051\u306f\u30af\u30a8\u30ea\u30ce\u30fc\u30c9\u3067\u6761\u4ef6\u4ed8\u3051\u3055\u308c\u307e\u305b\u3093\u3002\u5bfe\u7167\u7684\u306b\u3001GATv2\u3067\u306f\u3001\u3059\u3079\u3066\u306e\u30ce\u30fc\u30c9\u304c\u4ed6\u306e\u30ce\u30fc\u30c9\u306b\u63a5\u7d9a\u3067\u304d\u307e\u3059</p>\u3002\n<p>\u3053\u308c\u306f\u3001<a href=\"https://nn.labml.ai/graphs/gatv2/experiment.html\">Cora\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u30672\u5c64GATv2\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u305f\u3081\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30b3\u30fc\u30c9\u3067\u3059</a>\u3002</p>\n",
 "Graph Attention Networks v2 (GATv2)": "\u30b0\u30e9\u30d5\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u30b9 v2 (GATv2)"
 }
--- a/translate_cache/graphs/gatv2/readme.si.json
+++ b/translate_cache/graphs/gatv2/readme.si.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/graphs/gatv2/index.html\">Graph Attention Networks v2 (GATv2)</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the GATv2 operator from the paper <a href=\"https://papers.labml.ai/paper/2105.14491\">How Attentive are Graph Attention Networks?</a>.</p>\n<p>GATv2s work on graph data. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>The GATv2 operator fixes the static attention problem of the standard GAT: since the linear layers in the standard GAT are applied right after each other, the ranking of attended nodes is unconditioned on the query node. In contrast, in GATv2, every node can attend to any other node.</p>\n<p>Here is <a href=\"https://nn.labml.ai/graphs/gatv2/experiment.html\">the training code</a> for training a two-layer GATv2 on Cora dataset.</p>\n<p><a href=\"https://app.labml.ai/run/34b1e2f6ed6f11ebb860997901a2d1e3\"><span translate=no>_^_0_^_</span></a> </p>\n": "<h1><a href=\"https://nn.labml.ai/graphs/gatv2/index.html\">\u0db4\u0dca\u0dbb\u0dc3\u0dca\u0dad\u0dcf\u0dbb\u0dba \u0d85\u0dc0\u0db0\u0dcf\u0db1\u0dba \u0da2\u0dcf\u0dbd v2 (GATV2)</a></h1>\n<p>\u0db8\u0dd9\u0dbaGATV2 \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0d9a\u0dbb\u0dd4\u0d9c\u0dda <a href=\"https://pytorch.org\">PyTorch</a> \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0d9a\u0dd2 <a href=\"https://papers.labml.ai/paper/2105.14491\">\u0db4\u0dca\u0dbb\u0dc3\u0dca\u0dae\u0dcf\u0dbb \u0d85\u0dc0\u0db0\u0dcf\u0db1\u0dba \u0dba\u0ddc\u0db8\u0dd4 \u0d9a\u0dbb\u0db1 \u0da2\u0dcf\u0dbd\u0dba\u0db1\u0dca \u0d9a\u0dd9\u0dad\u0dbb\u0db8\u0dca \u0d85\u0dc0\u0db0\u0dcf\u0db1\u0dba\u0dd9\u0db1\u0dca \u0dc3\u0dd2\u0da7\u0dd2\u0db1\u0dc0\u0dcf\u0daf? </a>. </p>\n<p>GATV2s\u0db4\u0dca\u0dbb\u0dc3\u0dca\u0dad\u0dcf\u0dbb \u0daf\u0dad\u0dca\u0dad \u0db8\u0dad \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf \u0d9a\u0dbb\u0dba\u0dd2. \u0db4\u0dca\u0dbb\u0dc3\u0dca\u0dae\u0dcf\u0dbb\u0dba\u0d9a\u0dca \u0db1\u0ddd\u0da9\u0dca \u0dc3\u0dc4 \u0daf\u0dcf\u0dbb \u0dc3\u0db8\u0dca\u0db6\u0db1\u0dca\u0db0 \u0d9a\u0dbb\u0db1 \u0db1\u0ddd\u0da9\u0dca \u0dc0\u0dbd\u0dd2\u0db1\u0dca \u0dc3\u0db8\u0db1\u0dca\u0dc0\u0dd2\u0dad \u0dc0\u0dda. \u0d8b\u0daf\u0dcf\u0dc4\u0dbb\u0dab\u0dba\u0d9a\u0dca \u0dbd\u0dd9\u0dc3, \u0d9a\u0ddd\u0dbb\u0dcf \u0daf\u0dad\u0dca\u0dad \u0d9a\u0da7\u0dca\u0da7\u0dbd\u0dba\u0dda \u0db1\u0ddd\u0da9\u0dca \u0db4\u0dbb\u0dca\u0dba\u0dda\u0dc2\u0dab \u0db4\u0dad\u0dca\u0dbb\u0dd2\u0d9a\u0dcf \u0dc0\u0db1 \u0d85\u0dad\u0dbb \u0daf\u0dcf\u0dbb \u0dba\u0db1\u0dd4 \u0db4\u0dad\u0dca\u0dbb\u0dd2\u0d9a\u0dcf \u0dc3\u0db8\u0dca\u0db6\u0db1\u0dca\u0db0 \u0d9a\u0dbb\u0db1 \u0d8b\u0db4\u0dd4\u0da7\u0dcf \u0daf\u0dd0\u0d9a\u0dca\u0dc0\u0dd3\u0db8\u0dca \u0dc0\u0dda. </p>\n<p>GATV2\u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0d9a\u0dbb\u0dd4 \u0dc3\u0db8\u0dca\u0db8\u0dad GAT \u0dc4\u0dd2 \u0dc3\u0dca\u0dae\u0dd2\u0dad\u0dd2\u0d9a \u0d85\u0dc0\u0db0\u0dcf\u0db1\u0dba \u0dba\u0ddc\u0db8\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dda \u0d9c\u0dd0\u0da7\u0dc5\u0dd4\u0dc0 \u0db1\u0dd2\u0dc0\u0dd0\u0dbb\u0daf\u0dd2 \u0d9a\u0dbb\u0dba\u0dd2: \u0dc3\u0db8\u0dca\u0db8\u0dad GAT \u0dc4\u0dd2 \u0dbb\u0dda\u0d9b\u0dd3\u0dba \u0dc3\u0dca\u0dae\u0dbb \u0d91\u0d9a\u0dd2\u0db1\u0dd9\u0d9a\u0da7 \u0db4\u0dc3\u0dd4\u0dc0 \u0dba\u0ddc\u0daf\u0db1 \u0db6\u0dd0\u0dc0\u0dd2\u0db1\u0dca, \u0dc3\u0dc4\u0db7\u0dcf\u0d9c\u0dd3 \u0dc0\u0dd6 \u0db1\u0ddd\u0da9\u0dca \u0dc0\u0dbd \u0dc1\u0dca\u0dbb\u0dda\u0dab\u0dd2\u0d9c\u0dad \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc0\u0dd2\u0db8\u0dc3\u0dd4\u0db8\u0dca \u0db1\u0ddd\u0da9\u0dba \u0db8\u0dad \u0d9a\u0ddc\u0db1\u0dca\u0daf\u0dda\u0dc3\u0dd2 \u0dc0\u0dd2\u0dbb\u0dc4\u0dd2\u0dad\u0dc0 \u0db4\u0dc0\u0dad\u0dd3. \u0d8a\u0da7 \u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0dc0, GATV2 \u0dc4\u0dd2, \u0dc3\u0dd1\u0db8 \u0db1\u0ddd\u0da9\u0dba\u0d9a\u0da7\u0db8 \u0dc0\u0dd9\u0db1\u0dad\u0dca \u0d95\u0db1\u0dd1\u0db8 \u0db1\u0ddd\u0da9\u0dba\u0d9a\u0da7 \u0dc3\u0dc4\u0db7\u0dcf\u0d9c\u0dd3 \u0dc0\u0dd2\u0dba \u0dc4\u0dd0\u0d9a\u0dd2\u0dba. </p>\n<p><a href=\"https://nn.labml.ai/graphs/gatv2/experiment.html\">\u0d9a\u0ddd\u0dbb\u0dcf \u0daf\u0dad\u0dca\u0dad \u0d9a\u0da7\u0dca\u0da7\u0dbd\u0dba\u0dda \u0dc3\u0dca\u0dae\u0dbb \u0daf\u0dd9\u0d9a\u0d9a GATV2 \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dda\u0dad\u0dba</a> \u0db8\u0dd9\u0db1\u0dca\u0db1. </p>\n<p><a href=\"https://app.labml.ai/run/34b1e2f6ed6f11ebb860997901a2d1e3\"><span translate=no>_^_0_^_</span></a> </p>\n",
+ "<h1><a href=\"https://nn.labml.ai/graphs/gatv2/index.html\">Graph Attention Networks v2 (GATv2)</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the GATv2 operator from the paper <a href=\"https://arxiv.org/abs/2105.14491\">How Attentive are Graph Attention Networks?</a>.</p>\n<p>GATv2s work on graph data. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>The GATv2 operator fixes the static attention problem of the standard GAT: since the linear layers in the standard GAT are applied right after each other, the ranking of attended nodes is unconditioned on the query node. In contrast, in GATv2, every node can attend to any other node.</p>\n<p>Here is <a href=\"https://nn.labml.ai/graphs/gatv2/experiment.html\">the training code</a> for training a two-layer GATv2 on Cora dataset.</p>\n<p><a href=\"https://app.labml.ai/run/34b1e2f6ed6f11ebb860997901a2d1e3\"><span translate=no>_^_0_^_</span></a> </p>\n": "<h1><a href=\"https://nn.labml.ai/graphs/gatv2/index.html\">\u0db4\u0dca\u0dbb\u0dc3\u0dca\u0dad\u0dcf\u0dbb\u0dba \u0d85\u0dc0\u0db0\u0dcf\u0db1\u0dba \u0da2\u0dcf\u0dbd v2 (GATV2)</a></h1>\n<p>\u0db8\u0dd9\u0dbaGATV2 \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0d9a\u0dbb\u0dd4\u0d9c\u0dda <a href=\"https://pytorch.org\">PyTorch</a> \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0d9a\u0dd2 <a href=\"https://arxiv.org/abs/2105.14491\">\u0db4\u0dca\u0dbb\u0dc3\u0dca\u0dae\u0dcf\u0dbb \u0d85\u0dc0\u0db0\u0dcf\u0db1\u0dba \u0dba\u0ddc\u0db8\u0dd4 \u0d9a\u0dbb\u0db1 \u0da2\u0dcf\u0dbd\u0dba\u0db1\u0dca \u0d9a\u0dd9\u0dad\u0dbb\u0db8\u0dca \u0d85\u0dc0\u0db0\u0dcf\u0db1\u0dba\u0dd9\u0db1\u0dca \u0dc3\u0dd2\u0da7\u0dd2\u0db1\u0dc0\u0dcf\u0daf? </a>. </p>\n<p>GATV2s\u0db4\u0dca\u0dbb\u0dc3\u0dca\u0dad\u0dcf\u0dbb \u0daf\u0dad\u0dca\u0dad \u0db8\u0dad \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf \u0d9a\u0dbb\u0dba\u0dd2. \u0db4\u0dca\u0dbb\u0dc3\u0dca\u0dae\u0dcf\u0dbb\u0dba\u0d9a\u0dca \u0db1\u0ddd\u0da9\u0dca \u0dc3\u0dc4 \u0daf\u0dcf\u0dbb \u0dc3\u0db8\u0dca\u0db6\u0db1\u0dca\u0db0 \u0d9a\u0dbb\u0db1 \u0db1\u0ddd\u0da9\u0dca \u0dc0\u0dbd\u0dd2\u0db1\u0dca \u0dc3\u0db8\u0db1\u0dca\u0dc0\u0dd2\u0dad \u0dc0\u0dda. \u0d8b\u0daf\u0dcf\u0dc4\u0dbb\u0dab\u0dba\u0d9a\u0dca \u0dbd\u0dd9\u0dc3, \u0d9a\u0ddd\u0dbb\u0dcf \u0daf\u0dad\u0dca\u0dad \u0d9a\u0da7\u0dca\u0da7\u0dbd\u0dba\u0dda \u0db1\u0ddd\u0da9\u0dca \u0db4\u0dbb\u0dca\u0dba\u0dda\u0dc2\u0dab \u0db4\u0dad\u0dca\u0dbb\u0dd2\u0d9a\u0dcf \u0dc0\u0db1 \u0d85\u0dad\u0dbb \u0daf\u0dcf\u0dbb \u0dba\u0db1\u0dd4 \u0db4\u0dad\u0dca\u0dbb\u0dd2\u0d9a\u0dcf \u0dc3\u0db8\u0dca\u0db6\u0db1\u0dca\u0db0 \u0d9a\u0dbb\u0db1 \u0d8b\u0db4\u0dd4\u0da7\u0dcf \u0daf\u0dd0\u0d9a\u0dca\u0dc0\u0dd3\u0db8\u0dca \u0dc0\u0dda. </p>\n<p>GATV2\u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0d9a\u0dbb\u0dd4 \u0dc3\u0db8\u0dca\u0db8\u0dad GAT \u0dc4\u0dd2 \u0dc3\u0dca\u0dae\u0dd2\u0dad\u0dd2\u0d9a \u0d85\u0dc0\u0db0\u0dcf\u0db1\u0dba \u0dba\u0ddc\u0db8\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0dda \u0d9c\u0dd0\u0da7\u0dc5\u0dd4\u0dc0 \u0db1\u0dd2\u0dc0\u0dd0\u0dbb\u0daf\u0dd2 \u0d9a\u0dbb\u0dba\u0dd2: \u0dc3\u0db8\u0dca\u0db8\u0dad GAT \u0dc4\u0dd2 \u0dbb\u0dda\u0d9b\u0dd3\u0dba \u0dc3\u0dca\u0dae\u0dbb \u0d91\u0d9a\u0dd2\u0db1\u0dd9\u0d9a\u0da7 \u0db4\u0dc3\u0dd4\u0dc0 \u0dba\u0ddc\u0daf\u0db1 \u0db6\u0dd0\u0dc0\u0dd2\u0db1\u0dca, \u0dc3\u0dc4\u0db7\u0dcf\u0d9c\u0dd3 \u0dc0\u0dd6 \u0db1\u0ddd\u0da9\u0dca \u0dc0\u0dbd \u0dc1\u0dca\u0dbb\u0dda\u0dab\u0dd2\u0d9c\u0dad \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc0\u0dd2\u0db8\u0dc3\u0dd4\u0db8\u0dca \u0db1\u0ddd\u0da9\u0dba \u0db8\u0dad \u0d9a\u0ddc\u0db1\u0dca\u0daf\u0dda\u0dc3\u0dd2 \u0dc0\u0dd2\u0dbb\u0dc4\u0dd2\u0dad\u0dc0 \u0db4\u0dc0\u0dad\u0dd3. \u0d8a\u0da7 \u0dc0\u0dd9\u0db1\u0dc3\u0dca\u0dc0, GATV2 \u0dc4\u0dd2, \u0dc3\u0dd1\u0db8 \u0db1\u0ddd\u0da9\u0dba\u0d9a\u0da7\u0db8 \u0dc0\u0dd9\u0db1\u0dad\u0dca \u0d95\u0db1\u0dd1\u0db8 \u0db1\u0ddd\u0da9\u0dba\u0d9a\u0da7 \u0dc3\u0dc4\u0db7\u0dcf\u0d9c\u0dd3 \u0dc0\u0dd2\u0dba \u0dc4\u0dd0\u0d9a\u0dd2\u0dba. </p>\n<p><a href=\"https://nn.labml.ai/graphs/gatv2/experiment.html\">\u0d9a\u0ddd\u0dbb\u0dcf \u0daf\u0dad\u0dca\u0dad \u0d9a\u0da7\u0dca\u0da7\u0dbd\u0dba\u0dda \u0dc3\u0dca\u0dae\u0dbb \u0daf\u0dd9\u0d9a\u0d9a GATV2 \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf \u0db4\u0dd4\u0dc4\u0dd4\u0dab\u0dd4 \u0d9a\u0dda\u0dad\u0dba</a> \u0db8\u0dd9\u0db1\u0dca\u0db1. </p>\n<p><a href=\"https://app.labml.ai/run/34b1e2f6ed6f11ebb860997901a2d1e3\"><span translate=no>_^_0_^_</span></a> </p>\n",
 "Graph Attention Networks v2 (GATv2)": "\u0db4\u0dca\u0dbb\u0dc3\u0dca\u0dad\u0dcf\u0dbb\u0dba \u0d85\u0dc0\u0db0\u0dcf\u0db1\u0dba \u0da2\u0dcf\u0dbd v2 (GATV2)"
 }
--- a/translate_cache/graphs/gatv2/readme.zh.json
+++ b/translate_cache/graphs/gatv2/readme.zh.json
@ -1,4 +1,4 @@
 {
- "<h1><a href=\"https://nn.labml.ai/graphs/gatv2/index.html\">Graph Attention Networks v2 (GATv2)</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the GATv2 operator from the paper <a href=\"https://papers.labml.ai/paper/2105.14491\">How Attentive are Graph Attention Networks?</a>.</p>\n<p>GATv2s work on graph data. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>The GATv2 operator fixes the static attention problem of the standard GAT: since the linear layers in the standard GAT are applied right after each other, the ranking of attended nodes is unconditioned on the query node. In contrast, in GATv2, every node can attend to any other node.</p>\n<p>Here is <a href=\"https://nn.labml.ai/graphs/gatv2/experiment.html\">the training code</a> for training a two-layer GATv2 on Cora dataset. </p>\n": "<h1><a href=\"https://nn.labml.ai/graphs/gatv2/index.html\">Graph \u6ce8\u610f\u529b\u7f51\u7edc v2 (Gatv2)</a></h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5bf9 Gatv2 \u8fd0\u7b97\u7b26\u7684\u5b9e\u73b0\uff0c\u6458\u81ea\u300a<a href=\"https://papers.labml.ai/paper/2105.14491\">\u56fe\u6ce8\u610f\u529b\u7f51\u7edc\u6709\u591a\u4e13\u5fc3\uff1f</a>\u300b</p>\u3002\n<p>Gatv2 \u5904\u7406\u56fe\u8868\u6570\u636e\u3002\u56fe\u7531\u8282\u70b9\u548c\u8fde\u63a5\u8282\u70b9\u7684\u8fb9\u7ec4\u6210\u3002\u4f8b\u5982\uff0c\u5728 Cora \u6570\u636e\u96c6\u4e2d\uff0c\u8282\u70b9\u662f\u7814\u7a76\u8bba\u6587\uff0c\u8fb9\u7f18\u662f\u8fde\u63a5\u8bba\u6587\u7684\u5f15\u6587\u3002</p>\nG@@ <p>atv2 \u8fd0\u7b97\u7b26\u4fee\u590d\u4e86\u6807\u51c6 GAT \u7684\u9759\u6001\u6ce8\u610f\u529b\u95ee\u9898\uff1a\u7531\u4e8e\u6807\u51c6 GAT \u4e2d\u7684\u7ebf\u6027\u5c42\u662f\u7d27\u63a5\u5e94\u7528\u7684\uff0c\u56e0\u6b64\u6709\u4eba\u503c\u5b88\u8282\u70b9\u7684\u6392\u540d\u4e0d\u53d7\u67e5\u8be2\u8282\u70b9\u7684\u9650\u5236\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5728 Gatv2 \u4e2d\uff0c\u6bcf\u4e2a\u8282\u70b9\u90fd\u53ef\u4ee5\u7ba1\u7406\u4efb\u4f55\u5176\u4ed6\u8282\u70b9\u3002</p>\n<p>\u4ee5\u4e0b\u662f<a href=\"https://nn.labml.ai/graphs/gatv2/experiment.html\">\u5728 Cora \u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\u53cc\u5c42 Gatv2 \u7684\u8bad\u7ec3\u4ee3\u7801</a>\u3002</p>\n",
+ "<h1><a href=\"https://nn.labml.ai/graphs/gatv2/index.html\">Graph Attention Networks v2 (GATv2)</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the GATv2 operator from the paper <a href=\"https://arxiv.org/abs/2105.14491\">How Attentive are Graph Attention Networks?</a>.</p>\n<p>GATv2s work on graph data. A graph consists of nodes and edges connecting nodes. For example, in Cora dataset the nodes are research papers and the edges are citations that connect the papers.</p>\n<p>The GATv2 operator fixes the static attention problem of the standard GAT: since the linear layers in the standard GAT are applied right after each other, the ranking of attended nodes is unconditioned on the query node. In contrast, in GATv2, every node can attend to any other node.</p>\n<p>Here is <a href=\"https://nn.labml.ai/graphs/gatv2/experiment.html\">the training code</a> for training a two-layer GATv2 on Cora dataset. </p>\n": "<h1><a href=\"https://nn.labml.ai/graphs/gatv2/index.html\">Graph \u6ce8\u610f\u529b\u7f51\u7edc v2 (Gatv2)</a></h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch</a> \u5bf9 Gatv2 \u8fd0\u7b97\u7b26\u7684\u5b9e\u73b0\uff0c\u6458\u81ea\u300a<a href=\"https://arxiv.org/abs/2105.14491\">\u56fe\u6ce8\u610f\u529b\u7f51\u7edc\u6709\u591a\u4e13\u5fc3\uff1f</a>\u300b</p>\u3002\n<p>Gatv2 \u5904\u7406\u56fe\u8868\u6570\u636e\u3002\u56fe\u7531\u8282\u70b9\u548c\u8fde\u63a5\u8282\u70b9\u7684\u8fb9\u7ec4\u6210\u3002\u4f8b\u5982\uff0c\u5728 Cora \u6570\u636e\u96c6\u4e2d\uff0c\u8282\u70b9\u662f\u7814\u7a76\u8bba\u6587\uff0c\u8fb9\u7f18\u662f\u8fde\u63a5\u8bba\u6587\u7684\u5f15\u6587\u3002</p>\nG@@ <p>atv2 \u8fd0\u7b97\u7b26\u4fee\u590d\u4e86\u6807\u51c6 GAT \u7684\u9759\u6001\u6ce8\u610f\u529b\u95ee\u9898\uff1a\u7531\u4e8e\u6807\u51c6 GAT \u4e2d\u7684\u7ebf\u6027\u5c42\u662f\u7d27\u63a5\u5e94\u7528\u7684\uff0c\u56e0\u6b64\u6709\u4eba\u503c\u5b88\u8282\u70b9\u7684\u6392\u540d\u4e0d\u53d7\u67e5\u8be2\u8282\u70b9\u7684\u9650\u5236\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5728 Gatv2 \u4e2d\uff0c\u6bcf\u4e2a\u8282\u70b9\u90fd\u53ef\u4ee5\u7ba1\u7406\u4efb\u4f55\u5176\u4ed6\u8282\u70b9\u3002</p>\n<p>\u4ee5\u4e0b\u662f<a href=\"https://nn.labml.ai/graphs/gatv2/experiment.html\">\u5728 Cora \u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\u53cc\u5c42 Gatv2 \u7684\u8bad\u7ec3\u4ee3\u7801</a>\u3002</p>\n",
 "Graph Attention Networks v2 (GATv2)": "Graph \u6ce8\u610f\u529b\u7f51\u7edc v2 (GATv2)"
 }
--- a/translate_cache/hypernetworks/hyper_lstm.ja.json
+++ b/translate_cache/hypernetworks/hyper_lstm.ja.json
--- a/translate_cache/hypernetworks/hyper_lstm.si.json
+++ b/translate_cache/hypernetworks/hyper_lstm.si.json
--- a/translate_cache/hypernetworks/hyper_lstm.zh.json
+++ b/translate_cache/hypernetworks/hyper_lstm.zh.json
--- a/translate_cache/neox/init.ja.json
+++ b/translate_cache/neox/init.ja.json
@ -1,5 +1,5 @@
 {
- "<h1>GPT-NeoX</h1>\n<p>This is a simple implementation of <a href=\"https://papers.labml.ai/paper/2204.06745\">Eleuther GPT-NeoX</a> for inference and fine-tuning.</p>\n<ul><li><a href=\"model.html\">Model definition</a> </li>\n<li><a href=\"tokenizer.html\">Tokenizer</a> </li>\n<li><a href=\"checkpoint.html\">Checkpoint downloading and loading helpers</a> </li>\n<li><a href=\"utils/index.html\">Utilities</a> </li>\n<li><a href=\"utils/llm_int8.html\">LLM.int8() quantization</a></li></ul>\n<h3><a href=\"samples/__init__.py\">Samples</a></h3>\n<ul><li><a href=\"samples/generate.html\">Generating text</a> </li>\n<li><a href=\"samples/finetune.html\">Fine-tuning the biases with pipeline-parallel</a> </li>\n<li><a href=\"samples/llm_int8.html\">Generating text with LLM.int8()</a></li></ul>\n<h3><a href=\"evaluation/__init__.py\">Evaluation</a></h3>\n<ul><li><a href=\"evaluation/half_precision.html\">Evaluating half precision model on a single GPU</a> </li>\n<li><a href=\"evaluation/llm_int8.html\">Evaluating LLM.int8() model</a></li></ul>\n<p><strong>Official <a href=\"https://www.eleuther.ai\">Eleuther</a> GPT-NoeX is source code is available at <a href=\"https://github.com/eleutherai/gpt-neox\">eleutherai/gpt-neox</a>.</strong></p>\n": "<h1>GPT \u30cd\u30aa\u30c3\u30af\u30b9</h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://papers.labml.ai/paper/2204.06745\">\u63a8\u8ad6\u3068\u5fae\u8abf\u6574\u306e\u305f\u3081\u306eEleuther GPT-Neox\u306e\u7c21\u5358\u306a\u5b9f\u88c5\u3067\u3059</a>\u3002</p>\n<ul><li><a href=\"model.html\">\u30e2\u30c7\u30eb\u5b9a\u7fa9</a></li>\n<li><a href=\"tokenizer.html\">\u30c8\u30fc\u30af\u30ca\u30a4\u30b6\u30fc</a></li>\n<li><a href=\"checkpoint.html\">\u30c1\u30a7\u30c3\u30af\u30dd\u30a4\u30f3\u30c8\u306e\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u3068\u8aad\u307f\u8fbc\u307f\u30d8\u30eb\u30d1\u30fc</a></li>\n<li><a href=\"utils/index.html\">\u30e6\u30fc\u30c6\u30a3\u30ea\u30c6\u30a3</a></li>\n<li><a href=\"utils/llm_int8.html\">llm.int8 () \u91cf\u5b50\u5316</a></li></ul>\n<h3><a href=\"samples/__init__.py\">[\u30b5\u30f3\u30d7\u30eb]</a></h3>\n<ul><li><a href=\"samples/generate.html\">\u30c6\u30ad\u30b9\u30c8\u306e\u751f\u6210</a></li>\n<li><a href=\"samples/finetune.html\">\u30d1\u30a4\u30d7\u30e9\u30a4\u30f3\u30d1\u30e9\u30ec\u30eb\u306b\u3088\u308b\u30d0\u30a4\u30a2\u30b9\u306e\u5fae\u8abf\u6574</a></li>\n<li><a href=\"samples/llm_int8.html\">LLM.int8 () \u3092\u4f7f\u7528\u3057\u3066\u30c6\u30ad\u30b9\u30c8\u3092\u751f\u6210\u3059\u308b</a></li></ul>\n<h3><a href=\"evaluation/__init__.py\">\u8a55\u4fa1</a></h3>\n<ul><li><a href=\"evaluation/half_precision.html\">\u5358\u4e00\u306e GPU \u3067\u306e\u534a\u7cbe\u5ea6\u30e2\u30c7\u30eb\u306e\u8a55\u4fa1</a></li>\n<li><a href=\"evaluation/llm_int8.html\">LLM.int8 () \u30e2\u30c7\u30eb\u306e\u8a55\u4fa1\u4e2d</a></li></ul>\n<p><strong><a href=\"https://www.eleuther.ai\">Eleuther GPT-noex\u306e\u516c\u5f0f\u30bd\u30fc\u30b9\u30b3\u30fc\u30c9\u306feleutherai/gpt-neox\u3067\u5165\u624b\u3067\u304d\u307e\u3059</a><a href=\"https://github.com/eleutherai/gpt-neox\">\u3002</a></strong></p>\n",
+ "<h1>GPT-NeoX</h1>\n<p>This is a simple implementation of <a href=\"https://arxiv.org/abs/2204.06745\">Eleuther GPT-NeoX</a> for inference and fine-tuning.</p>\n<ul><li><a href=\"model.html\">Model definition</a> </li>\n<li><a href=\"tokenizer.html\">Tokenizer</a> </li>\n<li><a href=\"checkpoint.html\">Checkpoint downloading and loading helpers</a> </li>\n<li><a href=\"utils/index.html\">Utilities</a> </li>\n<li><a href=\"utils/llm_int8.html\">LLM.int8() quantization</a></li></ul>\n<h3><a href=\"samples/__init__.py\">Samples</a></h3>\n<ul><li><a href=\"samples/generate.html\">Generating text</a> </li>\n<li><a href=\"samples/finetune.html\">Fine-tuning the biases with pipeline-parallel</a> </li>\n<li><a href=\"samples/llm_int8.html\">Generating text with LLM.int8()</a></li></ul>\n<h3><a href=\"evaluation/__init__.py\">Evaluation</a></h3>\n<ul><li><a href=\"evaluation/half_precision.html\">Evaluating half precision model on a single GPU</a> </li>\n<li><a href=\"evaluation/llm_int8.html\">Evaluating LLM.int8() model</a></li></ul>\n<p><strong>Official <a href=\"https://www.eleuther.ai\">Eleuther</a> GPT-NoeX is source code is available at <a href=\"https://github.com/eleutherai/gpt-neox\">eleutherai/gpt-neox</a>.</strong></p>\n": "<h1>GPT \u30cd\u30aa\u30c3\u30af\u30b9</h1>\n<p>\u3053\u308c\u306f\u3001<a href=\"https://arxiv.org/abs/2204.06745\">\u63a8\u8ad6\u3068\u5fae\u8abf\u6574\u306e\u305f\u3081\u306eEleuther GPT-Neox\u306e\u7c21\u5358\u306a\u5b9f\u88c5\u3067\u3059</a>\u3002</p>\n<ul><li><a href=\"model.html\">\u30e2\u30c7\u30eb\u5b9a\u7fa9</a></li>\n<li><a href=\"tokenizer.html\">\u30c8\u30fc\u30af\u30ca\u30a4\u30b6\u30fc</a></li>\n<li><a href=\"checkpoint.html\">\u30c1\u30a7\u30c3\u30af\u30dd\u30a4\u30f3\u30c8\u306e\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u3068\u8aad\u307f\u8fbc\u307f\u30d8\u30eb\u30d1\u30fc</a></li>\n<li><a href=\"utils/index.html\">\u30e6\u30fc\u30c6\u30a3\u30ea\u30c6\u30a3</a></li>\n<li><a href=\"utils/llm_int8.html\">llm.int8 () \u91cf\u5b50\u5316</a></li></ul>\n<h3><a href=\"samples/__init__.py\">[\u30b5\u30f3\u30d7\u30eb]</a></h3>\n<ul><li><a href=\"samples/generate.html\">\u30c6\u30ad\u30b9\u30c8\u306e\u751f\u6210</a></li>\n<li><a href=\"samples/finetune.html\">\u30d1\u30a4\u30d7\u30e9\u30a4\u30f3\u30d1\u30e9\u30ec\u30eb\u306b\u3088\u308b\u30d0\u30a4\u30a2\u30b9\u306e\u5fae\u8abf\u6574</a></li>\n<li><a href=\"samples/llm_int8.html\">LLM.int8 () \u3092\u4f7f\u7528\u3057\u3066\u30c6\u30ad\u30b9\u30c8\u3092\u751f\u6210\u3059\u308b</a></li></ul>\n<h3><a href=\"evaluation/__init__.py\">\u8a55\u4fa1</a></h3>\n<ul><li><a href=\"evaluation/half_precision.html\">\u5358\u4e00\u306e GPU \u3067\u306e\u534a\u7cbe\u5ea6\u30e2\u30c7\u30eb\u306e\u8a55\u4fa1</a></li>\n<li><a href=\"evaluation/llm_int8.html\">LLM.int8 () \u30e2\u30c7\u30eb\u306e\u8a55\u4fa1\u4e2d</a></li></ul>\n<p><strong><a href=\"https://www.eleuther.ai\">Eleuther GPT-noex\u306e\u516c\u5f0f\u30bd\u30fc\u30b9\u30b3\u30fc\u30c9\u306feleutherai/gpt-neox\u3067\u5165\u624b\u3067\u304d\u307e\u3059</a><a href=\"https://github.com/eleutherai/gpt-neox\">\u3002</a></strong></p>\n",
 "GPT-NeoX": "GPT \u30cd\u30aa\u30c3\u30af\u30b9",
 "Simple GPT-NeoX implementation": "\u30b7\u30f3\u30d7\u30eb\u306a GPT \u30cd\u30aa\u30c3\u30af\u30b9\u306e\u5b9f\u88c5"
 }
--- a/translate_cache/neox/init.si.json
+++ b/translate_cache/neox/init.si.json
@ -1,5 +1,5 @@
 {
- "<h1>GPT-NeoX</h1>\n<p>This is a simple implementation of <a href=\"https://papers.labml.ai/paper/2204.06745\">Eleuther GPT-NeoX</a> for inference and fine-tuning.</p>\n<ul><li><a href=\"model.html\">Model definition</a> </li>\n<li><a href=\"tokenizer.html\">Tokenizer</a> </li>\n<li><a href=\"checkpoint.html\">Checkpoint downloading and loading helpers</a> </li>\n<li><a href=\"utils/index.html\">Utilities</a> </li>\n<li><a href=\"utils/llm_int8.html\">LLM.int8() quantization</a></li></ul>\n<h3><a href=\"samples/__init__.py\">Samples</a></h3>\n<ul><li><a href=\"samples/generate.html\">Generating text</a> </li>\n<li><a href=\"samples/finetune.html\">Fine-tuning the biases with pipeline-parallel</a> </li>\n<li><a href=\"samples/llm_int8.html\">Generating text with LLM.int8()</a></li></ul>\n<h3><a href=\"evaluation/__init__.py\">Evaluation</a></h3>\n<ul><li><a href=\"evaluation/half_precision.html\">Evaluating half precision model on a single GPU</a> </li>\n<li><a href=\"evaluation/llm_int8.html\">Evaluating LLM.int8() model</a></li></ul>\n<p><strong>Official <a href=\"https://www.eleuther.ai\">Eleuther</a> GPT-NoeX is source code is available at <a href=\"https://github.com/eleutherai/gpt-neox\">eleutherai/gpt-neox</a>.</strong></p>\n": "<h1>\u0da2\u0dd3\u0db4\u0dd3\u0da7\u0dd3-\u0db1\u0dd2\u0dba\u0ddd\u0d9a\u0dca\u0dc3\u0dca</h1>\n<p>\u0db8\u0dd9\u0dba\u0d85\u0db1\u0dd4\u0db8\u0dcf\u0db1\u0dba \u0dc3\u0dc4 \u0db8\u0db1\u0dcf\u0dc0 \u0dc3\u0dd4\u0dc3\u0dbb \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf <a href=\"https://papers.labml.ai/paper/2204.06745\">\u0d91\u0dbd\u0dd2\u0dba\u0dd4\u0dad\u0dbb\u0dca \u0da2\u0dd3\u0db4\u0dd3\u0da7\u0dd3-\u0db1\u0dd2\u0dba\u0ddd\u0d9a\u0dca\u0dc3\u0dca</a> \u0dc3\u0dbb\u0dbd \u0dbd\u0dd9\u0dc3 \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0d9a\u0dd2. </p>\n<ul><li><a href=\"model.html\">\u0d86\u0daf\u0dbb\u0dca\u0dc1 \u0d85\u0dbb\u0dca\u0dae \u0daf\u0dd0\u0d9a\u0dca\u0dc0\u0dd3\u0db8</a> </li>\n<li><a href=\"tokenizer.html\">\u0da7\u0ddd\u0d9a\u0db1\u0dba\u0dd2\u0dc3\u0dbb\u0dca</a> </li>\n<li><a href=\"checkpoint.html\">\u0db4\u0dd2\u0dbb\u0dd2\u0d9a\u0dca\u0dc3\u0dd4\u0db8\u0dca \u0dbd\u0d9a\u0dca\u0dc2\u0dca\u0dba\u0dba \u0db6\u0dcf\u0d9c\u0dad \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0dc4 \u0db4\u0dd0\u0da7\u0dc0\u0dd3\u0db8 \u0dc3\u0dc4\u0dcf\u0dba\u0d9a\u0dba\u0dd2\u0db1\u0dca</a> </li>\n<li><a href=\"utils/index.html\">\u0d8b\u0db4\u0dba\u0ddd\u0d9c\u0dd2\u0dad\u0dcf</a> </li>\n<li><a href=\"utils/llm_int8.html\">LLM.INT8 () \u0db4\u0dca\u0dbb\u0db8\u0dcf\u0dab\u0d9a\u0dbb\u0dab\u0dba</a></li></ul>\n<h3><a href=\"samples/__init__.py\">\u0dc3\u0dcf\u0db8\u0dca\u0db4\u0dbd</a></h3>\n<ul><li><a href=\"samples/generate.html\">\u0db4\u0dd9\u0dc5 \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0db1\u0dba</a> </li>\n<li><a href=\"samples/finetune.html\">\u0db1\u0dbd \u0db8\u0dcf\u0dbb\u0dca\u0d9c\u0dba\u0dda \u0dc3\u0db8\u0dcf\u0db1\u0dca\u0dad\u0dbb\u0dc0 \u0d87\u0dad\u0dd2 \u0d85\u0d9c\u0dad\u0dd3\u0db1\u0dca \u0db8\u0db1\u0dcf\u0dc0 \u0dc3\u0d9a\u0dc3\u0dca \u0d9a\u0dd2\u0dbb\u0dd3\u0db8</a> </li>\n<li><a href=\"samples/llm_int8.html\">LLM.INT8 () \u0dc3\u0db8\u0d9f \u0db4\u0dd9\u0dc5 \u0da2\u0db1\u0db1\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8</a></li></ul>\n<h3><a href=\"evaluation/__init__.py\">\u0d87\u0d9c\u0dba\u0dd3\u0db8</a></h3>\n<ul><li><a href=\"evaluation/half_precision.html\">\u0dad\u0db1\u0dd2 GPU \u0db8\u0dad \u0d85\u0dbb\u0dca\u0db0 \u0db1\u0dd2\u0dbb\u0dc0\u0daf\u0dca\u0dba\u0dad\u0dcf \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba\u0d9a\u0dca \u0d87\u0d9c\u0dba\u0dd3\u0db8</a> </li>\n<li><a href=\"evaluation/llm_int8.html\">LLM.INT8 () \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba \u0d87\u0d9c\u0dba\u0dd3\u0db8</a></li></ul>\n<p><strong>\u0db1\u0dd2\u0dbd <a href=\"https://www.eleuther.ai\">\u0d91\u0dbd\u0dd2\u0dad\u0dbb\u0dca</a> \u0da2\u0dd3\u0db4\u0dd3\u0da7\u0dd3-\u0db1\u0ddc\u0d9a\u0dca\u0dc3\u0dca \u0dba\u0db1\u0dd4 \u0db4\u0dca\u0dbb\u0db7\u0dc0 \u0d9a\u0dda\u0dad\u0dba <a href=\"https://github.com/eleutherai/gpt-neox\">eleutherai/gpt-neox</a> \u0dc0\u0dbd\u0dd2\u0db1\u0dca \u0dbd\u0db6\u0dcf \u0d9c\u0dad \u0dc4\u0dd0\u0d9a\u0dd2\u0dba. </strong></p>\n",
+ "<h1>GPT-NeoX</h1>\n<p>This is a simple implementation of <a href=\"https://arxiv.org/abs/2204.06745\">Eleuther GPT-NeoX</a> for inference and fine-tuning.</p>\n<ul><li><a href=\"model.html\">Model definition</a> </li>\n<li><a href=\"tokenizer.html\">Tokenizer</a> </li>\n<li><a href=\"checkpoint.html\">Checkpoint downloading and loading helpers</a> </li>\n<li><a href=\"utils/index.html\">Utilities</a> </li>\n<li><a href=\"utils/llm_int8.html\">LLM.int8() quantization</a></li></ul>\n<h3><a href=\"samples/__init__.py\">Samples</a></h3>\n<ul><li><a href=\"samples/generate.html\">Generating text</a> </li>\n<li><a href=\"samples/finetune.html\">Fine-tuning the biases with pipeline-parallel</a> </li>\n<li><a href=\"samples/llm_int8.html\">Generating text with LLM.int8()</a></li></ul>\n<h3><a href=\"evaluation/__init__.py\">Evaluation</a></h3>\n<ul><li><a href=\"evaluation/half_precision.html\">Evaluating half precision model on a single GPU</a> </li>\n<li><a href=\"evaluation/llm_int8.html\">Evaluating LLM.int8() model</a></li></ul>\n<p><strong>Official <a href=\"https://www.eleuther.ai\">Eleuther</a> GPT-NoeX is source code is available at <a href=\"https://github.com/eleutherai/gpt-neox\">eleutherai/gpt-neox</a>.</strong></p>\n": "<h1>\u0da2\u0dd3\u0db4\u0dd3\u0da7\u0dd3-\u0db1\u0dd2\u0dba\u0ddd\u0d9a\u0dca\u0dc3\u0dca</h1>\n<p>\u0db8\u0dd9\u0dba\u0d85\u0db1\u0dd4\u0db8\u0dcf\u0db1\u0dba \u0dc3\u0dc4 \u0db8\u0db1\u0dcf\u0dc0 \u0dc3\u0dd4\u0dc3\u0dbb \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0db3\u0dc4\u0dcf <a href=\"https://arxiv.org/abs/2204.06745\">\u0d91\u0dbd\u0dd2\u0dba\u0dd4\u0dad\u0dbb\u0dca \u0da2\u0dd3\u0db4\u0dd3\u0da7\u0dd3-\u0db1\u0dd2\u0dba\u0ddd\u0d9a\u0dca\u0dc3\u0dca</a> \u0dc3\u0dbb\u0dbd \u0dbd\u0dd9\u0dc3 \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8\u0d9a\u0dd2. </p>\n<ul><li><a href=\"model.html\">\u0d86\u0daf\u0dbb\u0dca\u0dc1 \u0d85\u0dbb\u0dca\u0dae \u0daf\u0dd0\u0d9a\u0dca\u0dc0\u0dd3\u0db8</a> </li>\n<li><a href=\"tokenizer.html\">\u0da7\u0ddd\u0d9a\u0db1\u0dba\u0dd2\u0dc3\u0dbb\u0dca</a> </li>\n<li><a href=\"checkpoint.html\">\u0db4\u0dd2\u0dbb\u0dd2\u0d9a\u0dca\u0dc3\u0dd4\u0db8\u0dca \u0dbd\u0d9a\u0dca\u0dc2\u0dca\u0dba\u0dba \u0db6\u0dcf\u0d9c\u0dad \u0d9a\u0dd2\u0dbb\u0dd3\u0db8 \u0dc3\u0dc4 \u0db4\u0dd0\u0da7\u0dc0\u0dd3\u0db8 \u0dc3\u0dc4\u0dcf\u0dba\u0d9a\u0dba\u0dd2\u0db1\u0dca</a> </li>\n<li><a href=\"utils/index.html\">\u0d8b\u0db4\u0dba\u0ddd\u0d9c\u0dd2\u0dad\u0dcf</a> </li>\n<li><a href=\"utils/llm_int8.html\">LLM.INT8 () \u0db4\u0dca\u0dbb\u0db8\u0dcf\u0dab\u0d9a\u0dbb\u0dab\u0dba</a></li></ul>\n<h3><a href=\"samples/__init__.py\">\u0dc3\u0dcf\u0db8\u0dca\u0db4\u0dbd</a></h3>\n<ul><li><a href=\"samples/generate.html\">\u0db4\u0dd9\u0dc5 \u0d8b\u0dad\u0dca\u0db4\u0dcf\u0daf\u0db1\u0dba</a> </li>\n<li><a href=\"samples/finetune.html\">\u0db1\u0dbd \u0db8\u0dcf\u0dbb\u0dca\u0d9c\u0dba\u0dda \u0dc3\u0db8\u0dcf\u0db1\u0dca\u0dad\u0dbb\u0dc0 \u0d87\u0dad\u0dd2 \u0d85\u0d9c\u0dad\u0dd3\u0db1\u0dca \u0db8\u0db1\u0dcf\u0dc0 \u0dc3\u0d9a\u0dc3\u0dca \u0d9a\u0dd2\u0dbb\u0dd3\u0db8</a> </li>\n<li><a href=\"samples/llm_int8.html\">LLM.INT8 () \u0dc3\u0db8\u0d9f \u0db4\u0dd9\u0dc5 \u0da2\u0db1\u0db1\u0dba \u0d9a\u0dd2\u0dbb\u0dd3\u0db8</a></li></ul>\n<h3><a href=\"evaluation/__init__.py\">\u0d87\u0d9c\u0dba\u0dd3\u0db8</a></h3>\n<ul><li><a href=\"evaluation/half_precision.html\">\u0dad\u0db1\u0dd2 GPU \u0db8\u0dad \u0d85\u0dbb\u0dca\u0db0 \u0db1\u0dd2\u0dbb\u0dc0\u0daf\u0dca\u0dba\u0dad\u0dcf \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba\u0d9a\u0dca \u0d87\u0d9c\u0dba\u0dd3\u0db8</a> </li>\n<li><a href=\"evaluation/llm_int8.html\">LLM.INT8 () \u0d86\u0d9a\u0dd8\u0dad\u0dd2\u0dba \u0d87\u0d9c\u0dba\u0dd3\u0db8</a></li></ul>\n<p><strong>\u0db1\u0dd2\u0dbd <a href=\"https://www.eleuther.ai\">\u0d91\u0dbd\u0dd2\u0dad\u0dbb\u0dca</a> \u0da2\u0dd3\u0db4\u0dd3\u0da7\u0dd3-\u0db1\u0ddc\u0d9a\u0dca\u0dc3\u0dca \u0dba\u0db1\u0dd4 \u0db4\u0dca\u0dbb\u0db7\u0dc0 \u0d9a\u0dda\u0dad\u0dba <a href=\"https://github.com/eleutherai/gpt-neox\">eleutherai/gpt-neox</a> \u0dc0\u0dbd\u0dd2\u0db1\u0dca \u0dbd\u0db6\u0dcf \u0d9c\u0dad \u0dc4\u0dd0\u0d9a\u0dd2\u0dba. </strong></p>\n",
 "GPT-NeoX": "\u0da2\u0dd3\u0db4\u0dd3\u0da7\u0dd3-\u0db1\u0dd2\u0dba\u0ddd\u0d9a\u0dca\u0dc3\u0dca",
 "Simple GPT-NeoX implementation": "\u0dc3\u0dbb\u0dbd \u0da2\u0dd3\u0db4\u0dd3\u0da7\u0dd3-\u0db1\u0dd2\u0dba\u0ddd\u0d9a\u0dca\u0dc3\u0dca \u0d9a\u0dca\u0dbb\u0dd2\u0dba\u0dcf\u0dad\u0dca\u0db8\u0d9a \u0d9a\u0dd2\u0dbb\u0dd3\u0db8"
 }
--- a/translate_cache/neox/init.zh.json
+++ b/translate_cache/neox/init.zh.json
@ -1,5 +1,5 @@
 {
- "<h1>GPT-NeoX</h1>\n<p>This is a simple implementation of <a href=\"https://papers.labml.ai/paper/2204.06745\">Eleuther GPT-NeoX</a> for inference and fine-tuning.</p>\n<ul><li><a href=\"model.html\">Model definition</a> </li>\n<li><a href=\"tokenizer.html\">Tokenizer</a> </li>\n<li><a href=\"checkpoint.html\">Checkpoint downloading and loading helpers</a> </li>\n<li><a href=\"utils/index.html\">Utilities</a> </li>\n<li><a href=\"utils/llm_int8.html\">LLM.int8() quantization</a></li></ul>\n<h3><a href=\"samples/__init__.py\">Samples</a></h3>\n<ul><li><a href=\"samples/generate.html\">Generating text</a> </li>\n<li><a href=\"samples/finetune.html\">Fine-tuning the biases with pipeline-parallel</a> </li>\n<li><a href=\"samples/llm_int8.html\">Generating text with LLM.int8()</a></li></ul>\n<h3><a href=\"evaluation/__init__.py\">Evaluation</a></h3>\n<ul><li><a href=\"evaluation/half_precision.html\">Evaluating half precision model on a single GPU</a> </li>\n<li><a href=\"evaluation/llm_int8.html\">Evaluating LLM.int8() model</a></li></ul>\n<p><strong>Official <a href=\"https://www.eleuther.ai\">Eleuther</a> GPT-NoeX is source code is available at <a href=\"https://github.com/eleutherai/gpt-neox\">eleutherai/gpt-neox</a>.</strong></p>\n": "<h1>GPT-neox</h1>\n<p>\u8fd9\u662f Ele <a href=\"https://papers.labml.ai/paper/2204.06745\">uther GPT-NEOX</a> \u7684\u7b80\u5355\u5b9e\u73b0\uff0c\u7528\u4e8e\u63a8\u7406\u548c\u5fae\u8c03\u3002</p>\n<ul><li><a href=\"model.html\">\u578b\u53f7\u5b9a\u4e49</a></li>\n<li><a href=\"tokenizer.html\">\u5206\u8bcd\u5668</a></li>\n<li><a href=\"checkpoint.html\">\u68c0\u67e5\u70b9\u4e0b\u8f7d\u548c\u52a0\u8f7d\u52a9\u624b</a></li>\n<li><a href=\"utils/index.html\">\u516c\u5171\u4e8b\u4e1a</a></li>\n<li><a href=\"utils/llm_int8.html\">llm.int8 () \u91cf\u5316</a></li></ul>\n<h3><a href=\"samples/__init__.py\">\u6837\u54c1</a></h3>\n<ul><li><a href=\"samples/generate.html\">\u751f\u6210\u6587\u672c</a></li>\n<li><a href=\"samples/finetune.html\">\u4f7f\u7528\u7ba1\u9053\u5e73\u884c\u5fae\u8c03\u504f\u5dee</a></li>\n<li><a href=\"samples/llm_int8.html\">\u4f7f\u7528 llm.int8 () \u751f\u6210\u6587\u672c</a></li></ul>\n<h3><a href=\"evaluation/__init__.py\">\u8bc4\u4f30</a></h3>\n<li><a href=\"evaluation/half_precision.html\">\u5728\u5355\u4e2a GPU \u4e0a\u8bc4\u4f30\u534a\u7cbe\u5ea6\u6a21\u578b</a></li> <ul>\n<li><a href=\"evaluation/llm_int8.html\">\u6b63\u5728\u8bc4\u4f30 llm.int8 () \u6a21\u578b</a></li></ul>\n<p><strong>\u5b98\u65b9\u7684 <a href=\"https://www.eleuther.ai\">Eleuther</a> GPT-NOEX \u662f\u6e90\u4ee3\u7801\u53ef\u5728 <a href=\"https://github.com/eleutherai/gpt-neox\">eleutherai/gpt-neox</a> \u83b7\u5f97\u3002</strong></p>\n",
+ "<h1>GPT-NeoX</h1>\n<p>This is a simple implementation of <a href=\"https://arxiv.org/abs/2204.06745\">Eleuther GPT-NeoX</a> for inference and fine-tuning.</p>\n<ul><li><a href=\"model.html\">Model definition</a> </li>\n<li><a href=\"tokenizer.html\">Tokenizer</a> </li>\n<li><a href=\"checkpoint.html\">Checkpoint downloading and loading helpers</a> </li>\n<li><a href=\"utils/index.html\">Utilities</a> </li>\n<li><a href=\"utils/llm_int8.html\">LLM.int8() quantization</a></li></ul>\n<h3><a href=\"samples/__init__.py\">Samples</a></h3>\n<ul><li><a href=\"samples/generate.html\">Generating text</a> </li>\n<li><a href=\"samples/finetune.html\">Fine-tuning the biases with pipeline-parallel</a> </li>\n<li><a href=\"samples/llm_int8.html\">Generating text with LLM.int8()</a></li></ul>\n<h3><a href=\"evaluation/__init__.py\">Evaluation</a></h3>\n<ul><li><a href=\"evaluation/half_precision.html\">Evaluating half precision model on a single GPU</a> </li>\n<li><a href=\"evaluation/llm_int8.html\">Evaluating LLM.int8() model</a></li></ul>\n<p><strong>Official <a href=\"https://www.eleuther.ai\">Eleuther</a> GPT-NoeX is source code is available at <a href=\"https://github.com/eleutherai/gpt-neox\">eleutherai/gpt-neox</a>.</strong></p>\n": "<h1>GPT-neox</h1>\n<p>\u8fd9\u662f Ele <a href=\"https://arxiv.org/abs/2204.06745\">uther GPT-NEOX</a> \u7684\u7b80\u5355\u5b9e\u73b0\uff0c\u7528\u4e8e\u63a8\u7406\u548c\u5fae\u8c03\u3002</p>\n<ul><li><a href=\"model.html\">\u578b\u53f7\u5b9a\u4e49</a></li>\n<li><a href=\"tokenizer.html\">\u5206\u8bcd\u5668</a></li>\n<li><a href=\"checkpoint.html\">\u68c0\u67e5\u70b9\u4e0b\u8f7d\u548c\u52a0\u8f7d\u52a9\u624b</a></li>\n<li><a href=\"utils/index.html\">\u516c\u5171\u4e8b\u4e1a</a></li>\n<li><a href=\"utils/llm_int8.html\">llm.int8 () \u91cf\u5316</a></li></ul>\n<h3><a href=\"samples/__init__.py\">\u6837\u54c1</a></h3>\n<ul><li><a href=\"samples/generate.html\">\u751f\u6210\u6587\u672c</a></li>\n<li><a href=\"samples/finetune.html\">\u4f7f\u7528\u7ba1\u9053\u5e73\u884c\u5fae\u8c03\u504f\u5dee</a></li>\n<li><a href=\"samples/llm_int8.html\">\u4f7f\u7528 llm.int8 () \u751f\u6210\u6587\u672c</a></li></ul>\n<h3><a href=\"evaluation/__init__.py\">\u8bc4\u4f30</a></h3>\n<li><a href=\"evaluation/half_precision.html\">\u5728\u5355\u4e2a GPU \u4e0a\u8bc4\u4f30\u534a\u7cbe\u5ea6\u6a21\u578b</a></li> <ul>\n<li><a href=\"evaluation/llm_int8.html\">\u6b63\u5728\u8bc4\u4f30 llm.int8 () \u6a21\u578b</a></li></ul>\n<p><strong>\u5b98\u65b9\u7684 <a href=\"https://www.eleuther.ai\">Eleuther</a> GPT-NOEX \u662f\u6e90\u4ee3\u7801\u53ef\u5728 <a href=\"https://github.com/eleutherai/gpt-neox\">eleutherai/gpt-neox</a> \u83b7\u5f97\u3002</strong></p>\n",
 "GPT-NeoX": "GPT-neox",
 "Simple GPT-NeoX implementation": "\u7b80\u5355\u7684 GPT-NEOX \u5b9e\u73b0"
 }
--- a/translate_cache/neox/model.ja.json
+++ b/translate_cache/neox/model.ja.json
@ -4,7 +4,7 @@
 "<h2>Embedding layer</h2>\n<p>This is a standard embeddings layer with code to load the checkpoint.</p>\n": "<h2>\u57cb\u3081\u8fbc\u307f\u30ec\u30a4\u30e4\u30fc</h2>\n<p>\u3053\u308c\u306f\u3001\u30c1\u30a7\u30c3\u30af\u30dd\u30a4\u30f3\u30c8\u3092\u30ed\u30fc\u30c9\u3059\u308b\u30b3\u30fc\u30c9\u3092\u542b\u3080\u6a19\u6e96\u306e\u57cb\u3081\u8fbc\u307f\u30ec\u30a4\u30e4\u30fc\u3067\u3059\u3002</p>\n",
 "<h2>Feedforward Network</h2>\n": "<h2>\u30d5\u30a3\u30fc\u30c9\u30d5\u30a9\u30ef\u30fc\u30c9\u30cd\u30c3\u30c8\u30ef\u30fc\u30af</h2>\n",
 "<h2>Final normalization layer</h2>\n": "<h2>\u6700\u7d42\u6b63\u898f\u5316\u30ec\u30a4\u30e4\u30fc</h2>\n",
- "<h2>Rotary Positional Embeddings</h2>\n<p>GPT-NeoX uses <a href=\"https://papers.labml.ai/paper/2104.09864\">rotary positional embeddings (RoPE)</a>.</p>\n<p>WE have annotated implementation of RoPE <a href=\"https://nn.labml.ai/transformers/rope/index.html\">here</a> with more notes the theory.</p>\n": "<h2>\u30ed\u30fc\u30bf\u30ea\u30fc\u30dd\u30b8\u30b7\u30e7\u30ca\u30eb\u30a8\u30f3\u30d9\u30c7\u30a3\u30f3\u30b0</h2>\n<p><a href=\"https://papers.labml.ai/paper/2104.09864\">GPT-Neox\u306f\u56de\u8ee2\u5f0f\u30dd\u30b8\u30b7\u30e7\u30ca\u30eb\u30a8\u30f3\u30d9\u30c7\u30a3\u30f3\u30b0</a>\uff08RoPE\uff09\u3092\u4f7f\u7528\u3057\u3066\u3044\u307e\u3059\u3002</p>\n<p><a href=\"https://nn.labml.ai/transformers/rope/index.html\">\u3053\u3053\u3067\u306f</a>\u3001RoPE \u306e\u5b9f\u88c5\u306b\u6ce8\u91c8\u3092\u4ed8\u3051\u3066\u3001\u7406\u8ad6\u306b\u95a2\u3059\u308b\u6ce8\u91c8\u3092\u4ed8\u3051\u307e\u3057\u305f\u3002</p>\n",
+ "<h2>Rotary Positional Embeddings</h2>\n<p>GPT-NeoX uses <a href=\"https://arxiv.org/abs/2104.09864\">rotary positional embeddings (RoPE)</a>.</p>\n<p>WE have annotated implementation of RoPE <a href=\"https://nn.labml.ai/transformers/rope/index.html\">here</a> with more notes the theory.</p>\n": "<h2>\u30ed\u30fc\u30bf\u30ea\u30fc\u30dd\u30b8\u30b7\u30e7\u30ca\u30eb\u30a8\u30f3\u30d9\u30c7\u30a3\u30f3\u30b0</h2>\n<p><a href=\"https://arxiv.org/abs/2104.09864\">GPT-Neox\u306f\u56de\u8ee2\u5f0f\u30dd\u30b8\u30b7\u30e7\u30ca\u30eb\u30a8\u30f3\u30d9\u30c7\u30a3\u30f3\u30b0</a>\uff08RoPE\uff09\u3092\u4f7f\u7528\u3057\u3066\u3044\u307e\u3059\u3002</p>\n<p><a href=\"https://nn.labml.ai/transformers/rope/index.html\">\u3053\u3053\u3067\u306f</a>\u3001RoPE \u306e\u5b9f\u88c5\u306b\u6ce8\u91c8\u3092\u4ed8\u3051\u3066\u3001\u7406\u8ad6\u306b\u95a2\u3059\u308b\u6ce8\u91c8\u3092\u4ed8\u3051\u307e\u3057\u305f\u3002</p>\n",
 "<h2>Transformer Layer</h2>\n": "<h2>\u5909\u5727\u5668\u5c64</h2>\n",
 "<h3>Generator to create layers</h3>\n<p>The layers are generated in the same order as checkpoints.</p>\n<p>It gives <span translate=no>_^_0_^_</span> when a layer is not available; we use the layer indices as NeoX and there are two transformation layers we don&#x27;t need in our implementation.</p>\n<ul><li><span translate=no>_^_1_^_</span>  is the number of tokens in the vocabulary </li>\n<li><span translate=no>_^_2_^_</span>  is the number of features in the embeddings </li>\n<li><span translate=no>_^_3_^_</span>  is the number of transformer layers </li>\n<li><span translate=no>_^_4_^_</span>  is the number of attention heads </li>\n<li><span translate=no>_^_5_^_</span>  are the set of layers to be used. All layers will be used if None.  This is used to test smaller versions of the model with fewer layers </li>\n<li><span translate=no>_^_6_^_</span>  specifies whether to clone the transformer layers (a bit faster) </li>\n<li><span translate=no>_^_7_^_</span>  is the data type of the model </li>\n<li><span translate=no>_^_8_^_</span>  is the device of the model </li>\n<li><span translate=no>_^_9_^_</span>  specifies whether to use int8 quantization </li>\n<li><span translate=no>_^_10_^_</span>  is the threshold <span translate=no>_^_11_^_</span> used to separate outlier features </li>\n<li><span translate=no>_^_12_^_</span>  specifies whether to use  <a href=\"https://github.com/HazyResearch/flash-attention\">FlashAttention</a></li></ul>\n": "<h3>\u30ec\u30a4\u30e4\u30fc\u3092\u4f5c\u6210\u3059\u308b\u305f\u3081\u306e\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc</h3>\n<p>\u30ec\u30a4\u30e4\u30fc\u306f\u30c1\u30a7\u30c3\u30af\u30dd\u30a4\u30f3\u30c8\u3068\u540c\u3058\u9806\u5e8f\u3067\u751f\u6210\u3055\u308c\u307e\u3059\u3002</p>\n<p><span translate=no>_^_0_^_</span>\u30ec\u30a4\u30e4\u30fc\u304c\u4f7f\u7528\u3067\u304d\u306a\u3044\u5834\u5408\u306b\u8fd4\u3055\u308c\u307e\u3059\u3002\u30ec\u30a4\u30e4\u30fc\u30a4\u30f3\u30c7\u30c3\u30af\u30b9\u3092NeoX\u3068\u3057\u3066\u4f7f\u7528\u3057\u3001\u5b9f\u88c5\u306b\u306f\u5fc5\u8981\u306e\u306a\u3044\u5909\u63db\u30ec\u30a4\u30e4\u30fc\u304c2\u3064\u3042\u308a\u307e\u3059\u3002</p>\n<ul><li><span translate=no>_^_1_^_</span>\u30dc\u30ad\u30e3\u30d6\u30e9\u30ea\u5185\u306e\u30c8\u30fc\u30af\u30f3\u306e\u6570\u3067\u3059</li>\n<li><span translate=no>_^_2_^_</span>\u306f\u57cb\u3081\u8fbc\u307f\u5185\u306e\u30d5\u30a3\u30fc\u30c1\u30e3\u306e\u6570\u3067\u3059</li>\n<li><span translate=no>_^_3_^_</span>\u5909\u5727\u5668\u5c64\u306e\u6570\u3067\u3059</li>\n<li><span translate=no>_^_4_^_</span>\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30d8\u30c3\u30c9\u306e\u6570\u3067\u3059</li>\n<li><span translate=no>_^_5_^_</span>\u4f7f\u7528\u3059\u308b\u30ec\u30a4\u30e4\u30fc\u306e\u30bb\u30c3\u30c8\u3067\u3059\u3002None \u306e\u5834\u5408\u306f\u3059\u3079\u3066\u306e\u30ec\u30a4\u30e4\u30fc\u304c\u4f7f\u7528\u3055\u308c\u307e\u3059\u3002\u3053\u308c\u306f\u3001\u30ec\u30a4\u30e4\u30fc\u6570\u306e\u5c11\u306a\u3044\u30e2\u30c7\u30eb\u306e\u5c0f\u3055\u3044\u30d0\u30fc\u30b8\u30e7\u30f3\u3092\u30c6\u30b9\u30c8\u3059\u308b\u5834\u5408\u306b\u4f7f\u7528\u3057\u307e\u3059</li>\u3002\n<li><span translate=no>_^_6_^_</span>\u30c8\u30e9\u30f3\u30b9\u30d5\u30a9\u30fc\u30de\u30fc\u30ec\u30a4\u30e4\u30fc\u306e\u30af\u30ed\u30fc\u30f3\u3092\u4f5c\u6210\u3059\u308b\u304b\u3069\u3046\u304b\u3092\u6307\u5b9a\u3057\u307e\u3059 (\u5c11\u3057\u901f\u304f\u306a\u308a\u307e\u3059)</li>\n<li><span translate=no>_^_7_^_</span>\u30e2\u30c7\u30eb\u306e\u30c7\u30fc\u30bf\u578b\u3067\u3059</li>\n<li><span translate=no>_^_8_^_</span>\u30e2\u30c7\u30eb\u306e\u30c7\u30d0\u30a4\u30b9\u3067\u3059</li>\n<li><span translate=no>_^_9_^_</span>int8 \u91cf\u5b50\u5316\u3092\u4f7f\u7528\u3059\u308b\u304b\u3069\u3046\u304b\u3092\u6307\u5b9a\u3057\u307e\u3059</li>\n<li><span translate=no>_^_10_^_</span><span translate=no>_^_11_^_</span>\u5916\u308c\u5024\u306e\u7279\u5fb4\u3092\u5206\u96e2\u3059\u308b\u305f\u3081\u306e\u95be\u5024\u3067\u3059</li>\n<li><span translate=no>_^_12_^_</span><a href=\"https://github.com/HazyResearch/flash-attention\">\u30d5\u30e9\u30c3\u30b7\u30e5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u3092\u4f7f\u7528\u3059\u308b\u304b\u3069\u3046\u304b\u3092\u6307\u5b9a\u3057\u307e\u3059</a></li></ul>\n",
 "<h3>Generator to get layers</h3>\n": "<h3>\u30ec\u30a4\u30e4\u30fc\u3092\u53d6\u5f97\u3059\u308b\u305f\u3081\u306e\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc</h3>\n",
--- a/Show More
+++ b/Show More