Chineese translation

This commit is contained in:
Varuna Jayasiri
2024-08-16 16:35:25 +05:30
parent edf875aa70
commit f3465ac926
17 changed files with 1651 additions and 58 deletions

View File

@ -0,0 +1,25 @@
{
"<h1>Low-Rank Adaptation (LoRA)</h1>\n<p>This is an implementation of <a href=\"https://arxiv.org/abs/2106.09685\">Low-Rank Adaptation (LoRA)</a> in <a href=\"https://pytorch.org\">PyTorch</a>.</p>\n<p>Low-Rank Adaptation (LoRA) freezes pre-trained model weights and injects trainable rank decomposition matrices into each layer of the transformer. This makes it possible to efficiently fine-tune large langauge models by reducing trainable parameters by a large factor.</p>\n<p>Here&#x27;s <a href=\"experiment.html\">the training code</a> for training a GPT2 model with LoRA on Tiny Shakespeare dataset.</p>\n": "<h1>Low-Rank Adaptation (LoRA)</h1>\n<p>This is an implementation of <a href=\"https://arxiv.org/abs/2106.09685\">Low-Rank Adaptation (LoRA)</a> in <a href=\"https://pytorch.org\">PyTorch</a>.</p>\n<p>Low-Rank Adaptation (LoRA) freezes pre-trained model weights and injects trainable rank decomposition matrices into each layer of the transformer. This makes it possible to efficiently fine-tune large langauge models by reducing trainable parameters by a large factor.</p>\n<p>Here&#x27;s <a href=\"experiment.html\">the training code</a> for training a GPT2 model with LoRA on Tiny Shakespeare dataset.</p>\n",
"<h2>LoRA Embedding Layer</h2>\n<p>Similar to LoRA linear layer this adds a low-rank decomposition to the pre-trained embedding weights matrix (<span translate=no>_^_0_^_</span>).</p>\n<p><span translate=no>_^_1_^_</span></p>\n": "<h2>LoRA Embedding Layer</h2>\n<p>Similar to LoRA linear layer this adds a low-rank decomposition to the pre-trained embedding weights matrix (<span translate=no>_^_0_^_</span>).</p>\n<p><span translate=no>_^_1_^_</span></p>\n",
"<h2>LoRA Linear Layer</h2>\n<p>LoRA linear layer adds a low-rank decomposition to the pre-trained weight matrix (<span translate=no>_^_0_^_</span>) of the linear layer.</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>, where <span translate=no>_^_2_^_</span>, <span translate=no>_^_3_^_</span>, and the rank <span translate=no>_^_4_^_</span>.</p>\n<p>All parameters are frozen except <span translate=no>_^_5_^_</span> and <span translate=no>_^_6_^_</span>.</p>\n<p><span translate=no>_^_7_^_</span> is initialized to be zero at the beginning of the training.</p>\n<p>They multiple <span translate=no>_^_8_^_</span> by <span translate=no>_^_9_^_</span> where <span translate=no>_^_10_^_</span> is a hyper-parameter. Once <span translate=no>_^_11_^_</span> is tuned it can be kept the same when varying <span translate=no>_^_12_^_</span>.</p>\n": "<h2>LoRA Linear Layer</h2>\n<p>LoRA linear layer adds a low-rank decomposition to the pre-trained weight matrix (<span translate=no>_^_0_^_</span>) of the linear layer.</p>\n<p><span translate=no>_^_1_^_</span></p>\n<p>, where <span translate=no>_^_2_^_</span>, <span translate=no>_^_3_^_</span>, and the rank <span translate=no>_^_4_^_</span>.</p>\n<p>All parameters are frozen except <span translate=no>_^_5_^_</span> and <span translate=no>_^_6_^_</span>.</p>\n<p><span translate=no>_^_7_^_</span> is initialized to be zero at the beginning of the training.</p>\n<p>They multiple <span translate=no>_^_8_^_</span> by <span translate=no>_^_9_^_</span> where <span translate=no>_^_10_^_</span> is a hyper-parameter. Once <span translate=no>_^_11_^_</span> is tuned it can be kept the same when varying <span translate=no>_^_12_^_</span>.</p>\n",
"<p> </p>\n": "<p> </p>\n",
"<p>Add <span translate=no>_^_0_^_</span> </p>\n": "<p>Add <span translate=no>_^_0_^_</span> </p>\n",
"<p>Bias parameter <span translate=no>_^_0_^_</span> (also frozen) </p>\n": "<p>Bias parameter <span translate=no>_^_0_^_</span> (also frozen) </p>\n",
"<p>Compute <span translate=no>_^_0_^_</span> </p>\n": "<p>Compute <span translate=no>_^_0_^_</span> </p>\n",
"<p>Compute the embeddings <span translate=no>_^_0_^_</span> </p>\n": "<p>Compute the embeddings <span translate=no>_^_0_^_</span> </p>\n",
"<p>Freeze it </p>\n": "<p>Freeze it </p>\n",
"<p>Initialize <span translate=no>_^_0_^_</span> similar to a weight matrix in a normal linear layer </p>\n": "<p>Initialize <span translate=no>_^_0_^_</span> similar to a weight matrix in a normal linear layer </p>\n",
"<p>Initialize <span translate=no>_^_0_^_</span> to <span translate=no>_^_1_^_</span> so that <span translate=no>_^_2_^_</span> is <span translate=no>_^_3_^_</span> at initialization </p>\n": "<p>Initialize <span translate=no>_^_0_^_</span> to <span translate=no>_^_1_^_</span> so that <span translate=no>_^_2_^_</span> is <span translate=no>_^_3_^_</span> at initialization </p>\n",
"<p>Initialize <span translate=no>_^_0_^_</span> with a normal distribution </p>\n": "<p>Initialize <span translate=no>_^_0_^_</span> with a normal distribution </p>\n",
"<p>Matrix <span translate=no>_^_0_^_</span> </p>\n": "<p>Matrix <span translate=no>_^_0_^_</span> </p>\n",
"<p>Matrix <span translate=no>_^_0_^_</span>, we keep <span translate=no>_^_1_^_</span> and <span translate=no>_^_2_^_</span> transposed </p>\n": "<p>Matrix <span translate=no>_^_0_^_</span>, we keep <span translate=no>_^_1_^_</span> and <span translate=no>_^_2_^_</span> transposed </p>\n",
"<p>No bias parameter </p>\n": "<p>No bias parameter </p>\n",
"<p>Set <span translate=no>_^_0_^_</span> is not provided. i.e. make the scaling factor <span translate=no>_^_1_^_</span>. </p>\n": "<p>Set <span translate=no>_^_0_^_</span> is not provided. i.e. make the scaling factor <span translate=no>_^_1_^_</span>. </p>\n",
"<p>The pre-trained embedding weights <span translate=no>_^_0_^_</span> (frozen) </p>\n": "<p>The pre-trained embedding weights <span translate=no>_^_0_^_</span> (frozen) </p>\n",
"<p>The pre-trained weight <span translate=no>_^_0_^_</span> </p>\n": "<p>The pre-trained weight <span translate=no>_^_0_^_</span> </p>\n",
"<p>scaling factor <span translate=no>_^_0_^_</span> </p>\n": "<p>scaling factor <span translate=no>_^_0_^_</span> </p>\n",
"<ul><li><span translate=no>_^_0_^_</span> is the number of embeddings </li>\n<li><span translate=no>_^_1_^_</span> is the number embedding dimensions </li>\n<li><span translate=no>_^_2_^_</span> is the rank of the decomposition <span translate=no>_^_3_^_</span> </li>\n<li><span translate=no>_^_4_^_</span> is the scaling factor <span translate=no>_^_5_^_</span></li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span> is the number of embeddings </li>\n<li><span translate=no>_^_1_^_</span> is the number embedding dimensions </li>\n<li><span translate=no>_^_2_^_</span> is the rank of the decomposition <span translate=no>_^_3_^_</span> </li>\n<li><span translate=no>_^_4_^_</span> is the scaling factor <span translate=no>_^_5_^_</span></li></ul>\n",
"<ul><li><span translate=no>_^_0_^_</span> is the number of input features of the linear layer </li>\n<li><span translate=no>_^_1_^_</span> is the number of output features of the linear layer </li>\n<li><span translate=no>_^_2_^_</span> is a flag indicating if there is a bias parameter </li>\n<li><span translate=no>_^_3_^_</span> is the rank of the decomposition <span translate=no>_^_4_^_</span> </li>\n<li><span translate=no>_^_5_^_</span> is the scaling factor <span translate=no>_^_6_^_</span></li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span> is the number of input features of the linear layer </li>\n<li><span translate=no>_^_1_^_</span> is the number of output features of the linear layer </li>\n<li><span translate=no>_^_2_^_</span> is a flag indicating if there is a bias parameter </li>\n<li><span translate=no>_^_3_^_</span> is the rank of the decomposition <span translate=no>_^_4_^_</span> </li>\n<li><span translate=no>_^_5_^_</span> is the scaling factor <span translate=no>_^_6_^_</span></li></ul>\n",
"Annotated implementation of RoRA from paper LoRA: Low-Rank Adaptation of Large Language Models": "Annotated implementation of RoRA from paper LoRA: Low-Rank Adaptation of Large Language Models",
"Low-Rank Adaptation (LoRA)": "Low-Rank Adaptation (LoRA)"
}

View File

@ -0,0 +1,24 @@
{
"<h1>Finetune GPT-2 with <a href=\"index.html\">LoRA</a></h1>\n<p>Here&#x27;s a Colab notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/lora/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>Finetune GPT-2 with <a href=\"index.html\">LoRA</a></h1>\n<p>Here&#x27;s a Colab notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/lora/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
"<h2>Trainer configurations and the training loop</h2>\n<p>The default configs can and will be over-ridden when we start the experiment</p>\n": "<h2>Trainer configurations and the training loop</h2>\n<p>The default configs can and will be over-ridden when we start the experiment</p>\n",
"<h3>Initialize the model, optimizer and dataloader</h3>\n": "<h3>Initialize the model, optimizer and dataloader</h3>\n",
"<h3>Load pre-trained <a href=\"https://huggingface.co/openai-community/gpt2\">GPT-2 from huggingface</a></h3>\n": "<h3>Load pre-trained <a href=\"https://huggingface.co/openai-community/gpt2\">GPT-2 from huggingface</a></h3>\n",
"<h3>Tiny Shakespeare dataset</h3>\n<p>It will download from the url if not present</p>\n": "<h3>Tiny Shakespeare dataset</h3>\n<p>It will download from the url if not present</p>\n",
"<h3>Training loop</h3>\n": "<h3>Training loop</h3>\n",
"<p>Dataset </p>\n": "<p>Dataset </p>\n",
"<p>GPT-2 configs </p>\n": "<p>GPT-2 configs </p>\n",
"<p>GPT-2 hugging face uses 1D Convolution layers. We need to transpose those weights since we use linear layers </p>\n": "<p>GPT-2 hugging face uses 1D Convolution layers. We need to transpose those weights since we use linear layers </p>\n",
"<p>Initialize the data loader </p>\n": "<p>Initialize the data loader </p>\n",
"<p>Initialize the model </p>\n": "<p>Initialize the model </p>\n",
"<p>Initialize the optimizer </p>\n": "<p>Initialize the optimizer </p>\n",
"<p>LoRA rank </p>\n": "<p>LoRA rank </p>\n",
"<p>Load out model </p>\n": "<p>Load out model </p>\n",
"<p>Load pre-trained model weights </p>\n": "<p>Load pre-trained model weights </p>\n",
"<p>Load the huggingface model and get the parameters </p>\n": "<p>Load the huggingface model and get the parameters </p>\n",
"<p>Mapping (<span translate=no>_^_0_^_</span>) of decoder layers </p>\n": "<p>Mapping (<span translate=no>_^_0_^_</span>) of decoder layers </p>\n",
"<p>Move the parameters based on mapping </p>\n": "<p>Move the parameters based on mapping </p>\n",
"<p>Training configs </p>\n": "<p>Training configs </p>\n",
"<p>Transformer embedding and prediction layer parameter mapping (<span translate=no>_^_0_^_</span>) </p>\n": "<p>Transformer embedding and prediction layer parameter mapping (<span translate=no>_^_0_^_</span>) </p>\n",
"Finetune GPT-2 with LoRA": "Finetune GPT-2 with LoRA",
"This is training code with notes for fine-tuning pre-trained GPT-2 model with LoRA.": "This is training code with notes for fine-tuning pre-trained GPT-2 model with LoRA."
}

View File

@ -0,0 +1,16 @@
{
"<p> Splits hidden_size dim into attn_head_size and num_heads</p>\n": "<p> Splits hidden_size dim into attn_head_size and num_heads</p>\n",
"<p>Add position embeddings </p>\n": "<p>Add position embeddings </p>\n",
"<p>Final normalization </p>\n": "<p>Final normalization </p>\n",
"<p>Get logits from projection layer </p>\n": "<p>Get logits from projection layer </p>\n",
"<p>Get position embeddings </p>\n": "<p>Get position embeddings </p>\n",
"<p>Get position ids </p>\n": "<p>Get position ids </p>\n",
"<p>Get token embeddings </p>\n": "<p>Get token embeddings </p>\n",
"<p>Run through transformer blocks </p>\n": "<p>Run through transformer blocks </p>\n",
"<p>lin1 </p>\n": "<p>lin1 </p>\n",
"<p>lin2 </p>\n": "<p>lin2 </p>\n",
"<p>out </p>\n": "<p>out </p>\n",
"<p>qkv </p>\n": "<p>qkv </p>\n",
"<ul><li><span translate=no>_^_0_^_</span> has shape <span translate=no>_^_1_^_</span></li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span> has shape <span translate=no>_^_1_^_</span></li></ul>\n",
"gpt2.py": "gpt2.py"
}