mirror of
https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
synced 2025-11-03 22:09:18 +08:00
fix conditional probabilty typo
This commit is contained in:
@ -5,24 +5,26 @@ This is an implementation of the paper
|
|||||||
[Generalization through Memorization: Nearest Neighbor Language Models](https://arxiv.org/abs/1911.00172).
|
[Generalization through Memorization: Nearest Neighbor Language Models](https://arxiv.org/abs/1911.00172).
|
||||||
It uses k-nearest neighbors to improve perplexity of autoregressive transformer models.
|
It uses k-nearest neighbors to improve perplexity of autoregressive transformer models.
|
||||||
|
|
||||||
An autoregressive language model estimates $p(w_t, \color{yellowgreen}{c_t})$,
|
An autoregressive language model estimates $p(w_t | \color{yellowgreen}{c_t})$,
|
||||||
where $w_t$ is the token at step $t$
|
where $w_t$ is the token at step $t$
|
||||||
and $c_t$ is the context, $\color{yellowgreen}{c_t} = (w_1, w_2, ..., w_{t-1})$.
|
and $c_t$ is the context, $\color{yellowgreen}{c_t} = (w_1, w_2, ..., w_{t-1})$.
|
||||||
|
|
||||||
This paper, improves $p(w_t, c_t)$ using a k-nearest neighbor search
|
This paper, improves $p(w_t | \color{yellowgreen}{c_t})$ using a k-nearest neighbor search
|
||||||
on key-value pairs $\big(f(c_i), w_i\big)$, with search key $f(\color{yellowgreen}{c_t})$.
|
on key-value pairs $\big(f(c_i), w_i\big)$, with search key $f(\color{yellowgreen}{c_t})$.
|
||||||
Here $f(\color{yellowgreen}{c_t})$ is an embedding of the context $c_t$.
|
Here $f(\color{yellowgreen}{c_t})$ is an embedding of the context $\color{yellowgreen}{c_t}$.
|
||||||
The paper (and this implementation) uses the *input* to the feed-forward layer of the
|
The paper (and this implementation) uses the **input to the feed-forward layer of the
|
||||||
final layer of the transformer as $f(\color{yellowgreen}{c_t})$.
|
final layer of the transformer** as $f(\color{yellowgreen}{c_t})$.
|
||||||
|
|
||||||
We use [FAISS](https://github.com/facebookresearch/faiss) to index $f(c_i)$.
|
We use [FAISS](https://github.com/facebookresearch/faiss) to index $f(c_i)$.
|
||||||
|
|
||||||
|
### Implementation
|
||||||
|
|
||||||
So to run $k$NN-LM we need to:
|
So to run $k$NN-LM we need to:
|
||||||
|
|
||||||
* [Train a transformer model](train_model.html)
|
* [Train a transformer model](train_model.html)
|
||||||
* [Build an index](build_index.html) of $\big(f(c_i), w_i\big)$
|
* [Build an index](build_index.html) of $\big(f(c_i), w_i\big)$
|
||||||
* [Evaluate kNN-ML](eval_knn.html) using $k$NN seach on $\big(f(c_i), w_i\big)$
|
* [Evaluate kNN-ML](eval_knn.html) using $k$NN seach on $\big(f(c_i), w_i\big)$
|
||||||
with $f(c_t)$
|
with $f(\color{yellowgreen}{c_t})$
|
||||||
|
|
||||||
This experiment uses a small dataset so that we can run this without using up a few hundred giga-bytes
|
This experiment uses a small dataset so that we can run this without using up a few hundred giga-bytes
|
||||||
of disk space for the index.
|
of disk space for the index.
|
||||||
|
|||||||
Reference in New Issue
Block a user