Files
2024-06-21 19:35:22 +05:30
..
2024-06-21 19:35:22 +05:30
2024-06-21 19:35:22 +05:30
2024-06-21 19:35:22 +05:30

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="zh">
<head>
    <meta http-equiv="content-type" content="text/html;charset=utf-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
    <meta name="description" content=""/>

    <meta name="twitter:card" content="summary"/>
    <meta name="twitter:image:src" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
    <meta name="twitter:title" content=" 视觉变压器 (ViT)"/>
    <meta name="twitter:description" content=""/>
    <meta name="twitter:site" content="@labmlai"/>
    <meta name="twitter:creator" content="@labmlai"/>

    <meta property="og:url" content="https://nn.labml.ai/transformers/vit/readme.html"/>
    <meta property="og:title" content=" 视觉变压器 (ViT)"/>
    <meta property="og:image" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
    <meta property="og:site_name" content=" 视觉变压器 (ViT)"/>
    <meta property="og:type" content="object"/>
    <meta property="og:title" content=" 视觉变压器 (ViT)"/>
    <meta property="og:description" content=""/>

    <title> 视觉变压器 (ViT)</title>
    <link rel="shortcut icon" href="/icon.png"/>
    <link rel="stylesheet" href="../../pylit.css?v=1">
    <link rel="canonical" href="https://nn.labml.ai/transformers/vit/readme.html"/>
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.13.18/dist/katex.min.css" integrity="sha384-zTROYFVGOfTw7JV7KUu8udsvW2fx4lWOsCEDqhBreBwlHI4ioVRtmIvEThzJHGET" crossorigin="anonymous">

    <!-- Global site tag (gtag.js) - Google Analytics -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=G-4V3HC8HBLH"></script>
    <script>
        window.dataLayer = window.dataLayer || [];

        function gtag() {
            dataLayer.push(arguments);
        }

        gtag('js', new Date());

        gtag('config', 'G-4V3HC8HBLH');
    </script>
</head>
<body>
<div id='container'>
    <div id="background"></div>
    <div class='section'>
        <div class='docs'>
            <p>
                <a class="parent" href="/">home</a>
                <a class="parent" href="../index.html">transformers</a>
                <a class="parent" href="index.html">vit</a>
            </p>
            <p>
                <a href="https://github.com/labmlai/annotated_deep_learning_paper_implementations" target="_blank">
                    <img alt="Github"
                         src="https://img.shields.io/github/stars/labmlai/annotated_deep_learning_paper_implementations?style=social"
                         style="max-width:100%;"/></a>
                <a href="https://twitter.com/labmlai" rel="nofollow" target="_blank">
                    <img alt="Twitter"
                         src="https://img.shields.io/twitter/follow/labmlai?style=social"
                         style="max-width:100%;"/></a>
            </p>
            <p>
                <a href="https://github.com/labmlai/annotated_deep_learning_paper_implementations/tree/master/labml_nn/transformers/vit/readme.md" target="_blank">
                    View code on Github</a>
            </p>
        </div>
    </div>
    <div class='section' id='section-0'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-0'>#</a>
            </div>
            <h1><a href="https://nn.labml.ai/transformer/vit/index.html">视觉变压器 (ViT)</a></h1>
<p>这是论文《图像<a href="https://arxiv.org/abs/2010.11929">值得 16x16 Words大规模图像识别的变形金刚》的 PyTorc</a> <a href="https://pytorch.org">h</a> 实现。</p>
<p>视觉变换器将纯变换器应用于没有任何卷积层的图像。他们将图像拆分为补丁,然后在补丁嵌入上应用变换器。<a href="https://nn.labml.ai/transformer/vit/index.html#PathEmbeddings">补丁嵌入</a>是通过对面片的扁平化像素值应用简单的线性变换来生成的。然后将标准变压器编码器与补丁嵌入以及分类令牌一起馈送<code  class="highlight"><span></span><span class="p">[</span><span class="n">CLS</span><span class="p">]</span></code>
。<code  class="highlight"><span></span><span class="p">[</span><span class="n">CLS</span><span class="p">]</span></code>
令牌上的编码用于使用 MLP 对图像进行分类。</p>
向@@ <p>变压器提供补丁时,学习的位置嵌入会添加到补丁嵌入中,因为补丁嵌入没有关于补丁来自何处的任何信息。位置嵌入是每个面片位置的一组向量,这些向量通过梯度下降以及其他参数进行训练。</p>
<p>VIT 在大型数据集上进行预训练时表现良好。本文建议使用 MLP 分类头对它们进行预训练然后在微调时使用单个线性层。该论文在3亿张图像数据集上预先训练了ViT击败了SOTA。它们还在推理过程中使用更高分辨率的图像同时保持补丁大小不变。新面片位置的位置嵌入是通过插值学习位置嵌入来计算的。</p>
<p><a href="https://nn.labml.ai/transformer/vit/experiment.html">这是一个在 CIFAR-10 上训练 ViT 的实验</a>。这样做不太好因为它是在一个小数据集上训练的。这是一个简单的实验任何人都可以运行和玩VIT。</p>

        </div>
        <div class='code'>
            
        </div>
    </div>
    <div class='footer'>
        <a href="https://labml.ai">labml.ai</a>
    </div>
</div>
<script src=../../interactive.js?v=1"></script>
<script>
    function handleImages() {
        var images = document.querySelectorAll('p>img')

        for (var i = 0; i < images.length; ++i) {
            handleImage(images[i])
        }
    }

    function handleImage(img) {
        img.parentElement.style.textAlign = 'center'

        var modal = document.createElement('div')
        modal.id = 'modal'

        var modalContent = document.createElement('div')
        modal.appendChild(modalContent)

        var modalImage = document.createElement('img')
        modalContent.appendChild(modalImage)

        var span = document.createElement('span')
        span.classList.add('close')
        span.textContent = 'x'
        modal.appendChild(span)

        img.onclick = function () {
            console.log('clicked')
            document.body.appendChild(modal)
            modalImage.src = img.src
        }

        span.onclick = function () {
            document.body.removeChild(modal)
        }
    }

    handleImages()
</script>
</body>
</html>