Technology
DeiT
DeiT (Data-efficient Image Transformers) is a Vision Transformer (ViT) variant that achieves high-accuracy image classification using a novel distillation token, cutting training time and data requirements significantly.
DeiT, introduced by Hugo Touvron et al., solves the original Vision Transformer's (ViT) data hunger: it delivers competitive, convolution-free performance without massive external pre-training datasets. The core innovation is a **distillation token**—a teacher-student strategy that efficiently transfers inductive bias, often from a ConvNet teacher, to the ViT architecture. This method allows training on ImageNet-1k only, completing the process in less than three days on a single machine. The result: the distilled model hits up to **85.2% top-1 accuracy**, making high-performance visual transformers accessible and resource-efficient.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1