TiTok: A Transformer-based 1D Tokenization Approach for Image Generation

Author: Arjun Srivastava
Published: Thu 18 Jul 2024
Episode Link: https://arjunsriva.com/podcast/podcasts/2406.07550/

TiTok introduces a novel 1D tokenization method for image generation, enabling the representation of images with significantly fewer tokens while maintaining or surpassing the performance of existing 2D grid-based methods. The approach leverages a Vision Transformer architecture, two-stage training with proxy codes, and achieves remarkable speedup in training and inference. The research opens up new possibilities for efficient and high-quality image generation, with implications for various applications in computer vision and beyond.

Read full paper: https://arxiv.org/abs/2406.07550

Tags: Generative Models, Computer Vision, Transformers

Share to:

EachPod

EachPod

TiTok: A Transformer-based 1D Tokenization Approach for Image Generation