1. EachPod

TiTok: A Transformer-based 1D Tokenization Approach for Image Generation

Author
Arjun Srivastava
Published
Thu 18 Jul 2024
Episode Link
https://arjunsriva.com/podcast/podcasts/2406.07550/

TiTok introduces a novel 1D tokenization method for image generation, enabling the representation of images with significantly fewer tokens while maintaining or surpassing the performance of existing 2D grid-based methods. The approach leverages a Vision Transformer architecture, two-stage training with proxy codes, and achieves remarkable speedup in training and inference. The research opens up new possibilities for efficient and high-quality image generation, with implications for various applications in computer vision and beyond.

Read full paper: https://arxiv.org/abs/2406.07550

Tags: Generative Models, Computer Vision, Transformers

Share to: