SVQVAE (Scalable Vector Quantized Variational Autoencoder)

Github: https://github.com/Open-Model-Initiative/SVQVAE

A scalable Vector Quantized Variational Autoencoder (VQVAE) for high-resolution image generation and reconstruction. This model supports tiled processing for handling large images efficiently.

Model Description

SVQVAE is a scalable variant of the Vector Quantized Variational Autoencoder that can process high-resolution images through tiled encoding and decoding. The model uses a discrete codebook to compress images into a latent representation and can reconstruct them at multiple scales.

Key Features

Scalable Processing: Handles high-resolution images through tiled processing
Multi-scale Output: Can generate reconstructions at different scales
Vector Quantization: Uses a discrete codebook for efficient compression
Attention Mechanisms: Includes self-attention blocks for better feature learning
Flexible Architecture: Configurable encoder/decoder with customizable channel multipliers

Citation

If you use this code in your research, please cite Austin J. Bryant and the Open Model Initiative.

Acknowledgments

This implementation is based on the VQVAE architecture and includes improvements for scalable processing of high-resolution images.

Repository Links

GitHub Repository: Open-Model-Initiative/SVQVAE
Model Weights: Available in this Hugging Face repository
Documentation: See the GitHub repository for detailed documentation and examples

This model is licensed under the OpenMDW License Agreement (See LICENSE)