SVQVAE (Scalable Vector Quantized Variational Autoencoder)
Github: https://github.com/Open-Model-Initiative/SVQVAE
A scalable Vector Quantized Variational Autoencoder (VQVAE) for high-resolution image generation and reconstruction. This model supports tiled processing for handling large images efficiently.
Model Description
SVQVAE is a scalable variant of the Vector Quantized Variational Autoencoder that can process high-resolution images through tiled encoding and decoding. The model uses a discrete codebook to compress images into a latent representation and can reconstruct them at multiple scales.
Key Features
- Scalable Processing: Handles high-resolution images through tiled processing
- Multi-scale Output: Can generate reconstructions at different scales
- Vector Quantization: Uses a discrete codebook for efficient compression
- Attention Mechanisms: Includes self-attention blocks for better feature learning
- Flexible Architecture: Configurable encoder/decoder with customizable channel multipliers
Citation
If you use this code in your research, please cite Austin J. Bryant and the Open Model Initiative.
Acknowledgments
This implementation is based on the VQVAE architecture and includes improvements for scalable processing of high-resolution images.
Repository Links
- GitHub Repository: Open-Model-Initiative/SVQVAE
- Model Weights: Available in this Hugging Face repository
- Documentation: See the GitHub repository for detailed documentation and examples
This model is licensed under the OpenMDW License Agreement (See LICENSE)