trainfarren/john-welbourne-csm-1b - 4BIT Quantized
This is a 4bit quantized version of trainfarren/john-welbourne-csm-1b for faster inference.
Model Description
This model has been quantized using BitsAndBytesConfig to reduce memory usage and improve inference speed while maintaining quality.
Quantization Details
- Quantization Type: 4bit
- Original Model: trainfarren/john-welbourne-csm-1b
- Expected Speed Improvement: 2-4x faster inference
- Memory Reduction: ~50-75% less VRAM usage
Usage
from transformers import CsmForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
import torch
# The model is already quantized, so just load it normally
model = CsmForConditionalGeneration.from_pretrained(
"john-welbourne-csm-1b-4bit",
torch_dtype=torch.float16,
device_map="auto"
)
processor = AutoProcessor.from_pretrained("john-welbourne-csm-1b-4bit")
# Use with your existing CSM inference code
Performance
Expected improvements over the original model:
- Inference Speed: 2-4x faster
- Memory Usage: 50-75% reduction
- Quality: Minimal degradation
Original Model
This model is based on trainfarren/john-welbourne-csm-1b.
- Downloads last month
- 15