trainfarren/john-welbourne-csm-1b - 4BIT Quantized

This is a 4bit quantized version of trainfarren/john-welbourne-csm-1b for faster inference.

Model Description

This model has been quantized using BitsAndBytesConfig to reduce memory usage and improve inference speed while maintaining quality.

Quantization Details

  • Quantization Type: 4bit
  • Original Model: trainfarren/john-welbourne-csm-1b
  • Expected Speed Improvement: 2-4x faster inference
  • Memory Reduction: ~50-75% less VRAM usage

Usage

from transformers import CsmForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
import torch

# The model is already quantized, so just load it normally
model = CsmForConditionalGeneration.from_pretrained(
    "john-welbourne-csm-1b-4bit",
    torch_dtype=torch.float16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("john-welbourne-csm-1b-4bit")

# Use with your existing CSM inference code

Performance

Expected improvements over the original model:

  • Inference Speed: 2-4x faster
  • Memory Usage: 50-75% reduction
  • Quality: Minimal degradation

Original Model

This model is based on trainfarren/john-welbourne-csm-1b.

Downloads last month
15
Safetensors
Model size
1.1B params
Tensor type
F32
F16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for trainfarren/john-welbourne-csm-1b-4bit

Base model

sesame/csm-1b
Finetuned
unsloth/csm-1b
Quantized
(1)
this model