fixie-ai/ultravox-v0_5-llama-3_3-70b · Loading checkpoint shards: killed at 33%

May 1

Hi,

I am running an application using this model and when loading it gets killed at the 33% mark. My setup is as follows

Ubuntu Server 22.04 w/ drivers
1x RTX A6000 (48GB) [Premium]
6 vCPU, 96 GB RAM, 300 GB Storage

htop app shows that i have almost 90 GB of RAM free
further more there is 48 GB of GPU Memory

Here is the sample of my code that i use to access the model

import gradio as gr
import torch
import transformers
import librosa
import numpy as np
import tempfile
import os
from kokoro import KPipeline
import soundfile as sf
from typing import Dict, Optional, Tuple
import huggingface_hub
from huggingface_hub import login

class VoiceAssistant:
# Available voices with their configurations
VOICES = {
"Bella (US Female)": {"code": "af_bella", "lang_code": "a"},
"Nicole (US Female)": {"code": "af_nicole", "lang_code": "a"},
"Michael (US Male)": {"code": "am_michael", "lang_code": "a"},
"Emma (UK Female)": {"code": "bf_emma", "lang_code": "b"},
"George (UK Male)": {"code": "bm_george", "lang_code": "b"}
}

def __init__(self):
    """Initialize both Ultravox and Kokoro TTS models"""
    access_token_read = "token i got from Huggingface for gated repo"
    login(token=access_token_read)
    print("Loading Ultravox model... This may take a few minutes...")
    self.pipe = transformers.pipeline(
        model='fixie-ai/ultravox-v0_5-llama-3_3-70b',  # Updated to v0_5
        # model='fixie-ai/ultravox-v0_4',  # Original 04
        trust_remote_code=True
    )

print("Model loaded successfully!")

Could you please let me know what it is that i am missing?

Thanks,
Sincerely,
Arshad.