HQQ+PEFT

model_name = "meta-llama/Llama-3.2-3B-Instruct"   
model = torch.load("merged_model.pth", map_location=device, weights_only=False)
model.eval() 
tokenizer = AutoTokenizer.from_pretrained(model_name)
        
model.prefill_forward = model.forward
model.forward = torch.compile(model.forward, mode='max-autotune', dynamic=False, fullgraph=True)

backend='gemlite'
prepare_for_inference(model, backend=backend) 
torch.cuda.empty_cache()
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lyonlu13/edge-ai-final

Finetuned
(453)
this model