HQQ+PEFT
model_name = "meta-llama/Llama-3.2-3B-Instruct"
model = torch.load("merged_model.pth", map_location=device, weights_only=False)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_name)
model.prefill_forward = model.forward
model.forward = torch.compile(model.forward, mode='max-autotune', dynamic=False, fullgraph=True)
backend='gemlite'
prepare_for_inference(model, backend=backend)
torch.cuda.empty_cache()
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for lyonlu13/edge-ai-final
Base model
meta-llama/Llama-3.2-3B-Instruct