"Not all quantized model perform good", serving framework ollama uses NVIDIA gpu, llama.cpp uses CPU with AVX & AMX
v1k
xbruce22
AI & ML interests
None yet
Recent Activity
liked
a model about 18 hours ago
sarvamai/sarvam-105b liked
a model about 18 hours ago
sarvamai/sarvam-30b liked
a model 5 days ago
janhq/Jan-code-4b