Llama3-Taiwan-70B-Instruct-128K-AWQ-4bits leverages 4-bit quantized weights, processed with AutoAWQ, to significantly reduce GPU memory requirements.

References

For more information and detailed documentation, please refer to the links provided.

Downloads last month
0
Safetensors
Model size
11.3B params
Tensor type
I32
·
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support