ALIA-40b in GGUF format and quantized to `Q3_K`

ALIA-40B is a 40B parameter base language model developed by the Barcelona Supercomputing Center (BSC).

Original model and details here: https://huggingface.co/BSC-LT/ALIA-40b

This model is released under a permissive Apache 2.0 license. Along with the open weights, all training scripts and configuration files are made publicly available in this GitHub repository.

This repository contains the model in GGUF format and afterwards quantized to Q3_K level using llama.cpp.

Model Details

Description

Transformer-based decoder-only language model that has been pre-trained from scratch on 9.37 trillion tokens of highly curated data. The pre-training corpus contains text in 35 European languages and code.

Hyperparameters

The full list of hyperparameters can be found here.

Architecture


Total Parameters	40,433,885,184
Embedding Parameters	2,097,152,000
Layers	48
Hidden size	8,192
Attention heads	64
Context length	32,768
Vocabulary size	256,000
Precision	bfloat16
Embedding type	RoPE
Activation Function	SwiGLU
Layer normalization	RMS Norm
Flash attention	✅
Grouped Query Attention	✅
Num. query groups	8

Conversion Process

There are the steps that were followed to convert the weights to GGUF format and quantize.

1. Download from HuggingFace

Requirement: huggingface_hub

huggingface-cli download --cache-dir . BSC-LT/ALIA-40b

This command downloads the model into the directory ./models--BSC-LT--ALIA-40b/

The safetensors files end up inside ./models--BSC-LT--ALIA-40b/snapshots/aa8a4ac7f9e18f3c2ea8ec0cc84e7783cd751ac7/.

2. Convert Safetensors to GUFF without quantization using llama.cpp

Requirement: llama.cpp repository and python requirements installed.

cd $LLAMA_PATH
python convert_hf_to_gguf.py $ALIA_PATH/models--BSC-LT--ALIA-40b/snapshots/aa8a4ac7f9e18f3c2ea8ec0cc84e7783cd751ac7/ --outfile $ALIA_PATH/ALIA-40B.gguf

LLAMA_PATH is the root of the llama.cpp directory. ALIA_PATH is the directory where we downloaded the Safetensors weights and where we want to store the ALIA-40B GGUF file.

This creates the file $ALIA_PATH/ALIA-40B.gguf.

3. Quantize the model

Requirement: llama.cpp built and installed.

cd $ALIA_PATH
llama-quantize ALIA-40B.gguf ALIA-40B.Q3_K.gguf Q3_K

This generates the file ALIA-40B.Q3_K.gguf within the same directory.

csala
/

ALIA-40b-Q3_K-GGUF

ALIA-40b in GGUF format and quantized to `Q3_K`

Model Details

Description

Hyperparameters

Architecture

Conversion Process

1. Download from HuggingFace

2. Convert Safetensors to GUFF without quantization using llama.cpp

3. Quantize the model

Model tree for csala/ALIA-40b-Q3_K-GGUF

Datasets used to train csala/ALIA-40b-Q3_K-GGUF

ALIA-40b in GGUF format and quantized to Q3_K

Model Details

Description

Hyperparameters

Architecture

Conversion Process

1. Download from HuggingFace

2. Convert Safetensors to GUFF without quantization using llama.cpp

3. Quantize the model

Model tree for csala/ALIA-40b-Q3_K-GGUF

Datasets used to train csala/ALIA-40b-Q3_K-GGUF

ALIA-40b in GGUF format and quantized to `Q3_K`