utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF
Model Description
This is a quantized GGUF version of my fine-tuned model utkmst/chimera-beta-test2-lora-merged
, which was created by LoRA fine-tuning the Meta Llama-3.1-8B-Instruct model and merging the resulting adapter with the base model. The GGUF conversion was performed using llama.cpp with Q4_K_M quantization for efficient inference.
Architecture
- Base Model: meta-llama/Llama-3.1-8B-Instruct
- Size: 8.03B parameters
- Type: Decoder-only transformer
- Quantization: Q4_K_M GGUF format (4-bit quantization with K-means clustering)
Training Details
- Training Method: LoRA fine-tuning followed by adapter merging
- LoRA Configuration:
- Rank: 8
- Alpha: 16
- Trainable modules: Attention layers and feed-forward networks
- Training Hyperparameters:
- Learning rate: 2e-4
- Batch size: 2
- Training epochs: 1
- Optimizer: AdamW with constant scheduler
Dataset
The model was trained on a curated mixture of high-quality instruction datasets:
- OpenAssistant/oasst1: Human-generated conversations with AI assistants
- databricks/databricks-dolly-15k: Instruction-following examples
- Open-Orca/OpenOrca: Augmented training data based on GPT-4 generations
- mlabonne/open-perfectblend: A carefully balanced blend of open-source instruction data
- tatsu-lab/alpaca: Self-instructed data based on demonstrations
Intended Use
This model is designed for:
- General purpose assistant capabilities
- Question answering and knowledge retrieval
- Creative content generation
- Instructional guidance
It's optimized for deployment in resource-constrained environments due to its quantized nature while maintaining good response quality.
Limitations
- Reduced numerical precision due to quantization may impact performance on certain mathematical or precise reasoning tasks
- Base model limitations including potential hallucinations and factual inaccuracies
- Limited context window compared to larger models
- Knowledge cutoff from the base Llama-3.1 model
- May exhibit biases present in training data
Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)
brew install llama.cpp
Invoke the llama.cpp server or the CLI.
CLI:
llama-cli --hf-repo utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF --hf-file chimera-beta-test2-lora-merged-q4_k_m.gguf -p "The meaning to life and the universe is"
Server:
llama-server --hf-repo utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF --hf-file chimera-beta-test2-lora-merged-q4_k_m.gguf -c 2048
Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.
Step 1: Clone llama.cpp from GitHub.
git clone https://github.com/ggerganov/llama.cpp
Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1
flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
cd llama.cpp && LLAMA_CURL=1 make
Step 3: Run inference through the main binary.
./llama-cli --hf-repo utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF --hf-file chimera-beta-test2-lora-merged-q4_k_m.gguf -p "The meaning to life and the universe is"
or
./llama-server --hf-repo utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF --hf-file chimera-beta-test2-lora-merged-q4_k_m.gguf -c 2048
- Downloads last month
- 25
Model tree for utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF
Base model
meta-llama/Llama-3.1-8B