utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF

Model Description

This is a quantized GGUF version of my fine-tuned model utkmst/chimera-beta-test2-lora-merged, which was created by LoRA fine-tuning the Meta Llama-3.1-8B-Instruct model and merging the resulting adapter with the base model. The GGUF conversion was performed using llama.cpp with Q4_K_M quantization for efficient inference.

Architecture

  • Base Model: meta-llama/Llama-3.1-8B-Instruct
  • Size: 8.03B parameters
  • Type: Decoder-only transformer
  • Quantization: Q4_K_M GGUF format (4-bit quantization with K-means clustering)

Training Details

  • Training Method: LoRA fine-tuning followed by adapter merging
  • LoRA Configuration:
    • Rank: 8
    • Alpha: 16
    • Trainable modules: Attention layers and feed-forward networks
  • Training Hyperparameters:
    • Learning rate: 2e-4
    • Batch size: 2
    • Training epochs: 1
    • Optimizer: AdamW with constant scheduler

Dataset

The model was trained on a curated mixture of high-quality instruction datasets:

  • OpenAssistant/oasst1: Human-generated conversations with AI assistants
  • databricks/databricks-dolly-15k: Instruction-following examples
  • Open-Orca/OpenOrca: Augmented training data based on GPT-4 generations
  • mlabonne/open-perfectblend: A carefully balanced blend of open-source instruction data
  • tatsu-lab/alpaca: Self-instructed data based on demonstrations

Intended Use

This model is designed for:

  • General purpose assistant capabilities
  • Question answering and knowledge retrieval
  • Creative content generation
  • Instructional guidance

It's optimized for deployment in resource-constrained environments due to its quantized nature while maintaining good response quality.

Limitations

  • Reduced numerical precision due to quantization may impact performance on certain mathematical or precise reasoning tasks
  • Base model limitations including potential hallucinations and factual inaccuracies
  • Limited context window compared to larger models
  • Knowledge cutoff from the base Llama-3.1 model
  • May exhibit biases present in training data

Use with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF --hf-file chimera-beta-test2-lora-merged-q4_k_m.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF --hf-file chimera-beta-test2-lora-merged-q4_k_m.gguf -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF --hf-file chimera-beta-test2-lora-merged-q4_k_m.gguf -p "The meaning to life and the universe is"

or

./llama-server --hf-repo utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF --hf-file chimera-beta-test2-lora-merged-q4_k_m.gguf -c 2048
Downloads last month
25
GGUF
Model size
8.03B params
Architecture
llama

4-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF

Quantized
(369)
this model

Datasets used to train utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF