Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,13 @@
|
|
1 |
---
|
2 |
base_model:
|
3 |
-
- utkmst/chimera-beta-test2-lora-merged
|
4 |
- meta-llama/Llama-3.1-8B-Instruct
|
|
|
|
|
5 |
tags:
|
6 |
- llama-cpp
|
7 |
-
- gguf
|
|
|
|
|
8 |
license: llama3.1
|
9 |
datasets:
|
10 |
- OpenAssistant/oasst1
|
@@ -12,13 +15,54 @@ datasets:
|
|
12 |
- Open-Orca/OpenOrca
|
13 |
- mlabonne/open-perfectblend
|
14 |
- tatsu-lab/alpaca
|
15 |
-
language:
|
16 |
-
- en
|
17 |
---
|
18 |
|
19 |
# utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF
|
20 |
-
|
21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
## Use with llama.cpp
|
24 |
Install llama.cpp through brew (works on Mac and Linux)
|
|
|
1 |
---
|
2 |
base_model:
|
|
|
3 |
- meta-llama/Llama-3.1-8B-Instruct
|
4 |
+
language:
|
5 |
+
- en
|
6 |
tags:
|
7 |
- llama-cpp
|
8 |
+
- gguf
|
9 |
+
- llama-3.1
|
10 |
+
- instruction-tuned
|
11 |
license: llama3.1
|
12 |
datasets:
|
13 |
- OpenAssistant/oasst1
|
|
|
15 |
- Open-Orca/OpenOrca
|
16 |
- mlabonne/open-perfectblend
|
17 |
- tatsu-lab/alpaca
|
|
|
|
|
18 |
---
|
19 |
|
20 |
# utkmst/chimera-beta-test2-lora-merged-Q4_K_M-GGUF
|
21 |
+
|
22 |
+
## Model Description
|
23 |
+
This is a quantized GGUF version of my fine-tuned model [`utkmst/chimera-beta-test2-lora-merged`](https://huggingface.co/utkmst/chimera-beta-test2-lora-merged), which was created by LoRA fine-tuning the Meta Llama-3.1-8B-Instruct model and merging the resulting adapter with the base model. The GGUF conversion was performed using llama.cpp with Q4_K_M quantization for efficient inference.
|
24 |
+
|
25 |
+
## Architecture
|
26 |
+
- **Base Model**: meta-llama/Llama-3.1-8B-Instruct
|
27 |
+
- **Size**: 8B parameters
|
28 |
+
- **Type**: Decoder-only transformer
|
29 |
+
- **Quantization**: Q4_K_M GGUF format (4-bit quantization with K-means clustering)
|
30 |
+
|
31 |
+
## Training Details
|
32 |
+
- **Training Method**: LoRA fine-tuning followed by adapter merging
|
33 |
+
- **LoRA Configuration**:
|
34 |
+
- Rank: 8
|
35 |
+
- Alpha: 16
|
36 |
+
- Trainable modules: Attention layers and feed-forward networks
|
37 |
+
- **Training Hyperparameters**:
|
38 |
+
- Learning rate: 2e-4
|
39 |
+
- Batch size: 2
|
40 |
+
- Training epochs: 1
|
41 |
+
- Optimizer: AdamW with constant scheduler
|
42 |
+
|
43 |
+
## Dataset
|
44 |
+
The model was trained on a curated mixture of high-quality instruction datasets:
|
45 |
+
- OpenAssistant/oasst1: Human-generated conversations with AI assistants
|
46 |
+
- databricks/databricks-dolly-15k: Instruction-following examples
|
47 |
+
- Open-Orca/OpenOrca: Augmented training data based on GPT-4 generations
|
48 |
+
- mlabonne/open-perfectblend: A carefully balanced blend of open-source instruction data
|
49 |
+
- tatsu-lab/alpaca: Self-instructed data based on demonstrations
|
50 |
+
|
51 |
+
## Intended Use
|
52 |
+
This model is designed for:
|
53 |
+
- General purpose assistant capabilities
|
54 |
+
- Question answering and knowledge retrieval
|
55 |
+
- Creative content generation
|
56 |
+
- Instructional guidance
|
57 |
+
|
58 |
+
It's optimized for deployment in resource-constrained environments due to its quantized nature while maintaining good response quality.
|
59 |
+
|
60 |
+
## Limitations
|
61 |
+
- Reduced numerical precision due to quantization may impact performance on certain mathematical or precise reasoning tasks
|
62 |
+
- Base model limitations including potential hallucinations and factual inaccuracies
|
63 |
+
- Limited context window compared to larger models
|
64 |
+
- Knowledge cutoff from the base Llama-3.1 model
|
65 |
+
- May exhibit biases present in training data
|
66 |
|
67 |
## Use with llama.cpp
|
68 |
Install llama.cpp through brew (works on Mac and Linux)
|