File size: 7,028 Bytes
3029ffe 754afd2 d8775c2 3029ffe 1f89e2a 3029ffe d8775c2 3029ffe 1e49126 3029ffe d8775c2 3029ffe d8775c2 3029ffe |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
---
license: apache-2.0
language:
- en
tags:
- llama-factory
- lora
- open_thoughts
- chain-of-thought
- cot
- logicflow
- single-gpu-training
pipeline_tag: text-generation
base_model: meta-llama/Llama-3.2-3B-Instruct
datasets:
- open-thoughts/OpenThoughts-114k
---
# LogicFlow-Llama-3B

๐ **Introducing LogicFlow-Llama-3B: Exploring Open Access to Chain-of-Thought Reasoning**
Ever wished your AI could not just *tell* you the answer, but *show* you its thinking? **LogicFlow-Llama-3B** represents an exciting attempt to instill robust Chain-of-Thought (CoT) capabilities into models like `meta-llama/Llama-3.2-3B-Instruct`, which, in its base form, does not possess strong inherent CoT reasoning. This isn't just another fine-tune; it's a meticulously crafted model designed to explore the potential of CoT on accessible hardware.
Leveraging the insightful `open-thoughts/OpenThoughts-114k` dataset and the versatile [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) training library, LogicFlow-Llama-3B has been trained to dissect intricate problems and articulate its reasoning process step-by-step. Remarkably, this entire fine-tuning process was accomplished **on a single GPU**, demonstrating a pathway to more accessible CoT model development. Get ready to explore the frontiers of logical AI and unlock a new era of AI-powered deep thinking, even with limited resources!
## Model Details
- **Base Model:** `meta-llama/Llama-3.2-3B-Instruct` (initially without strong CoT capabilities)
- **Fine-tuning Goal:** To imbue Chain-of-Thought (CoT) reasoning abilities.
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
- **Fine-tuning Library:** LLaMA-Factory
- **Dataset:** `open-thoughts/OpenThoughts-114k` (for Chain-of-Thought enhancement)
- **Training Hardware:** Single GPU-A6000
- **LoRA Rank:** 8
- **LoRA Alpha:** 16
- **LoRA Dropout:** 0
- **Learning Rate:** 5e-5 (initial)
- **Optimizer:** AdamW (torch)
- **Batch Size:** 2
- **Gradient Accumulation Steps:** 8
- **Number of Training Epochs:** 3.0
- **Total Training Steps:** 18750
- **Cutoff Length:** 2048
- **Compute Type:** bf16
- **Rope Scaling:** llama3
- **Booster:** flashattn2
- **Training Stage:** Supervised Fine-Tuning
## Intended Use
LogicFlow-Llama-3B excels at tasks demanding step-by-step reasoning and transparent thought processes. It's ideal for:
* Complex Question Answering
* Logical Deduction and Problem Solving
* Generating Explanations and Justifications
* Any application where understanding *how* an AI reaches a conclusion is as important as the conclusion itself.
## How to Use
Unleash the power of LogicFlow-Llama-3B with the Hugging Face `transformers` library:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "RekklesAI/LogicFlow-Llama-3B" # Replace with your Hugging Face username and model name
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path)
# Example prompt for Chain-of-Thought
prompt = "Q: Natalia sold clips to 48 of her friends. She had 30 clips left. How many clips did she have at first? A: Let's think step by step:"
inputs = tokenizer(prompt, return_tensors="pt")
# Generate text showcasing the thought process
outputs = model.generate(**inputs, max_new_tokens=150, num_beams=5, early_stopping=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Process
The model was fine-tuned for **3.0 epochs** over a total of **18,750 steps** on a single **A6000 GPU**. Training employed a **linear learning rate scheduler**, starting from an initial rate of **5e-5**, with gradual decay toward zero. The process leveraged **LoRA** with `bf16` precision and **FlashAttention2** for efficient memory use and speed.
Here's a glimpse into the training progression:
* **Initial Phase (First ~2000 steps):** The training loss started around 1.05 and rapidly decreased, indicating the model was quickly learning from the `open-thoughts/OpenThoughts-114k` dataset. For example, at step 5, the loss was 1.0536, and by step 100, it had dropped to 0.7666. The learning rate was close to the initial 5e-5 during this phase.
* **Middle Phase (~2000 to ~15000 steps):** The loss continued to decrease, albeit at a slower pace, and started to stabilize, generally fluctuating between approximately 0.60 and 0.75. This shows the model consolidating its learning. The learning rate linearly decayed throughout this period. For instance, around step 10000, the loss was approximately 0.6230 and the learning rate was around 2.3e-5.
* **Final Phase (Last ~3750 steps):** In the final stages of training, the loss showed further slight reduction and stabilization, with values often hovering in the ~0.58 to ~0.65 range. The learning rate continued its linear decay, approaching zero towards the end of training. At step 18750 (final step), the loss was recorded as 0.5886, with a learning rate close to 0.
The gradient norm generally stayed within a reasonable range (mostly between 0.15 and 0.40 throughout many of the logged steps), suggesting stable training dynamics.
Below is a visualization of the training loss curve:

### ๐ Final Training Metrics
| Metric | Value |
|----------------------------|-----------------------------|
| **Epochs** | 3.0 |
| **Input Tokens Seen** | 613,609,008 |
| **Total FLOPs** | 9,706,625,883 GFLOPs |
| **Final Train Loss** | 0.435 |
| **Total Runtime** | 1 day, 22 hours, 12 minutes |
| **Samples per Second** | 1.803 |
| **Steps per Second** | 0.113 |
### Training Configuration (from `llamaboard_config.yaml`):
```yaml
top:
booster: flashattn2
finetuning_type: lora
model_name: Llama-3.2-3B-Instruct # Base model before LoRA merge
rope_scaling: llama3
template: llama3
train:
additional_target: \'\'\'\'
batch_size: 2
compute_type: bf16
cutoff_len: 2048
dataset:
- open_thoughts # Mapped to open-thoughts/OpenThoughts-114k
dataset_dir: data
extra_args: \'{\"optim\": \"adamw_torch\"}\'
gradient_accumulation_steps: 8
learning_rate: 5e-5 # Initial learning rate
logging_steps: 5
lora_alpha: 16
lora_dropout: 0
lora_rank: 8
lora_target: \'\'\'\'
lr_scheduler_type: linear
max_grad_norm: \'1.0\'
max_samples: \'100000\' # Max samples from the dataset used
num_train_epochs: \'3.0\'
save_steps: 100
training_stage: Supervised Fine-Tuning
warmup_steps: 0 # No warmup steps were used
```
## Disclaimer
LogicFlow-Llama-3B is a research artifact. While powerful, it may have limitations or biases. Please use it responsibly and critically evaluate its outputs. |