File size: 7,028 Bytes

---
license: apache-2.0
language:
- en
tags:
- llama-factory
- lora
- open_thoughts
- chain-of-thought
- cot
- logicflow
- single-gpu-training
pipeline_tag: text-generation
base_model: meta-llama/Llama-3.2-3B-Instruct
datasets:
- open-thoughts/OpenThoughts-114k
---

# LogicFlow-Llama-3B

![image/png](https://cdn-uploads.huggingface.co/production/uploads/664589a52d210101d1eac6ad/l_vPNI8K1AbiHHXUTo6aa.png)

🚀 **Introducing LogicFlow-Llama-3B: Exploring Open Access to Chain-of-Thought Reasoning** 

Ever wished your AI could not just *tell* you the answer, but *show* you its thinking? **LogicFlow-Llama-3B** represents an exciting attempt to instill robust Chain-of-Thought (CoT) capabilities into models like `meta-llama/Llama-3.2-3B-Instruct`, which, in its base form, does not possess strong inherent CoT reasoning. This isn't just another fine-tune; it's a meticulously crafted model designed to explore the potential of CoT on accessible hardware.

Leveraging the insightful `open-thoughts/OpenThoughts-114k` dataset and the versatile [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) training library, LogicFlow-Llama-3B has been trained to dissect intricate problems and articulate its reasoning process step-by-step. Remarkably, this entire fine-tuning process was accomplished **on a single GPU**, demonstrating a pathway to more accessible CoT model development. Get ready to explore the frontiers of logical AI and unlock a new era of AI-powered deep thinking, even with limited resources!


## Model Details

- **Base Model:** `meta-llama/Llama-3.2-3B-Instruct` (initially without strong CoT capabilities)
- **Fine-tuning Goal:** To imbue Chain-of-Thought (CoT) reasoning abilities.
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
- **Fine-tuning Library:** LLaMA-Factory
- **Dataset:** `open-thoughts/OpenThoughts-114k` (for Chain-of-Thought enhancement)
- **Training Hardware:** Single GPU-A6000  
- **LoRA Rank:** 8
- **LoRA Alpha:** 16
- **LoRA Dropout:** 0
- **Learning Rate:** 5e-5 (initial)
- **Optimizer:** AdamW (torch)
- **Batch Size:** 2
- **Gradient Accumulation Steps:** 8
- **Number of Training Epochs:** 3.0
- **Total Training Steps:** 18750
- **Cutoff Length:** 2048
- **Compute Type:** bf16
- **Rope Scaling:** llama3
- **Booster:** flashattn2
- **Training Stage:** Supervised Fine-Tuning

## Intended Use

LogicFlow-Llama-3B excels at tasks demanding step-by-step reasoning and transparent thought processes. It's ideal for:
*   Complex Question Answering
*   Logical Deduction and Problem Solving
*   Generating Explanations and Justifications
*   Any application where understanding *how* an AI reaches a conclusion is as important as the conclusion itself.

## How to Use

Unleash the power of LogicFlow-Llama-3B with the Hugging Face `transformers` library:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name_or_path = "RekklesAI/LogicFlow-Llama-3B"  # Replace with your Hugging Face username and model name
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path)

# Example prompt for Chain-of-Thought
prompt = "Q: Natalia sold clips to 48 of her friends. She had 30 clips left. How many clips did she have at first? A: Let's think step by step:"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate text showcasing the thought process
outputs = model.generate(**inputs, max_new_tokens=150, num_beams=5, early_stopping=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Training Process

The model was fine-tuned for **3.0 epochs** over a total of **18,750 steps** on a single **A6000 GPU**. Training employed a **linear learning rate scheduler**, starting from an initial rate of **5e-5**, with gradual decay toward zero. The process leveraged **LoRA** with `bf16` precision and **FlashAttention2** for efficient memory use and speed.


Here's a glimpse into the training progression:

*   **Initial Phase (First ~2000 steps):** The training loss started around 1.05 and rapidly decreased, indicating the model was quickly learning from the `open-thoughts/OpenThoughts-114k` dataset. For example, at step 5, the loss was 1.0536, and by step 100, it had dropped to 0.7666. The learning rate was close to the initial 5e-5 during this phase.
*   **Middle Phase (~2000 to ~15000 steps):** The loss continued to decrease, albeit at a slower pace, and started to stabilize, generally fluctuating between approximately 0.60 and 0.75. This shows the model consolidating its learning. The learning rate linearly decayed throughout this period. For instance, around step 10000, the loss was approximately 0.6230 and the learning rate was around 2.3e-5.
*   **Final Phase (Last ~3750 steps):** In the final stages of training, the loss showed further slight reduction and stabilization, with values often hovering in the ~0.58 to ~0.65 range. The learning rate continued its linear decay, approaching zero towards the end of training. At step 18750 (final step), the loss was recorded as 0.5886, with a learning rate close to 0.

The gradient norm generally stayed within a reasonable range (mostly between 0.15 and 0.40 throughout many of the logged steps), suggesting stable training dynamics.

Below is a visualization of the training loss curve:

![Training Loss](training_loss.png)

### 📊 Final Training Metrics

| Metric                      | Value                       |
|----------------------------|-----------------------------|
| **Epochs**                 | 3.0                         |
| **Input Tokens Seen**      | 613,609,008                 |
| **Total FLOPs**            | 9,706,625,883 GFLOPs        |
| **Final Train Loss**       | 0.435                       |
| **Total Runtime**          | 1 day, 22 hours, 12 minutes |
| **Samples per Second**     | 1.803                       |
| **Steps per Second**       | 0.113                       |

### Training Configuration (from `llamaboard_config.yaml`):

```yaml
top:
  booster: flashattn2
  finetuning_type: lora
  model_name: Llama-3.2-3B-Instruct # Base model before LoRA merge
  rope_scaling: llama3
  template: llama3
train:
  additional_target: \'\'\'\'
  batch_size: 2
  compute_type: bf16
  cutoff_len: 2048
  dataset:
  - open_thoughts # Mapped to open-thoughts/OpenThoughts-114k
  dataset_dir: data
  extra_args: \'{\"optim\": \"adamw_torch\"}\'
  gradient_accumulation_steps: 8
  learning_rate: 5e-5 # Initial learning rate
  logging_steps: 5
  lora_alpha: 16
  lora_dropout: 0
  lora_rank: 8
  lora_target: \'\'\'\'
  lr_scheduler_type: linear
  max_grad_norm: \'1.0\'
  max_samples: \'100000\' # Max samples from the dataset used
  num_train_epochs: \'3.0\'
  save_steps: 100
  training_stage: Supervised Fine-Tuning
  warmup_steps: 0 # No warmup steps were used
```

## Disclaimer

LogicFlow-Llama-3B is a research artifact. While powerful, it may have limitations or biases. Please use it responsibly and critically evaluate its outputs.