--- license: apache-2.0 language: - en tags: - llama-factory - lora - open_thoughts - chain-of-thought - cot - logicflow - single-gpu-training pipeline_tag: text-generation base_model: meta-llama/Llama-3.2-3B-Instruct datasets: - open-thoughts/OpenThoughts-114k --- # LogicFlow-Llama-3B ![image/png](https://cdn-uploads.huggingface.co/production/uploads/664589a52d210101d1eac6ad/l_vPNI8K1AbiHHXUTo6aa.png) 🚀 **Introducing LogicFlow-Llama-3B: Exploring Open Access to Chain-of-Thought Reasoning** Ever wished your AI could not just *tell* you the answer, but *show* you its thinking? **LogicFlow-Llama-3B** represents an exciting attempt to instill robust Chain-of-Thought (CoT) capabilities into models like `meta-llama/Llama-3.2-3B-Instruct`, which, in its base form, does not possess strong inherent CoT reasoning. This isn't just another fine-tune; it's a meticulously crafted model designed to explore the potential of CoT on accessible hardware. Leveraging the insightful `open-thoughts/OpenThoughts-114k` dataset and the versatile [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) training library, LogicFlow-Llama-3B has been trained to dissect intricate problems and articulate its reasoning process step-by-step. Remarkably, this entire fine-tuning process was accomplished **on a single GPU**, demonstrating a pathway to more accessible CoT model development. Get ready to explore the frontiers of logical AI and unlock a new era of AI-powered deep thinking, even with limited resources! ## Model Details - **Base Model:** `meta-llama/Llama-3.2-3B-Instruct` (initially without strong CoT capabilities) - **Fine-tuning Goal:** To imbue Chain-of-Thought (CoT) reasoning abilities. - **Fine-tuning Method:** LoRA (Low-Rank Adaptation) - **Fine-tuning Library:** LLaMA-Factory - **Dataset:** `open-thoughts/OpenThoughts-114k` (for Chain-of-Thought enhancement) - **Training Hardware:** Single GPU-A6000 - **LoRA Rank:** 8 - **LoRA Alpha:** 16 - **LoRA Dropout:** 0 - **Learning Rate:** 5e-5 (initial) - **Optimizer:** AdamW (torch) - **Batch Size:** 2 - **Gradient Accumulation Steps:** 8 - **Number of Training Epochs:** 3.0 - **Total Training Steps:** 18750 - **Cutoff Length:** 2048 - **Compute Type:** bf16 - **Rope Scaling:** llama3 - **Booster:** flashattn2 - **Training Stage:** Supervised Fine-Tuning ## Intended Use LogicFlow-Llama-3B excels at tasks demanding step-by-step reasoning and transparent thought processes. It's ideal for: * Complex Question Answering * Logical Deduction and Problem Solving * Generating Explanations and Justifications * Any application where understanding *how* an AI reaches a conclusion is as important as the conclusion itself. ## How to Use Unleash the power of LogicFlow-Llama-3B with the Hugging Face `transformers` library: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name_or_path = "RekklesAI/LogicFlow-Llama-3B" # Replace with your Hugging Face username and model name tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) model = AutoModelForCausalLM.from_pretrained(model_name_or_path) # Example prompt for Chain-of-Thought prompt = "Q: Natalia sold clips to 48 of her friends. She had 30 clips left. How many clips did she have at first? A: Let's think step by step:" inputs = tokenizer(prompt, return_tensors="pt") # Generate text showcasing the thought process outputs = model.generate(**inputs, max_new_tokens=150, num_beams=5, early_stopping=True, temperature=0.7) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Training Process The model was fine-tuned for **3.0 epochs** over a total of **18,750 steps** on a single **A6000 GPU**. Training employed a **linear learning rate scheduler**, starting from an initial rate of **5e-5**, with gradual decay toward zero. The process leveraged **LoRA** with `bf16` precision and **FlashAttention2** for efficient memory use and speed. Here's a glimpse into the training progression: * **Initial Phase (First ~2000 steps):** The training loss started around 1.05 and rapidly decreased, indicating the model was quickly learning from the `open-thoughts/OpenThoughts-114k` dataset. For example, at step 5, the loss was 1.0536, and by step 100, it had dropped to 0.7666. The learning rate was close to the initial 5e-5 during this phase. * **Middle Phase (~2000 to ~15000 steps):** The loss continued to decrease, albeit at a slower pace, and started to stabilize, generally fluctuating between approximately 0.60 and 0.75. This shows the model consolidating its learning. The learning rate linearly decayed throughout this period. For instance, around step 10000, the loss was approximately 0.6230 and the learning rate was around 2.3e-5. * **Final Phase (Last ~3750 steps):** In the final stages of training, the loss showed further slight reduction and stabilization, with values often hovering in the ~0.58 to ~0.65 range. The learning rate continued its linear decay, approaching zero towards the end of training. At step 18750 (final step), the loss was recorded as 0.5886, with a learning rate close to 0. The gradient norm generally stayed within a reasonable range (mostly between 0.15 and 0.40 throughout many of the logged steps), suggesting stable training dynamics. Below is a visualization of the training loss curve: ![Training Loss](training_loss.png) ### 📊 Final Training Metrics | Metric | Value | |----------------------------|-----------------------------| | **Epochs** | 3.0 | | **Input Tokens Seen** | 613,609,008 | | **Total FLOPs** | 9,706,625,883 GFLOPs | | **Final Train Loss** | 0.435 | | **Total Runtime** | 1 day, 22 hours, 12 minutes | | **Samples per Second** | 1.803 | | **Steps per Second** | 0.113 | ### Training Configuration (from `llamaboard_config.yaml`): ```yaml top: booster: flashattn2 finetuning_type: lora model_name: Llama-3.2-3B-Instruct # Base model before LoRA merge rope_scaling: llama3 template: llama3 train: additional_target: \'\'\'\' batch_size: 2 compute_type: bf16 cutoff_len: 2048 dataset: - open_thoughts # Mapped to open-thoughts/OpenThoughts-114k dataset_dir: data extra_args: \'{\"optim\": \"adamw_torch\"}\' gradient_accumulation_steps: 8 learning_rate: 5e-5 # Initial learning rate logging_steps: 5 lora_alpha: 16 lora_dropout: 0 lora_rank: 8 lora_target: \'\'\'\' lr_scheduler_type: linear max_grad_norm: \'1.0\' max_samples: \'100000\' # Max samples from the dataset used num_train_epochs: \'3.0\' save_steps: 100 training_stage: Supervised Fine-Tuning warmup_steps: 0 # No warmup steps were used ``` ## Disclaimer LogicFlow-Llama-3B is a research artifact. While powerful, it may have limitations or biases. Please use it responsibly and critically evaluate its outputs.