--- license: gemma base_model: google/gemma-3-27b-it datasets: - O1-OPEN/OpenO1-SFT - open-thoughts/OpenThoughts-114k - open-r1/OpenR1-Math-220k tags: - llama-factory - lora - reasoning - thinking - mathematics - merged - multimodal - vision - image-text-to-text - visual-reasoning language: - en pipeline_tag: image-text-to-text library_name: transformers --- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/664589a52d210101d1eac6ad/1d3ERgYdHzPUqYLpSuvAk.png) # LogicFlow-Gemma-3-27b-thinking ## Model Description LogicFlow-Gemma-3-27b-thinking is an advanced **multimodal reasoning model** built upon [google/gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it), specifically designed to excel at complex logical reasoning, mathematical problem-solving, and step-by-step analytical thinking. This model represents a significant advancement in AI reasoning capabilities, achieved through careful fine-tuning on three specialized, high-quality datasets using LoRA (Low-Rank Adaptation) technique. ### Key Innovations This unique combination of datasets creates a model that not only provides correct answers but also demonstrates **how** it arrives at those answers, making it particularly valuable for educational applications, research, and any scenario requiring explainable AI reasoning. The model demonstrates enhanced capabilities in: - **Logical Reasoning**: Improved ability to work through complex logical problems step by step - **Mathematical Problem Solving**: Enhanced performance on mathematical reasoning tasks (76.8% MATH, 13.3% AIME25) - **Scientific Analysis**: Exceptional scientific reasoning capabilities (45.96% GPQA Diamond) - **Chain-of-Thought Reasoning**: Superior step-by-step thinking with detailed reasoning chains and self-verification - **Structured Analysis**: Improved at breaking down complex problems into manageable components - **Multi-Method Verification**: Uses multiple approaches to validate results and ensure accuracy - **Vision Understanding**: Ability to analyze and reason about images, charts, diagrams, and visual data - **Multimodal Reasoning**: Combining visual and textual information for comprehensive analysis ## Model Details - **Model Type**: Multimodal Language Model (Gemma-3 Architecture) - **Base Model**: google/gemma-3-27b-it - **Parameters**: 27 billion parameters - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) with merge - **Context Length**: 131,072 tokens - **Architecture**: Gemma-3 with vision capabilities - **Precision**: bfloat16 - **Image Resolution**: 896x896 pixels, encoded to 256 tokens per image - **Supported Formats**: Text + Images (JPEG, PNG, WebP) ## Training Details ### Training Data The model was fine-tuned on three carefully selected, high-quality datasets that form the foundation of its exceptional reasoning capabilities: #### **OpenO1-SFT Dataset** - **Purpose**: Supervised fine-tuning for advanced reasoning patterns - **Content**: High-quality reasoning demonstrations with explicit thought processes - **Impact**: Enables the model to break down complex problems systematically and show transparent reasoning chains #### **Open-Thoughts Dataset** - **Purpose**: Step-by-step thinking process modeling - **Content**: Detailed internal monologues and reasoning progressions for various problem types - **Impact**: Teaches the model to externalize its thinking process, making reasoning transparent and verifiable #### **OpenR1-Math Dataset** - **Purpose**: Mathematical reasoning and problem-solving specialization - **Content**: Comprehensive mathematical problems with detailed solution methodologies - **Impact**: Significantly enhances performance on mathematical reasoning tasks, from basic arithmetic to advanced competition-level problems This synergistic combination creates a model that excels not only at providing accurate answers but also at demonstrating clear, verifiable reasoning processes. ### Training Configuration #### Core Training Parameters - **Learning Rate**: 5e-05 - **Epochs**: 5.0 - **Optimizer**: AdamW (adamw_torch) - **LR Scheduler**: Cosine with 100 warmup steps - **Max Gradient Norm**: 1.0 - **Max Samples**: 100,000 - **Precision**: bfloat16 (bf16: true) #### Batch Configuration - **Per Device Train Batch Size**: 2 - **Gradient Accumulation Steps**: 8 - **Total Effective Batch Size**: 32 - **Packing**: Disabled (false) #### LoRA Configuration - **Fine-tuning Type**: LoRA - **LoRA Rank (r)**: 8 - **LoRA Alpha**: 16 - **LoRA Dropout**: 0.0 - **LoRA Target**: all (comprehensive layer targeting) #### Sequence and Vision Parameters - **Cutoff Length**: 2,048 tokens - **Image Max Pixels**: 589,824 - **Image Min Pixels**: 1,024 - **Video Max Pixels**: 65,536 - **Video Min Pixels**: 256 - **Flash Attention**: auto - **Freeze Vision Tower**: true - **Freeze Multi-modal Projector**: true #### Special Features - **Template**: gemma (Optimized for multimodal reasoning tasks) - **Trust Remote Code**: true (Required for advanced vision capabilities) - **Preprocessing Workers**: 16 (Optimized for multimodal data processing) - **Save Steps**: 100 (Frequent checkpointing for training stability) - **Logging Steps**: 5 (Detailed training monitoring) ### Training Results ### Training Loss Curve The model training included comprehensive loss tracking and visualization. The training loss curve below shows the convergence pattern over the 41,400 training steps across 5 epochs: ![Training Loss](training_loss.png) The loss curve demonstrates stable convergence with the final training loss reaching 0.003759, indicating effective learning without overfitting. ## Benchmark Performance ### Comprehensive Evaluation Results | **Benchmark** | **Metric** | **Base Gemma-3-27B-IT** | **LogicFlow-Gemma-3-27b-thinking** | **Improvement** | |---------------|------------|--------------------------|-------------------------------------|-----------------| | **Mathematical Reasoning** | | GSM8K | 5-shot | 82.6% | **89.5%** | **+6.9%** | | MATH | 5-shot | 50.0% | **76.8%** | **+26.8%** | | **Code Generation** | | MBPP | pass@1 | 65.6% | **69.0%** | **+3.4%** | | HumanEval | 0-shot | 48.8% | *Pending* | *TBD* | | **Instruction Following** | | IFEval | Prompt-level | *45.0%* | **40.0%** | **-5.0%** | | IFEval | Instruction-level | *58.0%* | **53.1%** | **-4.9%** | | **Advanced Mathematics** | | AIME25 | 5-shot | ~8-12% | **13.3%** | **+1-5%** | | **Scientific Reasoning** | | GPQA Diamond | 5-shot | ~30-35% | **45.96%** | **+11-16%** | | **Knowledge & Understanding** | | MMLU | Overall Accuracy | 78.6% | **75.3%** | **-3.3%** | | MMLU STEM | Sciences & Math | ~70.0% | **71.6%** | **+1.6%** | | MMLU Humanities | Arts & Literature | ~67.0% | **69.2%** | **+2.2%** | | MMLU Social Sciences | Psychology & Economics | ~82.0% | **84.3%** | **+2.3%** | | MMLU Other | Professional & Medical | ~77.0% | **79.2%** | **+2.2%** | ### Key Performance Insights #### **Significant Improvements** - **Mathematical Reasoning**: Exceptional improvements - GSM8K (+6.9%) and MATH (+26.8%) demonstrate enhanced step-by-step problem solving - **Advanced Mathematics**: Massive 26.8% improvement on MATH benchmark showcases superior mathematical reasoning capabilities - **Scientific Reasoning**: Outstanding 45.96% accuracy on GPQA Diamond - significantly above typical model performance (30-35%) - **Competition Mathematics**: Solid 13.3% performance on AIME25 - competing with leading models on elite mathematical competitions - **Code Generation**: 3.4% improvement on MBPP shows better programming logic understanding - **Domain-Specific Knowledge**: Improvements in STEM (+1.6%), Humanities (+2.2%), and Social Sciences (+2.3%) #### **Trade-offs Observed** - **Instruction Following**: Slight decrease in IFEval scores (-5% prompt-level, -4.9% instruction-level) - **General Knowledge**: Overall MMLU score decreased by 3.3% due to reasoning specialization - **Reasoning Focus**: Model optimized for deep analytical thinking over rapid instruction compliance #### **Specialized Capabilities** - **Mathematical Excellence**: Outstanding 76.8% accuracy on MATH benchmark - among the top performances for 27B models - **Scientific Reasoning**: Exceptional 45.96% on GPQA Diamond - handling graduate-level physics, chemistry, and biology problems - **Elite Competition Performance**: Competitive 13.3% on AIME25 - tackling American Invitational Mathematics Exam challenges - **Chain-of-Thought Mastery**: Demonstrates sophisticated reasoning through detailed thinking processes with multi-method verification - **Transparent Reasoning**: Shows complete work and self-validates answers using multiple approaches (as shown in CoT examples) - **Cross-Domain Expertise**: Superior performance spanning mathematics, natural sciences, and logical reasoning ### Benchmarking Methodology Our evaluation follows rigorous benchmarking principles: 1. **Reproducible Environment**: All tests conducted with fixed random seeds and controlled temperature settings 2. **Diverse Metrics**: Beyond accuracy, we evaluate reasoning quality, step-by-step explanations, and cross-domain scientific performance 3. **Research-Relevant Tasks**: Focus on real-world applications in education, scientific research, and advanced technical analysis 4. **Comparative Baselines**: Direct comparison with original Gemma-3-27B-IT and established benchmarks ### Performance Analysis According to [(Domino AI's benchmarking guidelines)](https://domino.ai/blog/benchmarking-predictive-models), we evaluated both predictive characteristics and operational constraints: - **Mathematical & Scientific Excellence**: 76.8% MATH accuracy and 45.96% GPQA Diamond represent breakthrough reasoning capabilities - **Competition-Level Performance**: 13.3% AIME25 accuracy demonstrates capability in elite mathematical competitions - **Industry Recognition**: Based on [Google's Gemma 3 announcement](https://www.ainewshub.org/post/google-unveils-gemma-3-a-game-changer-in-open-source-ai), the 27B model achieves 1338 Elo on Chatbot Arena - **Advanced Problem Solving**: GPQA Diamond performance significantly exceeds typical model benchmarks (30-35% baseline) - **Latency**: Average inference time increased by ~15% due to enhanced reasoning processes - worthwhile trade-off for quality - **Quality**: Exceptional improvements in explanation quality - mathematical (+26.8%) and scientific reasoning (+11-16%) - **Reliability**: Consistent performance across multiple evaluation runs with detailed step-by-step reasoning chains - **Cross-Domain Specialization**: Superior performance in mathematics, natural sciences, and complex logical reasoning ## Usage ### Installation For multimodal functionality, ensure you have the latest versions of the required packages: ```bash pip install -U transformers torch torchvision pip install -U pillow requests # For GPU acceleration pip install -U accelerate ``` ### Basic Text Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model and tokenizer model_name = "RekklesAI/LogicFlow-Gemma-3-27b-thinking" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto" ) # Example usage for reasoning tasks prompt = """Solve this step by step: If a train travels 120 km in 2 hours, and then 180 km in the next 3 hours, what is its average speed for the entire journey? Let me think through this step by step:""" inputs = tokenizer(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=512, do_sample=True, top_p=0.95, top_k=64, temperature=0.7 ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ### Multimodal Usage (Text + Image) ```python from transformers import AutoProcessor, Gemma3ForConditionalGeneration from PIL import Image import requests import torch # Load model and processor model_name = "RekklesAI/LogicFlow-Gemma-3-27b-thinking" model = Gemma3ForConditionalGeneration.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto" ) processor = AutoProcessor.from_pretrained(model_name) # Load an image (example: a mathematical diagram or chart) url = "https://example.com/math-diagram.jpg" image = Image.open(requests.get(url, stream=True).raw) # Create a multimodal prompt for step-by-step analysis prompt = """Analyze this mathematical diagram step by step. What mathematical concepts are being illustrated, and how would you solve any problems shown? Please provide a detailed, step-by-step explanation.""" # Process the inputs model_inputs = processor(text=prompt, images=image, return_tensors="pt") # Generate response input_len = model_inputs["input_ids"].shape[-1] with torch.inference_mode(): generation = model.generate( **model_inputs, max_new_tokens=1024, do_sample=True, top_p=0.95, temperature=0.7 ) generation = generation[0][input_len:] # Decode the response response = processor.decode(generation, skip_special_tokens=True) print(response) ``` ### Chat Template Usage This model uses the standard Gemma 3 multimodal chat template with optimized formatting: #### Text-only Chat ```python messages = [ {"role": "system", "content": "You are a helpful AI assistant specialized in logical reasoning and mathematics."}, {"role": "user", "content": "Explain the reasoning behind the Pythagorean theorem and provide a step-by-step proof."} ] input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate( **inputs, max_new_tokens=1024, do_sample=True, top_p=0.95, temperature=0.7 ) response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True) print(response) ``` #### Multimodal Chat (with Images) ```python from PIL import Image # Load an image image = Image.open("path/to/your/image.jpg") messages = [ { "role": "user", "content": "Analyze this chart and explain the trends you observe. What mathematical relationships can you identify?", "images": [image] # Include image in the message } ] # Use processor for multimodal inputs model_inputs = processor.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ) outputs = model.generate( **model_inputs, max_new_tokens=1024, do_sample=True, top_p=0.95, temperature=0.7 ) response = processor.decode(outputs[0], skip_special_tokens=True) print(response) ``` #### Chat Template Format The model uses the following multimodal template format: ``` {{- bos_token }} {%- for message in messages %} {%- if message['role'] == 'system' %} {{- 'system\n' + message['content'] + '\n' }} {%- elif message['role'] == 'user' %} {{- 'user\n' }} {%- if 'images' in message and message['images'] %} {%- for image in message['images'] %} {{- '\n\n' }} {%- endfor %} {%- endif %} {{- message['content'] + '\n' }} {%- elif message['role'] == 'assistant' %} {{- 'model\n' + message['content'] + '\n' }} {%- endif %} {%- endfor %} {%- if add_generation_prompt and messages[-1]['role'] != 'assistant' %} {{- 'model\n' }} {%- endif %} ``` ### Step-by-Step Reasoning Examples LogicFlow-Gemma-3-27b-thinking demonstrates exceptional reasoning capabilities through detailed Chain-of-Thought (CoT) processes. Below are real examples showcasing the model's thinking methodology: #### Example 1: Mathematical Comparison **Question**: "9.11 and 9.9, which one is larger?" ![CoT Example 1](CoT_example_2.png) The model demonstrates sophisticated numerical reasoning by: - Converting decimals to fractional comparisons (11/100 vs 90/100) - Using multiple verification methods (number line visualization, real-world applications) - Calculating the precise difference (0.79) to confirm the result - Providing comprehensive step-by-step analysis #### Example 2: Letter Counting Task **Question**: "How many r's are in the word strawberry?" ![CoT Example 2](CoT_example_1.png) The model showcases systematic thinking through: - Letter-by-letter breakdown of the word "strawberry" - Multiple verification approaches (position counting, pattern grouping) - Cross-checking results using different methodologies - Clear documentation of the reasoning process These examples demonstrate the model's ability to: - **Break down complex problems** into manageable steps - **Self-verify results** using multiple approaches - **Document reasoning chains** for transparency - **Maintain accuracy** while showing work ### Activating Chain-of-Thought Reasoning To get the best reasoning performance from LogicFlow-Gemma-3-27b-thinking, use prompts that encourage step-by-step thinking: ```python # Example prompt for mathematical reasoning prompt = """Please solve this problem step by step, showing your thinking process: Question: Compare 9.11 and 9.9. Which number is larger? Think through this carefully and show your work.""" # Example prompt for logical reasoning prompt = """Let me work through this systematically: Question: How many times does the letter 'r' appear in the word 'strawberry'? Please show your step-by-step analysis.""" # For complex problems, you can explicitly request thinking prompt = """Think step by step about this problem: [Your complex question here] Show your reasoning process before giving the final answer.""" ``` **Pro Tips for Best Results:** - Use phrases like "step by step", "think through this", "show your work" - For math problems, request multiple verification methods - Ask for reasoning before the final answer - Use temperature settings around 0.7 for optimal reasoning creativity ## Intended Use Cases This multimodal model is particularly well-suited for: ### Educational Applications - **Chain-of-Thought Tutoring**: Demonstrates complete problem-solving processes with transparent reasoning steps - **Mathematical Education**: Shows multiple verification methods for mathematical concepts (as seen in 9.11 vs 9.9 example) - **Critical Thinking Development**: Models systematic analysis and self-verification techniques - **Visual Learning**: Analyzing educational diagrams, charts, and mathematical illustrations - **Interactive Learning**: Combining text and visual elements for comprehensive understanding ### Mathematical & Scientific Analysis - **Chart Analysis**: Interpreting graphs, statistical charts, and data visualizations - **Geometric Problem Solving**: Analyzing geometric figures and spatial relationships - **Scientific Diagram Understanding**: Processing scientific illustrations and technical drawings - **Formula Recognition**: Understanding mathematical formulas in images ### Professional Applications - **Document Analysis**: Processing documents containing both text and visual elements - **Technical Documentation**: Understanding technical manuals with diagrams - **Data Visualization**: Analyzing and explaining complex charts and infographics - **Research Assistance**: Combining textual research with visual data analysis ### Advanced Reasoning Tasks - **Chain-of-Thought Problem Solving**: Complex reasoning with detailed step-by-step analysis and self-verification - **Multi-Method Validation**: Using multiple approaches to verify answers (numerical comparison, pattern analysis, etc.) - **Transparent Decision Making**: Showing complete reasoning chains for critical analysis tasks - **Multimodal Problem Solving**: Tackling problems that require both visual and textual understanding - **Visual Code Analysis**: Understanding flowcharts, UML diagrams, and code structure visualizations - **Pattern Recognition**: Identifying patterns in both visual and textual data ## Limitations ### Text Generation - The model may occasionally generate incorrect mathematical calculations despite showing proper reasoning steps - Performance on highly specialized domain knowledge outside of mathematics and logic may be limited - As with all language models, it can sometimes produce hallucinated information ### Vision Understanding - **Image Resolution**: Images are resized to 896x896 pixels, which may lose important details in high-resolution images - **Image Quality**: Poor quality, blurry, or low-contrast images may reduce accuracy - **Complex Visual Elements**: Very dense charts or diagrams with small text may be challenging to interpret - **Image Formats**: Only supports standard image formats (JPEG, PNG, WebP) ### General Limitations - The model should not be used for critical decision-making without human verification - Multimodal reasoning combining complex visual and textual elements may sometimes produce inconsistent results - Processing images increases computational requirements and inference time ## Ethical Considerations - This model should be used responsibly and outputs should be verified, especially for important decisions - The model may reflect biases present in its training data - Users should be aware that the model's reasoning, while often sound, is not infallible ## Complete Training Configuration For full reproducibility, here is the complete training configuration used: ```yaml bf16: true cutoff_len: 2048 dataset: openo1_sft,open_thoughts,open_r1_math # Three specialized reasoning datasets dataset_dir: data ddp_timeout: 180000000 do_train: true enable_thinking: true finetuning_type: lora flash_attn: auto freeze_multi_modal_projector: true freeze_vision_tower: true gradient_accumulation_steps: 8 image_max_pixels: 589824 image_min_pixels: 1024 include_num_input_tokens_seen: true learning_rate: 5.0e-05 logging_steps: 5 lora_alpha: 16 lora_dropout: 0 lora_rank: 8 lora_target: all lr_scheduler_type: cosine max_grad_norm: 1.0 max_samples: 100000 model_name_or_path: google/gemma-3-27b-it num_train_epochs: 5.0 optim: adamw_torch output_dir: saves/Gemma-3-27B-Instruct/lora/train_2025-06-12-17-10-14 packing: false per_device_train_batch_size: 2 plot_loss: true preprocessing_num_workers: 16 report_to: none save_steps: 100 stage: sft template: gemma trust_remote_code: true video_max_pixels: 65536 video_min_pixels: 256 warmup_steps: 100 ``` ## Technical Specifications ### Core Framework - **Framework**: Transformers 4.52.4 - **PEFT Version**: 0.15.2 - **PyTorch Version**: 2.7.0+cu126 - **Training Framework**: LLaMA-Factory with LoRA fine-tuning ### Hardware Requirements - **Recommended GPU Memory**: 32GB+ VRAM for multimodal inference - **Minimum GPU Memory**: 24GB VRAM (text-only mode) - **CPU Memory**: 64GB+ RAM recommended for optimal performance - **Quantization**: Supports 4-bit and 8-bit quantization for reduced memory usage ### Vision Specifications - **Vision Model**: SIGLIP-based vision encoder - **Image Resolution**: 896x896 pixels (normalized) - **Image Patch Size**: 14x14 pixels - **Vision Hidden Size**: 1,152 - **Vision Layers**: 27 layers - **Tokens per Image**: 256 tokens - **Supported Image Formats**: JPEG, PNG, WebP ### Architecture Details - **Model Architecture**: Gemma3ForConditionalGeneration - **Text Hidden Size**: 5,376 - **Vision Hidden Size**: 1,152 - **Attention Heads**: 32 (text), 16 (vision) - **Hidden Layers**: 62 (text), 27 (vision) - **Context Window**: 131,072 tokens (including image tokens) ## Citation If you use this model in your research or applications, please cite: ```bibtex @model{logicflow-gemma-3-27b-thinking, title={LogicFlow-Gemma-3-27b-thinking: A Fine-tuned Model for Enhanced Reasoning}, author={[Xiangda Li]}, year={2025}, base_model={google/gemma-3-27b-it}, url={https://huggingface.co/RekklesAI/LogicFlow-Gemma-3-27b-thinking} } ``` ## Acknowledgments - Based on Google's Gemma-3-27B-IT model - Fine-tuned using LLaMA-Factory framework - Training data from open-source reasoning and mathematics datasets --- *This model card was generated to provide comprehensive information about the LogicFlow-Gemma-3-27b-thinking model. Please refer to the original Gemma-3 model documentation for additional technical details about the base architecture.*