|
--- |
|
license: gemma |
|
base_model: google/gemma-3-27b-it |
|
datasets: |
|
- O1-OPEN/OpenO1-SFT |
|
- open-thoughts/OpenThoughts-114k |
|
- open-r1/OpenR1-Math-220k |
|
tags: |
|
- llama-factory |
|
- lora |
|
- reasoning |
|
- thinking |
|
- mathematics |
|
- merged |
|
- multimodal |
|
- vision |
|
- image-text-to-text |
|
- visual-reasoning |
|
language: |
|
- en |
|
pipeline_tag: image-text-to-text |
|
library_name: transformers |
|
--- |
|
|
|
 |
|
|
|
# LogicFlow-Gemma-3-27b-thinking |
|
|
|
## Model Description |
|
|
|
LogicFlow-Gemma-3-27b-thinking is an advanced **multimodal reasoning model** built upon [google/gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it), specifically designed to excel at complex logical reasoning, mathematical problem-solving, and step-by-step analytical thinking. This model represents a significant advancement in AI reasoning capabilities, achieved through careful fine-tuning on three specialized, high-quality datasets using LoRA (Low-Rank Adaptation) technique. |
|
|
|
|
|
### Key Innovations |
|
|
|
This unique combination of datasets creates a model that not only provides correct answers but also demonstrates **how** it arrives at those answers, making it particularly valuable for educational applications, research, and any scenario requiring explainable AI reasoning. |
|
|
|
The model demonstrates enhanced capabilities in: |
|
- **Logical Reasoning**: Improved ability to work through complex logical problems step by step |
|
- **Mathematical Problem Solving**: Enhanced performance on mathematical reasoning tasks (76.8% MATH, 13.3% AIME25) |
|
- **Scientific Analysis**: Exceptional scientific reasoning capabilities (45.96% GPQA Diamond) |
|
- **Chain-of-Thought Reasoning**: Superior step-by-step thinking with detailed reasoning chains and self-verification |
|
- **Structured Analysis**: Improved at breaking down complex problems into manageable components |
|
- **Multi-Method Verification**: Uses multiple approaches to validate results and ensure accuracy |
|
- **Vision Understanding**: Ability to analyze and reason about images, charts, diagrams, and visual data |
|
- **Multimodal Reasoning**: Combining visual and textual information for comprehensive analysis |
|
|
|
## Model Details |
|
|
|
- **Model Type**: Multimodal Language Model (Gemma-3 Architecture) |
|
- **Base Model**: google/gemma-3-27b-it |
|
- **Parameters**: 27 billion parameters |
|
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation) with merge |
|
- **Context Length**: 131,072 tokens |
|
- **Architecture**: Gemma-3 with vision capabilities |
|
- **Precision**: bfloat16 |
|
- **Image Resolution**: 896x896 pixels, encoded to 256 tokens per image |
|
- **Supported Formats**: Text + Images (JPEG, PNG, WebP) |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
The model was fine-tuned on three carefully selected, high-quality datasets that form the foundation of its exceptional reasoning capabilities: |
|
|
|
#### **OpenO1-SFT Dataset** |
|
- **Purpose**: Supervised fine-tuning for advanced reasoning patterns |
|
- **Content**: High-quality reasoning demonstrations with explicit thought processes |
|
- **Impact**: Enables the model to break down complex problems systematically and show transparent reasoning chains |
|
|
|
#### **Open-Thoughts Dataset** |
|
- **Purpose**: Step-by-step thinking process modeling |
|
- **Content**: Detailed internal monologues and reasoning progressions for various problem types |
|
- **Impact**: Teaches the model to externalize its thinking process, making reasoning transparent and verifiable |
|
|
|
#### **OpenR1-Math Dataset** |
|
- **Purpose**: Mathematical reasoning and problem-solving specialization |
|
- **Content**: Comprehensive mathematical problems with detailed solution methodologies |
|
- **Impact**: Significantly enhances performance on mathematical reasoning tasks, from basic arithmetic to advanced competition-level problems |
|
|
|
This synergistic combination creates a model that excels not only at providing accurate answers but also at demonstrating clear, verifiable reasoning processes. |
|
|
|
### Training Configuration |
|
|
|
#### Core Training Parameters |
|
- **Learning Rate**: 5e-05 |
|
- **Epochs**: 5.0 |
|
- **Optimizer**: AdamW (adamw_torch) |
|
- **LR Scheduler**: Cosine with 100 warmup steps |
|
- **Max Gradient Norm**: 1.0 |
|
- **Max Samples**: 100,000 |
|
- **Precision**: bfloat16 (bf16: true) |
|
|
|
#### Batch Configuration |
|
- **Per Device Train Batch Size**: 2 |
|
- **Gradient Accumulation Steps**: 8 |
|
- **Total Effective Batch Size**: 32 |
|
- **Packing**: Disabled (false) |
|
|
|
#### LoRA Configuration |
|
- **Fine-tuning Type**: LoRA |
|
- **LoRA Rank (r)**: 8 |
|
- **LoRA Alpha**: 16 |
|
- **LoRA Dropout**: 0.0 |
|
- **LoRA Target**: all (comprehensive layer targeting) |
|
|
|
#### Sequence and Vision Parameters |
|
- **Cutoff Length**: 2,048 tokens |
|
- **Image Max Pixels**: 589,824 |
|
- **Image Min Pixels**: 1,024 |
|
- **Video Max Pixels**: 65,536 |
|
- **Video Min Pixels**: 256 |
|
- **Flash Attention**: auto |
|
- **Freeze Vision Tower**: true |
|
- **Freeze Multi-modal Projector**: true |
|
|
|
#### Special Features |
|
- **Template**: gemma (Optimized for multimodal reasoning tasks) |
|
- **Trust Remote Code**: true (Required for advanced vision capabilities) |
|
- **Preprocessing Workers**: 16 (Optimized for multimodal data processing) |
|
- **Save Steps**: 100 (Frequent checkpointing for training stability) |
|
- **Logging Steps**: 5 (Detailed training monitoring) |
|
|
|
### Training Results |
|
|
|
### Training Loss Curve |
|
The model training included comprehensive loss tracking and visualization. The training loss curve below shows the convergence pattern over the 41,400 training steps across 5 epochs: |
|
|
|
 |
|
|
|
The loss curve demonstrates stable convergence with the final training loss reaching 0.003759, indicating effective learning without overfitting. |
|
|
|
## Benchmark Performance |
|
|
|
### Comprehensive Evaluation Results |
|
|
|
| **Benchmark** | **Metric** | **Base Gemma-3-27B-IT** | **LogicFlow-Gemma-3-27b-thinking** | **Improvement** | |
|
|---------------|------------|--------------------------|-------------------------------------|-----------------| |
|
| **Mathematical Reasoning** | |
|
| GSM8K | 5-shot | 82.6% | **89.5%** | **+6.9%** | |
|
| MATH | 5-shot | 50.0% | **76.8%** | **+26.8%** | |
|
| **Code Generation** | |
|
| MBPP | pass@1 | 65.6% | **69.0%** | **+3.4%** | |
|
| HumanEval | 0-shot | 48.8% | *Pending* | *TBD* | |
|
| **Instruction Following** | |
|
| IFEval | Prompt-level | *45.0%* | **40.0%** | **-5.0%** | |
|
| IFEval | Instruction-level | *58.0%* | **53.1%** | **-4.9%** | |
|
| **Advanced Mathematics** | |
|
| AIME25 | 5-shot | ~8-12% | **13.3%** | **+1-5%** | |
|
| **Scientific Reasoning** | |
|
| GPQA Diamond | 5-shot | ~30-35% | **45.96%** | **+11-16%** | |
|
| **Knowledge & Understanding** | |
|
| MMLU | Overall Accuracy | 78.6% | **75.3%** | **-3.3%** | |
|
| MMLU STEM | Sciences & Math | ~70.0% | **71.6%** | **+1.6%** | |
|
| MMLU Humanities | Arts & Literature | ~67.0% | **69.2%** | **+2.2%** | |
|
| MMLU Social Sciences | Psychology & Economics | ~82.0% | **84.3%** | **+2.3%** | |
|
| MMLU Other | Professional & Medical | ~77.0% | **79.2%** | **+2.2%** | |
|
|
|
### Key Performance Insights |
|
|
|
#### **Significant Improvements** |
|
- **Mathematical Reasoning**: Exceptional improvements - GSM8K (+6.9%) and MATH (+26.8%) demonstrate enhanced step-by-step problem solving |
|
- **Advanced Mathematics**: Massive 26.8% improvement on MATH benchmark showcases superior mathematical reasoning capabilities |
|
- **Scientific Reasoning**: Outstanding 45.96% accuracy on GPQA Diamond - significantly above typical model performance (30-35%) |
|
- **Competition Mathematics**: Solid 13.3% performance on AIME25 - competing with leading models on elite mathematical competitions |
|
- **Code Generation**: 3.4% improvement on MBPP shows better programming logic understanding |
|
- **Domain-Specific Knowledge**: Improvements in STEM (+1.6%), Humanities (+2.2%), and Social Sciences (+2.3%) |
|
|
|
#### **Trade-offs Observed** |
|
- **Instruction Following**: Slight decrease in IFEval scores (-5% prompt-level, -4.9% instruction-level) |
|
- **General Knowledge**: Overall MMLU score decreased by 3.3% due to reasoning specialization |
|
- **Reasoning Focus**: Model optimized for deep analytical thinking over rapid instruction compliance |
|
|
|
#### **Specialized Capabilities** |
|
- **Mathematical Excellence**: Outstanding 76.8% accuracy on MATH benchmark - among the top performances for 27B models |
|
- **Scientific Reasoning**: Exceptional 45.96% on GPQA Diamond - handling graduate-level physics, chemistry, and biology problems |
|
- **Elite Competition Performance**: Competitive 13.3% on AIME25 - tackling American Invitational Mathematics Exam challenges |
|
- **Chain-of-Thought Mastery**: Demonstrates sophisticated reasoning through detailed thinking processes with multi-method verification |
|
- **Transparent Reasoning**: Shows complete work and self-validates answers using multiple approaches (as shown in CoT examples) |
|
- **Cross-Domain Expertise**: Superior performance spanning mathematics, natural sciences, and logical reasoning |
|
|
|
### Benchmarking Methodology |
|
|
|
Our evaluation follows rigorous benchmarking principles: |
|
|
|
1. **Reproducible Environment**: All tests conducted with fixed random seeds and controlled temperature settings |
|
2. **Diverse Metrics**: Beyond accuracy, we evaluate reasoning quality, step-by-step explanations, and cross-domain scientific performance |
|
3. **Research-Relevant Tasks**: Focus on real-world applications in education, scientific research, and advanced technical analysis |
|
4. **Comparative Baselines**: Direct comparison with original Gemma-3-27B-IT and established benchmarks |
|
|
|
### Performance Analysis |
|
|
|
According to [(Domino AI's benchmarking guidelines)](https://domino.ai/blog/benchmarking-predictive-models), we evaluated both predictive characteristics and operational constraints: |
|
|
|
- **Mathematical & Scientific Excellence**: 76.8% MATH accuracy and 45.96% GPQA Diamond represent breakthrough reasoning capabilities |
|
- **Competition-Level Performance**: 13.3% AIME25 accuracy demonstrates capability in elite mathematical competitions |
|
- **Industry Recognition**: Based on [Google's Gemma 3 announcement](https://www.ainewshub.org/post/google-unveils-gemma-3-a-game-changer-in-open-source-ai), the 27B model achieves 1338 Elo on Chatbot Arena |
|
- **Advanced Problem Solving**: GPQA Diamond performance significantly exceeds typical model benchmarks (30-35% baseline) |
|
- **Latency**: Average inference time increased by ~15% due to enhanced reasoning processes - worthwhile trade-off for quality |
|
- **Quality**: Exceptional improvements in explanation quality - mathematical (+26.8%) and scientific reasoning (+11-16%) |
|
- **Reliability**: Consistent performance across multiple evaluation runs with detailed step-by-step reasoning chains |
|
- **Cross-Domain Specialization**: Superior performance in mathematics, natural sciences, and complex logical reasoning |
|
|
|
|
|
## Usage |
|
|
|
### Installation |
|
|
|
For multimodal functionality, ensure you have the latest versions of the required packages: |
|
|
|
```bash |
|
pip install -U transformers torch torchvision |
|
pip install -U pillow requests |
|
# For GPU acceleration |
|
pip install -U accelerate |
|
``` |
|
|
|
### Basic Text Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
# Load model and tokenizer |
|
model_name = "RekklesAI/LogicFlow-Gemma-3-27b-thinking" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto" |
|
) |
|
|
|
# Example usage for reasoning tasks |
|
prompt = """Solve this step by step: |
|
If a train travels 120 km in 2 hours, and then 180 km in the next 3 hours, what is its average speed for the entire journey? |
|
|
|
Let me think through this step by step:""" |
|
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
with torch.no_grad(): |
|
outputs = model.generate( |
|
**inputs, |
|
max_new_tokens=512, |
|
do_sample=True, |
|
top_p=0.95, |
|
top_k=64, |
|
temperature=0.7 |
|
) |
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
### Multimodal Usage (Text + Image) |
|
|
|
```python |
|
from transformers import AutoProcessor, Gemma3ForConditionalGeneration |
|
from PIL import Image |
|
import requests |
|
import torch |
|
|
|
# Load model and processor |
|
model_name = "RekklesAI/LogicFlow-Gemma-3-27b-thinking" |
|
model = Gemma3ForConditionalGeneration.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto" |
|
) |
|
processor = AutoProcessor.from_pretrained(model_name) |
|
|
|
# Load an image (example: a mathematical diagram or chart) |
|
url = "https://example.com/math-diagram.jpg" |
|
image = Image.open(requests.get(url, stream=True).raw) |
|
|
|
# Create a multimodal prompt for step-by-step analysis |
|
prompt = """<start_of_image>Analyze this mathematical diagram step by step. |
|
What mathematical concepts are being illustrated, and how would you solve any problems shown? |
|
|
|
Please provide a detailed, step-by-step explanation.""" |
|
|
|
# Process the inputs |
|
model_inputs = processor(text=prompt, images=image, return_tensors="pt") |
|
|
|
# Generate response |
|
input_len = model_inputs["input_ids"].shape[-1] |
|
with torch.inference_mode(): |
|
generation = model.generate( |
|
**model_inputs, |
|
max_new_tokens=1024, |
|
do_sample=True, |
|
top_p=0.95, |
|
temperature=0.7 |
|
) |
|
generation = generation[0][input_len:] |
|
|
|
# Decode the response |
|
response = processor.decode(generation, skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
### Chat Template Usage |
|
|
|
This model uses the standard Gemma 3 multimodal chat template with optimized formatting: |
|
|
|
#### Text-only Chat |
|
```python |
|
messages = [ |
|
{"role": "system", "content": "You are a helpful AI assistant specialized in logical reasoning and mathematics."}, |
|
{"role": "user", "content": "Explain the reasoning behind the Pythagorean theorem and provide a step-by-step proof."} |
|
] |
|
|
|
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
|
|
outputs = model.generate( |
|
**inputs, |
|
max_new_tokens=1024, |
|
do_sample=True, |
|
top_p=0.95, |
|
temperature=0.7 |
|
) |
|
|
|
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
#### Multimodal Chat (with Images) |
|
```python |
|
from PIL import Image |
|
|
|
# Load an image |
|
image = Image.open("path/to/your/image.jpg") |
|
|
|
messages = [ |
|
{ |
|
"role": "user", |
|
"content": "Analyze this chart and explain the trends you observe. What mathematical relationships can you identify?", |
|
"images": [image] # Include image in the message |
|
} |
|
] |
|
|
|
# Use processor for multimodal inputs |
|
model_inputs = processor.apply_chat_template( |
|
messages, |
|
add_generation_prompt=True, |
|
return_tensors="pt" |
|
) |
|
|
|
outputs = model.generate( |
|
**model_inputs, |
|
max_new_tokens=1024, |
|
do_sample=True, |
|
top_p=0.95, |
|
temperature=0.7 |
|
) |
|
|
|
response = processor.decode(outputs[0], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
#### Chat Template Format |
|
The model uses the following multimodal template format: |
|
``` |
|
{{- bos_token }} |
|
{%- for message in messages %} |
|
{%- if message['role'] == 'system' %} |
|
{{- '<start_of_turn>system\n' + message['content'] + '<end_of_turn>\n' }} |
|
{%- elif message['role'] == 'user' %} |
|
{{- '<start_of_turn>user\n' }} |
|
{%- if 'images' in message and message['images'] %} |
|
{%- for image in message['images'] %} |
|
{{- '<start_of_image>\n<end_of_image>\n' }} |
|
{%- endfor %} |
|
{%- endif %} |
|
{{- message['content'] + '<end_of_turn>\n' }} |
|
{%- elif message['role'] == 'assistant' %} |
|
{{- '<start_of_turn>model\n' + message['content'] + '<end_of_turn>\n' }} |
|
{%- endif %} |
|
{%- endfor %} |
|
{%- if add_generation_prompt and messages[-1]['role'] != 'assistant' %} |
|
{{- '<start_of_turn>model\n' }} |
|
{%- endif %} |
|
|
|
``` |
|
|
|
### Step-by-Step Reasoning Examples |
|
|
|
LogicFlow-Gemma-3-27b-thinking demonstrates exceptional reasoning capabilities through detailed Chain-of-Thought (CoT) processes. Below are real examples showcasing the model's thinking methodology: |
|
|
|
#### Example 1: Mathematical Comparison |
|
**Question**: "9.11 and 9.9, which one is larger?" |
|
|
|
 |
|
|
|
The model demonstrates sophisticated numerical reasoning by: |
|
- Converting decimals to fractional comparisons (11/100 vs 90/100) |
|
- Using multiple verification methods (number line visualization, real-world applications) |
|
- Calculating the precise difference (0.79) to confirm the result |
|
- Providing comprehensive step-by-step analysis |
|
|
|
#### Example 2: Letter Counting Task |
|
**Question**: "How many r's are in the word strawberry?" |
|
|
|
 |
|
|
|
The model showcases systematic thinking through: |
|
- Letter-by-letter breakdown of the word "strawberry" |
|
- Multiple verification approaches (position counting, pattern grouping) |
|
- Cross-checking results using different methodologies |
|
- Clear documentation of the reasoning process |
|
|
|
These examples demonstrate the model's ability to: |
|
- **Break down complex problems** into manageable steps |
|
- **Self-verify results** using multiple approaches |
|
- **Document reasoning chains** for transparency |
|
- **Maintain accuracy** while showing work |
|
|
|
### Activating Chain-of-Thought Reasoning |
|
|
|
To get the best reasoning performance from LogicFlow-Gemma-3-27b-thinking, use prompts that encourage step-by-step thinking: |
|
|
|
```python |
|
# Example prompt for mathematical reasoning |
|
prompt = """Please solve this problem step by step, showing your thinking process: |
|
|
|
Question: Compare 9.11 and 9.9. Which number is larger? |
|
|
|
Think through this carefully and show your work.""" |
|
|
|
# Example prompt for logical reasoning |
|
prompt = """Let me work through this systematically: |
|
|
|
Question: How many times does the letter 'r' appear in the word 'strawberry'? |
|
|
|
Please show your step-by-step analysis.""" |
|
|
|
# For complex problems, you can explicitly request thinking |
|
prompt = """Think step by step about this problem: |
|
|
|
[Your complex question here] |
|
|
|
Show your reasoning process before giving the final answer.""" |
|
``` |
|
|
|
**Pro Tips for Best Results:** |
|
- Use phrases like "step by step", "think through this", "show your work" |
|
- For math problems, request multiple verification methods |
|
- Ask for reasoning before the final answer |
|
- Use temperature settings around 0.7 for optimal reasoning creativity |
|
|
|
## Intended Use Cases |
|
|
|
This multimodal model is particularly well-suited for: |
|
|
|
### Educational Applications |
|
- **Chain-of-Thought Tutoring**: Demonstrates complete problem-solving processes with transparent reasoning steps |
|
- **Mathematical Education**: Shows multiple verification methods for mathematical concepts (as seen in 9.11 vs 9.9 example) |
|
- **Critical Thinking Development**: Models systematic analysis and self-verification techniques |
|
- **Visual Learning**: Analyzing educational diagrams, charts, and mathematical illustrations |
|
- **Interactive Learning**: Combining text and visual elements for comprehensive understanding |
|
|
|
### Mathematical & Scientific Analysis |
|
- **Chart Analysis**: Interpreting graphs, statistical charts, and data visualizations |
|
- **Geometric Problem Solving**: Analyzing geometric figures and spatial relationships |
|
- **Scientific Diagram Understanding**: Processing scientific illustrations and technical drawings |
|
- **Formula Recognition**: Understanding mathematical formulas in images |
|
|
|
### Professional Applications |
|
- **Document Analysis**: Processing documents containing both text and visual elements |
|
- **Technical Documentation**: Understanding technical manuals with diagrams |
|
- **Data Visualization**: Analyzing and explaining complex charts and infographics |
|
- **Research Assistance**: Combining textual research with visual data analysis |
|
|
|
### Advanced Reasoning Tasks |
|
- **Chain-of-Thought Problem Solving**: Complex reasoning with detailed step-by-step analysis and self-verification |
|
- **Multi-Method Validation**: Using multiple approaches to verify answers (numerical comparison, pattern analysis, etc.) |
|
- **Transparent Decision Making**: Showing complete reasoning chains for critical analysis tasks |
|
- **Multimodal Problem Solving**: Tackling problems that require both visual and textual understanding |
|
- **Visual Code Analysis**: Understanding flowcharts, UML diagrams, and code structure visualizations |
|
- **Pattern Recognition**: Identifying patterns in both visual and textual data |
|
|
|
## Limitations |
|
|
|
### Text Generation |
|
- The model may occasionally generate incorrect mathematical calculations despite showing proper reasoning steps |
|
- Performance on highly specialized domain knowledge outside of mathematics and logic may be limited |
|
- As with all language models, it can sometimes produce hallucinated information |
|
|
|
### Vision Understanding |
|
- **Image Resolution**: Images are resized to 896x896 pixels, which may lose important details in high-resolution images |
|
- **Image Quality**: Poor quality, blurry, or low-contrast images may reduce accuracy |
|
- **Complex Visual Elements**: Very dense charts or diagrams with small text may be challenging to interpret |
|
- **Image Formats**: Only supports standard image formats (JPEG, PNG, WebP) |
|
|
|
### General Limitations |
|
- The model should not be used for critical decision-making without human verification |
|
- Multimodal reasoning combining complex visual and textual elements may sometimes produce inconsistent results |
|
- Processing images increases computational requirements and inference time |
|
|
|
## Ethical Considerations |
|
|
|
- This model should be used responsibly and outputs should be verified, especially for important decisions |
|
- The model may reflect biases present in its training data |
|
- Users should be aware that the model's reasoning, while often sound, is not infallible |
|
|
|
## Complete Training Configuration |
|
|
|
For full reproducibility, here is the complete training configuration used: |
|
|
|
```yaml |
|
bf16: true |
|
cutoff_len: 2048 |
|
dataset: openo1_sft,open_thoughts,open_r1_math # Three specialized reasoning datasets |
|
dataset_dir: data |
|
ddp_timeout: 180000000 |
|
do_train: true |
|
enable_thinking: true |
|
finetuning_type: lora |
|
flash_attn: auto |
|
freeze_multi_modal_projector: true |
|
freeze_vision_tower: true |
|
gradient_accumulation_steps: 8 |
|
image_max_pixels: 589824 |
|
image_min_pixels: 1024 |
|
include_num_input_tokens_seen: true |
|
learning_rate: 5.0e-05 |
|
logging_steps: 5 |
|
lora_alpha: 16 |
|
lora_dropout: 0 |
|
lora_rank: 8 |
|
lora_target: all |
|
lr_scheduler_type: cosine |
|
max_grad_norm: 1.0 |
|
max_samples: 100000 |
|
model_name_or_path: google/gemma-3-27b-it |
|
num_train_epochs: 5.0 |
|
optim: adamw_torch |
|
output_dir: saves/Gemma-3-27B-Instruct/lora/train_2025-06-12-17-10-14 |
|
packing: false |
|
per_device_train_batch_size: 2 |
|
plot_loss: true |
|
preprocessing_num_workers: 16 |
|
report_to: none |
|
save_steps: 100 |
|
stage: sft |
|
template: gemma |
|
trust_remote_code: true |
|
video_max_pixels: 65536 |
|
video_min_pixels: 256 |
|
warmup_steps: 100 |
|
``` |
|
|
|
## Technical Specifications |
|
|
|
### Core Framework |
|
- **Framework**: Transformers 4.52.4 |
|
- **PEFT Version**: 0.15.2 |
|
- **PyTorch Version**: 2.7.0+cu126 |
|
- **Training Framework**: LLaMA-Factory with LoRA fine-tuning |
|
|
|
### Hardware Requirements |
|
- **Recommended GPU Memory**: 32GB+ VRAM for multimodal inference |
|
- **Minimum GPU Memory**: 24GB VRAM (text-only mode) |
|
- **CPU Memory**: 64GB+ RAM recommended for optimal performance |
|
- **Quantization**: Supports 4-bit and 8-bit quantization for reduced memory usage |
|
|
|
### Vision Specifications |
|
- **Vision Model**: SIGLIP-based vision encoder |
|
- **Image Resolution**: 896x896 pixels (normalized) |
|
- **Image Patch Size**: 14x14 pixels |
|
- **Vision Hidden Size**: 1,152 |
|
- **Vision Layers**: 27 layers |
|
- **Tokens per Image**: 256 tokens |
|
- **Supported Image Formats**: JPEG, PNG, WebP |
|
|
|
### Architecture Details |
|
- **Model Architecture**: Gemma3ForConditionalGeneration |
|
- **Text Hidden Size**: 5,376 |
|
- **Vision Hidden Size**: 1,152 |
|
- **Attention Heads**: 32 (text), 16 (vision) |
|
- **Hidden Layers**: 62 (text), 27 (vision) |
|
- **Context Window**: 131,072 tokens (including image tokens) |
|
|
|
## Citation |
|
|
|
If you use this model in your research or applications, please cite: |
|
|
|
```bibtex |
|
@model{logicflow-gemma-3-27b-thinking, |
|
title={LogicFlow-Gemma-3-27b-thinking: A Fine-tuned Model for Enhanced Reasoning}, |
|
author={[Xiangda Li]}, |
|
year={2025}, |
|
base_model={google/gemma-3-27b-it}, |
|
url={https://huggingface.co/RekklesAI/LogicFlow-Gemma-3-27b-thinking} |
|
} |
|
``` |
|
|
|
## Acknowledgments |
|
|
|
- Based on Google's Gemma-3-27B-IT model |
|
- Fine-tuned using LLaMA-Factory framework |
|
- Training data from open-source reasoning and mathematics datasets |
|
|
|
--- |
|
|
|
*This model card was generated to provide comprehensive information about the LogicFlow-Gemma-3-27b-thinking model. Please refer to the original Gemma-3 model documentation for additional technical details about the base architecture.* |