Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -27,38 +27,20 @@ pipeline_tag: image-text-to-text
|
|
27 |
|
28 |
LogicFlow-Gemma-3-27b-thinking is an advanced **multimodal reasoning model** built upon [google/gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it), specifically designed to excel at complex logical reasoning, mathematical problem-solving, and step-by-step analytical thinking. This model represents a significant advancement in AI reasoning capabilities, achieved through careful fine-tuning on three specialized, high-quality datasets using LoRA (Low-Rank Adaptation) technique.
|
29 |
|
30 |
-
### Training Dataset Foundation
|
31 |
-
|
32 |
-
Our model has been meticulously trained on three cutting-edge datasets, each contributing unique reasoning capabilities:
|
33 |
-
|
34 |
-
#### π§ **OpenO1-SFT Dataset**
|
35 |
-
- **Purpose**: Supervised fine-tuning for advanced reasoning patterns
|
36 |
-
- **Content**: High-quality reasoning demonstrations with explicit thought processes
|
37 |
-
- **Impact**: Enables the model to break down complex problems systematically and show transparent reasoning chains
|
38 |
-
|
39 |
-
#### π **Open-Thoughts Dataset**
|
40 |
-
- **Purpose**: Step-by-step thinking process modeling
|
41 |
-
- **Content**: Detailed internal monologues and reasoning progressions for various problem types
|
42 |
-
- **Impact**: Teaches the model to externalize its thinking process, making reasoning transparent and verifiable
|
43 |
-
|
44 |
-
#### π’ **OpenR1-Math Dataset**
|
45 |
-
- **Purpose**: Mathematical reasoning and problem-solving specialization
|
46 |
-
- **Content**: Comprehensive mathematical problems with detailed solution methodologies
|
47 |
-
- **Impact**: Significantly enhances performance on mathematical reasoning tasks, from basic arithmetic to advanced competition-level problems
|
48 |
|
49 |
### Key Innovations
|
50 |
|
51 |
This unique combination of datasets creates a model that not only provides correct answers but also demonstrates **how** it arrives at those answers, making it particularly valuable for educational applications, research, and any scenario requiring explainable AI reasoning.
|
52 |
|
53 |
The model demonstrates enhanced capabilities in:
|
54 |
-
-
|
55 |
-
-
|
56 |
-
-
|
57 |
-
-
|
58 |
-
-
|
59 |
-
-
|
60 |
-
-
|
61 |
-
-
|
62 |
|
63 |
## Model Details
|
64 |
|
@@ -77,11 +59,20 @@ The model demonstrates enhanced capabilities in:
|
|
77 |
### Training Data
|
78 |
The model was fine-tuned on three carefully selected, high-quality datasets that form the foundation of its exceptional reasoning capabilities:
|
79 |
|
80 |
-
|
|
|
|
|
|
|
81 |
|
82 |
-
|
|
|
|
|
|
|
83 |
|
84 |
-
|
|
|
|
|
|
|
85 |
|
86 |
This synergistic combination creates a model that excels not only at providing accurate answers but also at demonstrating clear, verifiable reasoning processes.
|
87 |
|
@@ -156,20 +147,20 @@ The loss curve demonstrates stable convergence with the final training loss reac
|
|
156 |
|
157 |
| **Benchmark** | **Metric** | **Base Gemma-3-27B-IT** | **LogicFlow-Gemma-3-27b-thinking** | **Improvement** |
|
158 |
|---------------|------------|--------------------------|-------------------------------------|-----------------|
|
159 |
-
|
|
160 |
| GSM8K | Exact Match | 82.6% | **89.5%** | **+6.9%** |
|
161 |
| MATH | Accuracy | 50.0% | **76.8%** | **+26.8%** |
|
162 |
-
|
|
163 |
| MBPP | pass@1 | 65.6% | **69.0%** | **+3.4%** |
|
164 |
| HumanEval | 0-shot | 48.8% | *Pending* | *TBD* |
|
165 |
-
|
|
166 |
| IFEval | Prompt-level | *45.0%* | **40.0%** | **-5.0%** |
|
167 |
| IFEval | Instruction-level | *58.0%* | **53.1%** | **-4.9%** |
|
168 |
-
|
|
169 |
| AIME25 | Problem Solving | ~8-12% | **13.3%** | **+1-5%** |
|
170 |
-
|
|
171 |
| GPQA Diamond | Science QA | ~30-35% | **45.96%** | **+11-16%** |
|
172 |
-
|
|
173 |
| MMLU | Overall Accuracy | 78.6% | **75.3%** | **-3.3%** |
|
174 |
| MMLU STEM | Sciences & Math | ~70.0% | **71.6%** | **+1.6%** |
|
175 |
| MMLU Humanities | Arts & Literature | ~67.0% | **69.2%** | **+2.2%** |
|
@@ -178,7 +169,7 @@ The loss curve demonstrates stable convergence with the final training loss reac
|
|
178 |
|
179 |
### Key Performance Insights
|
180 |
|
181 |
-
####
|
182 |
- **Mathematical Reasoning**: Exceptional improvements - GSM8K (+6.9%) and MATH (+26.8%) demonstrate enhanced step-by-step problem solving
|
183 |
- **Advanced Mathematics**: Massive 26.8% improvement on MATH benchmark showcases superior mathematical reasoning capabilities
|
184 |
- **Scientific Reasoning**: Outstanding 45.96% accuracy on GPQA Diamond - significantly above typical model performance (30-35%)
|
@@ -186,12 +177,12 @@ The loss curve demonstrates stable convergence with the final training loss reac
|
|
186 |
- **Code Generation**: 3.4% improvement on MBPP shows better programming logic understanding
|
187 |
- **Domain-Specific Knowledge**: Improvements in STEM (+1.6%), Humanities (+2.2%), and Social Sciences (+2.3%)
|
188 |
|
189 |
-
####
|
190 |
- **Instruction Following**: Slight decrease in IFEval scores (-5% prompt-level, -4.9% instruction-level)
|
191 |
- **General Knowledge**: Overall MMLU score decreased by 3.3% due to reasoning specialization
|
192 |
- **Reasoning Focus**: Model optimized for deep analytical thinking over rapid instruction compliance
|
193 |
|
194 |
-
####
|
195 |
- **Mathematical Excellence**: Outstanding 76.8% accuracy on MATH benchmark - among the top performances for 27B models
|
196 |
- **Scientific Reasoning**: Exceptional 45.96% on GPQA Diamond - handling graduate-level physics, chemistry, and biology problems
|
197 |
- **Elite Competition Performance**: Competitive 13.3% on AIME25 - tackling American Invitational Mathematics Exam challenges
|
@@ -430,10 +421,10 @@ The model showcases systematic thinking through:
|
|
430 |
- Clear documentation of the reasoning process
|
431 |
|
432 |
These examples demonstrate the model's ability to:
|
433 |
-
-
|
434 |
-
-
|
435 |
-
-
|
436 |
-
-
|
437 |
|
438 |
### Activating Chain-of-Thought Reasoning
|
439 |
|
@@ -472,26 +463,26 @@ Show your reasoning process before giving the final answer."""
|
|
472 |
|
473 |
This multimodal model is particularly well-suited for:
|
474 |
|
475 |
-
###
|
476 |
- **Chain-of-Thought Tutoring**: Demonstrates complete problem-solving processes with transparent reasoning steps
|
477 |
- **Mathematical Education**: Shows multiple verification methods for mathematical concepts (as seen in 9.11 vs 9.9 example)
|
478 |
- **Critical Thinking Development**: Models systematic analysis and self-verification techniques
|
479 |
- **Visual Learning**: Analyzing educational diagrams, charts, and mathematical illustrations
|
480 |
- **Interactive Learning**: Combining text and visual elements for comprehensive understanding
|
481 |
|
482 |
-
###
|
483 |
- **Chart Analysis**: Interpreting graphs, statistical charts, and data visualizations
|
484 |
- **Geometric Problem Solving**: Analyzing geometric figures and spatial relationships
|
485 |
- **Scientific Diagram Understanding**: Processing scientific illustrations and technical drawings
|
486 |
- **Formula Recognition**: Understanding mathematical formulas in images
|
487 |
|
488 |
-
###
|
489 |
- **Document Analysis**: Processing documents containing both text and visual elements
|
490 |
- **Technical Documentation**: Understanding technical manuals with diagrams
|
491 |
- **Data Visualization**: Analyzing and explaining complex charts and infographics
|
492 |
- **Research Assistance**: Combining textual research with visual data analysis
|
493 |
|
494 |
-
###
|
495 |
- **Chain-of-Thought Problem Solving**: Complex reasoning with detailed step-by-step analysis and self-verification
|
496 |
- **Multi-Method Validation**: Using multiple approaches to verify answers (numerical comparison, pattern analysis, etc.)
|
497 |
- **Transparent Decision Making**: Showing complete reasoning chains for critical analysis tasks
|
|
|
27 |
|
28 |
LogicFlow-Gemma-3-27b-thinking is an advanced **multimodal reasoning model** built upon [google/gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it), specifically designed to excel at complex logical reasoning, mathematical problem-solving, and step-by-step analytical thinking. This model represents a significant advancement in AI reasoning capabilities, achieved through careful fine-tuning on three specialized, high-quality datasets using LoRA (Low-Rank Adaptation) technique.
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
### Key Innovations
|
32 |
|
33 |
This unique combination of datasets creates a model that not only provides correct answers but also demonstrates **how** it arrives at those answers, making it particularly valuable for educational applications, research, and any scenario requiring explainable AI reasoning.
|
34 |
|
35 |
The model demonstrates enhanced capabilities in:
|
36 |
+
- ** Logical Reasoning**: Improved ability to work through complex logical problems step by step
|
37 |
+
- ** Mathematical Problem Solving**: Enhanced performance on mathematical reasoning tasks (76.8% MATH, 13.3% AIME25)
|
38 |
+
- ** Scientific Analysis**: Exceptional scientific reasoning capabilities (45.96% GPQA Diamond)
|
39 |
+
- ** Chain-of-Thought Reasoning**: Superior step-by-step thinking with detailed reasoning chains and self-verification
|
40 |
+
- ** Structured Analysis**: Improved at breaking down complex problems into manageable components
|
41 |
+
- ** Multi-Method Verification**: Uses multiple approaches to validate results and ensure accuracy
|
42 |
+
- ** Vision Understanding**: Ability to analyze and reason about images, charts, diagrams, and visual data
|
43 |
+
- ** Multimodal Reasoning**: Combining visual and textual information for comprehensive analysis
|
44 |
|
45 |
## Model Details
|
46 |
|
|
|
59 |
### Training Data
|
60 |
The model was fine-tuned on three carefully selected, high-quality datasets that form the foundation of its exceptional reasoning capabilities:
|
61 |
|
62 |
+
#### **OpenO1-SFT Dataset**
|
63 |
+
- **Purpose**: Supervised fine-tuning for advanced reasoning patterns
|
64 |
+
- **Content**: High-quality reasoning demonstrations with explicit thought processes
|
65 |
+
- **Impact**: Enables the model to break down complex problems systematically and show transparent reasoning chains
|
66 |
|
67 |
+
#### **Open-Thoughts Dataset**
|
68 |
+
- **Purpose**: Step-by-step thinking process modeling
|
69 |
+
- **Content**: Detailed internal monologues and reasoning progressions for various problem types
|
70 |
+
- **Impact**: Teaches the model to externalize its thinking process, making reasoning transparent and verifiable
|
71 |
|
72 |
+
#### **OpenR1-Math Dataset**
|
73 |
+
- **Purpose**: Mathematical reasoning and problem-solving specialization
|
74 |
+
- **Content**: Comprehensive mathematical problems with detailed solution methodologies
|
75 |
+
- **Impact**: Significantly enhances performance on mathematical reasoning tasks, from basic arithmetic to advanced competition-level problems
|
76 |
|
77 |
This synergistic combination creates a model that excels not only at providing accurate answers but also at demonstrating clear, verifiable reasoning processes.
|
78 |
|
|
|
147 |
|
148 |
| **Benchmark** | **Metric** | **Base Gemma-3-27B-IT** | **LogicFlow-Gemma-3-27b-thinking** | **Improvement** |
|
149 |
|---------------|------------|--------------------------|-------------------------------------|-----------------|
|
150 |
+
| ** Mathematical Reasoning** |
|
151 |
| GSM8K | Exact Match | 82.6% | **89.5%** | **+6.9%** |
|
152 |
| MATH | Accuracy | 50.0% | **76.8%** | **+26.8%** |
|
153 |
+
| ** Code Generation** |
|
154 |
| MBPP | pass@1 | 65.6% | **69.0%** | **+3.4%** |
|
155 |
| HumanEval | 0-shot | 48.8% | *Pending* | *TBD* |
|
156 |
+
| ** Instruction Following** |
|
157 |
| IFEval | Prompt-level | *45.0%* | **40.0%** | **-5.0%** |
|
158 |
| IFEval | Instruction-level | *58.0%* | **53.1%** | **-4.9%** |
|
159 |
+
| ** Advanced Mathematics** |
|
160 |
| AIME25 | Problem Solving | ~8-12% | **13.3%** | **+1-5%** |
|
161 |
+
| ** Scientific Reasoning** |
|
162 |
| GPQA Diamond | Science QA | ~30-35% | **45.96%** | **+11-16%** |
|
163 |
+
| ** Knowledge & Understanding** |
|
164 |
| MMLU | Overall Accuracy | 78.6% | **75.3%** | **-3.3%** |
|
165 |
| MMLU STEM | Sciences & Math | ~70.0% | **71.6%** | **+1.6%** |
|
166 |
| MMLU Humanities | Arts & Literature | ~67.0% | **69.2%** | **+2.2%** |
|
|
|
169 |
|
170 |
### Key Performance Insights
|
171 |
|
172 |
+
#### **Significant Improvements**
|
173 |
- **Mathematical Reasoning**: Exceptional improvements - GSM8K (+6.9%) and MATH (+26.8%) demonstrate enhanced step-by-step problem solving
|
174 |
- **Advanced Mathematics**: Massive 26.8% improvement on MATH benchmark showcases superior mathematical reasoning capabilities
|
175 |
- **Scientific Reasoning**: Outstanding 45.96% accuracy on GPQA Diamond - significantly above typical model performance (30-35%)
|
|
|
177 |
- **Code Generation**: 3.4% improvement on MBPP shows better programming logic understanding
|
178 |
- **Domain-Specific Knowledge**: Improvements in STEM (+1.6%), Humanities (+2.2%), and Social Sciences (+2.3%)
|
179 |
|
180 |
+
#### **Trade-offs Observed**
|
181 |
- **Instruction Following**: Slight decrease in IFEval scores (-5% prompt-level, -4.9% instruction-level)
|
182 |
- **General Knowledge**: Overall MMLU score decreased by 3.3% due to reasoning specialization
|
183 |
- **Reasoning Focus**: Model optimized for deep analytical thinking over rapid instruction compliance
|
184 |
|
185 |
+
#### **Specialized Capabilities**
|
186 |
- **Mathematical Excellence**: Outstanding 76.8% accuracy on MATH benchmark - among the top performances for 27B models
|
187 |
- **Scientific Reasoning**: Exceptional 45.96% on GPQA Diamond - handling graduate-level physics, chemistry, and biology problems
|
188 |
- **Elite Competition Performance**: Competitive 13.3% on AIME25 - tackling American Invitational Mathematics Exam challenges
|
|
|
421 |
- Clear documentation of the reasoning process
|
422 |
|
423 |
These examples demonstrate the model's ability to:
|
424 |
+
- ** Break down complex problems** into manageable steps
|
425 |
+
- ** Self-verify results** using multiple approaches
|
426 |
+
- ** Document reasoning chains** for transparency
|
427 |
+
- ** Maintain accuracy** while showing work
|
428 |
|
429 |
### Activating Chain-of-Thought Reasoning
|
430 |
|
|
|
463 |
|
464 |
This multimodal model is particularly well-suited for:
|
465 |
|
466 |
+
### Educational Applications
|
467 |
- **Chain-of-Thought Tutoring**: Demonstrates complete problem-solving processes with transparent reasoning steps
|
468 |
- **Mathematical Education**: Shows multiple verification methods for mathematical concepts (as seen in 9.11 vs 9.9 example)
|
469 |
- **Critical Thinking Development**: Models systematic analysis and self-verification techniques
|
470 |
- **Visual Learning**: Analyzing educational diagrams, charts, and mathematical illustrations
|
471 |
- **Interactive Learning**: Combining text and visual elements for comprehensive understanding
|
472 |
|
473 |
+
### Mathematical & Scientific Analysis
|
474 |
- **Chart Analysis**: Interpreting graphs, statistical charts, and data visualizations
|
475 |
- **Geometric Problem Solving**: Analyzing geometric figures and spatial relationships
|
476 |
- **Scientific Diagram Understanding**: Processing scientific illustrations and technical drawings
|
477 |
- **Formula Recognition**: Understanding mathematical formulas in images
|
478 |
|
479 |
+
### Professional Applications
|
480 |
- **Document Analysis**: Processing documents containing both text and visual elements
|
481 |
- **Technical Documentation**: Understanding technical manuals with diagrams
|
482 |
- **Data Visualization**: Analyzing and explaining complex charts and infographics
|
483 |
- **Research Assistance**: Combining textual research with visual data analysis
|
484 |
|
485 |
+
### Advanced Reasoning Tasks
|
486 |
- **Chain-of-Thought Problem Solving**: Complex reasoning with detailed step-by-step analysis and self-verification
|
487 |
- **Multi-Method Validation**: Using multiple approaches to verify answers (numerical comparison, pattern analysis, etc.)
|
488 |
- **Transparent Decision Making**: Showing complete reasoning chains for critical analysis tasks
|