22 3 23

Odunayo Ogundepo

ToluClassics

https://github.com/ToluClassics

ToluClassics

AI & ML interests

None yet

Recent Activity

updated a dataset 1 day ago

taresco/details_meta-llama__Llama-3.1-70B-Instruct

published a dataset 1 day ago

taresco/details_meta-llama__Llama-3.1-70B-Instruct

published a model 4 days ago

ToluClassics/translated_open_math_instruct_v2-GRPO

View all activity

Organizations

ToluClassics's activity

updated a dataset 1 day ago

taresco/details_meta-llama__Llama-3.1-70B-Instruct

Viewer • Updated 1 day ago • 502 • 24

published a dataset 1 day ago

taresco/details_meta-llama__Llama-3.1-70B-Instruct

Viewer • Updated 1 day ago • 502 • 24

published a model 4 days ago

ToluClassics/translated_open_math_instruct_v2-GRPO

Updated 4 days ago

updated a dataset 4 days ago

taresco/details_taresco__lugha_llama_8b_wura_math_no_instruction_mask

Viewer • Updated 4 days ago • 3.77k • 42

reacted to openfree's post with 🚀 4 days ago

Post

2209

🚀 Introducing Phi-4-reasoning-plus: Powerful 14B Reasoning Model by Microsoft!

VIDraft/phi-4-reasoning-plus

🌟 Key Highlights
Compact Size (14B parameters): Efficient for use in environments with limited computing resources, yet powerful in performance.

Extended Context (32k tokens): Capable of handling lengthy and complex input sequences.

Enhanced Reasoning: Excels at multi-step reasoning, particularly in mathematics, science, and coding challenges.

Chain-of-Thought Methodology: Provides a detailed reasoning process, followed by concise, accurate summaries.

🏅 Benchmark Achievements
Despite its smaller size, Phi-4-reasoning-plus has delivered impressive results, often surpassing significantly larger models:

Mathematical Reasoning (AIME 2025): Achieved an accuracy of 78%, significantly outperforming larger models like DeepSeek-R1 Distilled (51.5%) and Claude-3.7 Sonnet (58.7%).

Olympiad-level Math (OmniMath): Strong performance with an accuracy of 81.9%, surpassing DeepSeek-R1 Distilled's 63.4%.

Graduate-Level Science Questions (GPQA-Diamond): Delivered competitive performance at 68.9%, close to larger models and demonstrating its capabilities in advanced scientific reasoning.

Coding Challenges (LiveCodeBench): Scored 53.1%, reflecting strong performance among smaller models, though slightly behind specialized coding-focused models.

🛡️ Safety and Robustness
Comprehensive safety evaluation completed through Microsoft's independent AI Red Team assessments.

High standards of alignment and responsible AI compliance validated through extensive adversarial testing.

🎯 Recommended Applications
Phi-4-reasoning-plus is especially suitable for:
Systems with limited computational resources.
Latency-sensitive applications requiring quick yet accurate responses.

📜 License
Freely available under the MIT License for broad accessibility and flexible integration into your projects.