fahrizalfarid's picture

In a Training Loop 🔄

fahrizalfarid

akahana

·

AI & ML interests

NLP

Recent Activity

reacted to SeaWolf-AI's post with 🔥 about 12 hours ago

🏟️ Smol AI WorldCup: A 4B Model Just Beat 8B — Here's the Data We evaluated 18 small language models from 12 makers on 125 questions across 7 languages. The results challenge the assumption that bigger is always better. Community Article: https://huggingface.co/blog/FINAL-Bench/smol-worldcup Live Leaderboard: https://huggingface.co/spaces/ginigen-ai/smol-worldcup Dataset: https://huggingface.co/datasets/ginigen-ai/smol-worldcup What we found: → Gemma-3n-E4B (4B, 2GB RAM) outscores Qwen3-8B (8B, 5.5GB). Doubling parameters gained only 0.4 points. RAM cost: 2.75x more. → GPT-OSS-20B fits in 1.5GB yet matches Champions-league dense models requiring 8.5GB. MoE architecture is the edge AI game-changer. → Thinking models hurt structured output. DeepSeek-R1-7B scores 8.7 points below same-size Qwen3-8B and runs 2.7x slower. → A 1.3B model fabricates confident fake content 80% of the time when prompted with nonexistent entities. Qwen3 family hits 100% trap detection across all sizes. → Qwen3-1.7B (1.2GB) outscores Mistral-7B, Llama-3.1-8B, and DeepSeek-R1-14B. Latest architecture at 1.7B beats older architecture at 14B. What makes this benchmark different? Most benchmarks ask "how smart?" — we measure five axes simultaneously: Size, Honesty, Intelligence, Fast, Thrift (SHIFT). Our ranking metric WCS = sqrt(SHIFT x PIR_norm) rewards models that are both high-quality AND efficient. Smart but massive? Low rank. Tiny but poor? Also low. Top 5 by WCS: 1. GPT-OSS-20B — WCS 82.6 — 1.5GB — Raspberry Pi tier 2. Gemma-3n-E4B — WCS 81.8 — 2.0GB — Smartphone tier 3. Llama-4-Scout — WCS 79.3 — 240 tok/s — Fastest model 4. Qwen3-4B — WCS 76.6 — 2.8GB — Smartphone tier 5. Qwen3-1.7B — WCS 76.1 — 1.2GB — IoT tier Built in collaboration with the FINAL Bench research team. Interoperable with ALL Bench Leaderboard for full small-to-large model comparison. Dataset is open under Apache 2.0 (125 questions, 7 languages). We welcome new model submissions.

updated a dataset 18 days ago

akahana/wikipedia-id-conv

published a dataset 18 days ago

akahana/wikipedia-id-conv

View all activity

Organizations

None yet

akahana 's datasets 56

akahana/wikipedia-id-conv

Viewer • Updated 18 days ago • 666k • 22

akahana/LLaVA-Instruct-150K

Preview • Updated Jan 13 • 17

akahana/wikipedia-full

Viewer • Updated Dec 24, 2025 • 61.6M • 724

akahana/Medical-Reasoning-SFT-GPT-OSS-120B

Viewer • Updated Dec 23, 2025 • 200k • 33

akahana/alpaca-gpt4-indonesian

Viewer • Updated Dec 23, 2025 • 50k • 14 • 1

akahana/tesis

Preview • Updated Dec 19, 2025 • 16

akahana/doodle-blip-captions

Viewer • Updated Dec 18, 2025 • 1k • 13

akahana/pokemon-blip-captions

Viewer • Updated Dec 18, 2025 • 833 • 14

akahana/geo

Updated Dec 16, 2025 • 17

akahana/flickr30k

Updated Dec 16, 2025 • 9

akahana/english-indonesia-wikimatrix-token

Viewer • Updated Dec 11, 2025 • 1.02M • 30

akahana/english-indonesia-wikimatrix

Viewer • Updated Dec 9, 2025 • 1.02M • 12

akahana/english-indonesia

Viewer • Updated Dec 9, 2025 • 1M • 10

akahana/ubuntu

Updated Nov 27, 2025 • 7

akahana/anti-spoofing-nuaaaa

Viewer • Updated Jun 4, 2025 • 8.6k • 10

akahana/anti-spoofing-casiafasd

Viewer • Updated Jun 4, 2025 • 4.06k • 9

akahana/hifi-gan

Updated Jun 1, 2025 • 7

akahana/Driver-Drowsiness-Dataset

Viewer • Updated May 14, 2025 • 41.8k • 24 • 2

akahana/mpii-face-gaze

Updated May 12, 2025 • 24

akahana/common-voice-11-eng-sample

Updated May 9, 2025 • 12

akahana/children-codes-stories

Updated Mar 19, 2025 • 31

akahana/vlm

Updated Mar 18, 2025 • 15

akahana/medical

Updated Mar 15, 2025 • 900

akahana/llm-opus-ParaCrawl-english-id-v2

Updated Mar 13, 2025 • 22

akahana/llamacpp

Updated Mar 11, 2025 • 6

akahana/camel-ai-sains

Updated Mar 10, 2025 • 8

akahana/big-machine-translations

Updated Mar 9, 2025 • 65

akahana/rocov2-full

Updated Mar 8, 2025 • 9

akahana/dolphin-r1

Viewer • Updated Feb 3, 2025 • 814k • 22

akahana/OpenThoughts-114k

Viewer • Updated Feb 3, 2025 • 114k • 55