5 25 126

Van os

Vanos007

AI & ML interests

None yet

Recent Activity

liked a model 17 days ago

deepseek-ai/DeepSeek-R1-0528

upvoted a collection 3 months ago

Llama 4

reacted to lianghsun's post with 👍 3 months ago

With the arrival of Twinkle April — Twinkle AI’s annual open-source celebration held every April — our community is excited to unveil its very first project: 📊 Twinkle Eval (https://github.com/ai-twinkle/Eval), a next-generation evaluation tool led by our contributor @tedslin . Unlike traditional evaluation tools like iKala’s ievals (https://github.com/ikala-ai/ievals), which can only evaluate language models (LMs) one sample at a time, Twinkle Eval is designed with Large Reasoning Models (LRMs) in mind. As reasoning time increases with more complex models, traditional tools become increasingly inefficient 😲 — for example, evaluating LRMs on the https://huggingface.co/datasets/ikala/tmmluplus benchmark could take * half a day without finishing. One question we were especially curious about: Does shuffling multiple-choice answer order impact model accuracy? 🤔 → See: "Change Answer Order Can Decrease MMLU Accuracy" – arXiv:2406.19470v1 To address these challenges, Twinkle Eval brings three key innovations to the table: 1️⃣ Parallelized evaluation of samples 2️⃣ Multi-round testing for stability 3️⃣ Randomized answer order to test robustness After running experiments, we observed that Twinkle Eval can speed up evaluation by up to 15× 🚀🚀. Interestingly, most models scored slightly lower under the 2️⃣3️⃣ test settings compared to their claimed performance — suggesting further benchmarking is needed. This framework also comes with additional tunable parameters and detailed logging of LM behavior per question — perfect for those who want to dive deeper. 😆 If you find Twinkle Eval useful, please ⭐ the project and help spread the word 🤗

View all activity

Organizations

liked a model 17 days ago

deepseek-ai/DeepSeek-R1-0528

Text Generation • 685B • Updated 30 days ago • 173k • • 2.13k

upvoted a collection 3 months ago

Llama 4

Collection

Llama 4 release • 13 items • Updated Apr 29 • 553

reacted to lianghsun's post with 👍 3 months ago

Post

2701

With the arrival of Twinkle April — Twinkle AI’s annual open-source celebration held every April — our community is excited to unveil its very first project:

📊 Twinkle Eval (https://github.com/ai-twinkle/Eval), a next-generation evaluation tool led by our contributor @tedslin .

Unlike traditional evaluation tools like iKala’s ievals (https://github.com/ikala-ai/ievals), which can only evaluate language models (LMs) one sample at a time, Twinkle Eval is designed with Large Reasoning Models (LRMs) in mind. As reasoning time increases with more complex models, traditional tools become increasingly inefficient 😲 — for example, evaluating LRMs on the ikala/tmmluplus benchmark could take *
half a day without finishing.

One question we were especially curious about:
Does shuffling multiple-choice answer order impact model accuracy? 🤔
→ See: "Change Answer Order Can Decrease MMLU Accuracy" – arXiv:2406.19470v1

To address these challenges, Twinkle Eval brings three key innovations to the table:

1️⃣ Parallelized evaluation of samples
2️⃣ Multi-round testing for stability
3️⃣ Randomized answer order to test robustness

After running experiments, we observed that Twinkle Eval can speed up evaluation by up to 15× 🚀🚀. Interestingly, most models scored slightly lower under the 2️⃣3️⃣ test settings compared to their claimed performance — suggesting further benchmarking is needed.

This framework also comes with additional tunable parameters and detailed logging of LM behavior per question — perfect for those who want to dive deeper. 😆

If you find Twinkle Eval useful, please ⭐ the project and help spread the word 🤗

5 replies

liked a dataset 3 months ago

jed351/Chinese-Common-Crawl-Filtered

Viewer • Updated 26 days ago • 21.3M • 418 • 15

liked a Space 3 months ago

Translate 100 Languages

📈

Translate text between multiple languages

liked 4 datasets 4 months ago

liked a Space 4 months ago

Phi 4 Mini

🌍

Demos for Phi-4-mini-instruct model

liked 4 models 4 months ago

microsoft/phi-4

Text Generation • 15B • Updated Feb 24 • 777k • • 2.09k

microsoft/phi-4-onnx

Updated Feb 27 • 56 • 19

openai-community/gpt2

Text Generation • 0.1B • Updated Feb 19, 2024 • 15.2M • 2.81k

deepseek-ai/DeepSeek-R1

Text Generation • 685B • Updated Mar 27 • 602k • • 12.4k

reacted to prithivMLmods's post with 😎 5 months ago

Post

4304

QwQ Edge Gets a Small Update..! 💬
try now: https://huggingface.co/spaces/prithivMLmods/QwQ-Edge

🚀Now, you can use the following commands for different tasks:

🖼️ @image 'prompt...' → Generates an image
🔉@tts1 'prompt...' → Generates speech in a female voice
🔉 @tts2 'prompt...' → Generates speech in a male voice
🅰️@text 'prompt...' → Enables textual conversation (If not specified, text-to-text generation is the default mode)

💬Multimodality Support : prithivMLmods/Qwen2-VL-OCR-2B-Instruct
💬For text generation, the FastThink-0.5B model ensures quick and efficient responses, prithivMLmods/FastThink-0.5B-Tiny
💬Image Generation: sdxl lightning model, SG161222/RealVisXL_V4.0_Lightning

Github: https://github.com/PRITHIVSAKTHIUR/QwQ-Edge

graph TD
    A[User Interface] --> B[Chat Logic]
    B --> C{Command Type}
    C -->|Text| D[FastThink-0.5B]
    C -->|Image| E[Qwen2-VL-OCR-2B]
    C -->|@image| F[Stable Diffusion XL]
    C -->|@tts| G[Edge TTS]
    D --> H[Response]
    E --> H
    F --> H
    G --> H