1 2 9

Jiachen Du

Baphomet666

AI & ML interests

Use Large Language Model (LLM) to empower reading Tarot

Recent Activity

new activity 24 days ago

FunAudioLLM/Fun-ASR-Nano-2512:Where is main.py?

upvoted a paper 3 months ago

Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States

reacted to alandao's post with 🔥 7 months ago

Don’t give up 🔥 Do you know what I was planning to do this time last week? I was preparing to write a report declaring that Jan Nano was a failed project because the benchmark results didn’t meet expectations. But I thought — it can’t be. When loading the model into the app, the performance clearly felt better. So why were the benchmark results worse? That’s when I reviewed the entire benchmark codebase and realized something fundamental: agentic or workflow-based approaches introduce a huge gap and variation when benchmarking. Jan-nano was trained with an agentic setup — it simply can’t be benchmarked using a rigid workflow-based method. I made the necessary changes, and the model ended up performing even better than before the issues arose. Turns out the previous benchmarking method conflicted with the way the model was trained. What if I had given up? That would’ve meant 1.5 months of training and a huge amount of company resources wasted. But now, this is officially the most successful and biggest release for the whole team — all thanks to Jan-nano. https://huggingface.co/Menlo/Jan-nano

View all activity

Organizations

New activity in FunAudioLLM/Fun-ASR-Nano-2512 24 days ago

Where is main.py?

#1 opened 24 days ago by

Baphomet666

upvoted a paper 3 months ago

Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States

Paper • 2510.11052 • Published Oct 13, 2025 • 51

reacted to alandao's post with 🔥 7 months ago

Post

1369

Don’t give up 🔥

Do you know what I was planning to do this time last week?

I was preparing to write a report declaring that Jan Nano was a failed project because the benchmark results didn’t meet expectations.

But I thought — it can’t be. When loading the model into the app, the performance clearly felt better. So why were the benchmark results worse?

That’s when I reviewed the entire benchmark codebase and realized something fundamental: agentic or workflow-based approaches introduce a huge gap and variation when benchmarking. Jan-nano was trained with an agentic setup — it simply can’t be benchmarked using a rigid workflow-based method.

I made the necessary changes, and the model ended up performing even better than before the issues arose. Turns out the previous benchmarking method conflicted with the way the model was trained.

What if I had given up? That would’ve meant 1.5 months of training and a huge amount of company resources wasted.

But now, this is officially the most successful and biggest release for the whole team — all thanks to Jan-nano.

Menlo/Jan-nano