InternLM

company
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

Update README.md

#5 opened about 1 month ago by
yuhangzang
clefourrier 
posted an update about 1 month ago
view post
Post
765
Always surprised that so few people actually read the FineTasks blog, on
✨how to select training evals with the highest signal✨

If you're serious about training models without wasting compute on shitty runs, you absolutely should read it!!

An high signal eval actually tells you precisely, during training, how wel & what your model is learning, allowing you to discard the bad runs/bad samplings/...!

The blog covers in depth prompt choice, metrics, dataset, across languages/capabilities, and my fave section is "which properties should evals have"👌
(to know on your use case how to select the best evals for you)

Blog: HuggingFaceFW/blogpost-fine-tasks
  • 2 replies
·
vansin 
posted an update 4 months ago
view post
Post
3516
🔥MedAgentBench Amazing Work🚀

Just explored #MedAgentBench from @Yale researchers and it's mind-blowing! They've created a cutting-edge benchmark that finally exposes the true capabilities of LLMs in complex medical reasoning.

⚡ Key discoveries:

DeepSeek R1 & OpenAI O3 dominate clinical reasoning tasks
Agent-based frameworks deliver exceptional performance-cost balance
Open-source alternatives are closing the gap at fraction of the cost

This work shatters previous benchmarks that failed to challenge today's advanced models.
The future of medical AI is here: https://github.com/gersteinlab/medagents-benchmark
#MedicalAI #MachineLearning #AIinHealthcare 🔥