mehran's picture
3

mehran PRO

diginoron
·

AI & ML interests

None yet

Recent Activity

reacted to seawolf2357's post with 👍 1 day ago
🚀 Just Found an Interesting New Leaderboard for Medical AI Evaluation! I recently stumbled upon a medical domain-specific FACTS Grounding leaderboard on Hugging Face, and the approach to evaluating AI accuracy in medical contexts is quite impressive, so I thought I'd share. 📊 What is FACTS Grounding? It's originally a benchmark developed by Google DeepMind that measures how well LLMs generate answers based solely on provided documents. What's cool about this medical-focused version is that it's designed to test even small open-source models. 🏥 Medical Domain Version Features 236 medical examples: Extracted from the original 860 examples Tests small models like Qwen 3 1.7B: Great for resource-constrained environments Uses Gemini 1.5 Flash for evaluation: Simplified to a single judge model 📈 The Evaluation Method is Pretty Neat Grounding Score: Are all claims in the response supported by the provided document? Quality Score: Does it properly answer the user's question? Combined Score: Did it pass both checks? Since medical information requires extreme accuracy, this thorough verification approach makes a lot of sense. 🔗 Check It Out Yourself The actual leaderboard: https://huggingface.co/spaces/MaziyarPanahi/FACTS-Leaderboard 💭 My thoughts: As medical AI continues to evolve, evaluation tools like this are becoming increasingly important. The fact that it can test smaller models is particularly helpful for the open-source community!
updated a Space 10 days ago
diginoron/Tradingdata
View all activity

Organizations

Hugging Face Discord Community's profile picture diginoron's profile picture