seawolf's picture

seawolf PRO

seawolf2357

AI & ML interests

None yet

Recent Activity

reacted to their post with šŸ‘ about 3 hours ago
šŸš€ Just Found an Interesting New Leaderboard for Medical AI Evaluation! I recently stumbled upon a medical domain-specific FACTS Grounding leaderboard on Hugging Face, and the approach to evaluating AI accuracy in medical contexts is quite impressive, so I thought I'd share. šŸ“Š What is FACTS Grounding? It's originally a benchmark developed by Google DeepMind that measures how well LLMs generate answers based solely on provided documents. What's cool about this medical-focused version is that it's designed to test even small open-source models. šŸ„ Medical Domain Version Features 236 medical examples: Extracted from the original 860 examples Tests small models like Qwen 3 1.7B: Great for resource-constrained environments Uses Gemini 1.5 Flash for evaluation: Simplified to a single judge model šŸ“ˆ The Evaluation Method is Pretty Neat Grounding Score: Are all claims in the response supported by the provided document? Quality Score: Does it properly answer the user's question? Combined Score: Did it pass both checks? Since medical information requires extreme accuracy, this thorough verification approach makes a lot of sense. šŸ”— Check It Out Yourself The actual leaderboard: https://huggingface.co/spaces/MaziyarPanahi/FACTS-Leaderboard šŸ’­ My thoughts: As medical AI continues to evolve, evaluation tools like this are becoming increasingly important. The fact that it can test smaller models is particularly helpful for the open-source community!
reacted to their post with šŸ¤— about 3 hours ago
šŸš€ Just Found an Interesting New Leaderboard for Medical AI Evaluation! I recently stumbled upon a medical domain-specific FACTS Grounding leaderboard on Hugging Face, and the approach to evaluating AI accuracy in medical contexts is quite impressive, so I thought I'd share. šŸ“Š What is FACTS Grounding? It's originally a benchmark developed by Google DeepMind that measures how well LLMs generate answers based solely on provided documents. What's cool about this medical-focused version is that it's designed to test even small open-source models. šŸ„ Medical Domain Version Features 236 medical examples: Extracted from the original 860 examples Tests small models like Qwen 3 1.7B: Great for resource-constrained environments Uses Gemini 1.5 Flash for evaluation: Simplified to a single judge model šŸ“ˆ The Evaluation Method is Pretty Neat Grounding Score: Are all claims in the response supported by the provided document? Quality Score: Does it properly answer the user's question? Combined Score: Did it pass both checks? Since medical information requires extreme accuracy, this thorough verification approach makes a lot of sense. šŸ”— Check It Out Yourself The actual leaderboard: https://huggingface.co/spaces/MaziyarPanahi/FACTS-Leaderboard šŸ’­ My thoughts: As medical AI continues to evolve, evaluation tools like this are becoming increasingly important. The fact that it can test smaller models is particularly helpful for the open-source community!
reacted to their post with šŸ”„ about 3 hours ago
šŸš€ Just Found an Interesting New Leaderboard for Medical AI Evaluation! I recently stumbled upon a medical domain-specific FACTS Grounding leaderboard on Hugging Face, and the approach to evaluating AI accuracy in medical contexts is quite impressive, so I thought I'd share. šŸ“Š What is FACTS Grounding? It's originally a benchmark developed by Google DeepMind that measures how well LLMs generate answers based solely on provided documents. What's cool about this medical-focused version is that it's designed to test even small open-source models. šŸ„ Medical Domain Version Features 236 medical examples: Extracted from the original 860 examples Tests small models like Qwen 3 1.7B: Great for resource-constrained environments Uses Gemini 1.5 Flash for evaluation: Simplified to a single judge model šŸ“ˆ The Evaluation Method is Pretty Neat Grounding Score: Are all claims in the response supported by the provided document? Quality Score: Does it properly answer the user's question? Combined Score: Did it pass both checks? Since medical information requires extreme accuracy, this thorough verification approach makes a lot of sense. šŸ”— Check It Out Yourself The actual leaderboard: https://huggingface.co/spaces/MaziyarPanahi/FACTS-Leaderboard šŸ’­ My thoughts: As medical AI continues to evolve, evaluation tools like this are becoming increasingly important. The fact that it can test smaller models is particularly helpful for the open-source community!
View all activity

Organizations

KAISAR's profile picture VIDraft's profile picture korea forestry's profile picture PowergenAI's profile picture

Posts 5

view post
Post
42
šŸš€ Just Found an Interesting New Leaderboard for Medical AI Evaluation!

I recently stumbled upon a medical domain-specific FACTS Grounding leaderboard on Hugging Face, and the approach to evaluating AI accuracy in medical contexts is quite impressive, so I thought I'd share.

šŸ“Š What is FACTS Grounding?
It's originally a benchmark developed by Google DeepMind that measures how well LLMs generate answers based solely on provided documents. What's cool about this medical-focused version is that it's designed to test even small open-source models.

šŸ„ Medical Domain Version Features

236 medical examples: Extracted from the original 860 examples
Tests small models like Qwen 3 1.7B: Great for resource-constrained environments
Uses Gemini 1.5 Flash for evaluation: Simplified to a single judge model

šŸ“ˆ The Evaluation Method is Pretty Neat

Grounding Score: Are all claims in the response supported by the provided document?
Quality Score: Does it properly answer the user's question?
Combined Score: Did it pass both checks?

Since medical information requires extreme accuracy, this thorough verification approach makes a lot of sense.
šŸ”— Check It Out Yourself

The actual leaderboard: MaziyarPanahi/FACTS-Leaderboard

šŸ’­ My thoughts: As medical AI continues to evolve, evaluation tools like this are becoming increasingly important. The fact that it can test smaller models is particularly helpful for the open-source community!
view post
Post
6056
Samsung Hacking Incident: Samsung Electronics' Official Hugging Face Account Compromised
Samsung Electronics' official Hugging Face account has been hacked. Approximately 17 hours ago, two new language models (LLMs) were registered under Samsung Electronics' official Hugging Face account. These models are:

https://huggingface.co/Samsung/MuTokenZero2-32B
https://huggingface.co/Samsung/MythoMax-L2-13B

The model descriptions contain absurd and false claims, such as being trained on "1 million W200 GPUs," hardware that doesn't even exist.
Moreover, community participants on Hugging Face who have noticed this issue are continuously posting that Samsung Electronics' account has been compromised.
There is concern about potential secondary and tertiary damage if users download these LLMs released under the Samsung Electronics account, trusting Samsung's reputation without knowing about the hack.
Samsung Electronics appears to be unaware of this situation, as they have not taken any visible measures yet, such as changing the account password.
Source: https://discord.gg/openfreeai