Always surprised that so few people actually read the FineTasks blog, on ✨how to select training evals with the highest signal✨
If you're serious about training models without wasting compute on shitty runs, you absolutely should read it!!
An high signal eval actually tells you precisely, during training, how wel & what your model is learning, allowing you to discard the bad runs/bad samplings/...!
The blog covers in depth prompt choice, metrics, dataset, across languages/capabilities, and my fave section is "which properties should evals have"👌 (to know on your use case how to select the best evals for you)
Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.
Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**. (Which everybody does, but people usually don't say)
For a tech report, it makes a lot of sense to report model performance when used optimally! On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)
Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!
Because if your model knows its evals by heart, you're not testing for generalization.
Last Week in Medical AI: Top Research Papers/Models 🔥 🏅 (December 7 – December 14, 2024)
Medical LLM & Other Models - PediaBench: Chinese Pediatric LLM - Comprehensive pediatric dataset - Advanced benchmarking platform - Chinese healthcare innovation - BiMediX: Bilingual Medical LLM - Multilingual medical expertise - Diverse medical knowledge integration - Cross-cultural healthcare insights - MMedPO: Vision-Language Medical LLM - Clinical multimodal optimization - Advanced medical image understanding - Precision healthcare modeling
Frameworks and Methodologies - TOP-Training: Medical Q&A Framework - Hybrid RAG: Secure Medical Data Management - Zero-Shot ATC Clinical Coding - Chest X-Ray Diagnosis Architecture - Medical Imaging AI Democratization
Benchmarks & Evaluations - KorMedMCQA: Korean Healthcare Licensing Benchmark - Large Language Model Medical Tasks - Clinical T5 Model Performance Study - Radiology Report Quality Assessment - Genomic Analysis Benchmarking
Medical LLM Applications - BRAD: Digital Biology Language Model - TCM-FTP: Herbal Prescription Prediction - LLaSA: Activity Analysis via Sensors - Emergency Department Visit Predictions - Neurodegenerative Disease AI Diagnosis - Kidney Disease Explainable AI Model
Ethical AI & Privacy - Privacy-Preserving LLM Mechanisms - AI-Driven Digital Organism Modeling - Biomedical Research Automation - Multimodality in Medical Practice
Last Week in Medical AI: Top Research Papers/Models 🔥 🏅 (December 2 – December 7, 2024)
Medical LLM & Models - Block MedCare: Blockchain AI & IoT - LLMs4Life: Biomedical Ontology Learning - LLaMA II for Multimodal Diagnosis - Compact LLM for EHR Privacy
Frameworks & Methods - RARE: Retrieval-Augmented Reasoning - STORM: Strategies for Rare Events - TransFair: Fair Disease Classification - PePR: Performance Per Resource - Medical LLM Best Practices
LLM Applications - Medchain: LLMs in Clinical Practice - Query Nursing Note Summarization - CLINICSUM: Patient Conversation Summaries - Text Embeddings for Classifiers
LLM Benchmarks - Polish Medical Exams Transfer - Single-Cell Omics Annotation - LLMs in Precision Medicine - Low-Resource Healthcare Challenges
Other Models - LLM Chatbot Hallucinations - Multi-stage Chest X-ray Diagnosis - EchoONE: Echocardiography AI - Radiology Report Grounding
Ethics & Fairness - Privacy in Medical Imaging - Demographic Fairness in AI
Last Week in Medical AI: Top Research Papers/Models 🔥 (November 2 -November 9, 2024)
🏅 Medical AI Paper of the Week: Exploring Large Language Models for Specialist-level Oncology Care
Medical LLM & Other Models: - GSCo: Generalist-Specialist AI Collaboration - PediatricsGPT: Chinese Pediatric Assistant - MEG: Knowledge-Enhanced Medical QA - AutoProteinEngine: Multimodal Protein LLM
Frameworks and Methodologies: - BrainSegFounder: 3D Neuroimage Analysis - PASSION: Sub-Saharan Dermatology Dataset - SAM for Lung X-ray Segmentation - Label Critic: Data-First Approach - Medprompt Runtime Strategies
Medical LLM Applications: - CataractBot: Patient Support System - CheX-GPT: X-ray Report Enhancement - CardioAI: Cancer Cardiotoxicity Monitor - HealthQ: Healthcare Conversation Chain - PRObot: Diabetic Retinopathy Assistant
Medical LLMs & Benchmarks: - MediQ: Clinical Reasoning Benchmark - Touchstone: Segmentation Evaluation - Medical LLM Adaptation Progress - Fine-Tuning Medical QA Strategies
AI in Healthcare Ethics: - Healthcare Robotics with LLMs - XAI in Clinical Practice - Precision Rehabilitation Framework - Multimodal AI Challenges
Now you can watch and listen to the latest Medical AI papers daily on our YouTube and Spotify channels as well!