Shubham

shubhamg2208

https://shubhamg.in

AI & ML interests

NLP Graphs

Recent Activity

liked a model about 1 month ago

facebook/multi-token-prediction

upvoted a collection about 1 month ago

TinySwallow

liked a dataset about 1 month ago

ScalingIntelligence/KernelBench

View all activity

Organizations

shubhamg2208's activity

liked a model about 1 month ago

facebook/multi-token-prediction

Updated Jun 18, 2024 • 365

upvoted a collection about 1 month ago

TinySwallow

Collection

Compact Japanese models trained with "TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models" • 5 items • Updated Jan 30 • 16

liked a dataset about 1 month ago

ScalingIntelligence/KernelBench

Viewer • Updated Jan 10 • 270 • 4.57k • 20

liked a model about 2 months ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 17 days ago • 2.99M • • 11.2k

liked a model 3 months ago

CohereForAI/c4ai-command-r7b-12-2024

Text Generation • Updated 21 days ago • 7.24k • 372

liked a Space 8 months ago

357

Open Medical-LLM Leaderboard

🥇

Browse and submit LLM evaluations

liked a model 9 months ago

jasperai/flash-sd3

Text-to-Image • Updated Jun 27, 2024 • 317 • 110

liked a Space 12 months ago

1.02k

OOTDiffusion

🥼

High-quality virtual try-on ~ Your cyber fitting room

reacted to santiviquez's post with ❤️ about 1 year ago

Post

Eigenvalues to the rescue? 🛟🤔

I found out about this paper thanks to @gsarti 's post from last week; I got curious, so I want to post my take on it. 🤗

The paper proposes a new metric called EigenScore to detect LLM hallucinations. 📄

Their idea is that given an input question, they generate K different answers, take their internal embedding states, calculate a covariance matrix with them, and use it to calculate an EigenScore.

We can think of the EigenScore as the mean of the eigenvalues of the covariance matrix of the embedding space of the K-generated answers.

❓But why eigenvalues?

Well, if the K generations have similar semantics, the sentence embeddings will be highly correlated, and most eigenvalues will be close to 0.

On the other hand, if the LLM hallucinates, the K generations will have diverse semantics, and the eigenvalues will be significantly different from 0.

The idea is pretty neat and shows better results when compared to other methods like sequence probabilities, length-normalized entropy, and other uncertainty quantification-based methods.

💭 What I'm personally missing from the paper is that they don't compare their results with other methods like LLM-Eval and SelfcheckGPT. They do mention that EigenScore is much cheaper to implement than SelfcheckGPT, but that's all on the topic.

Paper: INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection (2402.03744)