Shahrukh Khan's picture

Shahrukh Khan

shahrukhx01

·

https://github.com/shahrukhx01

AI & ML interests

NLP

Recent Activity

upvoted an article 2 days ago

You could have designed state of the art positional encoding

liked a model 5 days ago

kyutai/helium-1-2b

updated a model 13 days ago

shahrukhx01/gradient-whisperer

View all activity

Organizations

shahrukhx01's activity

upvoted an article 2 days ago

Article

You could have designed state of the art positional encoding

Nov 25, 2024

• 241

upvoted a collection 15 days ago

Deepseek Papers

Deepseek papers collection • 20 items • Updated 5 days ago • 194

upvoted a collection 19 days ago

Gemma 3 QAT

Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated 19 days ago • 185

upvoted 2 collections 27 days ago

Orpheus Multilingual Research Release

Beta Release of multilingual models. • 12 items • Updated 27 days ago • 77

Kimi-VL-A3B

Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 25 days ago • 66

upvoted 2 collections about 1 month ago

Llama 4

Llama 4 release • 13 items • Updated 8 days ago • 479

Nomic Embed Multimodal

Multimodal models allowing you to search over interleaved text, PDFs, charts, and images! • 15 items • Updated 30 days ago • 20

upvoted 6 collections about 2 months ago

Orpheus TTS

TTS Towards Human-Sounding Speech • 2 items • Updated Mar 18 • 64

Zonos-v0.1

3 items • Updated Feb 12 • 28

Ultravox v0.5

Ultravox is a multimodal Speech LLM built around different pretrained LLMs (frozen) and the whisper-large-v3-turbo (fine-tuned) backbone. • 3 items • Updated Feb 10 • 14

reranking series v2

V2 crispy rerank series • 2 items • Updated Mar 13 • 21

BD3-LMs

https://m-arriola.com/bd3lms/ • 4 items • Updated 25 days ago • 20

Gemma 3 Release

24 items • Updated 19 days ago • 356

upvoted 2 collections 2 months ago

Hallucination detection

Trained ModernBERT (base and large) for detection hallucinations in LLM responses. The models are trained as token classifications. • 4 items • Updated Mar 5 • 16

GemmaX2

GemmaX2 language models, including pretrained and instruction-tuned models of 2 sizes, including 2B, 9B. • 7 items • Updated Feb 7 • 21

upvoted a collection 3 months ago

Qwen2.5-1M

The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated 8 days ago • 118

upvoted 2 collections 4 months ago

DeepSeek-R1

8 items • Updated Jan 21 • 625

KaLM-embedding

11 items • Updated Mar 11 • 24

upvoted 2 collections 5 months ago

Phi-3

Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated 6 days ago • 566

Common Models

The first generation of models pretrained on Common Corpus. • 5 items • Updated Dec 5, 2024 • 35