Sarath Shekkizhar's picture

7 4 22

Sarath Shekkizhar

sarath-shekkizhar

·

https://shekkizh.github.io/

AI & ML interests

None yet

Recent Activity

posted an update about 1 hour ago

Some interesting architectural choices made in Llama 4 models -- were these key to the 10M context? Possibly 🤔 🔍 Takeaways: 🧩 Interleaved Attention without position encoding - LLaMA 4 removes explicit positional encoding in some attention layers to boost performance on longer contexts. - The principles here could be similar to the residual connections to facilitate attention to early tokens without positional decay. ⚖️ Scaled Softmax to increase attention at inference time - The max attention value (output of softmax) decreases as context size increases. - Llama 4 incorporates a context-size dependent temperature in the softmax function to modify the slope of softmax, allowing the model to focus better on relevant tokens. - Done only at inference time -- guessing it was more a choice after some observation on eval datasets. What did you think of these choices?

updated a Space 9 months ago

tenyx/TenyxChat-7B-v1

updated a Space 9 months ago

tenyx/TenyxChat-8x7B-v1

View all activity

Organizations

sarath-shekkizhar's activity

New activity in open-llm-leaderboard/open_llm_leaderboard 10 months ago

Failed evaluation for 70B model

#804 opened 10 months ago by

sarath-shekkizhar

New activity in openbmb/RLAIF-V-Dataset 10 months ago

Dataset loading failing with HF load_dataset

#3 opened 10 months ago by

sarath-shekkizhar

New activity in tenyx/Llama3-TenyxChat-70B 11 months ago

great evals

#2 opened 11 months ago by

Script to reproduce MT-Bench

#1 opened 11 months ago by

New activity in open-llm-leaderboard/open_llm_leaderboard 11 months ago

Resubmitting failed 70B model (tenyx/Llama3-TenyxChat-70B)

#728 opened 11 months ago by

sarath-shekkizhar

Evaluation for 70B model FAILED (tenyx/Llama3-TenyxChat-70B)

#719 opened 12 months ago by

sarath-shekkizhar

New activity in tenyx/TenyxChat-7B-v1 about 1 year ago

Update README.md

#1 opened about 1 year ago by