Shizhe Diao
shizhediao2
AI & ML interests
LLM pre-training and reasoning
Recent Activity
liked
a dataset
7 days ago
OptimalScale/ClimbMix
liked
a model
13 days ago
nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1
upvoted
a
paper
14 days ago
Chain-of-Experts: Unlocking the Communication Power of
Mixture-of-Experts Models