Future Plans for Multi-Token Prediction Support?
#4 opened 17 days ago
by
NaiveYan
The result is problematic.
#3 opened 24 days ago
by
zhnagchenchne
running with flashmla on A100s
1
#1 opened 26 days ago
by
ehartford
