-
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time
Paper • 2408.13233 • Published • 25 -
Heterogeneous Multi-task Learning with Expert Diversity
Paper • 2106.10595 • Published • 1 -
Residual Mixture of Experts
Paper • 2204.09636 • Published • 1 -
Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Paper • 2307.05956 • Published • 1
Hazem Essam
hazemessam
AI & ML interests
Protein Language Modeling, Natural Language Processing, Generative Adverserial Networks.
Recent Activity
liked
a dataset
8 days ago
hakurei/open-instruct-v1
updated
a dataset
8 days ago
hazemessam/SuperCOT-dataset-splitted
published
a dataset
8 days ago
hazemessam/SuperCOT-dataset-splitted