view article Article How 🤗 Accelerate runs very large models thanks to PyTorch Sep 27, 2022 • 11
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published Nov 21, 2024 • 61
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning Paper • 2408.14158 • Published Aug 26, 2024 • 3
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published Sep 10, 2024 • 57
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20, 2024 • 42
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8, 2024 • 163
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18, 2024 • 56
Scaling Diffusion Transformers to 16 Billion Parameters Paper • 2407.11633 • Published Jul 16, 2024 • 26