LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms Paper • 2311.13133 • Published Nov 22, 2023
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining Paper • 2312.17482 • Published Dec 29, 2023 • 1
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates Paper • 2206.00832 • Published Jun 2, 2022
Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion Paper • 2406.11196 • Published Jun 17, 2024 • 8
Striped Attention: Faster Ring Attention for Causal Transformers Paper • 2311.09431 • Published Nov 15, 2023 • 4
Improving Language Models with Advantage-based Offline Policy Gradients Paper • 2305.14718 • Published May 24, 2023 • 2
LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms Paper • 2311.13133 • Published Nov 22, 2023
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining Paper • 2312.17482 • Published Dec 29, 2023 • 1