swiss-ai/Apertus-70B-Instruct-2509 Text Generation • 71B • Updated about 23 hours ago • 10.9k • 108
swiss-ai/Apertus-8B-Instruct-2509 Text Generation • 8B • Updated about 23 hours ago • 25.9k • 230
swiss-ai/Apertus-8B-Instruct-2509 Text Generation • 8B • Updated about 23 hours ago • 25.9k • 230
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26 • 70
view article Article Welcome FalconMamba: The first strong attention-free 7B model By JingweiZuo and 5 others • Aug 12, 2024 • 113