@singhsidhukuldeep on Hugging Face: "🖥️ Do you have 1TB+ VRAM? 🎉 Well, good news for you! 👨‍🔬 Good folks at…"

Post

1667

🖥️ Do you have 1TB+ VRAM?

🎉 Well, good news for you!

👨‍🔬 Good folks at @nvidia have released Nemotron 4 340B, the new open-source LLM king, rivalling GPT-4! 🚀

📊 340B parameter models in 3 flavours: base, reward, and instruct models

🎯 It's a dense model, not MoE

👓 4k context window

📚 9T tokens training data, 2 phase training (8T pre-train + 1T continued pre-training)

🌍 Trained on 50+ languages and 40+ coding languages (70% training data is English, 15% multi-lingual, 15% code)

📅 June 2023 training data cut-off

💻 To deploy needs 8x H200/ 16x H100/ 16x A100 80GB for BF16 Inference (about 8x H100 in int4)

🏆 Of course, it beats Llama 3 70B on MMLU (81.1), Arena Hard (54.2), and GSM8K (92.4)

🤖 But beaten by Qwen 2 on HumanEval and MTBench which is a 72B parameter model

🔧 Used SFT, DPO, and RPO. RLHF via Nemo Aligner framework to align the model

📊 98% of alignment data was synthetically generated

📄 Nvidia open licence with commercial use allowed

¯\_(ツ)_/¯
😅 Glad to see more open models but this is one confusing fellow!
🤨340B parameter model that is narrowly beating 70B models? Starts failing against 72B models? Sounds like a model for synthetic data generation! But then it has 4k context?

🔗 Models: nvidia/nemotron-4-340b-666b7ebaf1b3867caf2f1911

📑 Paper: https://research.nvidia.com/publication/2024-06_nemotron-4-340b

Join the conversation