Post
918
๐๐๐ฎ๐ญ ๐ถ๐๐ฒ๐ฟ๐ฎ๐๐ฒ๐ ๐๐ถ๐๐ต ๐ป๐ฒ๐ ๐๐ฎ๐บ๐ฏ๐ฎ ๐ญ.๐ฑ ๐ฟ๐ฒ๐น๐ฒ๐ฎ๐๐ฒ: ๐ก๐ฒ๐ ๐๐๐ฎ๐ป๐ฑ๐ฎ๐ฟ๐ฑ ๐ณ๐ผ๐ฟ ๐น๐ผ๐ป๐ด-๐ฐ๐ผ๐ป๐๐ฒ๐
๐ ๐๐๐ฒ-๐ฐ๐ฎ๐๐ฒ๐!๐
@ai21labs used a different architecture to beat the status-quo Transformers models: Jamba architecture combines classic Transformers layers with the new Mamba layers, for which the complexity is a linear (instead of quadratic) function of the context length.
What does this imply?
โก๏ธ Jamba models are much more efficient for long contexts: faster (up to 2.5x faster for long context), takes less memory, and also performs better to recall everything in the prompt.
That means itโs a new go-to model for RAG or agentic applications!
And the performance is not too shabby: Jamba 1.5 models are comparable in perf to similar-sized Llama-3.1 models! The largest model even outperforms Llama-3.1 405B on Arena-Hard.
โ๏ธ Comes in 2 sizes: Mini (12B active/52B) and Large (94B active/399B)
๐ Both deliver 256k context length, for low memory: Jamba-1.5 mini fits 140k context length on one single A100.
โ๏ธ New quanttization method: Experts Int8 quantizes only the weights parts of the MoE layers, which account for 85% of weights
๐ค Natively supports JSON format generation & function calling.
๐ Permissive license *if your org makes <$50M revenue*
Available on the Hub ๐ ai21labs/jamba-15-66c44befa474a917fcf55251
Read their release blog post ๐ https://www.ai21.com/blog/announcing-jamba-model-family
@ai21labs used a different architecture to beat the status-quo Transformers models: Jamba architecture combines classic Transformers layers with the new Mamba layers, for which the complexity is a linear (instead of quadratic) function of the context length.
What does this imply?
โก๏ธ Jamba models are much more efficient for long contexts: faster (up to 2.5x faster for long context), takes less memory, and also performs better to recall everything in the prompt.
That means itโs a new go-to model for RAG or agentic applications!
And the performance is not too shabby: Jamba 1.5 models are comparable in perf to similar-sized Llama-3.1 models! The largest model even outperforms Llama-3.1 405B on Arena-Hard.
โ๏ธ Comes in 2 sizes: Mini (12B active/52B) and Large (94B active/399B)
๐ Both deliver 256k context length, for low memory: Jamba-1.5 mini fits 140k context length on one single A100.
โ๏ธ New quanttization method: Experts Int8 quantizes only the weights parts of the MoE layers, which account for 85% of weights
๐ค Natively supports JSON format generation & function calling.
๐ Permissive license *if your org makes <$50M revenue*
Available on the Hub ๐ ai21labs/jamba-15-66c44befa474a917fcf55251
Read their release blog post ๐ https://www.ai21.com/blog/announcing-jamba-model-family