Nemo models [pretrain]
Proof-of-concept: SOTA tokenizers can be used for compatible precomputed embeddings, industry can repeat with their tokenizers
Updated • 11Note nemo_bvv_moe is a multi-lingual Mixture-of-Experts (MoE) model constructed by combining nemo_bvv_ru and nemo_bvv_zh—thanks to their fully shared and frozen token embeddings, making direct model fusion feasible without re-training embeddings. Proof: SOTA tokenizers can be used for compatible precomputed embeddings, industry can repeat with their tokenizers.
Bochkov/nemo_bvv_ru
Updated • 12Note This is nemo_bvv_ru, a proof-of-concept Russian language causal language model trained with completely frozen, precomputed token embeddings based on SOTA Mistral/Nemo tokenizer (visual string composition, not standard semantic embeddings).
Bochkov/nemo_bvv_zh
Updated • 10Note This is nemo_bvv_zh, a Chinese language GPT-style model trained with fully precomputed and frozen token embeddings using the Mistral/Nemo tokenizer (visual appearance-based). Specifically designed to demonstrate the compatibility of SOTA tokenizers in the fixed-embedding paradigm.
Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations
Paper • 2507.04886 • Published • 2Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate
Paper • 2507.07129 • Published • 2