Nemo models [pretrain]

Bochkov 's Collections

updated 3 days ago

Proof-of-concept: SOTA tokenizers can be used for compatible precomputed embeddings, industry can repeat with their tokenizers

Bochkov/nemo_bvv_moe

Updated 3 days ago • 11

Note nemo_bvv_moe is a multi-lingual Mixture-of-Experts (MoE) model constructed by combining nemo_bvv_ru and nemo_bvv_zh—thanks to their fully shared and frozen token embeddings, making direct model fusion feasible without re-training embeddings. Proof: SOTA tokenizers can be used for compatible precomputed embeddings, industry can repeat with their tokenizers.
Bochkov/nemo_bvv_ru

Updated 3 days ago • 12

Note This is nemo_bvv_ru, a proof-of-concept Russian language causal language model trained with completely frozen, precomputed token embeddings based on SOTA Mistral/Nemo tokenizer (visual string composition, not standard semantic embeddings).
Bochkov/nemo_bvv_zh

Updated 3 days ago • 10

Note This is nemo_bvv_zh, a Chinese language GPT-style model trained with fully precomputed and frozen token embeddings using the Mistral/Nemo tokenizer (visual appearance-based). Specifically designed to demonstrate the compatibility of SOTA tokenizers in the fixed-embedding paradigm.
Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations

Paper • 2507.04886 • Published 7 days ago • 2
Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

Paper • 2507.07129 • Published 5 days ago • 2