--- license: apache-2.0 language: - en - ko - ja - zh --- # Tri-0.5B-Base Tri-0.5B-Base is a \~500M parameter multilingual language model trained as an **early experimental run** before the Tri-7B training. The model covers **English, Korean, Japanese, and Chinese**, with additional exposure to programming languages and mathematical reasoning. Pretrained on \~1.26 trillion tokens, it serves as a lightweight base model for research, fine-tuning, and open-source community use - especially for advancing Korean LLM development. ## Model Summary * Architecture: decoder-only Transformer (LLaMA-style) * Parameters: \~472M (untied embeddings and LM head) * Layers / hidden size / attention heads: 24 / 896 / 14 * Feedforward hidden size: 2,560 (SiLU-gated MLP) * Context length: 4,096 * RoPE θ: 100,000 * Training precision: bfloat16 * Status: base pretraining only (no instruction tuning, no RLHF) ## Intended Use * As a **foundation** for downstream fine-tuning and alignment. * Research on multilingual pretraining and adaptation. ## Limitations * Being a base model, outputs may be unsafe, incoherent, or factually incorrect. ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM name = "trillionlabs/Tri-0.5B-Base" tok = AutoTokenizer.from_pretrained(name) model = AutoModelForCausalLM.from_pretrained( name, torch_dtype="bfloat16", device_map="auto" ) prompt = "Write a short paragraph about Hangul." x = tok(prompt, return_tensors="pt").to(model.device) y = model.generate( **x, max_new_tokens=128, do_sample=True, temperature=0.8, top_p=0.95 ) print(tok.decode(y[0], skip_special_tokens=True)) ``` ## License This model is released under the **Apache 2.0 License**. See [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for details. --- ## Citation If you use this model, please cite it as: ``` @misc{trillionlabs_tri05b_base_2025, title = {Tri-0.5B-Base}, author = {Trillion Labs}, year = {2025}, note = {https://huggingface.co/trillionlabs/Tri-0.5B-Base} } ```