BAAI
/

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

OpenSeek-Small v1 Model Documentation

Overview

OpenSeek-Small-v1 is the initial production model of the OpenSeek project.

  • Utilizes DeepSeek-V3-like MoE architecture.
  • Comprises 1.4 billion total parameters, with 0.4 billion activated parameters.
  • Trained on 720 billion tokens.
  • Demonstrates superior efficiency compared to 1-billion-parameter models.

Training Data

  • 0.72TB tokens of high-quality pretraining data and the ratio for each domain is as follows:
    Name Ratio
    Nemotron-CC-high-actual-actual-high 1.26
    Nemotron-CC-high-actual-actual-low 0.67
    Nemotron-CC-high-actual-actual-mid 2.05
    Nemotron-CC-high-synthetic-distill-high 1.59
    Nemotron-CC-high-synthetic-distill-low 0.64
    Nemotron-CC-high-synthetic-distill-mid 2.32
    Nemotron-CC-high-synthetic-diverse_qa_pairs-high 4.67
    Nemotron-CC-high-synthetic-diverse_qa_pairs-low 2.16
    Nemotron-CC-high-synthetic-diverse_qa_pairs-mid 7.58
    Nemotron-CC-high-synthetic-extract_knowledge-high 6.43
    Nemotron-CC-high-synthetic-extract_knowledge-low 0.07
    Nemotron-CC-high-synthetic-extract_knowledge-mid 2.22
    Nemotron-CC-high-synthetic-knowledge_list-high 1.88
    Nemotron-CC-high-synthetic-knowledge_list-low 0.74
    Nemotron-CC-high-synthetic-knowledge_list-mid 3.20
    Nemotron-CC-high-synthetic-wrap_medium-high 3.89
    Nemotron-CC-high-synthetic-wrap_medium-low 0.65
    Nemotron-CC-high-synthetic-wrap_medium-mid 6.18
    Nemotron-CC-low-synthetic-wrap_medium-high 0.17
    Nemotron-CC-low-synthetic-wrap_medium-low 0.30
    Nemotron-CC-low-synthetic-wrap_medium-mid 1.08
    Nemotron-CC-medium-actual-actual-high 2.20
    Nemotron-CC-medium-actual-actual-low 4.48
    Nemotron-CC-medium-actual-actual-mid 7.76
    arxiv 0.32
    books 1.98
    code 3.43
    cot_synthesis_CC 9.82
    cot_synthesis_OpenSource 0.46
    cot_synthesis_arxiv 4.15
    cot_synthesis_code 1.32
    cot_synthesis_math 2.19
    cot_synthesis_wiki 0.83
    math 0.83
    pes2o 0.31
    stack 0.19
    wiki 0.29
    zh_cc 9.65

Wandb

Our training curves have been recorded in Weights & Biases wandb.

Evaluation

Category Metrics (shots) Llama-3.2-1B Qwen2.5-1.5B Qwen2.5-0.5B OLMo-1B-0724 OpenSeek-Small-v1
English-Commonsense Reasoning HellaSwag (5-shot) 0.4830 0.5007 0.4007 0.4909 0.3893
TruthfulQA (0-shot) 0.3773 0.4663 0.3986 0.4029 0.3990
Winogrande (5-shot) 0.6212 0.6448 0.5683 0.6290 0.5541
CommonsenseQA (5-shot) 0.3120 0.7445 0.5487 0.1949 0.2048
PIQA (5-shot) 0.7514 0.7612 0.7111 0.7459 0.7203
OpenBookQA (5-shot) 0.2960 0.3340 0.2720 0.3080 0.2560
BoolQ (5-shot) 0.6590 0.7774 0.6572 0.6508 0.6165
English-Problem-Solving ARC Easy (5-shot) 0.6940 0.8043 0.6780 0.6111 0.6237
ARC Challenge (5-shot) 0.3532 0.4846 0.3370 0.3063 0.3157
MMLU (5-shot) 0.3124 0.6165 0.4818 0.2869 0.2654
English-Mathematics GSM8K (5-shot) 0.0637 0.6194 0.3495 0.0159 0.0182
Minerva Math (4-shot) 0.0180 0.2876 0.1160 0.0182 0.0010
Chinese CEval (5-shot) 0.2779 0.6954 0.5423 0.2340 0.2422
CMMLU (5-shot) 0.2687 0.6882 0.5300 0.2570 0.2468
Average Metrics Average-English(w/o Math) 0.4859 0.6134 0.5053 0.4627 0.4345
Average-English 0.4118 0.5868 0.4599 0.3884 0.3637
Average-Chinese 0.2733 0.6918 0.5362 0.2455 0.2445
Average 0.3920 0.6018 0.4708 0.3680 0.3466
Average(w/o Math) 0.4505 0.6265 0.5105 0.4265 0.4028

OpenSeek-Small-v1 demonstrates superior efficiency compared to 1-billion-parameter models.

  • logC_vs_Metric_Average

Usage Instructions

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("BAAI/OpenSeek-Small-v1")
tokenizer = AutoTokenizer.from_pretrained("BAAI/OpenSeek-Small-v1")

inputs = tokenizer("The future of AI is", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including BAAI/OpenSeek-Small-v1