Update README.md (#3)

Browse files

- Update README.md (1dec455ecc08d88167a06865afdb964a876f6345)

Co-authored-by: Amril Nurman <[email protected]>

Files changed (1) hide show

README.md +30 -2

README.md CHANGED Viewed

@@ -10,9 +10,37 @@ language:
 pipeline_tag: text-generation
 ---
-# Qwen3-1.7B (from-scratch, 41B-token pretrain)
-A 1.7B-parameter decoder-only transformer (Qwen3 family) pre-trained **from scratch** on ~**40B tokens** of multi-domain text with **BF16 mixed precision** and a **4,096-token** context. Checkpoints are provided in standard Hugging Face format for easy inference and fine-tuning.
 ---

 pipeline_tag: text-generation
 ---
+# QVAC Genesis I Pretrained Model
+## Key Highlights
+- **Pretrained on the Largest Synthetic Educational Dataset**
+  This model has been **pretrained on Tether's QVAC Genesis I**, the largest synthetic dataset released for educational LLM pre-training.
+  The model was trained **from scratch** on approximately **41B tokens** of multi-domain educational text, using **BF16 mixed precision** and a **4,096-token context window**. Training was made with a **Qwen3-family 1.7B-parameter decoder-only transformer** architecture.
+  Checkpoints are provided in standard Hugging Face format for easy inference, continual pre-training, and fine-tuning.
+- **Multi-Domain Educational Coverage**
+  Because the model is trained on QVAC Genesis I, it inherits curriculum-aligned coverage across:
+  - Mathematics
+  - Physics
+  - Biology
+  - Medicine
+- **Superior Benchmark Performance**
+  Leveraging QVAC Genesis I as its training foundation, the model consistently outperforms baselines in:
+  - Reasoning tasks
+  - Knowledge assessments
+  - Subject-specific QA
+- **First Publicly Released Education-Specific Pretrained Model**
+  This is the first open-source pretrained model built directly on a rigorously validated synthetic dataset for education, offering deep and comprehensive STEM coverage.
+abilities
+## Intended Uses
+- Continual pre-training or fine-tuning for educational applications (STEM-focused tutoring, QA systems, curriculum support)
+- Benchmarking reasoning and subject-specific QA performance
+- Research into synthetic dataset–driven LLM training
 ---