Param 1-2.9B-Instruct
BharatGen introduces the early checkpoint of SFT (Supervised Fine-Tuned) for Param 1, a bilingual language model trained from scratch in English and Hindi. With 2.9 billion parameters, this checkpoint builds upon the pretraining phase and serves as a foundation for more downstream tasks, safety testing, and customization.
π Folder Structure
Param1/
βββ model.nemo # .nemo packaged model file
βββ nemo_inference.sh # Shell script for running inference
βββ README.md # model documentation file
Pre-Training Details
Dataset: 7.5 Trillion tokens
Data Quality: Highly curated with standard filtering and multiple processing steps.
Scheduler: Cosine Annealing
Learning_rate: 3e-4 to 3e-6
Training Setup: Running on 512 H100 GPUs
Training Duration: 9 days
Precision: bf16-mixed
For Pre-Trained Checkpoint (Param 1): https://aikosh.indiaai.gov.in/home/models/details/bharatgen_param_1_indic_scale_bilingual_foundation_model.html
SFT Training Details
- Dataset: 0.8 Million samples
- Epochs: 3
- Scheduler: Cosine Annealing
- Learning Rate: 5e-6 to 5e-8
- Training Hardware: 32 H200 GPUs
- Framework: NVIDIA NeMo
- Precision: bf16-mixed
Filtered high-quality bilingual data was used, combining public and in-house sources for safety-aware and culturally relevant behavior.
π³ Docker Setup
1. Pull the Docker Image
docker pull bharatgen/inference_image:latest
2. Create a Docker Container
Replace /path_to_your_project
and path_to_your_workspace
accordingly:
docker run --name name_of_your_container --gpus all -it -d -v /path_to_your_project:path_to_your_workspace bharatgen/inference_image:latest
π Model Inference
Steps to run inference using the .nemo
SFT model file:
- Locate the
model.nemo
file and copy the path. - Open
nemo_inference.sh
and update the following line: - Don't remove the
<user> <assistant>
tags just change the prompt in-between those tags
gpt_model_file="/path_to_sft_model.nemo"
π Benchmarks (zero-shot)
Task | Param 1 (PT) | Gemma2-2B (PT) | llama3.2-3B (distill PT) | granite-3.1-2B (PT) | granite-3.1-3B (PT) | qwen-2.5-3B (PT) |
---|---|---|---|---|---|---|
ARC Challenge | 46.7 | 49.7 | 46.0 | 47.2 | 45.2 | 47.4 |
ARC Easy | 74.6 | 80.3 | 71.7 | 76.8 | 75.8 | 73.2 |
HellaSwag | 71.4 | 73.0 | 73.7 | 75.5 | 72.6 | 73.6 |
HellaSwag Hi | 44.1 | 38.6 | 40.0 | 31.0 | 28.5 | 32.9 |
MMLU En | 41.4 | 47.1 | 53.9 | 47.8 | 41.0 | 64.9 |
MMLU Hi | 30.7 | 30.0 | 35.0 | 29.0 | 25.7 | 38.32 |
PIQA | 79.3 | 78.3 | 77.31 | 79.4 | 78.2 | 78.84 |
TriviaQA | 38.5 | 32.9 | 50.83 | 26.2 | 27.5 | 42.27 |
TruthfulQA - Gen (BLEU) | 38.2 | 29.7 | 21.8 | 34.0 | 36.7 | 36.96 |
TruthfulQA - MC1 Acc | 28.0 | 24.0 | 25.3 | 26.1 | 26.4 | 32.07 |
TruthfulQA - MC2 Acc | 43.8 | 36.2 | 39.2 | 39.0 | 39.9 | 48.95 |
SuperGLUE - boolq | 70.6 | 73.7 | 72.7 | 71.0 | 68.5 | 77.27 |
SuperGLUE - rte | 62.5 | 61.7 | 54.5 | 69.3 | 54.9 | 75.09 |
SuperGLUE - WiC | 49.5 | 49.5 | 50.0 | 50.3 | 52.3 | 61.75 |
SuperGLUE - multirc | 56.9 | 55.9 | 57.2 | 57.2 | 57.2 | 39.52 |
Notes:
- Benchmarks reflect zero-shot performance post-SFT.
- PT = Pretrained
π§ Model Architecture
- Hidden size: 2048
- Intermediate size: 7168
- Attention heads: 16
- Hidden layers: 32
- Key-value heads: 8
- Max position embeddings: 2048
- Activation: SiLU
- Positional Embeddings: Rotary (RoPE, theta=10000)
- Attention Mechanism: Grouped-query attention
- Precision: bf16-mixed
Important Guidelines for Early Checkpoint Release of Param-1-2.9B-Instruct
- Early Development Status
- This model is in the initial phase of Param-1 Instruct Model.
- It is yet to undergo full-supervised fine-tuning, safety alignment, or rigorous evaluation.
- The release is intended to showcase progress, gather feedback, and encourage research and experimentation.
- Outputs may at times be incoherent, irrelevant, or of suboptimal quality.
- Data Sources and Potential Artifacts
- To preserve the Model's understanding on the global front, part of the training Data also includes data crawled from the Internet hence it may contain inherited artifacts;
- Due to the increased prevalence of AI-generated content online in the current times, the model may occasionally mimic such statements and incorrectly identify itself.
- These artifacts are natural consequences of using publicly available data found on the internet although critical but important since such data is important for the model to build a global know-how and we will be addressing issues like this in future iterations of the current Model.
- Lack of Alignment and Guardrails
- A preliminary-level alignment or safety mechanisms have been implemented at this stage.
- The model is yet to under go full-scale instruction tuning, supervised fine-tuning, or reinforcement learning from human feedback (RLHF).
- As a result, it may occasionally:
- Generate biased, offensive, or unsafe content
- Be susceptible to misuse or prompt injection (jailbreaking)
- Respond to harmful or unethical prompts without refusal
- This model must not be deployed in any production without reading Intent use section.
- Intended Use
- This release is provided exclusively for research, experimentation and contribution to the open source community.
- Suggested use cases include:
- Assessing early-stage LLM behavior
- Debugging model training pipelines and configurations
- Benchmarking or custom fine-tuning by the community
- Access to early-checkpoint should embibe a sense of motivation and enthusiasm among the open source community to take such early-stage check point and build India-Specific Innovative use cases on top of it. This should also help foster innovation among the Community.
- Licensing and Responsibility
- Released under an open license with responsible usage guidelines.
- License: MIT
- Users are expected to:
- Adhere to ethical usage practices and legal regulations
- Avoid malicious or unsafe deployment
- Credit the authors as per the licensing terms
- Acknowledgement of Origin
- A home-grown effort initiated in India with limited resources.
- This work represents a bottom-up initiative to develop LLMs from scratch within India.
- It reflects our humble, resource-constrained journey to contribute meaningfully to the open-source AI ecosystem.
- We hope to foster collaboration and growth within the broader community.
- Transparency & Community Collaboration
- We welcome contributions and open dialogue.
- We encourage the community to share feedback, report issues, and collaborate.
- Future versions will introduce better alignment, improved training scale, and more curated datasets.
- Together, we aim to evolve toward safer and more capable AI systems.
π License
This SFT checkpoint is released under the BharatGen non-commercial license. Please refer to the LICENSE for terms and conditions.
- Downloads last month
- 3