Param 1-2.9B-Instruct

BharatGen introduces the early checkpoint of SFT (Supervised Fine-Tuned) for Param 1, a bilingual language model trained from scratch in English and Hindi. With 2.9 billion parameters, this checkpoint builds upon the pretraining phase and serves as a foundation for more downstream tasks, safety testing, and customization.

📁 Folder Structure

Param1/
├── model.nemo             # .nemo packaged model file
├── nemo_inference.sh      # Shell script for running inference
└── README.md              # model documentation file

Pre-Training Details

Dataset: 7.5 Trillion tokens
Data Quality: Highly curated with standard filtering and multiple processing steps.
Scheduler: Cosine Annealing
Learning_rate: 3e-4 to 3e-6
Training Setup: Running on 512 H100 GPUs
Framework: NVIDIA NeMo
Precision: bf16-mixed
For Pre-Trained Checkpoint (Param 1): https://aikosh.indiaai.gov.in/home/models/details/bharatgen_param_1_indic_scale_bilingual_foundation_model.html

SFT Training Details

Dataset: 0.8 Million samples
Epochs: 3
Scheduler: Cosine Annealing
Learning Rate: 5e-6 to 5e-8
Training Hardware: 32 H200 GPUs
Framework: NVIDIA NeMo
Precision: bf16-mixed

Filtered high-quality bilingual data was used, combining public and in-house sources for safety-aware and culturally relevant behavior.

🐳 Docker Setup

1. Pull the Docker Image

docker pull bharatgen/inference_image:latest

2. Create a Docker Container

Replace /path_to_your_project and path_to_your_workspace accordingly:

docker run --name name_of_your_container --gpus all -it -d -v /path_to_your_project:path_to_your_workspace bharatgen/inference_image:latest

🚀 Model Inference

Steps to run inference using the `.nemo` SFT model file:

Locate the model.nemo file and copy the path.
Open nemo_inference.sh and update the following line:
Don't remove the <user> <assistant> tags just change the prompt in-between those tags

gpt_model_file="/path_to_sft_model.nemo"

📊 Benchmarks (zero-shot)

Task	Param 1 (PT)	Gemma2-2B (PT)	llama3.2-3B (distill PT)	granite-3.1-2B (PT)	granite-3.1-3B (PT)	qwen-2.5-3B (PT)
ARC Challenge	46.7	49.7	46.0	47.2	45.2	47.4
ARC Easy	74.6	80.3	71.7	76.8	75.8	73.2
HellaSwag	71.4	73.0	73.7	75.5	72.6	73.6
HellaSwag Hi	44.1	38.6	40.0	31.0	28.5	32.9
MMLU En	41.4	47.1	53.9	47.8	41.0	64.9
MMLU Hi	30.7	30.0	35.0	29.0	25.7	38.32
PIQA	79.3	78.3	77.31	79.4	78.2	78.84
TriviaQA	38.5	32.9	50.83	26.2	27.5	42.27
TruthfulQA - Gen (BLEU)	38.2	29.7	21.8	34.0	36.7	36.96
TruthfulQA - MC1 Acc	28.0	24.0	25.3	26.1	26.4	32.07
TruthfulQA - MC2 Acc	43.8	36.2	39.2	39.0	39.9	48.95
SuperGLUE - boolq	70.6	73.7	72.7	71.0	68.5	77.27
SuperGLUE - rte	62.5	61.7	54.5	69.3	54.9	75.09
SuperGLUE - WiC	49.5	49.5	50.0	50.3	52.3	61.75
SuperGLUE - multirc	56.9	55.9	57.2	57.2	57.2	39.52

Notes:

Benchmarks reflect zero-shot performance post-SFT.

PT = Pretrained

🧠 Model Architecture

Hidden size: 2048
Intermediate size: 7168
Attention heads: 16
Hidden layers: 32
Key-value heads: 8
Max position embeddings: 2048
Activation: SiLU
Positional Embeddings: Rotary (RoPE, theta=10000)
Attention Mechanism: Grouped-query attention
Precision: bf16-mixed

Important Guidelines for Early Checkpoint Release of Param-1-2.9B-Instruct

Early Development Status

This model is in the initial phase of Param-1 Instruct Model.
It is yet to undergo full-supervised fine-tuning, safety alignment, or rigorous evaluation.
The release is intended to showcase progress, gather feedback, and encourage research and experimentation.
Outputs may at times be incoherent, irrelevant, or of suboptimal quality.

Data Sources and Potential Artifacts

To preserve the Model's understanding on the global front, part of the training Data also includes data crawled from the Internet hence it may contain inherited artifacts;
Due to the increased prevalence of AI-generated content online in the current times, the model may occasionally mimic such statements and incorrectly identify itself.
These artifacts are natural consequences of using publicly available data found on the internet although critical but important since such data is important for the model to build a global know-how and we will be addressing issues like this in future iterations of the current Model.

Lack of Alignment and Guardrails

A preliminary-level alignment or safety mechanisms have been implemented at this stage.
The model is yet to under go full-scale instruction tuning, supervised fine-tuning, or reinforcement learning from human feedback (RLHF).
As a result, it may occasionally:
- Generate biased, offensive, or unsafe content
- Be susceptible to misuse or prompt injection (jailbreaking)
- Respond to harmful or unethical prompts without refusal
This model must not be deployed in any production without reading Intent use section.

Intended Use

This release is provided exclusively for research, experimentation and contribution to the open source community.
Suggested use cases include:
- Assessing early-stage LLM behavior
- Debugging model training pipelines and configurations
- Benchmarking or custom fine-tuning by the community
Access to early-checkpoint should embibe a sense of motivation and enthusiasm among the open source community to take such early-stage check point and build India-Specific Innovative use cases on top of it. This should also help foster innovation among the Community.

Licensing and Responsibility

Released under an open license with responsible usage guidelines.
License: MIT
Users are expected to:
- Adhere to ethical usage practices and legal regulations
- Avoid malicious or unsafe deployment
- Credit the authors as per the licensing terms

Acknowledgement of Origin

A home-grown effort initiated in India with limited resources.
This work represents a bottom-up initiative to develop LLMs from scratch within India.
It reflects our humble, resource-constrained journey to contribute meaningfully to the open-source AI ecosystem.
We hope to foster collaboration and growth within the broader community.

Transparency & Community Collaboration

We welcome contributions and open dialogue.
We encourage the community to share feedback, report issues, and collaborate.
Future versions will introduce better alignment, improved training scale, and more curated datasets.
Together, we aim to evolve toward safer and more capable AI systems.

📜 License

This SFT checkpoint is released under the BharatGen non-commercial license. Please refer to the LICENSE for terms and conditions.