YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Param 1-2.9B-Instruct

BharatGen introduces the early checkpoint of SFT (Supervised Fine-Tuned) for Param 1, a bilingual language model trained from scratch in English and Hindi. With 2.9 billion parameters, this checkpoint builds upon the pretraining phase and serves as a foundation for more downstream tasks, safety testing, and customization.


πŸ“ Folder Structure

Param1/
β”œβ”€β”€ model.nemo             # .nemo packaged model file
β”œβ”€β”€ nemo_inference.sh      # Shell script for running inference
└── README.md              # model documentation file

Pre-Training Details


SFT Training Details

  • Dataset: 0.8 Million samples
  • Epochs: 3
  • Scheduler: Cosine Annealing
  • Learning Rate: 5e-6 to 5e-8
  • Training Hardware: 32 H200 GPUs
  • Framework: NVIDIA NeMo
  • Precision: bf16-mixed

Filtered high-quality bilingual data was used, combining public and in-house sources for safety-aware and culturally relevant behavior.


🐳 Docker Setup

1. Pull the Docker Image

docker pull bharatgen/inference_image:latest

2. Create a Docker Container

Replace /path_to_your_project and path_to_your_workspace accordingly:

docker run --name name_of_your_container --gpus all -it -d -v /path_to_your_project:path_to_your_workspace bharatgen/inference_image:latest

πŸš€ Model Inference

Steps to run inference using the .nemo SFT model file:

  1. Locate the model.nemo file and copy the path.
  2. Open nemo_inference.sh and update the following line:
  3. Don't remove the <user> <assistant> tags just change the prompt in-between those tags
gpt_model_file="/path_to_sft_model.nemo"

πŸ“Š Benchmarks (zero-shot)

Task Param 1 (PT) Gemma2-2B (PT) llama3.2-3B (distill PT) granite-3.1-2B (PT) granite-3.1-3B (PT) qwen-2.5-3B (PT)
ARC Challenge 46.7 49.7 46.0 47.2 45.2 47.4
ARC Easy 74.6 80.3 71.7 76.8 75.8 73.2
HellaSwag 71.4 73.0 73.7 75.5 72.6 73.6
HellaSwag Hi 44.1 38.6 40.0 31.0 28.5 32.9
MMLU En 41.4 47.1 53.9 47.8 41.0 64.9
MMLU Hi 30.7 30.0 35.0 29.0 25.7 38.32
PIQA 79.3 78.3 77.31 79.4 78.2 78.84
TriviaQA 38.5 32.9 50.83 26.2 27.5 42.27
TruthfulQA - Gen (BLEU) 38.2 29.7 21.8 34.0 36.7 36.96
TruthfulQA - MC1 Acc 28.0 24.0 25.3 26.1 26.4 32.07
TruthfulQA - MC2 Acc 43.8 36.2 39.2 39.0 39.9 48.95
SuperGLUE - boolq 70.6 73.7 72.7 71.0 68.5 77.27
SuperGLUE - rte 62.5 61.7 54.5 69.3 54.9 75.09
SuperGLUE - WiC 49.5 49.5 50.0 50.3 52.3 61.75
SuperGLUE - multirc 56.9 55.9 57.2 57.2 57.2 39.52

Notes:

  • Benchmarks reflect zero-shot performance post-SFT.
  • PT = Pretrained

🧠 Model Architecture

  • Hidden size: 2048
  • Intermediate size: 7168
  • Attention heads: 16
  • Hidden layers: 32
  • Key-value heads: 8
  • Max position embeddings: 2048
  • Activation: SiLU
  • Positional Embeddings: Rotary (RoPE, theta=10000)
  • Attention Mechanism: Grouped-query attention
  • Precision: bf16-mixed

Important Guidelines for Early Checkpoint Release of Param-1-2.9B-Instruct

  1. Early Development Status
  • This model is in the initial phase of Param-1 Instruct Model.
  • It is yet to undergo full-supervised fine-tuning, safety alignment, or rigorous evaluation.
  • The release is intended to showcase progress, gather feedback, and encourage research and experimentation.
  • Outputs may at times be incoherent, irrelevant, or of suboptimal quality.
  1. Data Sources and Potential Artifacts
  • To preserve the Model's understanding on the global front, part of the training Data also includes data crawled from the Internet hence it may contain inherited artifacts;
  • Due to the increased prevalence of AI-generated content online in the current times, the model may occasionally mimic such statements and incorrectly identify itself.
  • These artifacts are natural consequences of using publicly available data found on the internet although critical but important since such data is important for the model to build a global know-how and we will be addressing issues like this in future iterations of the current Model.
  1. Lack of Alignment and Guardrails
  • A preliminary-level alignment or safety mechanisms have been implemented at this stage.
  • The model is yet to under go full-scale instruction tuning, supervised fine-tuning, or reinforcement learning from human feedback (RLHF).
  • As a result, it may occasionally:
    • Generate biased, offensive, or unsafe content
    • Be susceptible to misuse or prompt injection (jailbreaking)
    • Respond to harmful or unethical prompts without refusal
  • This model must not be deployed in any production without reading Intent use section.
  1. Intended Use
  • This release is provided exclusively for research, experimentation and contribution to the open source community.
  • Suggested use cases include:
    • Assessing early-stage LLM behavior
    • Debugging model training pipelines and configurations
    • Benchmarking or custom fine-tuning by the community
  • Access to early-checkpoint should embibe a sense of motivation and enthusiasm among the open source community to take such early-stage check point and build India-Specific Innovative use cases on top of it. This should also help foster innovation among the Community.
  1. Licensing and Responsibility
  • Released under an open license with responsible usage guidelines.
  • License: MIT
  • Users are expected to:
    • Adhere to ethical usage practices and legal regulations
    • Avoid malicious or unsafe deployment
    • Credit the authors as per the licensing terms
  1. Acknowledgement of Origin
  • A home-grown effort initiated in India with limited resources.
  • This work represents a bottom-up initiative to develop LLMs from scratch within India.
  • It reflects our humble, resource-constrained journey to contribute meaningfully to the open-source AI ecosystem.
  • We hope to foster collaboration and growth within the broader community.
  1. Transparency & Community Collaboration
  • We welcome contributions and open dialogue.
  • We encourage the community to share feedback, report issues, and collaborate.
  • Future versions will introduce better alignment, improved training scale, and more curated datasets.
  • Together, we aim to evolve toward safer and more capable AI systems.

πŸ“œ License

This SFT checkpoint is released under the BharatGen non-commercial license. Please refer to the LICENSE for terms and conditions.


Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support