---
language:
- en
- hi
tags:
- Multiturn
- QnA
---
# AgriParam
**BharatGen** introduces **AgriParam**, a domain-specialized large language model fine-tuned from **Param-1-2.9B-Instruct** on a high-quality, India-centric agriculture dataset.
AgriParam is designed to understand and generate contextually rich responses for agricultural queries, farmer advisories, policy information, research insights, and rural knowledge dissemination.
---
## ๐ฑ Motivation
Agriculture is the backbone of Indiaโs economy, yet existing language models lack deep domain knowledge tailored to Indian contexts, languages, and cultural nuances.
AgriParam bridges this gap by combining **Param-1**โs bilingual capabilities with a meticulously curated agricultural knowledge base.
---
## ๐ Model Architecture
AgriParam inherits the architecture of **Param-1-2.9B-Instruct**:
* **Hidden size**: 2048
* **Intermediate size**: 7168
* **Attention heads**: 16
* **Hidden layers**: 32
* **Key-value heads**: 8
* **Max position embeddings**: 2048
* **Activation**: SiLU
* **Positional Embeddings**: Rotary (RoPE, theta=10000)
* **Attention Mechanism**: Grouped-query attention
* **Precision**: bf16-mixed
* **Base model**: [Param-1-2.9B-Instruct](https://huggingface.co/bharatgenai/Param-1-2.9B-Instruct)
---
## ๐ Data Preparation
AgriParamโs training corpus was carefully crafted to ensure deep agricultural knowledge, cultural relevance, and bilingual (English-Hindi) accessibility.
**Steps involved:**
1. **Source Gathering**
* 17k open-source, India-focused agricultural news & information passages.
2. **Question Generation**
* Generated 5 curated Q&A pairs per passage using an open-source LLM.
3. **Domain Taxonomy & Personas**
* Built an exhaustive, India-specific agricultural taxonomy.
* Defined farmer, policy-maker, scientist, and agri-business personas.
4. **Dataset Construction**
* 2M Q&A pairs grounded in taxonomy and personas.
* Complete dataset translated into Hindi.
* 6M multi-turn conversation samples created.
---
## ๐๏ธ Training Setup
* **Base model**: Param-1-2.9B-Instruct
* **Training framework**: Hugging Face + `torchrun` multi-node setup
* **Prompt template**: Custom-designed for agricultural inference
* **Scheduler**: Linear with warmup
* **Epochs**: 3
* **Total training samples**: 12M
* **Test samples**: 1.2M
* **Base learning rate**: 5e-6
* **Minimum learning rate**: 0
* **Additional tokens**: ``, ``, ``, ``
* **Vocab size**: 256k + 4
* **Global batch size**: 1024
* **Micro batch size**: 4
* **Gradient accumulation steps**: 32
---
## ๐ Inference Example
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "bharatgenai/AgriParam"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=False)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.bfloat32,
device_map="auto"
)
# Example agricultural query
user_input = "What are the best practices for organic wheat farming in Uttar Pradesh?"
# 3 types of prompt
# 1. Generic QA: ...
# 2. Context based QA: ... ...
# 3. Multi-turn conversation (supports upto 5 conversations): ... ... ...
# Based on your requirements use the type of prompt (refere the above examples)
prompt = f" {user_input} "
# prompt = f" {user_context} {user_input} "
# prompt = f" {user_input1} {user_input2} {user_input3} ..."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=300,
do_sample=True,
top_k=50,
top_p=0.95,
temperature=0.6,
eos_token_id=tokenizer.eos_token_id,
use_cache=False
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
๐ Evaluation
* Crop-specific Q&A
* Policy & scheme awareness
* Rural advisory & extension services
* Bilingual (English/Hindi) capability
### **[BhashaBench-Krishi (BBK)](https://huggingface.co/datasets/bharatgenai/BhashaBench-Krishi)**
| Model | BBK | BBK_English | BBK_Hindi |
|-------------------------------------|-------------------|---------------------------|-------------------------|
| Llama-3.2-1B | 28.91 | 29.71 | 25.21 |
| Llama-3.2-1B-Instruct | 28.65 | 29.16 | 26.33 |
| Llama-3.2-3B | 31.96 | 32.68 | 28.69 |
| granite-3.1-3b-a800m-base | 32.17 | 33.36 | 26.70 |
| sarvam-2b-v0.5 | 27.68 | 28.14 | 25.57 |
| sarvam-1 | 30.24 | 30.82 | 27.57 |
| **AgriParam** | **32.18** | **33.10** | **27.97** |
---
### **Subject Domain Performance**
| Subject Domain | Llama-3.2-1B | Llama-3.2-1B-Instruct | Llama-3.2-3B | granite-3.1-3b-a800m-base | sarvam-2b-v0.5 | sarvam-1 | AgriParam |
|----------------------------------------------------|--------------|-----------------------|--------------|---------------------------|----------------|----------|-----------|
| Agri-Environmental & Allied Disciplines | 31.82 | 32.95 | 25.00 | 36.93 | 29.55 | 30.11 | 27.27 |
| Agricultural Biotechnology | 31.11 | 28.63 | 34.35 | 43.13 | 30.34 | 36.64 | 36.64 |
| Agricultural Chemistry & Biochemistry | 27.05 | 22.78 | 31.32 | 35.94 | 27.05 | 34.52 | 34.16 |
| Agricultural Economics & Policy | 29.98 | 25.52 | 35.09 | 34.77 | 27.75 | 30.78 | 32.54 |
| Agricultural Engineering & Technology | 27.46 | 26.23 | 32.79 | 30.33 | 27.46 | 29.51 | 27.87 |
| Agricultural Extension Education | 30.88 | 29.46 | 32.30 | 29.84 | 28.17 | 29.97 | 34.50 |
| Agricultural Microbiology | 34.23 | 36.04 | 31.53 | 34.23 | 17.12 | 26.13 | 34.23 |
| Agriculture Communication | 33.07 | 28.35 | 29.53 | 34.25 | 25.59 | 33.07 | 32.68 |
| Agriculture Information Technology | 30.53 | 31.58 | 44.21 | 36.84 | 27.89 | 32.11 | 27.89 |
| Agronomy | 27.92 | 28.77 | 31.84 | 31.51 | 28.67 | 29.60 | 32.49 |
| Animal Sciences | 25.68 | 34.46 | 36.49 | 37.84 | 35.14 | 29.05 | **40.54** |
| Crop Sciences | 31.15 | 26.41 | 29.87 | 35.15 | 26.59 | 29.33 | 32.42 |
| Dairy & Poultry Science | 35.96 | 31.46 | 30.34 | 44.94 | 33.71 | 32.58 | 29.21 |
| Entomology | 29.02 | 27.59 | 35.49 | 29.31 | 27.59 | 27.87 | 31.75 |
| Fisheries and Aquaculture | 29.41 | 41.18 | 38.24 | 26.47 | 20.59 | 14.71 | 23.53 |
| General Knowledge & Reasoning | 28.44 | 27.53 | 33.13 | 32.38 | 26.17 | 30.56 | 31.92 |
| Genetics and Plant Breeding | 30.59 | 30.08 | 28.02 | 29.05 | 26.99 | 31.62 | 29.82 |
| Horticulture | 27.05 | 28.60 | 31.21 | 32.17 | 27.00 | 29.76 | 31.40 |
| Natural Resource Management | 28.50 | 26.42 | 29.02 | 32.64 | 26.42 | 26.94 | 27.46 |
| Nematology | 22.83 | 28.26 | 28.26 | 27.17 | 21.20 | 24.46 | 23.91 |
| Plant Pathology | 28.97 | 30.48 | 27.96 | 29.97 | 25.44 | 33.50 | 25.44 |
| Plant Sciences & Physiology | 28.68 | 31.78 | 37.98 | 26.36 | 20.93 | 30.23 | 31.01 |
| Seed Science and Technology | 29.70 | 28.71 | 27.72 | 29.21 | 29.70 | 34.65 | 27.23 |
| Soil Science | 31.25 | 29.92 | 31.69 | 29.99 | 27.49 | 30.21 | **34.93** |
| Veterinary Sciences | 27.08 | 14.58 | 37.50 | 39.58 | 20.83 | 41.67 | **43.75** |
---
### **Question Level Difficulty**
| Difficulty | Llama-3.2-1B | Llama-3.2-1B-Instruct | Llama-3.2-3B | granite-3.1-3b-a800m-base | sarvam-2b-v0.5 | sarvam-1 | AgriParam |
|------------|--------------|-----------------------|--------------|---------------------------|----------------|----------|-----------|
| Easy | 29.43 | 30.22 | 36.44 | 36.08 | 28.26 | 32.20 | **36.94** |
| Hard | 27.72 | 26.37 | 25.61 | 26.02 | 28.01 | 27.54 | 25.91 |
| Medium | 28.68 | 27.69 | 29.17 | 29.88 | 27.03 | 28.99 | 29.09 |
---
## ๐ License
This SFT checkpoint is released under the **BharatGen non-commercial license**.
Please refer to the [LICENSE](./LICENSE) for terms and conditions.