Model Card for dhee-chat-mistral-hi
A fine-tuned Hindi conversational model based on mistralai/Mistral-7B-v0.3
, optimized for Hindi language understanding and generation.
Model Details
- Base Model: Mistral 7B v0.3
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Dataset:
ai4bharat/indic-align
- Language: Hindi
- Model ID:
dheeyantra/dhee-chat-mistral-hi
Intended Uses & Limitations
This model is intended for use in Hindi conversational applications, such as chatbots and virtual assistants. As it is fine-tuned on the ai4bharat/indic-align
dataset, its knowledge and conversational style are primarily shaped by this data.
Limitations:
- The model's responses are based on the patterns and information present in the training data. It may generate incorrect or biased information.
- Performance may vary depending on the complexity and nuance of the input.
- The model is primarily focused on Hindi and may not perform well in other languages or code-mixed scenarios unless explicitly trained for them.
How to Get Started with Hugging Face Transformers
You can use the following Python code to load and run inference with the dheeyantra/dhee-chat-mistral-hi
model:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_path = "dheeyantra/dhee-chat-mistral-hi"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
# Prepare chat messages
messages = [
{"role": "User", "content": "कितने वेद हैं?"},
{"role": "Dhee", "content": "चार वेद हैंः ऋग्वेद, यजुर्वेद, सामवेद और अथर्ववेद।"},
{"role": "User", "content": "ऋग्वेद के बारे में और बतायें?"}
]
# Apply chat template to get prompt
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Tokenize prompt
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# Generate output
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=64,
do_sample=True,
temperature=0.7,
top_p=0.95,
pad_token_id=tokenizer.eos_token_id
)
# Decode generated text
generated_text = tokenizer.decode(output_ids[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True)
results = [{"generated_text": generated_text}]
print("Generated text:")
print(results[0]['generated_text'])
Disclaimer
This model is provided as-is. Users should be aware of its potential limitations and biases before deploying it in any application. Responsible AI practices should be followed.
Training Configuration
The model was fine-tuned using the following LoRA and training parameters:
LoRA Parameters:
r
: 16target_modules
: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]lora_alpha
: 16lora_dropout
: 0bias
: "none"use_gradient_checkpointing
: "unsloth"use_rslora
: Falseloftq_config
: None
Training Arguments:
gradient_accumulation_steps
: 4warmup_ratio
: 0.03fp16
: Trueoptim
: "adamw_8bit"max_seq_length
: 32768
Acknowledgements
We extend our sincere gratitude to the following organizations for their invaluable contributions to this project:
- NxtGen: For generously providing the necessary infrastructure that powered the model training.
- AI4Bharat: For developing and making available the indic-align dataset, which was crucial for fine-tuning this model.
Citation
If you use this model in your research or applications, please cite:
@misc{dheenxtgen2025,
title={ dhee-chat-mistral-hi : A Compact Language Model for Hindi},
author={Dheeyantra Research Labs},
year={2025},}
}
- Downloads last month
- 9