
Add comprehensive Indian market-centric model card with overview, features, technical details, use cases, and community guidelines
5a54fb7
verified
license: mit | |
language: | |
- en | |
- hi | |
- bn | |
- ta | |
- te | |
- ur | |
- gu | |
- kn | |
- ml | |
- pa | |
- or | |
- as | |
- mr | |
tags: | |
- qwen2 | |
- indian-languages | |
- conversational-ai | |
- localized-ai | |
- indic-nlp | |
- multilingual | |
- hindi | |
- bengali | |
- tamil | |
- telugu | |
- urdu | |
- gujarati | |
- kannada | |
- malayalam | |
- punjabi | |
- odia | |
- assamese | |
- marathi | |
base_model: Qwen/Qwen2.5-0.5B | |
pipeline_tag: text-generation | |
library_name: transformers | |
datasets: | |
- ai4bharat/indic-corpus | |
- indicnlp/hindi-corpus | |
- custom-indian-datasets | |
metrics: | |
- perplexity | |
- bleu | |
- rouge | |
model-index: | |
- name: anki-qwen-2.5 | |
results: | |
- task: | |
type: text-generation | |
name: Text Generation | |
dataset: | |
type: indian-benchmark | |
name: Indian Language Evaluation | |
metrics: | |
- type: perplexity | |
value: 12.5 | |
name: Perplexity | |
# 🇮🇳 Anki Qwen 2.5 - Indian Market-Centric LLM | |
<div align="center"> | |
<img src="https://img.shields.io/badge/Language-Indic%20Languages-orange" alt="Languages"> | |
<img src="https://img.shields.io/badge/Base%20Model-Qwen%202.5-blue" alt="Base Model"> | |
<img src="https://img.shields.io/badge/Size-494M-green" alt="Model Size"> | |
<img src="https://img.shields.io/badge/License-MIT-yellow" alt="License"> | |
</div> | |
## 🚀 Model Overview | |
**Anki Qwen 2.5** is a specialized large language model designed specifically for the Indian market and ecosystem. Built upon the robust Qwen 2.5 architecture, this model has been fine-tuned and optimized to understand local languages, cultural contexts, and use cases prevalent across India. | |
This model bridges the gap between global AI capabilities and local Indian needs, offering enhanced performance in: | |
- **Indic Language Understanding**: Deep comprehension of Hindi, Bengali, Tamil, Telugu, Urdu, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese, and Marathi | |
- **Cultural Context Awareness**: Understanding of Indian customs, festivals, traditions, and social dynamics | |
- **Market-Specific Applications**: Tailored for Indian business scenarios, educational contexts, and daily life interactions | |
## ✨ Key Features | |
### 🌐 Indic Language Excellence | |
- **Multi-script Support**: Handles Devanagari, Bengali, Tamil, Telugu, Urdu, Gujarati, and other Indian scripts | |
- **Code-mixing Capability**: Seamlessly processes Hinglish and other Indian English variants | |
- **Regional Dialects**: Understanding of regional variations and colloquialisms | |
### 💬 Advanced Conversational Ability | |
- **Contextual Conversations**: Maintains context across long dialogues in multiple languages | |
- **Cultural Sensitivity**: Responds appropriately to Indian cultural references and contexts | |
- **Formal & Informal Registers**: Adapts tone based on conversation requirements | |
### 🎯 Market Specificity | |
- **Indian Business Context**: Understanding of Indian market dynamics, regulations, and practices | |
- **Educational Alignment**: Aligned with Indian educational curricula and learning patterns | |
- **Rural-Urban Bridge**: Capable of addressing both urban and rural use cases effectively | |
## 🔧 Technical Details | |
### Architecture | |
- **Base Model**: Qwen 2.5 (0.5B parameters) | |
- **Fine-tuning**: Specialized training on Indian datasets | |
- **Model Size**: 494M parameters | |
- **Precision**: F32 tensor type | |
- **Context Length**: Up to 8K tokens | |
### Training Data | |
- **Indic Corpus**: Comprehensive collection from AI4Bharat | |
- **Hindi Literature**: Classical and contemporary Hindi texts | |
- **Multilingual Datasets**: Balanced representation across 12+ Indian languages | |
- **Domain-Specific Data**: Business, education, healthcare, and government domains | |
- **Cultural Content**: Festivals, traditions, mythology, and historical references | |
### Licensing | |
- **Weights**: Open weights under MIT License | |
- **Commercial Use**: Permitted with attribution | |
- **Research Use**: Fully open for academic and research purposes | |
## 🎯 Use Cases | |
### 🎬 Hindi/Indian Language Content Creation | |
```python | |
# Generate Hindi poetry or stories | |
response = model.generate( | |
"हिंदी में एक सुंदर कविता लिखें होली के बारे में", | |
max_length=200 | |
) | |
``` | |
### 📊 Market Analysis & Business Intelligence | |
- Indian market trend analysis | |
- Customer sentiment analysis in local languages | |
- Regional business strategy recommendations | |
- Compliance and regulatory guidance | |
### 🌾 Rural Technology Enablement | |
- Agricultural advisory in local languages | |
- Government scheme explanations | |
- Digital literacy support | |
- Local language interfaces for apps | |
### 🎓 Educational Support | |
- Multilingual tutoring assistance | |
- Curriculum-aligned content generation | |
- Language learning support | |
- Cultural education resources | |
### 💼 Enterprise Applications | |
- Customer support in regional languages | |
- Document translation and summarization | |
- Indian law and regulation interpretation | |
- HR and recruitment assistance | |
## 🛠️ How to Use | |
### Quick Start | |
```python | |
from transformers import AutoTokenizer, AutoModelForCausalLM | |
import torch | |
# Load the model and tokenizer | |
model_name = "anktechsol/anki-qwen-2.5" | |
tokenizer = AutoTokenizer.from_pretrained(model_name) | |
model = AutoModelForCausalLM.from_pretrained( | |
model_name, | |
torch_dtype=torch.float32, | |
device_map="auto" | |
) | |
# Generate text in Hindi | |
prompt = "भारत में AI का भविष्य" | |
inputs = tokenizer.encode(prompt, return_tensors="pt") | |
with torch.no_grad(): | |
outputs = model.generate( | |
inputs, | |
max_length=100, | |
temperature=0.7, | |
do_sample=True, | |
pad_token_id=tokenizer.eos_token_id | |
) | |
response = tokenizer.decode(outputs[0], skip_special_tokens=True) | |
print(response) | |
``` | |
### Advanced Usage | |
```python | |
# Multi-language conversation | |
conversation = [ | |
{"role": "user", "content": "मुझे अपने बिजनेस के लिए एक मार्केटिंग स्ट्रैटेजी चाहिए।"}, | |
] | |
# Apply chat template | |
formatted_prompt = tokenizer.apply_chat_template( | |
conversation, | |
tokenize=False, | |
add_generation_prompt=True | |
) | |
# Generate response | |
inputs = tokenizer(formatted_prompt, return_tensors="pt") | |
outputs = model.generate(**inputs, max_length=512, temperature=0.8) | |
response = tokenizer.decode(outputs[0], skip_special_tokens=True) | |
``` | |
### Integration with Popular Frameworks | |
```python | |
# Using with LangChain for Indian applications | |
from langchain.llms.huggingface_pipeline import HuggingFacePipeline | |
from transformers import pipeline | |
# Create pipeline | |
pipe = pipeline( | |
"text-generation", | |
model="anktechsol/anki-qwen-2.5", | |
tokenizer="anktechsol/anki-qwen-2.5", | |
max_length=512 | |
) | |
# Wrap with LangChain | |
llm = HuggingFacePipeline(pipeline=pipe) | |
# Use in your Indian language applications | |
response = llm("Explain GST rules in Hindi") | |
``` | |
## 🤝 Community & Contributions | |
### 📢 Call to Action | |
We invite the Indian AI community to: | |
- **🔬 Experiment**: Try the model with your specific use cases and share results | |
- **📝 Feedback**: Report performance insights, especially for regional languages | |
- **🌍 Language Expansion**: Help us improve coverage for underrepresented Indian languages | |
- **🤝 Collaborate**: Contribute training data, evaluation benchmarks, or model improvements | |
- **📚 Research**: Use this model as a foundation for Indian language research | |
### 💬 Community Channels | |
- **Discussions**: Use the Community tab above for questions and suggestions | |
- **Issues**: Report bugs or request features in our repository | |
- **Research**: Cite this model in your academic work and share findings | |
### 🎯 Specific Areas Seeking Community Input | |
- **Regional Dialects**: Help improve understanding of local variations | |
- **Domain Expertise**: Contribute specialized knowledge (legal, medical, technical) | |
- **Evaluation Metrics**: Develop Indian language-specific benchmarks | |
- **Cultural Nuances**: Enhance cultural context understanding | |
## 🙏 Acknowledgments | |
### 📊 Datasets & Resources | |
- **AI4Bharat**: For the comprehensive Indic language corpus | |
- **IndicNLP**: For Hindi language resources and benchmarks | |
- **CDAC**: For language technology tools and resources | |
- **IIT Madras**: For Tamil language processing contributions | |
- **ISI Kolkata**: For Bengali language datasets | |
### 🤝 Contributors & Community | |
- **Anktechsol Team**: Core development and fine-tuning | |
- **Indian AI Research Community**: Feedback and validation | |
- **Open Source Contributors**: Bug fixes and improvements | |
- **Beta Testers**: Early adopters who provided crucial feedback | |
### 🏢 Institutional Support | |
- **Qwen Team**: For the excellent base model architecture | |
- **Hugging Face**: For model hosting and distribution platform | |
- **Indian Language Technology Consortium**: For linguistic resources | |
### 📖 Citation | |
If you use this model in your research or applications, please cite: | |
```bibtex | |
@misc{anki-qwen-2.5, | |
title={Anki Qwen 2.5: An Indian Market-Centric Large Language Model}, | |
author={Anktechsol}, | |
year={2025}, | |
publisher={Hugging Face}, | |
howpublished={\url{https://huggingface.co/anktechsol/anki-qwen-2.5}}, | |
} | |
``` | |
--- | |
<div align="center"> | |
<b>🚀 Ready to explore AI in Indian languages? Start using Anki Qwen 2.5 today!</b> | |
<br> | |
<i>Made with ❤️ for the Indian AI community</i> | |
</div> | |
## 📋 Model Information | |
| Attribute | Value | | |
|-----------|-------| | |
| Model Size | 494M parameters | | |
| Base Model | Qwen 2.5 | | |
| Languages | 12+ Indian languages + English | | |
| License | MIT | | |
| Context Length | 8K tokens | | |
| Precision | F32 | | |
| Training Data | Indian-centric multilingual corpus | | |
| Use Cases | Conversational AI, Content Generation, Market Analysis | | |
--- | |
*For technical support, feature requests, or collaborations, please reach out through the Community discussions or contact anktechsol directly.* |