๐ฎ๐ณ Anki Qwen 2.5 - Indian Market-Centric LLM
๐ Model Overview
Anki Qwen 2.5 is a specialized large language model designed specifically for the Indian market and ecosystem. Built upon the robust Qwen 2.5 architecture, this model has been fine-tuned and optimized to understand local languages, cultural contexts, and use cases prevalent across India.
This model bridges the gap between global AI capabilities and local Indian needs, offering enhanced performance in:
- Indic Language Understanding: Deep comprehension of Hindi, Bengali, Tamil, Telugu, Urdu, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese, and Marathi
- Cultural Context Awareness: Understanding of Indian customs, festivals, traditions, and social dynamics
- Market-Specific Applications: Tailored for Indian business scenarios, educational contexts, and daily life interactions
โจ Key Features
๐ Indic Language Excellence
- Multi-script Support: Handles Devanagari, Bengali, Tamil, Telugu, Urdu, Gujarati, and other Indian scripts
- Code-mixing Capability: Seamlessly processes Hinglish and other Indian English variants
- Regional Dialects: Understanding of regional variations and colloquialisms
๐ฌ Advanced Conversational Ability
- Contextual Conversations: Maintains context across long dialogues in multiple languages
- Cultural Sensitivity: Responds appropriately to Indian cultural references and contexts
- Formal & Informal Registers: Adapts tone based on conversation requirements
๐ฏ Market Specificity
- Indian Business Context: Understanding of Indian market dynamics, regulations, and practices
- Educational Alignment: Aligned with Indian educational curricula and learning patterns
- Rural-Urban Bridge: Capable of addressing both urban and rural use cases effectively
๐ง Technical Details
Architecture
- Base Model: Qwen 2.5 (0.5B parameters)
- Fine-tuning: Specialized training on Indian datasets
- Model Size: 494M parameters
- Precision: F32 tensor type
- Context Length: Up to 8K tokens
Training Data
- Indic Corpus: Comprehensive collection from AI4Bharat
- Hindi Literature: Classical and contemporary Hindi texts
- Multilingual Datasets: Balanced representation across 12+ Indian languages
- Domain-Specific Data: Business, education, healthcare, and government domains
- Cultural Content: Festivals, traditions, mythology, and historical references
Licensing
- Weights: Open weights under MIT License
- Commercial Use: Permitted with attribution
- Research Use: Fully open for academic and research purposes
๐ฏ Use Cases
๐ฌ Hindi/Indian Language Content Creation
# Generate Hindi poetry or stories
response = model.generate(
"เคนเคฟเคเคฆเฅ เคฎเฅเค เคเค เคธเฅเคเคฆเคฐ เคเคตเคฟเคคเคพ เคฒเคฟเคเฅเค เคนเฅเคฒเฅ เคเฅ เคฌเคพเคฐเฅ เคฎเฅเค",
max_length=200
)
๐ Market Analysis & Business Intelligence
- Indian market trend analysis
- Customer sentiment analysis in local languages
- Regional business strategy recommendations
- Compliance and regulatory guidance
๐พ Rural Technology Enablement
- Agricultural advisory in local languages
- Government scheme explanations
- Digital literacy support
- Local language interfaces for apps
๐ Educational Support
- Multilingual tutoring assistance
- Curriculum-aligned content generation
- Language learning support
- Cultural education resources
๐ผ Enterprise Applications
- Customer support in regional languages
- Document translation and summarization
- Indian law and regulation interpretation
- HR and recruitment assistance
๐ ๏ธ How to Use
Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load the model and tokenizer
model_name = "anktechsol/anki-qwen-2.5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float32,
device_map="auto"
)
# Generate text in Hindi
prompt = "เคญเคพเคฐเคค เคฎเฅเค AI เคเคพ เคญเคตเคฟเคทเฅเคฏ"
inputs = tokenizer.encode(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs,
max_length=100,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Advanced Usage
# Multi-language conversation
conversation = [
{"role": "user", "content": "เคฎเฅเคเฅ เค
เคชเคจเฅ เคฌเคฟเคเคจเฅเคธ เคเฅ เคฒเคฟเค เคเค เคฎเคพเคฐเฅเคเฅเคเคฟเคเค เคธเฅเคเฅเคฐเฅเคเฅเคเฅ เคเคพเคนเคฟเคเฅค"},
]
# Apply chat template
formatted_prompt = tokenizer.apply_chat_template(
conversation,
tokenize=False,
add_generation_prompt=True
)
# Generate response
inputs = tokenizer(formatted_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.8)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Integration with Popular Frameworks
# Using with LangChain for Indian applications
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import pipeline
# Create pipeline
pipe = pipeline(
"text-generation",
model="anktechsol/anki-qwen-2.5",
tokenizer="anktechsol/anki-qwen-2.5",
max_length=512
)
# Wrap with LangChain
llm = HuggingFacePipeline(pipeline=pipe)
# Use in your Indian language applications
response = llm("Explain GST rules in Hindi")
๐ค Community & Contributions
๐ข Call to Action
We invite the Indian AI community to:
- ๐ฌ Experiment: Try the model with your specific use cases and share results
- ๐ Feedback: Report performance insights, especially for regional languages
- ๐ Language Expansion: Help us improve coverage for underrepresented Indian languages
- ๐ค Collaborate: Contribute training data, evaluation benchmarks, or model improvements
- ๐ Research: Use this model as a foundation for Indian language research
๐ฌ Community Channels
- Discussions: Use the Community tab above for questions and suggestions
- Issues: Report bugs or request features in our repository
- Research: Cite this model in your academic work and share findings
๐ฏ Specific Areas Seeking Community Input
- Regional Dialects: Help improve understanding of local variations
- Domain Expertise: Contribute specialized knowledge (legal, medical, technical)
- Evaluation Metrics: Develop Indian language-specific benchmarks
- Cultural Nuances: Enhance cultural context understanding
๐ Acknowledgments
๐ Datasets & Resources
- AI4Bharat: For the comprehensive Indic language corpus
- IndicNLP: For Hindi language resources and benchmarks
- CDAC: For language technology tools and resources
- IIT Madras: For Tamil language processing contributions
- ISI Kolkata: For Bengali language datasets
๐ค Contributors & Community
- Anktechsol Team: Core development and fine-tuning
- Indian AI Research Community: Feedback and validation
- Open Source Contributors: Bug fixes and improvements
- Beta Testers: Early adopters who provided crucial feedback
๐ข Institutional Support
- Qwen Team: For the excellent base model architecture
- Hugging Face: For model hosting and distribution platform
- Indian Language Technology Consortium: For linguistic resources
๐ Citation
If you use this model in your research or applications, please cite:
@misc{anki-qwen-2.5,
title={Anki Qwen 2.5: An Indian Market-Centric Large Language Model},
author={Anktechsol},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/anktechsol/anki-qwen-2.5}},
}
๐ Ready to explore AI in Indian languages? Start using Anki Qwen 2.5 today!
Made with โค๏ธ for the Indian AI community
Made with โค๏ธ for the Indian AI community
๐ Model Information
Attribute | Value |
---|---|
Model Size | 494M parameters |
Base Model | Qwen 2.5 |
Languages | 12+ Indian languages + English |
License | MIT |
Context Length | 8K tokens |
Precision | F32 |
Training Data | Indian-centric multilingual corpus |
Use Cases | Conversational AI, Content Generation, Market Analysis |
For technical support, feature requests, or collaborations, please reach out through the Community discussions or contact anktechsol directly.
- Downloads last month
- 38
Model tree for anktechsol/anki-qwen-2.5
Evaluation results
- Perplexity on Indian Language Evaluationself-reported12.500