---
library_name: transformers
license: apache-2.0
datasets:
- ai4bharat/naamapadam
language:
- bn
base_model:
- openai-community/gpt2
---

# Model Card for Model ID

AddaGPT 2.0 is a Bengali language model based on GPT-2, fine-tuned using LoRA adapters for academic and low-resource applications. While GPT-2 was originally trained only on English data, this model has been adapted to Bengali using the AI4Bharat NaamaPadam dataset — a corpus focused on Named Entity Recognition (NER).
This project is intended as a proof of concept to explore how small, pretrained models like GPT-2 can be extended to Indic languages using low-rank adaptation (LoRA) techniques, even under limited compute settings (e.g., free Kaggle GPUs). It lays the foundation for future work in adapting language models for low-bandwidth, regional, and offline-first use cases — to support local communities.

## Model Details
| **Attribute**                      | **Description**                                                                                                        |
| ---------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| **Base Model**                     | GPT-2 (117M parameters)                                                                                                |
| **Fine-tuned Using**               | [LoRA (Low-Rank Adaptation)](https://arxiv.org/abs/2106.09685)                                                         |
| **Language**                       | Bengali (`bn`)                                                                                                         |
| **Training Dataset**               | [`ai4bharat/naamapadam`](https://huggingface.co/datasets/ai4bharat/naamapadam) – Bengali NER corpus (train split only) |
| **Sentences Seen During Training** | \~9.6 million Bengali sentences                                                                                        |
| **Training Platform**              | Kaggle (Free T4 GPUs)                                                                            |
| **Frameworks**                     | 🤗 Transformers + PEFT (Parameter-Efficient Fine-Tuning) + Safetensors                                                 |
| **Trainable Parameters**           | 294,912                                                                                                                |
| **Total Parameters**               | 124,734,720                                                                                                            |
| **Percentage Fine-Tuned**          | 0.2364%                                                                                                                |


### Model Description

- **Developed by:** Swastik Guha Roy
- **Funded by   :** Self Funded


### Uses

AddaGPT 2.0 is an academic proof-of-concept project designed to explore how low-resource, low-compute setups (like Kaggle T4 GPUs) can be used to adapt large language models like GPT-2 for Indic languages, specifically Bengali.

### Intended Use Cases:
Academic research on low-rank adaptation (LoRA) for regional languages

Language modeling experimentation in Bengali

Demonstration of fine-tuning techniques in resource-constrained environments

Baseline comparison for future Bengali language model development

Educational purposes for students and ML enthusiasts working on low-resource NLP

### Intended Users:

ML/NLP researchers exploring parameter-efficient tuning

Students building regional language models

Developers prototyping Bengali language tools (with limitations)

Community contributors interested in advancing open-source Bengali AI


## Limitations

This model is not capable of generating grammatically or syntactically correct Bengali sentences. Instead, it outputs individual Bengali words or word-like tokens that are often meaningful on their own — a direct result of training on a NER-style dataset rather than full natural language text.

->This version does not produce grammatically coherent Bengali sentences

->It's trained on a NER dataset, so it mostly outputs individual Bengali words

->It is not suitable for downstream tasks like summarization, translation, or question-answering — yet


### How to Get Started with the Model

# Load Nessecary Libraries
  ```python
     from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
  ```

# Load the model and tokenizer
```python
  model = AutoModelForCausalLM.from_pretrained("SwastikGuhaRoy/AddaGPT2.0")
  tokenizer = AutoTokenizer.from_pretrained("SwastikGuhaRoy/AddaGPT2.0_tokenizer")
```
# Initialize generation pipeline
  ```python
  text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
 ```
# Run inference
``` python
  prompt = "রবীন্দ্রনাথ ঠাকুর একজন"
  output = text_generator(
      prompt,
      max_new_tokens=30,
      temperature=0.7,
      top_p=0.95,
      do_sample=True
  )
  
  print(output[0]["generated_text"])
```

## Evaluation
### Results

The model was evaluated on the validation split of the ai4bharat/naamapadam dataset to measure how well it models Bengali text.

## Metric: Perplexity (Lower is Better)
| Model                   | Validation Perplexity |
| ----------------------- | --------------------- |
| **AddaGPT 2.0**         | **25.61**             |
| Vanilla GPT-2 (English) | 144.53                |


## AddaGPT 2.0 shows a significantly lower perplexity, indicating a better fit to Bengali text.

## GPT-2 struggles with Bengali due to the lack of Bengali data during pretraining.

## Summary
Despite lower perplexity, the model still generates mostly isolated Bengali words, not grammatically complete sentences (due to the nature of the training dataset — a NER corpus).


### Citation 
If you use this model, please cite:

```bibtex
 @misc{addagpt2.0,
  author = {Swastik Guha Roy},
  title = {AddaGPT 2.0: Bengali Finetuned GPT-2 with LoRA},
  year = 2025,
  howpublished = {\url{https://huggingface.co/SwastikGuhaRoy/AddaGPT2.0}},
 }