Model Card for Model ID

AddaGPT 2.0 is a Bengali language model based on GPT-2, fine-tuned using LoRA adapters for academic and low-resource applications. While GPT-2 was originally trained only on English data, this model has been adapted to Bengali using the AI4Bharat NaamaPadam dataset — a corpus focused on Named Entity Recognition (NER). This project is intended as a proof of concept to explore how small, pretrained models like GPT-2 can be extended to Indic languages using low-rank adaptation (LoRA) techniques, even under limited compute settings (e.g., free Kaggle GPUs). It lays the foundation for future work in adapting language models for low-bandwidth, regional, and offline-first use cases — to support local communities.

Model Details

Attribute Description
Base Model GPT-2 (117M parameters)
Fine-tuned Using LoRA (Low-Rank Adaptation)
Language Bengali (bn)
Training Dataset ai4bharat/naamapadam – Bengali NER corpus (train split only)
Sentences Seen During Training ~9.6 million Bengali sentences
Training Platform Kaggle (Free T4 GPUs)
Frameworks 🤗 Transformers + PEFT (Parameter-Efficient Fine-Tuning) + Safetensors
Trainable Parameters 294,912
Total Parameters 124,734,720
Percentage Fine-Tuned 0.2364%

Model Description

  • Developed by: Swastik Guha Roy
  • Funded by : Self Funded

Uses

AddaGPT 2.0 is an academic proof-of-concept project designed to explore how low-resource, low-compute setups (like Kaggle T4 GPUs) can be used to adapt large language models like GPT-2 for Indic languages, specifically Bengali.

Intended Use Cases:

Academic research on low-rank adaptation (LoRA) for regional languages

Language modeling experimentation in Bengali

Demonstration of fine-tuning techniques in resource-constrained environments

Baseline comparison for future Bengali language model development

Educational purposes for students and ML enthusiasts working on low-resource NLP

Intended Users:

ML/NLP researchers exploring parameter-efficient tuning

Students building regional language models

Developers prototyping Bengali language tools (with limitations)

Community contributors interested in advancing open-source Bengali AI

Limitations

This model is not capable of generating grammatically or syntactically correct Bengali sentences. Instead, it outputs individual Bengali words or word-like tokens that are often meaningful on their own — a direct result of training on a NER-style dataset rather than full natural language text.

->This version does not produce grammatically coherent Bengali sentences

->It's trained on a NER dataset, so it mostly outputs individual Bengali words

->It is not suitable for downstream tasks like summarization, translation, or question-answering — yet

How to Get Started with the Model

Load Nessecary Libraries

   from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

Load the model and tokenizer

  model = AutoModelForCausalLM.from_pretrained("SwastikGuhaRoy/AddaGPT2.0")
  tokenizer = AutoTokenizer.from_pretrained("SwastikGuhaRoy/AddaGPT2.0_tokenizer")

Initialize generation pipeline

text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

Run inference

  prompt = "রবীন্দ্রনাথ ঠাকুর একজন"
  output = text_generator(
      prompt,
      max_new_tokens=30,
      temperature=0.7,
      top_p=0.95,
      do_sample=True
  )
  
  print(output[0]["generated_text"])

Evaluation

Results

The model was evaluated on the validation split of the ai4bharat/naamapadam dataset to measure how well it models Bengali text.

Metric: Perplexity (Lower is Better)

Model Validation Perplexity
AddaGPT 2.0 25.61
Vanilla GPT-2 (English) 144.53

AddaGPT 2.0 shows a significantly lower perplexity, indicating a better fit to Bengali text.

GPT-2 struggles with Bengali due to the lack of Bengali data during pretraining.

Summary

Despite lower perplexity, the model still generates mostly isolated Bengali words, not grammatically complete sentences (due to the nature of the training dataset — a NER corpus).

Citation

If you use this model, please cite:

 @misc{addagpt2.0,
  author = {Swastik Guha Roy},
  title = {AddaGPT 2.0: Bengali Finetuned GPT-2 with LoRA},
  year = 2025,
  howpublished = {\url{https://huggingface.co/SwastikGuhaRoy/AddaGPT2.0}},
 }
Downloads last month
15
Safetensors
Model size
124M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SwastikGuhaRoy/AddaGPT2.0

Finetuned
(1914)
this model
Quantizations
1 model

Dataset used to train SwastikGuhaRoy/AddaGPT2.0