Stackoverflow Tag Recommender

Model Description

This model is a fine-tuned version of distilbert-base-uncased for multi-label text classification on Stack Overflow Posts. It predicts relevant tags for programming related posts based on the title and content.

Model Details

Model Type: Multi-label Text Classification
Base Model: distilbert-base-uncased
Language: English
Number of Labels: 20
Framework: PyTorch + Transformers
License: Apache 2.0

Performance

Metric	Value
Micro F1	0.583
Macro F1	0.590
Subset Accuracy	0.165
Hamming Loss	0.092

Available Tags

The model can predict the following 20 tags:

c#, java, javascript, jquery, ios, .net, php, html, c++, iphone, android, objective-c, asp.net, python, sql, mysql, css, ajax, c, database

Usage

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("tag-recommender")
model = AutoModelForSequenceClassification.from_pretrained("tag-recommender")

# Example prediction
text = "How do I connect to a MySQL database using Python?"
inputs = tokenizer(text, return_tensors="pt", max_length=384, truncation=True)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.sigmoid(outputs.logits)
    
# Get predictions above threshold
threshold = 0.5
predicted_indices = (predictions > threshold).nonzero(as_tuple=True)[1]
print(f"Predicted tag indices: {predicted_indices.tolist()}")

Training Details

Training Configuration

Epochs: 50
Batch Size: 8
Learning Rate: 5e-06
Max Sequence Length: 330
Optimizer: AdamW
Loss Function: BCEWithLogitsLoss

Training Infrastructure

Hardware: GPU (Google Colab)
Framework: PyTorch + HuggingFace Transformers

Limitations

Domain Specificity: Trained specifically on Stack Overflow data
Language: English only
Tag Coverage: Limited to most frequent tags in training data
Context Length: Maximum input length of 330 tokens

Citation

@misc{tag-recommender,
  title={Stack Overflow Tag Recommendation using DistilBERT},
  year={2025},
  howpublished={HuggingFace Model Repository},
  url={https://huggingface.co/bonjourusman/tag-recommender}
}

Generated on: 2025-06-02 04:43:45

Downloads last month: -

Safetensors

Model size

67.3M params

Tensor type

F32

Model tree for bonjourusman/tag-recommender

Base model

distilbert/distilbert-base-uncased

Finetuned

(9910)

this model

Evaluation results

Micro F1 on Stack Overflow Posts
self-reported

0.583
Macro F1 on Stack Overflow Posts
self-reported

0.590
Subset Accuracy on Stack Overflow Posts
self-reported

0.165
Hamming Loss on Stack Overflow Posts
self-reported

0.092

View on Papers With Code