Stackoverflow Tag Recommender

Model Description

This model is a fine-tuned version of distilbert-base-uncased for multi-label text classification on Stack Overflow Posts. It predicts relevant tags for programming related posts based on the title and content.

Model Details

  • Model Type: Multi-label Text Classification
  • Base Model: distilbert-base-uncased
  • Language: English
  • Number of Labels: 20
  • Framework: PyTorch + Transformers
  • License: Apache 2.0

Performance

Metric Value
Micro F1 0.583
Macro F1 0.590
Subset Accuracy 0.165
Hamming Loss 0.092

Available Tags

The model can predict the following 20 tags:

c#, java, javascript, jquery, ios, .net, php, html, c++, iphone, android, objective-c, asp.net, python, sql, mysql, css, ajax, c, database

Usage

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("tag-recommender")
model = AutoModelForSequenceClassification.from_pretrained("tag-recommender")

# Example prediction
text = "How do I connect to a MySQL database using Python?"
inputs = tokenizer(text, return_tensors="pt", max_length=384, truncation=True)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.sigmoid(outputs.logits)
    
# Get predictions above threshold
threshold = 0.5
predicted_indices = (predictions > threshold).nonzero(as_tuple=True)[1]
print(f"Predicted tag indices: {predicted_indices.tolist()}")

Training Details

Training Configuration

  • Epochs: 50
  • Batch Size: 8
  • Learning Rate: 5e-06
  • Max Sequence Length: 330
  • Optimizer: AdamW
  • Loss Function: BCEWithLogitsLoss

Training Infrastructure

  • Hardware: GPU (Google Colab)
  • Framework: PyTorch + HuggingFace Transformers

Limitations

  • Domain Specificity: Trained specifically on Stack Overflow data
  • Language: English only
  • Tag Coverage: Limited to most frequent tags in training data
  • Context Length: Maximum input length of 330 tokens

Citation

@misc{tag-recommender,
  title={Stack Overflow Tag Recommendation using DistilBERT},
  year={2025},
  howpublished={HuggingFace Model Repository},
  url={https://huggingface.co/bonjourusman/tag-recommender}
}

Generated on: 2025-06-02 04:43:45

Downloads last month
-
Safetensors
Model size
67.3M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for bonjourusman/tag-recommender

Finetuned
(9910)
this model

Evaluation results