Stackoverflow Tag Recommender
Model Description
This model is a fine-tuned version of distilbert-base-uncased for multi-label text classification on Stack Overflow Posts. It predicts relevant tags for programming related posts based on the title and content.
Model Details
- Model Type: Multi-label Text Classification
- Base Model: distilbert-base-uncased
- Language: English
- Number of Labels: 20
- Framework: PyTorch + Transformers
- License: Apache 2.0
Performance
Metric | Value |
---|---|
Micro F1 | 0.583 |
Macro F1 | 0.590 |
Subset Accuracy | 0.165 |
Hamming Loss | 0.092 |
Available Tags
The model can predict the following 20 tags:
c#
, java
, javascript
, jquery
, ios
, .net
, php
, html
, c++
, iphone
, android
, objective-c
, asp.net
, python
, sql
, mysql
, css
, ajax
, c
, database
Usage
Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("tag-recommender")
model = AutoModelForSequenceClassification.from_pretrained("tag-recommender")
# Example prediction
text = "How do I connect to a MySQL database using Python?"
inputs = tokenizer(text, return_tensors="pt", max_length=384, truncation=True)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.sigmoid(outputs.logits)
# Get predictions above threshold
threshold = 0.5
predicted_indices = (predictions > threshold).nonzero(as_tuple=True)[1]
print(f"Predicted tag indices: {predicted_indices.tolist()}")
Training Details
Training Configuration
- Epochs: 50
- Batch Size: 8
- Learning Rate: 5e-06
- Max Sequence Length: 330
- Optimizer: AdamW
- Loss Function: BCEWithLogitsLoss
Training Infrastructure
- Hardware: GPU (Google Colab)
- Framework: PyTorch + HuggingFace Transformers
Limitations
- Domain Specificity: Trained specifically on Stack Overflow data
- Language: English only
- Tag Coverage: Limited to most frequent tags in training data
- Context Length: Maximum input length of 330 tokens
Citation
@misc{tag-recommender,
title={Stack Overflow Tag Recommendation using DistilBERT},
year={2025},
howpublished={HuggingFace Model Repository},
url={https://huggingface.co/bonjourusman/tag-recommender}
}
Generated on: 2025-06-02 04:43:45
- Downloads last month
- -
Model tree for bonjourusman/tag-recommender
Base model
distilbert/distilbert-base-uncasedEvaluation results
- Micro F1 on Stack Overflow Postsself-reported0.583
- Macro F1 on Stack Overflow Postsself-reported0.590
- Subset Accuracy on Stack Overflow Postsself-reported0.165
- Hamming Loss on Stack Overflow Postsself-reported0.092