DistilBERT Fine-Tuned on Yahoo Answers Topics

This is a fine-tuned DistilBERT model for topic classification on the Yahoo Answers Topics dataset. It classifies questions into one of 10 predefined categories like "Science & Mathematics", "Health", "Business & Finance", etc.

🧠 Model Details

  • Base model: distilbert-base-uncased
  • Task: Multi-class Text Classification (10 classes)
  • Dataset: Yahoo Answers Topics
  • Training samples: 50,000 (subset)
  • Evaluation samples: 5,000 (subset)
  • Metrics: Accuracy

πŸ§ͺ How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("Koushim/distilbert-yahoo-answers")
model = AutoModelForSequenceClassification.from_pretrained("Koushim/distilbert-yahoo-answers")

text = "How do I improve my math skills for competitive exams?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)

predicted_class = outputs.logits.argmax(dim=1).item()
print("Predicted class:", predicted_class)

πŸ“Š Classes (Labels)

  1. Society & Culture
  2. Science & Mathematics
  3. Health
  4. Education & Reference
  5. Computers & Internet
  6. Sports
  7. Business & Finance
  8. Entertainment & Music
  9. Family & Relationships
  10. Politics & Government

πŸ“¦ Training Details

  • Optimizer: AdamW
  • Learning rate: 2e-5
  • Batch size: 16 (train), 32 (eval)
  • Epochs: 3
  • Weight decay: 0.01
  • Framework: PyTorch + πŸ€— Transformers

πŸ“ Repository Structure

  • config.json – Model config
  • pytorch_model.bin – Trained model weights
  • tokenizer.json, vocab.txt – Tokenizer files

✍️ Author

  • Hugging Face Hub: Koushim
  • Model trained using transformers.Trainer API

πŸ“„ License

Apache 2.0


Downloads last month
6
Safetensors
Model size
67M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Koushim/distilbert-yahoo-answers-topic-classifier

Evaluation results