🧠 DistilBERT Mental Health Classifier

This model is a fine-tuned version of distilbert-base-uncased for mental health condition classification. It is trained on a custom dataset containing user statements labeled with categories such as depression, anxiety, PTSD, and more.

🧠 Use Case

This model is designed for:

Early detection of mental health symptoms in user conversations

Clinical research on NLP-based diagnostic support

AI assistants that provide empathetic triage or support

🧪 Performance

The model shows significant improvements after fine-tuning:

Sample Size	Accuracy (Before)	F1 Score (Before)	Accuracy (After)	F1 Score (After)
200 Samples	0.075	0.0142	0.830	0.8267
500 Samples	0.070	0.0141	0.856	0.8544

✅ These results indicate that fine-tuning with a high-quality mental health dataset enables DistilBERT to make informed predictions from free-form user input.

📚 Dataset

The model was fine-tuned on Filtered_Combined_Data.csv, a curated dataset of 42,000+ statements labeled across multiple mental health categories. Each sample includes:

statement — a natural language user message

label — a mental health condition such as "Depression", "Anxiety", or "Healthy"

🏗️ Prompt Format (used during fine-tuning)

text Copy Edit

Instruction:

Classify the mental health condition in the following statement.

Input: {text}

Response: {label} This instruction format aligns the classifier with instruction-tuned language models.

🧠 Labels Covered

The model classifies input statements into the following mental health categories (example):

Anxiety
Depression
PTSD
OCD
Bipolar Disorder
ADHD
Healthy
Others (as labeled in dataset)

⚙️ Training Configuration

Base Model: distilbert-base-uncased
Epochs: 3
Total Steps: ~36,500
Batch Size: 16
Max Length: 512
Quantization: None
Learning Rate: 2e-5
Optimizer: AdamW
Evaluation: Accuracy, Weighted F1

📂 Model Files

pytorch_model.bin — fine-tuned model weights
tokenizer_config.json, vocab.txt, etc. — tokenizer files
config.json — architecture and label mapping
README.md — this file

📄 License

This model is licensed under the MIT License — free for personal, academic, and commercial use with attribution.

🙋 Author

Developed by Dileep Reddy Suram
📍 For multimodal clinical AI assistant research and PhD preparation
🔗 Hugging Face Profile

🚀 Citation

If you use this model, please cite:

📦 How to Use (Quick Start)

from transformers import pipeline

classifier = pipeline("text-classification", model="dsuram/distilbert-mentalhealth-classifier")
classifier("I feel anxious all the time and can't concentrate.")
---
🧪 Inference (Advanced)
You can also use the tokenizer + model directly:


from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

#### Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("dsuram/distilbert-mentalhealth-classifier")
tokenizer = AutoTokenizer.from_pretrained("dsuram/distilbert-mentalhealth-classifier")

# Input text
text = "I feel lost, hopeless, and don't see a way out."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
logits = outputs.logits
predicted_class_id = torch.argmax(logits, dim=1).item()

# Map to label
label_map = model.config.id2label
print(f"Predicted label: {label_map[predicted_class_id]}")
---

dsuram
/

distilbert-mentalhealth-classifier