|
--- |
|
library_name: sklearn |
|
tags: |
|
- text-classification |
|
- dependency-detection |
|
- random-forest |
|
- nlp |
|
- query-dependency |
|
- conversational-ai |
|
pipeline_tag: text-classification |
|
metrics: |
|
- accuracy |
|
- f1 |
|
- precision |
|
- recall |
|
--- |
|
|
|
# Query Dependence Classifier |
|
|
|
A Random Forest model that determines whether a second query depends on the context of a first query in conversational AI systems. |
|
|
|
## Model Description |
|
|
|
- **Model Type:** Random Forest Classifier (scikit-learn) |
|
- **Task:** Binary text classification for query dependency detection |
|
- **Features:** 45 engineered linguistic features |
|
- **Classes:** Independent vs Dependent queries |
|
|
|
## Intended Use |
|
|
|
This model is designed for conversational AI systems to determine if a follow-up question requires context from a previous query. |
|
|
|
**Examples:** |
|
- Query 1: "What is machine learning?" Query 2: "Can you give me examples?" → **Dependent** |
|
- Query 1: "What is AI?" Query 2: "What's the weather today?" → **Independent** |
|
|
|
## Model Performance |
|
|
|
- **Training Features:** 45 engineered features |
|
- **Model Architecture:** Random Forest with 500 estimators |
|
- **Cross-validation:** Out-of-bag scoring enabled |
|
|
|
## Feature Engineering |
|
|
|
The model uses 45 sophisticated features including: |
|
|
|
### Lexical Features |
|
- Word overlap and Jaccard similarity |
|
- N-gram overlap (bigrams, trigrams) |
|
- Semantic similarity with stemming |
|
|
|
### Linguistic Features |
|
- Pronoun and reference patterns |
|
- Question type classification |
|
- Discourse markers and connectives |
|
- Dependency phrases detection |
|
|
|
### Structural Features |
|
- Length ratios and differences |
|
- Punctuation patterns |
|
- Complexity measures (syllable density) |
|
- Capitalization patterns |
|
|
|
## Usage |
|
|
|
```python |
|
# Install dependencies |
|
# pip install scikit-learn pandas nltk huggingface-hub joblib |
|
|
|
from huggingface_hub import hf_hub_download |
|
import joblib |
|
import json |
|
|
|
# Download model files |
|
model_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="model.joblib") |
|
encoder_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="label_encoder.joblib") |
|
config_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="config.json") |
|
|
|
# Load model components |
|
model = joblib.load(model_path) |
|
label_encoder = joblib.load(encoder_path) |
|
|
|
with open(config_path, 'r') as f: |
|
config = json.load(f) |
|
|
|
# Initialize classifier |
|
classifier = DependencyClassifier() |
|
classifier.model = model |
|
classifier.label_encoder = label_encoder |
|
classifier.feature_names = config['feature_names'] |
|
|
|
# Make predictions |
|
result = classifier.predict( |
|
"What is artificial intelligence?", |
|
"Can you give me some examples?" |
|
) |
|
|
|
print(f"Prediction: {result['prediction']}") |
|
print(f"Confidence: {result['confidence']:.3f}") |
|
print(f"Probabilities: {result['probabilities']}") |
|
``` |
|
|
|
## Alternative Loading Method |
|
|
|
```python |
|
# Load directly using class method |
|
classifier = DependencyClassifier.load_from_huggingface_hub("admin-4minds/QUERY-DEPENDENCE-MODEL") |
|
|
|
# Use for inference |
|
result = classifier.predict("Query 1", "Query 2") |
|
``` |
|
|
|
## Training Data Format |
|
|
|
The model expects training data with columns: |
|
- `query1`: First query/question |
|
- `query2`: Second query/question |
|
- `label`: 'independent' or 'dependent' |
|
|
|
## Model Architecture |
|
|
|
```python |
|
RandomForestClassifier( |
|
n_estimators=500, |
|
max_depth=15, |
|
min_samples_split=7, |
|
min_samples_leaf=3, |
|
max_features='sqrt', |
|
class_weight='balanced', |
|
random_state=42 |
|
) |
|
``` |
|
|
|
## Limitations |
|
|
|
- Designed for English language queries |
|
- Performance may vary on very short queries (< 3 words) |
|
- Requires NLTK stopwords corpus for optimal performance |
|
- Best suited for conversational question-answering scenarios |
|
|
|
## Technical Details |
|
|
|
- **Framework:** scikit-learn |
|
- **Storage Format:** joblib (secure alternative to pickle) |
|
- **Configuration:** JSON metadata |
|
- **Reproducibility:** Fixed random seed (42) |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{query_dependence_classifier_2025, |
|
title={Query Dependence Classifier}, |
|
author={Admin-4minds}, |
|
year={2025}, |
|
publisher={Hugging Face}, |
|
url={https://huggingface.co/admin-4minds/QUERY-DEPENDENCE-MODEL} |
|
} |
|
``` |
|
|
|
## License |
|
|
|
This model is released under the MIT License. |
|
|
|
## Contact |
|
|
|
For questions or issues, please contact the admin-4minds team. |
|
|