admin-4minds's picture
Upload folder using huggingface_hub
da6e1f7 verified
---
library_name: sklearn
tags:
- text-classification
- dependency-detection
- random-forest
- nlp
- query-dependency
- conversational-ai
pipeline_tag: text-classification
metrics:
- accuracy
- f1
- precision
- recall
---
# Query Dependence Classifier
A Random Forest model that determines whether a second query depends on the context of a first query in conversational AI systems.
## Model Description
- **Model Type:** Random Forest Classifier (scikit-learn)
- **Task:** Binary text classification for query dependency detection
- **Features:** 45 engineered linguistic features
- **Classes:** Independent vs Dependent queries
## Intended Use
This model is designed for conversational AI systems to determine if a follow-up question requires context from a previous query.
**Examples:**
- Query 1: "What is machine learning?" Query 2: "Can you give me examples?" → **Dependent**
- Query 1: "What is AI?" Query 2: "What's the weather today?" → **Independent**
## Model Performance
- **Training Features:** 45 engineered features
- **Model Architecture:** Random Forest with 500 estimators
- **Cross-validation:** Out-of-bag scoring enabled
## Feature Engineering
The model uses 45 sophisticated features including:
### Lexical Features
- Word overlap and Jaccard similarity
- N-gram overlap (bigrams, trigrams)
- Semantic similarity with stemming
### Linguistic Features
- Pronoun and reference patterns
- Question type classification
- Discourse markers and connectives
- Dependency phrases detection
### Structural Features
- Length ratios and differences
- Punctuation patterns
- Complexity measures (syllable density)
- Capitalization patterns
## Usage
```python
# Install dependencies
# pip install scikit-learn pandas nltk huggingface-hub joblib
from huggingface_hub import hf_hub_download
import joblib
import json
# Download model files
model_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="model.joblib")
encoder_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="label_encoder.joblib")
config_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="config.json")
# Load model components
model = joblib.load(model_path)
label_encoder = joblib.load(encoder_path)
with open(config_path, 'r') as f:
config = json.load(f)
# Initialize classifier
classifier = DependencyClassifier()
classifier.model = model
classifier.label_encoder = label_encoder
classifier.feature_names = config['feature_names']
# Make predictions
result = classifier.predict(
"What is artificial intelligence?",
"Can you give me some examples?"
)
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.3f}")
print(f"Probabilities: {result['probabilities']}")
```
## Alternative Loading Method
```python
# Load directly using class method
classifier = DependencyClassifier.load_from_huggingface_hub("admin-4minds/QUERY-DEPENDENCE-MODEL")
# Use for inference
result = classifier.predict("Query 1", "Query 2")
```
## Training Data Format
The model expects training data with columns:
- `query1`: First query/question
- `query2`: Second query/question
- `label`: 'independent' or 'dependent'
## Model Architecture
```python
RandomForestClassifier(
n_estimators=500,
max_depth=15,
min_samples_split=7,
min_samples_leaf=3,
max_features='sqrt',
class_weight='balanced',
random_state=42
)
```
## Limitations
- Designed for English language queries
- Performance may vary on very short queries (< 3 words)
- Requires NLTK stopwords corpus for optimal performance
- Best suited for conversational question-answering scenarios
## Technical Details
- **Framework:** scikit-learn
- **Storage Format:** joblib (secure alternative to pickle)
- **Configuration:** JSON metadata
- **Reproducibility:** Fixed random seed (42)
## Citation
```bibtex
@misc{query_dependence_classifier_2025,
title={Query Dependence Classifier},
author={Admin-4minds},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/admin-4minds/QUERY-DEPENDENCE-MODEL}
}
```
## License
This model is released under the MIT License.
## Contact
For questions or issues, please contact the admin-4minds team.