Upload folder using huggingface_hub

da6e1f7 verified about 2 months ago

4.27 kB

	---
	library_name: sklearn
	tags:
	- text-classification
	- dependency-detection
	- random-forest
	- nlp
	- query-dependency
	- conversational-ai
	pipeline_tag: text-classification
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	---

	# Query Dependence Classifier

	A Random Forest model that determines whether a second query depends on the context of a first query in conversational AI systems.

	## Model Description

	- Model Type: Random Forest Classifier (scikit-learn)
	- Task: Binary text classification for query dependency detection
	- Features: 45 engineered linguistic features
	- Classes: Independent vs Dependent queries

	## Intended Use

	This model is designed for conversational AI systems to determine if a follow-up question requires context from a previous query.

	Examples:
	- Query 1: "What is machine learning?" Query 2: "Can you give me examples?" → Dependent
	- Query 1: "What is AI?" Query 2: "What's the weather today?" → Independent

	## Model Performance

	- Training Features: 45 engineered features
	- Model Architecture: Random Forest with 500 estimators
	- Cross-validation: Out-of-bag scoring enabled

	## Feature Engineering

	The model uses 45 sophisticated features including:

	### Lexical Features
	- Word overlap and Jaccard similarity
	- N-gram overlap (bigrams, trigrams)
	- Semantic similarity with stemming

	### Linguistic Features
	- Pronoun and reference patterns
	- Question type classification
	- Discourse markers and connectives
	- Dependency phrases detection

	### Structural Features
	- Length ratios and differences
	- Punctuation patterns
	- Complexity measures (syllable density)
	- Capitalization patterns

	## Usage

	```python
	# Install dependencies
	# pip install scikit-learn pandas nltk huggingface-hub joblib

	from huggingface_hub import hf_hub_download
	import joblib
	import json

	# Download model files
	model_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="model.joblib")
	encoder_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="label_encoder.joblib")
	config_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="config.json")

	# Load model components
	model = joblib.load(model_path)
	label_encoder = joblib.load(encoder_path)

	with open(config_path, 'r') as f:
	config = json.load(f)

	# Initialize classifier
	classifier = DependencyClassifier()
	classifier.model = model
	classifier.label_encoder = label_encoder
	classifier.feature_names = config['feature_names']

	# Make predictions
	result = classifier.predict(
	"What is artificial intelligence?",
	"Can you give me some examples?"
	)

	print(f"Prediction: {result['prediction']}")
	print(f"Confidence: {result['confidence']:.3f}")
	print(f"Probabilities: {result['probabilities']}")
	```

	## Alternative Loading Method

	```python
	# Load directly using class method
	classifier = DependencyClassifier.load_from_huggingface_hub("admin-4minds/QUERY-DEPENDENCE-MODEL")

	# Use for inference
	result = classifier.predict("Query 1", "Query 2")
	```

	## Training Data Format

	The model expects training data with columns:
	- `query1`: First query/question
	- `query2`: Second query/question
	- `label`: 'independent' or 'dependent'

	## Model Architecture

	```python
	RandomForestClassifier(
	n_estimators=500,
	max_depth=15,
	min_samples_split=7,
	min_samples_leaf=3,
	max_features='sqrt',
	class_weight='balanced',
	random_state=42
	)
	```

	## Limitations

	- Designed for English language queries
	- Performance may vary on very short queries (< 3 words)
	- Requires NLTK stopwords corpus for optimal performance
	- Best suited for conversational question-answering scenarios

	## Technical Details

	- Framework: scikit-learn
	- Storage Format: joblib (secure alternative to pickle)
	- Configuration: JSON metadata
	- Reproducibility: Fixed random seed (42)

	## Citation

	```bibtex
	@misc{query_dependence_classifier_2025,
	title={Query Dependence Classifier},
	author={Admin-4minds},
	year={2025},
	publisher={Hugging Face},
	url={https://huggingface.co/admin-4minds/QUERY-DEPENDENCE-MODEL}
	}
	```

	## License

	This model is released under the MIT License.

	## Contact

	For questions or issues, please contact the admin-4minds team.