ner-smart-home / README.md
ntgiaky's picture
Update README.md
5f1cb91 verified
---
language: vi
tags:
- ner
- named-entity-recognition
- slot-filling
- smart-home
- vietnamese
- phobert
- token-classification
license: mit
datasets:
- custom-vn-slu-augmented
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: PhoBERT NER for Vietnamese Smart Home Slot Filling
results:
- task:
type: token-classification
name: Named Entity Recognition
dataset:
name: VN-SLU Augmented Dataset
type: custom
metrics:
- type: accuracy
value: 96.64
name: Accuracy
- type: f1
value: 86.55
name: F1 Score (Weighted)
- type: f1
value: 67.04
name: F1 Score (Macro)
widget:
- text: "bật đèn phòng khách"
- text: "tắt quạt phòng ngủ lúc 10 giờ tối"
- text: "điều chỉnh nhiệt độ điều hòa 25 độ"
- text: "mở cửa garage sau 5 phút"
---
# PhoBERT Fine-tuned for Vietnamese Smart Home NER/Slot Filling
This model is a fine-tuned version of [vinai/phobert-base](https://huggingface.co/vinai/phobert-base) for Named Entity Recognition (NER) in Vietnamese smart home commands. It extracts slot values such as devices, locations, times, and numeric values from user commands.
## Model Description
- **Base Model**: vinai/phobert-base
- **Task**: Token Classification / Slot Filling for Smart Home Commands
- **Language**: Vietnamese
- **Number of Entity Types**: 13
## Intended Uses & Limitations
### Intended Uses
- Extracting entities from Vietnamese smart home voice commands
- Slot filling for voice assistant systems
- Integration with intent classification for complete NLU pipeline
- Research in Vietnamese NLP for IoT applications
### Limitations
- Optimized specifically for smart home domain
- May not generalize well to other domains
- Trained on Vietnamese language only
- Best performance when used with corresponding intent classifier
## Entity Types (Slot Labels)
The model recognizes 13 types of entities:
1. `B-device` / `I-device` - Device names (e.g., "đèn", "quạt", "điều hòa")
2. `B-living_space` / `I-living_space` - Room/location names (e.g., "phòng khách", "phòng ngủ")
3. `B-time_at` / `I-time_at` - Specific times (e.g., "10 giờ tối", "7 giờ sáng")
4. `B-duration` / `I-duration` - Time durations (e.g., "5 phút", "2 giờ")
5. `B-target_number` / `I-target_number` - Target values (e.g., "25 độ", "50%")
6. `B-changing_value` / `I-changing_value` - Change amounts (e.g., "tăng 10%")
7. `O` - Outside/No entity
## How to Use
### Using Transformers Library
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
import json
# Load model and tokenizer
model_name = "ntgiaky/phobert-ner-smart-home"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Load label mappings
with open('label_mappings.json', 'r') as f:
label_mappings = json.load(f)
id2label = {int(k): v for k, v in label_mappings['id2label'].items()}
def extract_entities(text):
# Tokenize
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
# Predict
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
# Extract entities
entities = []
current_entity = None
current_tokens = []
for token, pred_id in zip(tokens, predictions[0]):
label = id2label[pred_id.item()]
if label.startswith('B-'):
# Save previous entity if exists
if current_entity:
entities.append({
'type': current_entity,
'text': tokenizer.convert_tokens_to_string(current_tokens)
})
# Start new entity
current_entity = label[2:]
current_tokens = [token]
elif label.startswith('I-') and current_entity == label[2:]:
# Continue current entity
current_tokens.append(token)
else:
# End current entity
if current_entity:
entities.append({
'type': current_entity,
'text': tokenizer.convert_tokens_to_string(current_tokens)
})
current_entity = None
current_tokens = []
# Don't forget last entity
if current_entity:
entities.append({
'type': current_entity,
'text': tokenizer.convert_tokens_to_string(current_tokens)
})
return entities
# Example usage
text = "bật đèn phòng khách lúc 7 giờ tối"
entities = extract_entities(text)
print(f"Input: {text}")
print(f"Entities: {entities}")
```
### Using Pipeline
```python
from transformers import pipeline
# Load NER pipeline
ner = pipeline(
"token-classification",
model="ntgiaky/phobert-ner-smart-home",
aggregation_strategy="simple"
)
# Extract entities
result = ner("tắt quạt phòng ngủ sau 10 phút")
print(result)
```
## Integration with Intent Classification
For a complete NLU pipeline:
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
# Load with PhoBERT tokenizer explicitly
tokenizer = AutoTokenizer.from_pretrained("vinai/phobert-base")
model = AutoModelForTokenClassification.from_pretrained(
"ntgiaky/phobert-ner-smart-home",
ignore_mismatched_sizes=True # Add this if needed
)
# Create pipeline with explicit tokenizer
ner = pipeline(
"token-classification",
model=model,
tokenizer=tokenizer,
aggregation_strategy="simple"
)
# Test
result = ner("bật đèn phòng khách")
print(result)
```
## Example Outputs
```python
# Input: "bật đèn phòng khách"
# [{'entity_group': 'living_space', 'score': np.float32(0.97212785), 'word': 'đèn', 'start': None, 'end': None},
# {'entity_group': 'duration', 'score': np.float32(0.9332844), 'word': 'phòng khách', 'start': None, 'end': None}]
```
## Citation
If you use this model, please cite:
```bibtex
@misc{phobert-ner-smart-home-2025,
author = {Trần Quang Huy and Nguyễn Trần Gia Kỳ},
title = {PhoBERT Fine-tuned for Vietnamese Smart Home NER},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/ntgiaky/ner-smart-home}}
}
```
## Authors
- **Trần Quang Huy**
- **Nguyễn Trần Gia Kỳ**
## License
This model is released under the MIT License.