File size: 6,584 Bytes

---
language: vi
tags:
- ner
- named-entity-recognition
- slot-filling
- smart-home
- vietnamese
- phobert
- token-classification
license: mit
datasets:
- custom-vn-slu-augmented
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: PhoBERT NER for Vietnamese Smart Home Slot Filling
  results:
  - task:
      type: token-classification
      name: Named Entity Recognition
    dataset:
      name: VN-SLU Augmented Dataset
      type: custom
    metrics:
    - type: accuracy
      value: 96.64
      name: Accuracy
    - type: f1
      value: 86.55
      name: F1 Score (Weighted)
    - type: f1
      value: 67.04
      name: F1 Score (Macro)
widget:
- text: "bật đèn phòng khách"
- text: "tắt quạt phòng ngủ lúc 10 giờ tối"
- text: "điều chỉnh nhiệt độ điều hòa 25 độ"
- text: "mở cửa garage sau 5 phút"
---

# PhoBERT Fine-tuned for Vietnamese Smart Home NER/Slot Filling

This model is a fine-tuned version of [vinai/phobert-base](https://huggingface.co/vinai/phobert-base) for Named Entity Recognition (NER) in Vietnamese smart home commands. It extracts slot values such as devices, locations, times, and numeric values from user commands.

## Model Description

- **Base Model**: vinai/phobert-base
- **Task**: Token Classification / Slot Filling for Smart Home Commands
- **Language**: Vietnamese
- **Number of Entity Types**: 13

## Intended Uses & Limitations

### Intended Uses
- Extracting entities from Vietnamese smart home voice commands
- Slot filling for voice assistant systems
- Integration with intent classification for complete NLU pipeline
- Research in Vietnamese NLP for IoT applications

### Limitations
- Optimized specifically for smart home domain
- May not generalize well to other domains
- Trained on Vietnamese language only
- Best performance when used with corresponding intent classifier

## Entity Types (Slot Labels)

The model recognizes 13 types of entities:

1. `B-device` / `I-device` - Device names (e.g., "đèn", "quạt", "điều hòa")
2. `B-living_space` / `I-living_space` - Room/location names (e.g., "phòng khách", "phòng ngủ")
3. `B-time_at` / `I-time_at` - Specific times (e.g., "10 giờ tối", "7 giờ sáng")
4. `B-duration` / `I-duration` - Time durations (e.g., "5 phút", "2 giờ")
5. `B-target_number` / `I-target_number` - Target values (e.g., "25 độ", "50%")
6. `B-changing_value` / `I-changing_value` - Change amounts (e.g., "tăng 10%")
7. `O` - Outside/No entity

## How to Use

### Using Transformers Library

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
import json

# Load model and tokenizer
model_name = "ntgiaky/phobert-ner-smart-home"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Load label mappings
with open('label_mappings.json', 'r') as f:
    label_mappings = json.load(f)
    id2label = {int(k): v for k, v in label_mappings['id2label'].items()}

def extract_entities(text):
    # Tokenize
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
    
    # Predict
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.argmax(outputs.logits, dim=2)
    
    # Extract entities
    entities = []
    current_entity = None
    current_tokens = []
    
    for token, pred_id in zip(tokens, predictions[0]):
        label = id2label[pred_id.item()]
        
        if label.startswith('B-'):
            # Save previous entity if exists
            if current_entity:
                entities.append({
                    'type': current_entity,
                    'text': tokenizer.convert_tokens_to_string(current_tokens)
                })
            # Start new entity
            current_entity = label[2:]
            current_tokens = [token]
        elif label.startswith('I-') and current_entity == label[2:]:
            # Continue current entity
            current_tokens.append(token)
        else:
            # End current entity
            if current_entity:
                entities.append({
                    'type': current_entity,
                    'text': tokenizer.convert_tokens_to_string(current_tokens)
                })
            current_entity = None
            current_tokens = []
    
    # Don't forget last entity
    if current_entity:
        entities.append({
            'type': current_entity,
            'text': tokenizer.convert_tokens_to_string(current_tokens)
        })
    
    return entities

# Example usage
text = "bật đèn phòng khách lúc 7 giờ tối"
entities = extract_entities(text)
print(f"Input: {text}")
print(f"Entities: {entities}")
```

### Using Pipeline

```python
from transformers import pipeline

# Load NER pipeline
ner = pipeline(
    "token-classification",
    model="ntgiaky/phobert-ner-smart-home",
    aggregation_strategy="simple"
)

# Extract entities
result = ner("tắt quạt phòng ngủ sau 10 phút")
print(result)
```

## Integration with Intent Classification

For a complete NLU pipeline:

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

# Load with PhoBERT tokenizer explicitly
tokenizer = AutoTokenizer.from_pretrained("vinai/phobert-base")
model = AutoModelForTokenClassification.from_pretrained(
    "ntgiaky/phobert-ner-smart-home",
    ignore_mismatched_sizes=True  # Add this if needed
)

# Create pipeline with explicit tokenizer
ner = pipeline(
    "token-classification",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="simple"
)

# Test
result = ner("bật đèn phòng khách")
print(result)
```

## Example Outputs

```python
# Input: "bật đèn phòng khách"
# [{'entity_group': 'living_space', 'score': np.float32(0.97212785), 'word': 'đèn', 'start': None, 'end': None},
# {'entity_group': 'duration', 'score': np.float32(0.9332844), 'word': 'phòng khách', 'start': None, 'end': None}]
```

## Citation

If you use this model, please cite:

```bibtex
@misc{phobert-ner-smart-home-2025,
  author = {Trần Quang Huy and Nguyễn Trần Gia Kỳ},
  title = {PhoBERT Fine-tuned for Vietnamese Smart Home NER},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/ntgiaky/ner-smart-home}}
}
```

## Authors

- **Trần Quang Huy** 
- **Nguyễn Trần Gia Kỳ** 

## License

This model is released under the MIT License.