File size: 6,584 Bytes
e288dc3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1c8072b
e288dc3
1c8072b
 
 
 
 
 
e288dc3
1c8072b
 
 
 
 
 
 
 
 
 
e288dc3
 
 
 
 
 
 
1c8072b
 
e288dc3
 
 
 
 
 
 
 
 
 
 
 
 
f395d63
e288dc3
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
---
language: vi
tags:
- ner
- named-entity-recognition
- slot-filling
- smart-home
- vietnamese
- phobert
- token-classification
license: mit
datasets:
- custom-vn-slu-augmented
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: PhoBERT NER for Vietnamese Smart Home Slot Filling
  results:
  - task:
      type: token-classification
      name: Named Entity Recognition
    dataset:
      name: VN-SLU Augmented Dataset
      type: custom
    metrics:
    - type: accuracy
      value: 96.64
      name: Accuracy
    - type: f1
      value: 86.55
      name: F1 Score (Weighted)
    - type: f1
      value: 67.04
      name: F1 Score (Macro)
widget:
- text: "bật đèn phòng khách"
- text: "tắt quạt phòng ngủ lúc 10 giờ tối"
- text: "điều chỉnh nhiệt độ điều hòa 25 độ"
- text: "mở cửa garage sau 5 phút"
---

# PhoBERT Fine-tuned for Vietnamese Smart Home NER/Slot Filling

This model is a fine-tuned version of [vinai/phobert-base](https://huggingface.co/vinai/phobert-base) for Named Entity Recognition (NER) in Vietnamese smart home commands. It extracts slot values such as devices, locations, times, and numeric values from user commands.

## Model Description

- **Base Model**: vinai/phobert-base
- **Task**: Token Classification / Slot Filling for Smart Home Commands
- **Language**: Vietnamese
- **Number of Entity Types**: 13

## Intended Uses & Limitations

### Intended Uses
- Extracting entities from Vietnamese smart home voice commands
- Slot filling for voice assistant systems
- Integration with intent classification for complete NLU pipeline
- Research in Vietnamese NLP for IoT applications

### Limitations
- Optimized specifically for smart home domain
- May not generalize well to other domains
- Trained on Vietnamese language only
- Best performance when used with corresponding intent classifier

## Entity Types (Slot Labels)

The model recognizes 13 types of entities:

1. `B-device` / `I-device` - Device names (e.g., "đèn", "quạt", "điều hòa")
2. `B-living_space` / `I-living_space` - Room/location names (e.g., "phòng khách", "phòng ngủ")
3. `B-time_at` / `I-time_at` - Specific times (e.g., "10 giờ tối", "7 giờ sáng")
4. `B-duration` / `I-duration` - Time durations (e.g., "5 phút", "2 giờ")
5. `B-target_number` / `I-target_number` - Target values (e.g., "25 độ", "50%")
6. `B-changing_value` / `I-changing_value` - Change amounts (e.g., "tăng 10%")
7. `O` - Outside/No entity

## How to Use

### Using Transformers Library

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
import json

# Load model and tokenizer
model_name = "ntgiaky/phobert-ner-smart-home"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Load label mappings
with open('label_mappings.json', 'r') as f:
    label_mappings = json.load(f)
    id2label = {int(k): v for k, v in label_mappings['id2label'].items()}

def extract_entities(text):
    # Tokenize
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
    
    # Predict
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.argmax(outputs.logits, dim=2)
    
    # Extract entities
    entities = []
    current_entity = None
    current_tokens = []
    
    for token, pred_id in zip(tokens, predictions[0]):
        label = id2label[pred_id.item()]
        
        if label.startswith('B-'):
            # Save previous entity if exists
            if current_entity:
                entities.append({
                    'type': current_entity,
                    'text': tokenizer.convert_tokens_to_string(current_tokens)
                })
            # Start new entity
            current_entity = label[2:]
            current_tokens = [token]
        elif label.startswith('I-') and current_entity == label[2:]:
            # Continue current entity
            current_tokens.append(token)
        else:
            # End current entity
            if current_entity:
                entities.append({
                    'type': current_entity,
                    'text': tokenizer.convert_tokens_to_string(current_tokens)
                })
            current_entity = None
            current_tokens = []
    
    # Don't forget last entity
    if current_entity:
        entities.append({
            'type': current_entity,
            'text': tokenizer.convert_tokens_to_string(current_tokens)
        })
    
    return entities

# Example usage
text = "bật đèn phòng khách lúc 7 giờ tối"
entities = extract_entities(text)
print(f"Input: {text}")
print(f"Entities: {entities}")
```

### Using Pipeline

```python
from transformers import pipeline

# Load NER pipeline
ner = pipeline(
    "token-classification",
    model="ntgiaky/phobert-ner-smart-home",
    aggregation_strategy="simple"
)

# Extract entities
result = ner("tắt quạt phòng ngủ sau 10 phút")
print(result)
```

## Integration with Intent Classification

For a complete NLU pipeline:

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

# Load with PhoBERT tokenizer explicitly
tokenizer = AutoTokenizer.from_pretrained("vinai/phobert-base")
model = AutoModelForTokenClassification.from_pretrained(
    "ntgiaky/phobert-ner-smart-home",
    ignore_mismatched_sizes=True  # Add this if needed
)

# Create pipeline with explicit tokenizer
ner = pipeline(
    "token-classification",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="simple"
)

# Test
result = ner("bật đèn phòng khách")
print(result)
```

## Example Outputs

```python
# Input: "bật đèn phòng khách"
# [{'entity_group': 'living_space', 'score': np.float32(0.97212785), 'word': 'đèn', 'start': None, 'end': None},
# {'entity_group': 'duration', 'score': np.float32(0.9332844), 'word': 'phòng khách', 'start': None, 'end': None}]
```

## Citation

If you use this model, please cite:

```bibtex
@misc{phobert-ner-smart-home-2025,
  author = {Trần Quang Huy and Nguyễn Trần Gia Kỳ},
  title = {PhoBERT Fine-tuned for Vietnamese Smart Home NER},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/ntgiaky/ner-smart-home}}
}
```

## Authors

- **Trần Quang Huy** 
- **Nguyễn Trần Gia Kỳ** 

## License

This model is released under the MIT License.