--- language: vi tags: - ner - named-entity-recognition - slot-filling - smart-home - vietnamese - phobert - token-classification license: mit datasets: - custom-vn-slu-augmented metrics: - accuracy - f1 - precision - recall model-index: - name: PhoBERT NER for Vietnamese Smart Home Slot Filling results: - task: type: token-classification name: Named Entity Recognition dataset: name: VN-SLU Augmented Dataset type: custom metrics: - type: accuracy value: 96.64 name: Accuracy - type: f1 value: 86.55 name: F1 Score (Weighted) - type: f1 value: 67.04 name: F1 Score (Macro) widget: - text: "bật đèn phòng khách" - text: "tắt quạt phòng ngủ lúc 10 giờ tối" - text: "điều chỉnh nhiệt độ điều hòa 25 độ" - text: "mở cửa garage sau 5 phút" --- # PhoBERT Fine-tuned for Vietnamese Smart Home NER/Slot Filling This model is a fine-tuned version of [vinai/phobert-base](https://huggingface.co/vinai/phobert-base) for Named Entity Recognition (NER) in Vietnamese smart home commands. It extracts slot values such as devices, locations, times, and numeric values from user commands. ## Model Description - **Base Model**: vinai/phobert-base - **Task**: Token Classification / Slot Filling for Smart Home Commands - **Language**: Vietnamese - **Number of Entity Types**: 13 ## Intended Uses & Limitations ### Intended Uses - Extracting entities from Vietnamese smart home voice commands - Slot filling for voice assistant systems - Integration with intent classification for complete NLU pipeline - Research in Vietnamese NLP for IoT applications ### Limitations - Optimized specifically for smart home domain - May not generalize well to other domains - Trained on Vietnamese language only - Best performance when used with corresponding intent classifier ## Entity Types (Slot Labels) The model recognizes 13 types of entities: 1. `B-device` / `I-device` - Device names (e.g., "đèn", "quạt", "điều hòa") 2. `B-living_space` / `I-living_space` - Room/location names (e.g., "phòng khách", "phòng ngủ") 3. `B-time_at` / `I-time_at` - Specific times (e.g., "10 giờ tối", "7 giờ sáng") 4. `B-duration` / `I-duration` - Time durations (e.g., "5 phút", "2 giờ") 5. `B-target_number` / `I-target_number` - Target values (e.g., "25 độ", "50%") 6. `B-changing_value` / `I-changing_value` - Change amounts (e.g., "tăng 10%") 7. `O` - Outside/No entity ## How to Use ### Using Transformers Library ```python from transformers import AutoTokenizer, AutoModelForTokenClassification import torch import json # Load model and tokenizer model_name = "ntgiaky/phobert-ner-smart-home" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForTokenClassification.from_pretrained(model_name) # Load label mappings with open('label_mappings.json', 'r') as f: label_mappings = json.load(f) id2label = {int(k): v for k, v in label_mappings['id2label'].items()} def extract_entities(text): # Tokenize inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True) tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]) # Predict with torch.no_grad(): outputs = model(**inputs) predictions = torch.argmax(outputs.logits, dim=2) # Extract entities entities = [] current_entity = None current_tokens = [] for token, pred_id in zip(tokens, predictions[0]): label = id2label[pred_id.item()] if label.startswith('B-'): # Save previous entity if exists if current_entity: entities.append({ 'type': current_entity, 'text': tokenizer.convert_tokens_to_string(current_tokens) }) # Start new entity current_entity = label[2:] current_tokens = [token] elif label.startswith('I-') and current_entity == label[2:]: # Continue current entity current_tokens.append(token) else: # End current entity if current_entity: entities.append({ 'type': current_entity, 'text': tokenizer.convert_tokens_to_string(current_tokens) }) current_entity = None current_tokens = [] # Don't forget last entity if current_entity: entities.append({ 'type': current_entity, 'text': tokenizer.convert_tokens_to_string(current_tokens) }) return entities # Example usage text = "bật đèn phòng khách lúc 7 giờ tối" entities = extract_entities(text) print(f"Input: {text}") print(f"Entities: {entities}") ``` ### Using Pipeline ```python from transformers import pipeline # Load NER pipeline ner = pipeline( "token-classification", model="ntgiaky/phobert-ner-smart-home", aggregation_strategy="simple" ) # Extract entities result = ner("tắt quạt phòng ngủ sau 10 phút") print(result) ``` ## Integration with Intent Classification For a complete NLU pipeline: ```python from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline # Load with PhoBERT tokenizer explicitly tokenizer = AutoTokenizer.from_pretrained("vinai/phobert-base") model = AutoModelForTokenClassification.from_pretrained( "ntgiaky/phobert-ner-smart-home", ignore_mismatched_sizes=True # Add this if needed ) # Create pipeline with explicit tokenizer ner = pipeline( "token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple" ) # Test result = ner("bật đèn phòng khách") print(result) ``` ## Example Outputs ```python # Input: "bật đèn phòng khách" # [{'entity_group': 'living_space', 'score': np.float32(0.97212785), 'word': 'đèn', 'start': None, 'end': None}, # {'entity_group': 'duration', 'score': np.float32(0.9332844), 'word': 'phòng khách', 'start': None, 'end': None}] ``` ## Citation If you use this model, please cite: ```bibtex @misc{phobert-ner-smart-home-2025, author = {Trần Quang Huy and Nguyễn Trần Gia Kỳ}, title = {PhoBERT Fine-tuned for Vietnamese Smart Home NER}, year = {2025}, publisher = {Hugging Face}, journal = {Hugging Face Model Hub}, howpublished = {\url{https://huggingface.co/ntgiaky/ner-smart-home}} } ``` ## Authors - **Trần Quang Huy** - **Nguyễn Trần Gia Kỳ** ## License This model is released under the MIT License.