ner-smart-home / README.md

Update README.md

5f1cb91 verified 3 months ago

6.58 kB

	---
	language: vi
	tags:
	- ner
	- named-entity-recognition
	- slot-filling
	- smart-home
	- vietnamese
	- phobert
	- token-classification
	license: mit
	datasets:
	- custom-vn-slu-augmented
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	model-index:
	- name: PhoBERT NER for Vietnamese Smart Home Slot Filling
	results:
	- task:
	type: token-classification
	name: Named Entity Recognition
	dataset:
	name: VN-SLU Augmented Dataset
	type: custom
	metrics:
	- type: accuracy
	value: 96.64
	name: Accuracy
	- type: f1
	value: 86.55
	name: F1 Score (Weighted)
	- type: f1
	value: 67.04
	name: F1 Score (Macro)
	widget:
	- text: "bật đèn phòng khách"
	- text: "tắt quạt phòng ngủ lúc 10 giờ tối"
	- text: "điều chỉnh nhiệt độ điều hòa 25 độ"
	- text: "mở cửa garage sau 5 phút"
	---

	# PhoBERT Fine-tuned for Vietnamese Smart Home NER/Slot Filling

	This model is a fine-tuned version of [vinai/phobert-base](https://huggingface.co/vinai/phobert-base) for Named Entity Recognition (NER) in Vietnamese smart home commands. It extracts slot values such as devices, locations, times, and numeric values from user commands.

	## Model Description

	- Base Model: vinai/phobert-base
	- Task: Token Classification / Slot Filling for Smart Home Commands
	- Language: Vietnamese
	- Number of Entity Types: 13

	## Intended Uses & Limitations

	### Intended Uses
	- Extracting entities from Vietnamese smart home voice commands
	- Slot filling for voice assistant systems
	- Integration with intent classification for complete NLU pipeline
	- Research in Vietnamese NLP for IoT applications

	### Limitations
	- Optimized specifically for smart home domain
	- May not generalize well to other domains
	- Trained on Vietnamese language only
	- Best performance when used with corresponding intent classifier

	## Entity Types (Slot Labels)

	The model recognizes 13 types of entities:

	1. `B-device` / `I-device` - Device names (e.g., "đèn", "quạt", "điều hòa")
	2. `B-living_space` / `I-living_space` - Room/location names (e.g., "phòng khách", "phòng ngủ")
	3. `B-time_at` / `I-time_at` - Specific times (e.g., "10 giờ tối", "7 giờ sáng")
	4. `B-duration` / `I-duration` - Time durations (e.g., "5 phút", "2 giờ")
	5. `B-target_number` / `I-target_number` - Target values (e.g., "25 độ", "50%")
	6. `B-changing_value` / `I-changing_value` - Change amounts (e.g., "tăng 10%")
	7. `O` - Outside/No entity

	## How to Use

	### Using Transformers Library

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	import torch
	import json

	# Load model and tokenizer
	model_name = "ntgiaky/phobert-ner-smart-home"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForTokenClassification.from_pretrained(model_name)

	# Load label mappings
	with open('label_mappings.json', 'r') as f:
	label_mappings = json.load(f)
	id2label = {int(k): v for k, v in label_mappings['id2label'].items()}

	def extract_entities(text):
	# Tokenize
	inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
	tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

	# Predict
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.argmax(outputs.logits, dim=2)

	# Extract entities
	entities = []
	current_entity = None
	current_tokens = []

	for token, pred_id in zip(tokens, predictions[0]):
	label = id2label[pred_id.item()]

	if label.startswith('B-'):
	# Save previous entity if exists
	if current_entity:
	entities.append({
	'type': current_entity,
	'text': tokenizer.convert_tokens_to_string(current_tokens)
	})
	# Start new entity
	current_entity = label[2:]
	current_tokens = [token]
	elif label.startswith('I-') and current_entity == label[2:]:
	# Continue current entity
	current_tokens.append(token)
	else:
	# End current entity
	if current_entity:
	entities.append({
	'type': current_entity,
	'text': tokenizer.convert_tokens_to_string(current_tokens)
	})
	current_entity = None
	current_tokens = []

	# Don't forget last entity
	if current_entity:
	entities.append({
	'type': current_entity,
	'text': tokenizer.convert_tokens_to_string(current_tokens)
	})

	return entities

	# Example usage
	text = "bật đèn phòng khách lúc 7 giờ tối"
	entities = extract_entities(text)
	print(f"Input: {text}")
	print(f"Entities: {entities}")
	```

	### Using Pipeline

	```python
	from transformers import pipeline

	# Load NER pipeline
	ner = pipeline(
	"token-classification",
	model="ntgiaky/phobert-ner-smart-home",
	aggregation_strategy="simple"
	)

	# Extract entities
	result = ner("tắt quạt phòng ngủ sau 10 phút")
	print(result)
	```

	## Integration with Intent Classification

	For a complete NLU pipeline:

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

	# Load with PhoBERT tokenizer explicitly
	tokenizer = AutoTokenizer.from_pretrained("vinai/phobert-base")
	model = AutoModelForTokenClassification.from_pretrained(
	"ntgiaky/phobert-ner-smart-home",
	ignore_mismatched_sizes=True # Add this if needed
	)

	# Create pipeline with explicit tokenizer
	ner = pipeline(
	"token-classification",
	model=model,
	tokenizer=tokenizer,
	aggregation_strategy="simple"
	)

	# Test
	result = ner("bật đèn phòng khách")
	print(result)
	```

	## Example Outputs

	```python
	# Input: "bật đèn phòng khách"
	# [{'entity_group': 'living_space', 'score': np.float32(0.97212785), 'word': 'đèn', 'start': None, 'end': None},
	# {'entity_group': 'duration', 'score': np.float32(0.9332844), 'word': 'phòng khách', 'start': None, 'end': None}]
	```

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{phobert-ner-smart-home-2025,
	author = {Trần Quang Huy and Nguyễn Trần Gia Kỳ},
	title = {PhoBERT Fine-tuned for Vietnamese Smart Home NER},
	year = {2025},
	publisher = {Hugging Face},
	journal = {Hugging Face Model Hub},
	howpublished = {\url{https://huggingface.co/ntgiaky/ner-smart-home}}
	}
	```

	## Authors

	- Trần Quang Huy
	- Nguyễn Trần Gia Kỳ

	## License

	This model is released under the MIT License.