File size: 3,825 Bytes
6440419 b456c1a 6440419 b456c1a 6440419 b456c1a 6440419 b456c1a 6440419 b456c1a 6440419 b456c1a b91f796 6440419 b456c1a 6440419 b456c1a 6440419 b456c1a 6440419 b456c1a 6440419 b456c1a 6440419 b456c1a 6440419 b456c1a 6440419 b456c1a 6440419 b456c1a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
---
license: mit
language: en
base_model: microsoft/phi-2
tags:
- text-generation
- voice-assistant
- automotive
- fine-tuned
- peft
- lora
datasets:
- synthetic
widget:
- text: "Navigate to the nearest EV charging station."
- text: "Set the temperature to 22 degrees."
---
# 🚗 Fine-tuned MBUX Voice Assistant (phi-2)
This repository contains a fine-tuned version of Microsoft's **`microsoft/phi-2`** model, specifically adapted to function as an in-car voice assistant similar to MBUX. The model is trained to understand and respond to common automotive commands.
This model was created as part of an end-to-end MLOps project, from data creation and fine-tuning to deployment in an interactive application.
## ✨ Live Demo
You can interact with this model in a live, voice-to-voice application on Hugging Face Spaces:
**➡️ [Live MBUX Gradio Demo](https://huggingface.co/spaces/MrunangG/mbux-gradio-demo)**
---
## 📝 Model Details
* **Base Model:** `microsoft/phi-2`
* **Fine-tuning Method:** Parameter-Efficient Fine-Tuning (PEFT) using LoRA.
* **Training Data:** A synthetic, instruction-based dataset of in-car commands covering navigation, climate control, media, and vehicle settings.
* **Frameworks:** PyTorch, Transformers, PEFT, TRL.
### Intended Use
This model is a proof-of-concept designed for demonstration purposes. It's intended to be used as the "brain" for a voice assistant application in an automotive context. It excels at understanding commands like:
* "Navigate to the office."
* "Set the fan speed to maximum."
* "Play my 'Morning Commute' playlist."
---
## 🚀 How to Use
While the model's core function is text generation, its primary intended use is within a full voice-to-voice pipeline.
### Interactive Voice Demo
For the complete, interactive experience including Speech-to-Text and Text-to-Speech, please visit the live application hosted on Hugging Face Spaces:
**➡️ [Live MBUX Gradio Demo](https://huggingface.co/spaces/MrunangG/mbux-gradio-demo)**
### Programmatic Use (Text-Only)
The following Python code shows how to use the fine-tuned model for its core text-generation task programmatically.
```python
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Define the model repository IDs
base_model_id = "microsoft/phi-2"
peft_model_id = "MrunangG/phi-2-mbux-assistant"
# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
trust_remote_code=True,
torch_dtype=torch.float16,
device_map={"": device}
)
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# Load the PEFT model by merging the adapter
model = PeftModel.from_pretrained(base_model, peft_model_id)
# --- Inference ---
prompt = "Set the temperature to 21 degrees."
formatted_prompt = f"[INST] {prompt} [/INST]"
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=50)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
cleaned_response = response.split('[/INST]')[-1].strip()
print(cleaned_response)
# Expected output: Okay, setting the cabin temperature to 21 degrees.
```
---
## 🛠️ Training Procedure
The model was fine-tuned using the `SFTTrainer` from the TRL library. Key training parameters included a learning rate of `2e-4`, the `paged_adamw_8bit` optimizer, and 4-bit quantization to ensure efficient training on consumer hardware.
### Framework versions
- PEFT 0.17.1
- TRL: 0.22.1
- Transformers: 4.56.0
- Pytorch: 2.8.0
- Datasets: 4.0.0
- Tokenizers: 0.22.0 |