File size: 3,825 Bytes
6440419
b456c1a
 
6440419
 
b456c1a
 
 
 
 
6440419
b456c1a
 
 
 
 
6440419
 
b456c1a
6440419
b456c1a
6440419
b456c1a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b91f796
 
 
 
 
 
 
 
 
 
6440419
 
b456c1a
 
 
6440419
b456c1a
 
 
6440419
b456c1a
 
6440419
b456c1a
 
 
 
 
 
 
6440419
b456c1a
 
 
 
6440419
b456c1a
 
6440419
b456c1a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6440419
b456c1a
 
 
 
 
6440419
 
 
 
 
b456c1a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
license: mit
language: en
base_model: microsoft/phi-2
tags:
- text-generation
- voice-assistant
- automotive
- fine-tuned
- peft
- lora
datasets:
- synthetic
widget:
- text: "Navigate to the nearest EV charging station."
- text: "Set the temperature to 22 degrees."
---

# 🚗 Fine-tuned MBUX Voice Assistant (phi-2)

This repository contains a fine-tuned version of Microsoft's **`microsoft/phi-2`** model, specifically adapted to function as an in-car voice assistant similar to MBUX. The model is trained to understand and respond to common automotive commands.

This model was created as part of an end-to-end MLOps project, from data creation and fine-tuning to deployment in an interactive application.

## ✨ Live Demo

You can interact with this model in a live, voice-to-voice application on Hugging Face Spaces:

**➡️ [Live MBUX Gradio Demo](https://huggingface.co/spaces/MrunangG/mbux-gradio-demo)**



---

## 📝 Model Details

* **Base Model:** `microsoft/phi-2`
* **Fine-tuning Method:** Parameter-Efficient Fine-Tuning (PEFT) using LoRA.
* **Training Data:** A synthetic, instruction-based dataset of in-car commands covering navigation, climate control, media, and vehicle settings.
* **Frameworks:** PyTorch, Transformers, PEFT, TRL.

### Intended Use

This model is a proof-of-concept designed for demonstration purposes. It's intended to be used as the "brain" for a voice assistant application in an automotive context. It excels at understanding commands like:
* "Navigate to the office."
* "Set the fan speed to maximum."
* "Play my 'Morning Commute' playlist."

---

## 🚀 How to Use

While the model's core function is text generation, its primary intended use is within a full voice-to-voice pipeline.

### Interactive Voice Demo
For the complete, interactive experience including Speech-to-Text and Text-to-Speech, please visit the live application hosted on Hugging Face Spaces:

**➡️ [Live MBUX Gradio Demo](https://huggingface.co/spaces/MrunangG/mbux-gradio-demo)**

### Programmatic Use (Text-Only)

The following Python code shows how to use the fine-tuned model for its core text-generation task programmatically.

```python
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Define the model repository IDs
base_model_id = "microsoft/phi-2"
peft_model_id = "MrunangG/phi-2-mbux-assistant"

# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map={"": device}
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load the PEFT model by merging the adapter
model = PeftModel.from_pretrained(base_model, peft_model_id)

# --- Inference ---
prompt = "Set the temperature to 21 degrees."
formatted_prompt = f"[INST] {prompt} [/INST]"

inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=50)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
cleaned_response = response.split('[/INST]')[-1].strip()

print(cleaned_response)
# Expected output: Okay, setting the cabin temperature to 21 degrees.
```

---

## 🛠️ Training Procedure

The model was fine-tuned using the `SFTTrainer` from the TRL library. Key training parameters included a learning rate of `2e-4`, the `paged_adamw_8bit` optimizer, and 4-bit quantization to ensure efficient training on consumer hardware.

### Framework versions
- PEFT 0.17.1
- TRL: 0.22.1
- Transformers: 4.56.0
- Pytorch: 2.8.0
- Datasets: 4.0.0
- Tokenizers: 0.22.0