junnei
/

gemma-3-4b-it-speech

Automatic Speech Recognition

feature-extraction

Model card Files Files and versions Community

junnei commited on Mar 29

Commit

959264a

·

verified ·

1 Parent(s): fbc6ee1

Update README.md

Files changed (1) hide show

README.md +36 -1

README.md CHANGED Viewed

@@ -74,7 +74,7 @@ $ pip install git+https://github.com/huggingface/transformers
 Then, copy the snippet from the section that is relevant for your use case.
-#### Running the model on a single/multi GPU
 ```python
 from transformers import AutoProcessor, AutoModel
@@ -117,6 +117,41 @@ print(response)
 ```
 ## Evaluation
 Model evaluation metrics and results.

 Then, copy the snippet from the section that is relevant for your use case.
+#### Running the model with chat_template
 ```python
 from transformers import AutoProcessor, AutoModel
 ```
+#### Running the model with local data
+```python
+from io import BytesIO
+from urllib.request import urlopen
+import soundfile
+from PIL import Image
+# get Audio data from URL
+url = "https://huggingface.co/microsoft/Phi-4-multimodal-instruct/resolve/main/examples/what_is_shown_in_this_image.wav"
+audio, sr = soundfile.read(BytesIO(urlopen(url).read()))
+audio_token = '<start_of_audio>'
+messages = [
+    {'role': 'user', 'content': audio_token + 'Translate this audio into Korean.'},
+]
+prompt = processor.tokenizer.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True
+)
+inputs = processor(text=prompt, audio=[audio], add_special_tokens=False, return_tensors="pt")
+with torch.inference_mode():
+    generate_ids = model.generate(**inputs, max_new_tokens=128, do_sample=False)
+    generate_ids = generate_ids[:, inputs['input_ids'].shape[1] :]
+    response = processor.batch_decode(
+        generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
+    )[0]
+print(response)
+```
 ## Evaluation
 Model evaluation metrics and results.