Spaces:

GLorr
/

ML6-Gemini-Demo

Runtime error

App Files Files Community

GLorr commited on Mar 26

Commit

6c09f76

verified ·

1 Parent(s): b6b7427

Upload folder using huggingface_hub

Browse files

Files changed (34) hide show

.gitignore +15 -0
.pre-commit-config.yaml +28 -0
.python-version +1 -0
.ruff_cache/.gitignore +2 -0
.ruff_cache/0.4.8/17181755630229836148 +0 -0
.ruff_cache/0.4.8/2516455456322530856 +0 -0
.ruff_cache/0.4.8/3664365949595148797 +0 -0
.ruff_cache/0.9.6/12093191028265889985 +0 -0
.ruff_cache/0.9.6/16582661031577879600 +0 -0
.ruff_cache/0.9.6/6136549848780317009 +0 -0
.ruff_cache/CACHEDIR.TAG +1 -0
README.md +74 -8
pyproject.toml +21 -0
questions.json +27 -0
src copy/app.py +506 -0
src copy/app2.py +308 -0
src copy/app3.py +0 -0
src copy/helpers/loop.py +274 -0
src copy/helpers/prompts.py +12 -0
src copy/helpers/session.py +50 -0
src copy/index.html +452 -0
src copy/models.py +30 -0
src copy/prompts/default_prompt.jinja2 +41 -0
src copy/run.py +96 -0
src copy/tools/__init__.py +14 -0
src copy/tools/functions.py +148 -0
src copy/tts.py +103 -0
src/app.py +302 -0
src/helpers/datastore.py +5 -0
src/helpers/prompts.py +12 -0
src/prompts/default_prompt.jinja2 +41 -0
src/tools/__init__.py +17 -0
src/tools/functions.py +103 -0
uv.lock +0 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,15 @@

+# Python-generated files
+__pycache__/
+*.py[oc]
+build/
+dist/
+wheels/
+*.egg-info
+# Virtual environments
+.venv
+# Environment variables
+.env
+.vscode/

.pre-commit-config.yaml ADDED Viewed

	@@ -0,0 +1,28 @@

+repos:
+  - repo: https://github.com/PyCQA/bandit
+    rev: 1.7.4
+    hooks:
+      - id: bandit
+        name: bandit
+        types: [python]
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    # Ruff version.
+    rev: v0.4.8
+    hooks:
+      # Run the linter.
+      - id: ruff
+      # Run the formatter.
+      - id: ruff-format
+  - repo: https://github.com/psf/black
+    rev: 23.1.0
+    hooks:
+      - id: black
+        name: black
+  - repo: https://github.com/pre-commit/mirrors-isort
+    rev: v5.10.1
+    hooks:
+      - id: isort
+        args: ["--profile", "black"]

.python-version ADDED Viewed

	@@ -0,0 +1 @@


1	+ 3.11.9

.ruff_cache/.gitignore ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # Automatically created by ruff.
2	+ *

.ruff_cache/0.4.8/17181755630229836148 ADDED Viewed

Binary file (187 Bytes). View file

.ruff_cache/0.4.8/2516455456322530856 ADDED Viewed

Binary file (291 Bytes). View file

.ruff_cache/0.4.8/3664365949595148797 ADDED Viewed

Binary file (222 Bytes). View file

.ruff_cache/0.9.6/12093191028265889985 ADDED Viewed

Binary file (236 Bytes). View file

.ruff_cache/0.9.6/16582661031577879600 ADDED Viewed

Binary file (187 Bytes). View file

.ruff_cache/0.9.6/6136549848780317009 ADDED Viewed

Binary file (222 Bytes). View file

.ruff_cache/CACHEDIR.TAG ADDED Viewed

	@@ -0,0 +1 @@


1	+ Signature: 8a477f597d28d172789f06886806bc55

README.md CHANGED Viewed

@@ -1,12 +1,78 @@
 ---
-title: ML6 Gemini Demo
-emoji: 🔥
-colorFrom: red
-colorTo: gray
 sdk: gradio
-sdk_version: 5.23.1
-app_file: app.py
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: ML6-Gemini-Demo
+app_file: src/app.py
 sdk: gradio
+sdk_version: 5.23.0
 ---
+# Gemini Voice Agent Demo
+This repo contains a demo using the Gemini MultiModal API to create a voice-based agent that can conduct professional technical screening interviews.
+## Technical Overview
+The system is based on FastRTC and Gradio to provide a real-time voice UI.
+### About the modality
+You can configure the output modality:
+- If set to AUDIO
+    - The agent will respond with an audio response.
+    - There is no text output so no transcription
+if set to TEXT
+    - The agent will respond with a text response.
+    - The text output will be transcribed to audio using the TTS API.
+    - Transcriptions are available.
+### Function Calling
+There are 2 functions that can be called:
+- Answer validation
+    - will check the answer type vs the expected type
+    - will store the answer
+- Log Input
+    - will log the user input
+    - this is a form of transcribing the incoming audio
+## Getting Started
+To run the application, follow these steps:
+1. Install uv (if not already installed):
+`curl -LsSf https://astral.sh/uv/install.sh | sh`
+2. Install dependencies:
+`uv sync`
+3. Setup the environment variables for either GenAI or VertexAI (see below)
+4. Run the application:
+`python src/app.py`
+5. Visit `http://127.0.0.1:7860` in your browser to interact with the voice agent.
+### GenAI vs VertexAI
+"gemini-2.0-flash-exp" can be used in both GenAI and VertexAI. [more info](https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide?tab=readme-ov-file)
+- GenAI requires just a GEMINI_API_KEY environment variable [link](https://ai.google.dev/gemini-api/docs/api-key)
+- VertexAI requires a GCP project and the following environment variables:
+```
+export GOOGLE_CLOUD_PROJECT=YOUR_PROJECT_ID
+export GOOGLE_CLOUD_LOCATION=europe-west4
+export GOOGLE_GENAI_USE_VERTEXAI=True
+```
+Depending `GOOGLE_GENAI_USE_VERTEXAI` flag this demo will use either GenAI or VertexAI.
+### Note
+The gradio-webrtc install fails unless you have ffmpeg@6, on mac:
+```
+brew uninstall ffmpeg
+brew install ffmpeg@6
+brew link ffmpeg@6
+```

pyproject.toml ADDED Viewed

	@@ -0,0 +1,21 @@

+[project]
+name = "gemini-voice-agents"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.11.9"
+dependencies = [
+    "fastrtc>=0.0.17",
+    "google>=3.0.0",
+    "google-cloud>=0.34.0",
+    "google-cloud-texttospeech>=2.25.1",
+    "google-genai>=1.7.0",
+    "gradio>=5.23.0",
+    "numpy>=2.1.3",
+]
+[dependency-groups]
+dev = [
+    "ruff>=0.9.6",
+    "pre-commit>=4.1",
+]

questions.json ADDED Viewed

	@@ -0,0 +1,27 @@

+[
+    {
+        "id": 1,
+        "question": "What is your full name?",
+        "answer_format": "str"
+    },
+    {
+        "id": 2,
+        "question": "What is your current job title?",
+        "answer_format": "str"
+    },
+    {
+        "id": 3,
+        "question": "How many years of relevant experience do you have?",
+        "answer_format": "int"
+    },
+    {
+        "id": 4,
+        "question": "Are you looking for a new job?",
+        "answer_format": "bool"
+    },
+    {
+        "id": 5,
+        "question": "List your three strongest technical skills.",
+        "answer_format": "list[str]"
+    }
+]

src copy/app.py ADDED Viewed

	@@ -0,0 +1,506 @@

+# -*- coding: utf-8 -*-
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+## Setup
+The gradio-webrtc install fails unless you have ffmpeg@6, on mac:
+```
+brew uninstall ffmpeg
+brew install ffmpeg@6
+brew link ffmpeg@6
+```
+Create a virtual python environment, then install the dependencies for this script:
+```
+pip install websockets numpy gradio-webrtc "gradio>=5.9.1"
+```
+If installation fails it may be
+Before running this script, ensure the `GOOGLE_API_KEY` environment
+```
+$ export GOOGLE_API_KEY ='add your key here'
+```
+You can get an api-key from Google AI Studio (https://aistudio.google.com/apikey)
+## Run
+To run the script:
+```
+python gemini_gradio_audio.py
+```
+On the gradio page (http://127.0.0.1:7860/) click record, and talk, gemini will reply. But note that interruptions
+don't work.
+"""
+import base64
+import json
+import os
+import wave
+import itertools
+import gradio as gr
+import numpy as np
+import websockets.sync.client
+from gradio_webrtc import StreamHandler, WebRTC
+from jinja2 import Template
+import threading
+import queue
+from tools import FUNCTION_MAP, TOOLS
+from google.cloud import texttospeech
+# logging.basicConfig(
+#     level=logging.INFO,
+#     format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
+# )
+# logger = logging.getLogger(__name__)
+with open("questions.json", "r") as f:
+    questions_dict = json.load(f)
+with open("src/prompts/default_prompt.jinja2") as f:
+    template_str = f.read()
+    template = Template(template_str)
+    system_prompt = template.render(questions=json.dumps(questions_dict, indent=4))
+print(system_prompt)
+# TOOLS = types.GenerateContentConfig(tools=[validate_answer])
+__version__ = "0.0.3"
+KEY_NAME = "GOOGLE_API_KEY"
+# Configuration and Utilities
+class GeminiConfig:
+    """Configuration settings for Gemini API."""
+    def __init__(self):
+        self.api_key = os.getenv(KEY_NAME)
+        self.host = "generativelanguage.googleapis.com"
+        self.model = "models/gemini-2.0-flash-exp"
+        self.ws_url = f"wss://{self.host}/ws/google.ai.generativelanguage.v1alpha.GenerativeService.BidiGenerateContent?key={self.api_key}"
+class TTSStreamer:
+    def __init__(self):
+        self.client = texttospeech.TextToSpeechClient()
+        self.text_queue = queue.Queue()
+        self.audio_queue = queue.Queue()
+    def start_stream(self):
+        streaming_config = texttospeech.StreamingSynthesizeConfig(
+            voice=texttospeech.VoiceSelectionParams(
+                name="en-US-Journey-D",
+                language_code="en-US"
+            )
+        )
+        config_request = texttospeech.StreamingSynthesizeRequest(
+            streaming_config=streaming_config
+        )
+        def request_generator():
+            while True:
+                try:
+                    text = self.text_queue.get()
+                    if text is None:  # Poison pill to stop
+                        break
+                    yield texttospeech.StreamingSynthesizeRequest(
+                        input=texttospeech.StreamingSynthesisInput(text=text)
+                    )
+                except queue.Empty:
+                    continue
+        def audio_processor():
+            responses = self.client.streaming_synthesize(
+                itertools.chain([config_request], request_generator())
+            )
+            print(f"Responses: {responses}")
+            for response in responses:
+                self.audio_queue.put(response.audio_content)
+        self.processor_thread = threading.Thread(target=audio_processor)
+        self.processor_thread.start()
+    def send_text(self, text: str):
+        """Send text to be synthesized."""
+        self.text_queue.put(text)
+    def get_audio(self):
+        """Get the next chunk of audio bytes."""
+        try:
+            return self.audio_queue.get_nowait()
+        except queue.Empty:
+            return None
+    def stop(self):
+        """Stop the streaming synthesis."""
+        self.text_queue.put(None)  # Send poison pill
+        if self.processor_thread:
+            self.processor_thread.join()
+class AudioProcessor:
+    """Handles encoding and decoding of audio data."""
+    @staticmethod
+    def encode_audio(data, sample_rate):
+        """Encodes audio data to base64."""
+        encoded = base64.b64encode(data.tobytes()).decode("UTF-8")
+        return {
+            "realtimeInput": {
+                "mediaChunks": [
+                    {
+                        "mimeType": f"audio/pcm;rate={sample_rate}",
+                        "data": encoded,
+                    }
+                ],
+            },
+        }
+    @staticmethod
+    def process_audio_response(data):
+        """Decodes audio data from base64."""
+        audio_data = base64.b64decode(data)
+        return np.frombuffer(audio_data, dtype=np.int16)
+# Gemini Interaction Handler
+class GeminiHandler(StreamHandler):
+    """Handles streaming interactions with the Gemini API."""
+    def __init__(
+        self,
+        audio_file=None,
+        expected_layout="mono",
+        output_sample_rate=24000,
+        output_frame_size=480,
+    ) -> None:
+        super().__init__(
+            expected_layout,
+            output_sample_rate,
+            output_frame_size,
+            input_sample_rate=24000,
+        )
+        self.config = GeminiConfig()
+        self.ws = None
+        self.all_output_data = None
+        self.audio_processor = AudioProcessor()
+        self.audio_file = audio_file
+        self.text_buffer = ""
+        self.tts_engine = None
+    def copy(self):
+        """Creates a copy of the GeminiHandler instance."""
+        return GeminiHandler(
+            expected_layout=self.expected_layout,
+            output_sample_rate=self.output_sample_rate,
+            output_frame_size=self.output_frame_size,
+        )
+    def _initialize_websocket(self):
+        """Initializes the WebSocket connection to the Gemini API."""
+        try:
+            self.ws = websockets.sync.client.connect(self.config.ws_url, timeout=3000)
+            setup_request = {
+                "setup": {
+                    "model": self.config.model,
+                    "tools": [{"functionDeclarations": TOOLS}],
+                    "generationConfig": {"responseModalities": "TEXT"},
+                    "systemInstruction": {
+                        "parts": [{"text": system_prompt}],
+                        "role": "user",
+                    },
+                }
+            }
+            self.ws.send(json.dumps(setup_request))
+            setup_response = json.loads(self.ws.recv())
+            print(f"Setup response: {setup_response}")
+            if self.audio_file:
+                self.input_audio_file(self.audio_file)
+                print("Audio file sent")
+        except websockets.exceptions.WebSocketException as e:
+            print(f"WebSocket connection failed: {str(e)}")
+            self.ws = None
+        except Exception as e:
+            print(f"Setup failed: {str(e)}")
+            self.ws = None
+    def input_audio_file(self, audio_file):
+        """Processes an audio file and sends it to the Gemini API."""
+        try:
+            with wave.open(audio_file, "rb") as wf:
+                data = wf.readframes(wf.getnframes())
+                self.receive((wf.getframerate(), np.frombuffer(data, dtype=np.int16)))
+        except Exception as e:
+            print(f"Error in input_audio_file: {str(e)}")
+    def receive(self, frame: tuple[int, np.ndarray]) -> None:
+        """Receives audio/video data, encodes it, and sends it to the Gemini API."""
+        try:
+            if not self.ws:
+                self._initialize_websocket()
+            sample_rate, array = frame
+            message = {"realtimeInput": {"mediaChunks": []}}
+            if sample_rate > 0 and array is not None:
+                array = array.squeeze()
+                audio_data = self.audio_processor.encode_audio(
+                    array, self.output_sample_rate
+                )
+                message["realtimeInput"]["mediaChunks"].append(
+                    {
+                        "mimeType": f"audio/pcm;rate={self.output_sample_rate}",
+                        "data": audio_data["realtimeInput"]["mediaChunks"][0]["data"],
+                    }
+                )
+            if message["realtimeInput"]["mediaChunks"]:
+                self.ws.send(json.dumps(message))
+        except Exception as e:
+            print(f"Error in receive: {str(e)}")
+            if self.ws:
+                self.ws.close()
+            self.ws = None
+    def handle_tool_call(self, tool_call):
+        print("    ", tool_call)
+        for fc in tool_call["functionCalls"]:
+            print(f"Function call: {fc}")
+            # Call the function
+            try:
+                result = {"output": FUNCTION_MAP[fc["name"]](**fc["args"])}
+            except Exception as e:
+                result = {"error": str(e)}
+            # Send the response back
+            msg = {
+                "tool_response": {
+                    "function_responses": [
+                        {"id": fc["id"], "name": fc["name"], "response": result}
+                    ]
+                }
+            }
+            print(f"function response: {msg}")
+            self.ws.send(json.dumps(msg))
+    def _output_data(self, audio_array):
+        """Processes audio output data from the WebSocket response."""
+        if self.all_output_data is None:
+            self.all_output_data = audio_array
+        else:
+            self.all_output_data = np.concatenate((self.all_output_data, audio_array))
+        while self.all_output_data.shape[-1] >= self.output_frame_size:
+            yield (
+                self.output_sample_rate,
+                self.all_output_data[: self.output_frame_size].reshape(1, -1),
+            )
+            self.all_output_data = self.all_output_data[self.output_frame_size :]
+    def _process_server_content(self, content):
+        """Processes audio output data from the WebSocket response."""
+        if respone := content.get("modelTurn", {}):
+            if parts:= respone.get("parts"):
+                for part in parts:
+                    print(f"Part: {part}")
+                    data = part.get("inlineData", {}).get("data", "")
+                    if data:
+                        audio_array = self.audio_processor.process_audio_response(data)
+                        yield from self._output_data(audio_array)
+                    text = part.get("text", "")
+                    if text:
+                        self.text_buffer += text
+                        # audio_array = self._text_to_audio(text)
+                        # yield from self._output_data(audio_array)
+                        # # self.text_buffer += text
+        # Check if the turn is complete and process the text buffer into audio
+        if content.get("turnComplete"):
+            if self.text_buffer:
+                audio_array = self._text_to_audio(self.text_buffer)
+                yield from self._output_data(audio_array)
+                self.text_buffer = ""
+    def _text_to_audio(self, text: str) -> np.ndarray:
+        """Convert text to audio using Google Cloud TTS streaming."""
+        client = texttospeech.TextToSpeechClient()
+        # Configure synthesis
+        synthesis_input = texttospeech.SynthesisInput(text=text)
+        voice = texttospeech.VoiceSelectionParams(
+            name="en-IN-Chirp-HD-O",
+            language_code="en-IN"
+        )
+        audio_config = texttospeech.AudioConfig(
+            audio_encoding=texttospeech.AudioEncoding.LINEAR16
+        )
+        # Get response in a single request
+        try:
+            response = client.synthesize_speech(
+            input=synthesis_input,
+            voice=voice,
+            audio_config=audio_config
+            )
+            return np.frombuffer(response.audio_content, dtype=np.int16)
+        except Exception as e:
+            print(f"Error in speech synthesis: {e}")
+            return np.array([], dtype=np.int16)
+    def generator(self):
+        """Generates audio output from the WebSocket stream."""
+        while True:
+            if not self.ws:
+                print("WebSocket not connected")
+                yield None
+                continue
+            try:
+                message = self.ws.recv(timeout=30)
+                msg = json.loads(message)
+# {'serverContent': {'modelTurn': {'parts': [{'text': 'Hello'}]}}}
+# {'serverContent': {'modelTurn': {'parts': [{'text': ', good morning! Thank you for taking my call. My name is [Your'}]}}}
+# {'serverContent': {'modelTurn': {'parts': [{'text': " Name] and I'm a technical recruiter. I'm conducting a quick"}]}}}
+# {'serverContent': {'modelTurn': {'parts': [{'text': ' initial screening, is that okay with you?\n'}]}}}
+# {'serverContent': {'turnComplete': True}}
+                if "serverContent" in msg:
+                    content = msg["serverContent"]
+                    yield from self._process_server_content(content)
+                elif "toolCall" in msg:
+                    yield from self.handle_tool_call(msg["toolCall"])
+            except TimeoutError:
+                print("Timeout waiting for server response")
+                yield None
+            except Exception:
+                yield None
+    def emit(self) -> tuple[int, np.ndarray] | None:
+        """Emits the next audio chunk from the generator."""
+        if not self.ws:
+            return None
+        if not hasattr(self, "_generator"):
+            self._generator = self.generator()
+        try:
+            return next(self._generator)
+        except StopIteration:
+            self.reset()
+            return None
+    def reset(self) -> None:
+        """Resets the generator and output data."""
+        if hasattr(self, "_generator"):
+            delattr(self, "_generator")
+        self.all_output_data = None
+    def shutdown(self) -> None:
+        """Closes the WebSocket connection."""
+        if self.ws:
+            self.ws.close()
+    def check_connection(self):
+        """Checks if the WebSocket connection is active."""
+        try:
+            if not self.ws or self.ws.closed:
+                self._initialize_websocket()
+            return True
+        except Exception as e:
+            print(f"Connection check failed: {str(e)}")
+            return False
+def update_answers():
+    with open("answers.json", "r") as f:
+        return json.load(f)
+# Main Gradio Interface
+def registry(name: str, token: str | None = None, **kwargs):
+    """Sets up and returns the Gradio interface."""
+    api_key = token or os.environ.get(KEY_NAME)
+    if not api_key:
+        raise ValueError(f"{KEY_NAME} environment variable is not set.")
+    interface = gr.Blocks()
+    with interface:
+        with gr.Tabs():
+            with gr.TabItem("Voice Chat"):
+                gr.HTML(
+                    """
+                    <div style='text-align: left'>
+                        <h1>ML6 Voice Demo - Function Calling and Custom Output Voice</h1>
+                    </div>
+                    """
+                )
+                gemini_handler = GeminiHandler()
+                # gemini_handler = ThreeStepHandler()
+                with gr.Row():
+                    audio = WebRTC(
+                        label="Voice Chat", modality="audio", mode="send-receive"
+                    )
+                # Add display components for questions and answers
+                with gr.Row():
+                    with gr.Column():
+                        gr.JSON(
+                            label="Questions",
+                            value=questions_dict,
+                        )
+                    with gr.Column():
+                        gr.JSON(update_answers, label="Collected Answers", every=1)
+                audio.stream(
+                    gemini_handler,
+                    inputs=[audio],  # Add audio_file to inputs
+                    outputs=[audio],
+                    time_limit=600,
+                    concurrency_limit=10,
+                )
+    return interface
+# Launch the Gradio interface
+gr.load(
+    name="gemini-2.0-flash-exp",
+    src=registry,
+).launch()

src copy/app2.py ADDED Viewed

	@@ -0,0 +1,308 @@

+# -*- coding: utf-8 -*-
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+## Setup
+The gradio-webrtc install fails unless you have ffmpeg@6, on mac:
+```
+brew uninstall ffmpeg
+brew install ffmpeg@6
+brew link ffmpeg@6
+```
+Create a virtual python environment, then install the dependencies for this script:
+```
+pip install websockets numpy gradio-webrtc "gradio>=5.9.1"
+```
+If installation fails it may be
+Before running this script, ensure the `GOOGLE_API_KEY` environment
+```
+$ export GOOGLE_API_KEY ='add your key here'
+```
+You can get an api-key from Google AI Studio (https://aistudio.google.com/apikey)
+## Run
+To run the script:
+```
+python gemini_gradio_audio.py
+```
+On the gradio page (http://127.0.0.1:7860/) click record, and talk, gemini will reply. But note that interruptions
+don't work.
+"""
+import asyncio
+import json
+import os
+from typing import Literal
+import base64
+import gradio as gr
+import numpy as np
+from fastrtc import (
+    AsyncStreamHandler,
+    WebRTC,
+    wait_for_item,
+)
+from jinja2 import Template
+from google import genai
+from google.genai.types import LiveConnectConfig, Tool, FunctionDeclaration
+from google.cloud import texttospeech
+from tools import FUNCTION_MAP, TOOLS
+with open("questions.json", "r") as f:
+    questions_dict = json.load(f)
+with open("src/prompts/default_prompt.jinja2") as f:
+    template_str = f.read()
+    template = Template(template_str)
+    system_prompt = template.render(questions=json.dumps(questions_dict, indent=4))
+class TTSConfig:
+    def __init__(self):
+        self.client = texttospeech.TextToSpeechClient()
+        self.voice = texttospeech.VoiceSelectionParams(
+            name="en-US-Chirp3-HD-Charon",
+            language_code="en-US"
+        )
+        self.audio_config = texttospeech.AudioConfig(
+            audio_encoding=texttospeech.AudioEncoding.LINEAR16
+        )
+class AsyncGeminiHandler(AsyncStreamHandler):
+    """Simple Async Gemini Handler"""
+    def __init__(
+            self,
+            expected_layout: Literal["mono"] = "mono",
+            output_sample_rate: int = 24000,
+            output_frame_size: int = 480,
+        ) -> None:
+        super().__init__(
+            expected_layout,
+            output_sample_rate,
+            output_frame_size,
+            input_sample_rate=16000,
+        )
+        self.input_queue: asyncio.Queue = asyncio.Queue()
+        self.output_queue: asyncio.Queue = asyncio.Queue()
+        self.text_queue: asyncio.Queue = asyncio.Queue()
+        self.quit: asyncio.Event = asyncio.Event()
+        self.chunk_size = 1024
+        self.tts_config: TTSConfig | None = TTSConfig()
+        self.text_buffer = ""
+    def copy(self) -> "AsyncGeminiHandler":
+        return AsyncGeminiHandler(
+            expected_layout="mono",
+            output_sample_rate=self.output_sample_rate,
+            output_frame_size=self.output_frame_size,
+        )
+    def _encode_audio(self, data: np.ndarray) -> str:
+        """Encode Audio data to send to the server"""
+        return base64.b64encode(data.tobytes()).decode("UTF-8")
+    async def receive(self, frame: tuple[int, np.ndarray]) -> None:
+        _, array = frame
+        array = array.squeeze()
+        audio_message = self._encode_audio(array)
+        self.input_queue.put_nowait(audio_message)
+    async def emit(self) -> tuple[int, np.ndarray] | None:
+        return await wait_for_item(self.output_queue)
+    async def start_up(self) -> None:
+        client = genai.Client(
+            api_key=os.getenv("GOOGLE_API_KEY"),
+            http_options={"api_version": "v1alpha"},
+        )
+        config = LiveConnectConfig(
+            system_instruction={
+                        "parts": [{"text": system_prompt}],
+                        "role": "user",
+                    },
+            tools=[Tool(function_declarations=[FunctionDeclaration(**tool) for tool in TOOLS])],
+            response_modalities=["AUDIO"],
+        )
+        async with (
+            client.aio.live.connect(model="gemini-2.0-flash-exp", config=config) as session,
+            asyncio.TaskGroup() as tg
+        ):
+            self.session = session
+            tasks = [
+                tg.create_task(self.process()),
+                tg.create_task(self.send_realtime()),
+                tg.create_task(self.tts()),
+            ]
+    async def process(self) -> None:
+        while True:
+            try:
+                turn = self.session.receive()
+                async for response in turn:
+                    if data := response.data:
+                        array = np.frombuffer(data, dtype=np.int16)
+                        self.output_queue.put_nowait((self.output_sample_rate, array))
+                        continue
+                    if text := response.text:
+                        print(f"Received text: {text}")
+                        self.text_buffer += text
+                    if response.tool_call is not None:
+                        for tool in response.tool_call.function_calls:
+                            tool_response = FUNCTION_MAP[tool.name](**tool.args)
+                            print(f"Calling tool: {tool.name}")
+                            print(f"Tool response: {tool_response}")
+                            await self.session.send(
+                                input=tool_response, end_of_turn=True
+                            )
+                            await asyncio.sleep(0.1)
+                    if sc := response.server_content:
+                        if sc.turn_complete and self.text_buffer:
+                            self.text_queue.put_nowait(self.text_buffer)
+                            FUNCTION_MAP["store_input"](
+                                role="bot",
+                                input=self.text_buffer
+                            )
+                            self.text_buffer = ""
+            except Exception as e:
+                print(f"Error in processing: {e}")
+                await asyncio.sleep(0.1)
+    async def send_realtime(self) -> None:
+        """Send real-time audio data to model."""
+        while True:
+            try:
+                data = await self.input_queue.get()
+                msg = {"data": data, "mime_type": "audio/pcm"}
+                await self.session.send(input=msg)
+            except Exception as e:
+                print(f"Error in real-time sending: {e}")
+                await asyncio.sleep(0.1)
+    async def tts(self) -> None:
+        while True:
+            try:
+                text = await self.text_queue.get()
+                # Get response in a single request
+                if text:
+                    response = self.tts_config.client.synthesize_speech(
+                        input=texttospeech.SynthesisInput(text=text),
+                        voice=self.tts_config.voice,
+                        audio_config=self.tts_config.audio_config
+                    )
+                    array = np.frombuffer(response.audio_content, dtype=np.int16)
+                    self.output_queue.put_nowait((self.output_sample_rate, array))
+            except Exception as e:
+                print(f"Error in TTS: {e}")
+                await asyncio.sleep(0.1)
+    def shutdown(self) -> None:
+        self.quit.set()
+def reload_json(path):
+    with open(path, "r") as f:
+        return json.load(f)
+# Main Gradio Interface
+def registry(name: str, token: str | None = None, **kwargs):
+    """Sets up and returns the Gradio interface."""
+    interface = gr.Blocks()
+    with interface:
+        with gr.Tabs():
+            with gr.TabItem("Voice Chat"):
+                gr.HTML(
+                    """
+                    <div style='text-align: left'>
+                        <h1>ML6 Voice Demo - Function Calling and Custom Output Voice</h1>
+                    </div>
+                    """
+                )
+                gemini_handler = AsyncGeminiHandler()
+                with gr.Row():
+                    audio = WebRTC(
+                        label="Voice Chat", modality="audio", mode="send-receive"
+                    )
+                # Add display components for questions and answers
+                with gr.Row():
+                    with gr.Column():
+                        gr.JSON(
+                            label="Questions",
+                            value=questions_dict,
+                        )
+                    # with gr.Column():
+                    #     gr.JSON(reload_json, inputs=gr.Text(value="/Users/georgeslorre/ML6/internal/gemini-voice-agents/conversation.json", visible=False), label="Conversation", every=1)
+                    with gr.Column():
+                        gr.JSON(reload_json, inputs=gr.Text(value="/Users/georgeslorre/ML6/internal/gemini-voice-agents/answers.json", visible=False),label="Collected Answers", every=1)
+                audio.stream(
+                    gemini_handler,
+                    inputs=[audio],  # Add audio_file to inputs
+                    outputs=[audio],
+                    time_limit=600,
+                    concurrency_limit=10,
+                )
+    return interface
+# Function to clear JSON files
+def clear_json_files():
+    with open("/Users/georgeslorre/ML6/internal/gemini-voice-agents/conversation.json", "w") as f:
+        json.dump([], f)
+    with open("/Users/georgeslorre/ML6/internal/gemini-voice-agents/answers.json", "w") as f:
+        json.dump({}, f)
+# Clear files before launching
+clear_json_files()
+# Launch the Gradio interface
+gr.load(
+    name="gemini-2.0-flash-exp",
+    src=registry,
+).launch()

src copy/app3.py ADDED Viewed

File without changes

src copy/helpers/loop.py ADDED Viewed

	@@ -0,0 +1,274 @@

+"""Helper for audio loop."""
+import asyncio
+import logging
+import traceback
+import wave
+from typing import Optional
+import pyaudio
+from google import genai
+from models import AudioConfig, ModelConfig
+from tools import FUNCTION_MAP
+logger = logging.getLogger(__name__)
+class TextLoop:
+    def __init__(self, model_config: ModelConfig):
+        self.model_config = model_config
+        self.client = self._setup_client()
+        self.session = None
+    def _setup_client(self) -> genai.Client:
+        """Initialize the Gemini client."""
+        return genai.Client(
+            api_key=self.model_config.api_key,
+            http_options={"api_version": "v1alpha"},
+        )
+    async def send_text(self) -> None:
+        """Handle text input and send to model."""
+        while True:
+            try:
+                text = await asyncio.to_thread(input, "message > ")
+                if text.lower() == "q":
+                    break
+                await self.session.send(input=text or ".", end_of_turn=True)
+            except Exception as e:
+                logger.error(f"Error sending text: {e}")
+                await asyncio.sleep(0.1)
+    async def receive_text(self) -> None:
+        """Process and handle model responses."""
+        while True:
+            try:
+                turn = self.session.receive()
+                async for response in turn:
+                    if text := response.text:
+                        logger.info(text)
+                    if response.tool_call is not None:
+                        for tool in response.tool_call.function_calls:
+                            tool_response = FUNCTION_MAP[tool.name](**tool.args)
+                            logger.info(tool_response)
+                            await self.session.send(
+                                input=tool_response, end_of_turn=True
+                            )
+                            await asyncio.sleep(0.1)
+            except Exception as e:
+                logger.error(f"Error receiving text: {e}")
+                await asyncio.sleep(0.1)
+    async def run(self):
+        try:
+            async with (
+                self.client.aio.live.connect(
+                    model=self.model_config.name,
+                    config={
+                        "system_instruction": self.model_config.system_instruction,
+                        "tools": self.model_config.tools,
+                        "generation_config": self.model_config.generation_config,
+                    },
+                ) as session,
+                asyncio.TaskGroup() as tg,
+            ):
+                self.session = session
+                tasks = [
+                    tg.create_task(self.send_text()),
+                    tg.create_task(self.receive_text()),
+                ]
+                await tasks[0]  # Wait for send_text to complete
+                raise asyncio.CancelledError("User requested exit")
+        except asyncio.CancelledError:
+            logger.info("Shutting down...")
+        except Exception as e:
+            logger.error(f"Error in main loop: {e}")
+            logger.debug(traceback.format_exc())
+class AudioLoop:
+    """Handles real-time audio streaming and processing."""
+    def __init__(
+        self,
+        audio_config: AudioConfig,
+        model_config: ModelConfig,
+        function_map: Optional[dict[str, callable]] = FUNCTION_MAP,
+        instruction_audio: Optional[str] = None,
+    ):
+        """Initialize the audio loop.
+        Args:
+            audio_config (AudioConfig): Audio configuration settings
+            model_config (ModelConfig): Model configuration settings
+            function_map (Optional[dict[str, callable]]): Function map
+        """
+        self.audio_config = audio_config
+        self.model_config = model_config
+        self.audio_in_queue: Optional[asyncio.Queue] = None
+        self.out_queue: Optional[asyncio.Queue] = None
+        self.session = None
+        self.audio_stream = None
+        self.client = self._setup_client()
+        self.instruction_audio = instruction_audio
+        self.function_map = function_map
+    def _setup_client(self) -> genai.Client:
+        """Initialize the Gemini client."""
+        return genai.Client(
+            api_key=self.model_config.api_key,
+            http_options={"api_version": "v1alpha"},
+        )
+    async def send_text(self) -> None:
+        """Handle text input and send to model."""
+        while True:
+            try:
+                text = await asyncio.to_thread(input, "message > ")
+                if text.lower() == "q":
+                    break
+                await self.session.send(input=text or ".", end_of_turn=True)
+            except Exception as e:
+                logger.error(f"Error sending text: {e}")
+                await asyncio.sleep(0.1)
+    async def send_realtime(self) -> None:
+        """Send real-time audio data to model."""
+        while True:
+            try:
+                msg = await self.out_queue.get()
+                await self.session.send(input=msg)
+            except Exception as e:
+                logger.error(f"Error in real-time sending: {e}")
+                await asyncio.sleep(0.1)
+    def input_audio_file(self, file_path: str):
+        """Read audio file and stream to the model."""
+        try:
+            with wave.open(file_path, "rb") as wave_file:
+                data = wave_file.readframes(wave_file.getnframes())
+                self.out_queue.put_nowait({"data": data, "mime_type": "audio/pcm"})
+        except Exception as e:
+            logger.error(f"Error reading audio file: {e}")
+    async def listen_audio(self) -> None:
+        """Capture and process audio input."""
+        try:
+            pya = pyaudio.PyAudio()
+            mic_info = pya.get_default_input_device_info()
+            self.audio_stream = await asyncio.to_thread(
+                pya.open,
+                format=self.audio_config.format,
+                channels=self.audio_config.channels,
+                rate=self.audio_config.send_sample_rate,
+                input=True,
+                input_device_index=mic_info["index"],
+                frames_per_buffer=self.audio_config.chunk_size,
+            )
+            kwargs = {"exception_on_overflow": False} if __debug__ else {}
+            while True:
+                data = await asyncio.to_thread(
+                    self.audio_stream.read,
+                    self.audio_config.chunk_size,
+                    **kwargs,
+                )
+                await self.out_queue.put({"data": data, "mime_type": "audio/pcm"})
+        except Exception as e:
+            logger.error(f"Error in audio listening: {e}")
+            if self.audio_stream:
+                self.audio_stream.close()
+    async def receive_audio(self) -> None:
+        """Process and handle model responses."""
+        while True:
+            try:
+                turn = self.session.receive()
+                async for response in turn:
+                    if data := response.data:
+                        self.audio_in_queue.put_nowait(data)
+                        continue
+                    if text := response.text:
+                        logger.info(text)
+                    if response.tool_call is not None:
+                        for tool in response.tool_call.function_calls:
+                            tool_response = FUNCTION_MAP[tool.name](**tool.args)
+                            logger.info(tool_response)
+                            await self.session.send(
+                                input=tool_response, end_of_turn=True
+                            )
+                            await asyncio.sleep(0.1)
+                # Clear queue on turn completion
+                while not self.audio_in_queue.empty():
+                    self.audio_in_queue.get_nowait()
+            except Exception as e:
+                logger.error(f"Error receiving audio: {e}")
+                await asyncio.sleep(0.1)
+    async def play_audio(self) -> None:
+        """Play received audio through output device."""
+        try:
+            pya = pyaudio.PyAudio()
+            stream = await asyncio.to_thread(
+                pya.open,
+                format=self.audio_config.format,
+                channels=self.audio_config.channels,
+                rate=self.audio_config.receive_sample_rate,
+                output=True,
+            )
+            while True:
+                bytestream = await self.audio_in_queue.get()
+                await asyncio.to_thread(stream.write, bytestream)
+        except Exception as e:
+            logger.error(f"Error playing audio: {e}")
+            if "stream" in locals():
+                stream.close()
+    async def run(self) -> None:
+        """Main execution loop."""
+        try:
+            async with (
+                self.client.aio.live.connect(
+                    model=self.model_config.name,
+                    config={
+                        "system_instruction": self.model_config.system_instruction,
+                        "tools": self.model_config.tools,
+                        "generation_config": self.model_config.generation_config,
+                    },
+                ) as session,
+                asyncio.TaskGroup() as tg,
+            ):
+                self.session = session
+                self.audio_in_queue = asyncio.Queue()
+                self.out_queue = asyncio.Queue(maxsize=5)
+                if self.instruction_audio:
+                    self.input_audio_file(file_path=self.instruction_audio)
+                tasks = [
+                    tg.create_task(self.send_text()),
+                    tg.create_task(self.send_realtime()),
+                    tg.create_task(self.listen_audio()),
+                    tg.create_task(self.receive_audio()),
+                    tg.create_task(self.play_audio()),
+                ]
+                await tasks[0]  # Wait for send_text to complete
+                raise asyncio.CancelledError("User requested exit")
+        except asyncio.CancelledError:
+            logger.info("Shutting down...")
+        except Exception as e:
+            logger.error(f"Error in main loop: {e}")
+            logger.debug(traceback.format_exc())
+        finally:
+            if self.audio_stream:
+                self.audio_stream.close()

src copy/helpers/prompts.py ADDED Viewed

	@@ -0,0 +1,12 @@

+"""This module contains the prompts for the application."""
+# import jinja2 template prompt
+from jinja2 import Template
+def load_prompt(prompt_path: str) -> str:
+    """Load the prompt from the given path."""
+    with open(prompt_path, "r", encoding="utf-8") as file:
+        prompt = Template(file.read())
+    return prompt.render()

src copy/helpers/session.py ADDED Viewed

	@@ -0,0 +1,50 @@

+import json
+from dataclasses import dataclass
+from datetime import datetime
+from jinja2 import Template
+@dataclass
+class Question:
+    id: int
+    text: str
+    answer_format: type
+    user_answer: any = None
+class Session:
+    def __init__(self, questions):
+        self.session_id = datetime.now().strftime("%Y%m%d_%H%M%S")
+        self.questions = questions
+        # self.questions = self.process_questions(questions)
+    @staticmethod
+    def process_questions(questions):
+        qq = {}
+        for q in questions:
+            if q["answer_format"] == "number":
+                Q = Question(q["id"], q["text"], int, None)
+            elif q["answer_format"] == "text":
+                Q = Question(q["id"], q["text"], str, None)
+            elif q["answer_format"] == "list":
+                Q = Question(q["id"], q["text"], list, None)
+            else:
+                raise ValueError("Invalid answer format")
+            qq[q["id"]] = Q
+        return qq
+    def answer_question(self, question_id, user_answer):
+        self.questions[question_id].user_answer = user_answer
+    def get_next_question(self):
+        for q in self.questions:
+            if q.user_answer:
+                return q
+        return False
+    def zero_shot_prompt(self, prompt_template_path):
+        with open(prompt_template_path) as f:
+            template_str = f.read()
+            template = Template(template_str)
+            return template.render(questions=json.dumps(self.questions, indent=4))

src copy/index.html ADDED Viewed

	@@ -0,0 +1,452 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Gemini Voice Chat</title>
+    <style>
+        :root {
+            --color-accent: #6366f1;
+            --color-background: #0f172a;
+            --color-surface: #1e293b;
+            --color-text: #e2e8f0;
+            --boxSize: 8px;
+            --gutter: 4px;
+        }
+        body {
+            margin: 0;
+            padding: 0;
+            background-color: var(--color-background);
+            color: var(--color-text);
+            font-family: system-ui, -apple-system, sans-serif;
+            min-height: 100vh;
+            display: flex;
+            flex-direction: column;
+            align-items: center;
+            justify-content: center;
+        }
+        .container {
+            width: 90%;
+            max-width: 800px;
+            background-color: var(--color-surface);
+            padding: 2rem;
+            border-radius: 1rem;
+            box-shadow: 0 25px 50px -12px rgba(0, 0, 0, 0.25);
+        }
+        .wave-container {
+            position: relative;
+            display: flex;
+            min-height: 100px;
+            max-height: 128px;
+            justify-content: center;
+            align-items: center;
+            margin: 2rem 0;
+        }
+        .box-container {
+            display: flex;
+            justify-content: space-between;
+            height: 64px;
+            width: 100%;
+        }
+        .box {
+            height: 100%;
+            width: var(--boxSize);
+            background: var(--color-accent);
+            border-radius: 8px;
+            transition: transform 0.05s ease;
+        }
+        .controls {
+            display: grid;
+            gap: 1rem;
+            margin-bottom: 2rem;
+        }
+        .input-group {
+            display: flex;
+            flex-direction: column;
+            gap: 0.5rem;
+        }
+        label {
+            font-size: 0.875rem;
+            font-weight: 500;
+        }
+        input,
+        select {
+            padding: 0.75rem;
+            border-radius: 0.5rem;
+            border: 1px solid rgba(255, 255, 255, 0.1);
+            background-color: var(--color-background);
+            color: var(--color-text);
+            font-size: 1rem;
+        }
+        button {
+            padding: 1rem 2rem;
+            border-radius: 0.5rem;
+            border: none;
+            background-color: var(--color-accent);
+            color: white;
+            font-weight: 600;
+            cursor: pointer;
+            transition: all 0.2s ease;
+        }
+        button:hover {
+            opacity: 0.9;
+            transform: translateY(-1px);
+        }
+        .icon-with-spinner {
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            gap: 12px;
+            min-width: 180px;
+        }
+        .spinner {
+            width: 20px;
+            height: 20px;
+            border: 2px solid white;
+            border-top-color: transparent;
+            border-radius: 50%;
+            animation: spin 1s linear infinite;
+            flex-shrink: 0;
+        }
+        @keyframes spin {
+            to {
+                transform: rotate(360deg);
+            }
+        }
+        .pulse-container {
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            gap: 12px;
+            min-width: 180px;
+        }
+        .pulse-circle {
+            width: 20px;
+            height: 20px;
+            border-radius: 50%;
+            background-color: white;
+            opacity: 0.2;
+            flex-shrink: 0;
+            transform: translateX(-0%) scale(var(--audio-level, 1));
+            transition: transform 0.1s ease;
+        }
+        /* Add styles for toast notifications */
+        .toast {
+            position: fixed;
+            top: 20px;
+            left: 50%;
+            transform: translateX(-50%);
+            padding: 16px 24px;
+            border-radius: 4px;
+            font-size: 14px;
+            z-index: 1000;
+            display: none;
+            box-shadow: 0 2px 5px rgba(0, 0, 0, 0.2);
+        }
+        .toast.error {
+            background-color: #f44336;
+            color: white;
+        }
+        .toast.warning {
+            background-color: #ffd700;
+            color: black;
+        }
+    </style>
+</head>
+<body>
+    <!-- Add toast element after body opening tag -->
+    <div id="error-toast" class="toast"></div>
+    <div style="text-align: center">
+        <h1>Gemini Voice Chat</h1>
+        <p>Speak with Gemini using real-time audio streaming</p>
+        <p>
+            Get a Gemini API key
+            <a href="https://ai.google.dev/gemini-api/docs/api-key">here</a>
+        </p>
+    </div>
+    <div class="container">
+        <div class="controls">
+            <div class="input-group">
+                <label for="api-key">API Key</label>
+                <input type="password" id="api-key" placeholder="Enter your API key">
+            </div>
+            <div class="input-group">
+                <label for="voice">Voice</label>
+                <select id="voice">
+                    <option value="Puck">Puck</option>
+                    <option value="Charon">Charon</option>
+                    <option value="Kore">Kore</option>
+                    <option value="Fenrir">Fenrir</option>
+                    <option value="Aoede">Aoede</option>
+                </select>
+            </div>
+        </div>
+        <div class="wave-container">
+            <div class="box-container">
+                <!-- Boxes will be dynamically added here -->
+            </div>
+        </div>
+        <button id="start-button">Start Recording</button>
+    </div>
+    <audio id="audio-output"></audio>
+    <script>
+        let peerConnection;
+        let audioContext;
+        let dataChannel;
+        let isRecording = false;
+        let webrtc_id;
+        const startButton = document.getElementById('start-button');
+        const apiKeyInput = document.getElementById('api-key');
+        const voiceSelect = document.getElementById('voice');
+        const audioOutput = document.getElementById('audio-output');
+        const boxContainer = document.querySelector('.box-container');
+        const numBars = 32;
+        for (let i = 0; i < numBars; i++) {
+            const box = document.createElement('div');
+            box.className = 'box';
+            boxContainer.appendChild(box);
+        }
+        function updateButtonState() {
+            if (peerConnection && (peerConnection.connectionState === 'connecting' || peerConnection.connectionState === 'new')) {
+                startButton.innerHTML = `
+                    <div class="icon-with-spinner">
+                        <div class="spinner"></div>
+                        <span>Connecting...</span>
+                    </div>
+                `;
+            } else if (peerConnection && peerConnection.connectionState === 'connected') {
+                startButton.innerHTML = `
+                    <div class="pulse-container">
+                        <div class="pulse-circle"></div>
+                        <span>Stop Recording</span>
+                    </div>
+                `;
+            } else {
+                startButton.innerHTML = 'Start Recording';
+            }
+        }
+        function showError(message) {
+            const toast = document.getElementById('error-toast');
+            toast.textContent = message;
+            toast.className = 'toast error';
+            toast.style.display = 'block';
+            // Hide toast after 5 seconds
+            setTimeout(() => {
+                toast.style.display = 'none';
+            }, 5000);
+        }
+        async function setupWebRTC() {
+            const config = __RTC_CONFIGURATION__;
+            peerConnection = new RTCPeerConnection(config);
+            webrtc_id = Math.random().toString(36).substring(7);
+            const timeoutId = setTimeout(() => {
+                const toast = document.getElementById('error-toast');
+                toast.textContent = "Connection is taking longer than usual. Are you on a VPN?";
+                toast.className = 'toast warning';
+                toast.style.display = 'block';
+                // Hide warning after 5 seconds
+                setTimeout(() => {
+                    toast.style.display = 'none';
+                }, 5000);
+            }, 5000);
+            try {
+                const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
+                stream.getTracks().forEach(track => peerConnection.addTrack(track, stream));
+                // Update audio visualization setup
+                audioContext = new AudioContext();
+                analyser_input = audioContext.createAnalyser();
+                const source = audioContext.createMediaStreamSource(stream);
+                source.connect(analyser_input);
+                analyser_input.fftSize = 64;
+                dataArray_input = new Uint8Array(analyser_input.frequencyBinCount);
+                function updateAudioLevel() {
+                    analyser_input.getByteFrequencyData(dataArray_input);
+                    const average = Array.from(dataArray_input).reduce((a, b) => a + b, 0) / dataArray_input.length;
+                    const audioLevel = average / 255;
+                    const pulseCircle = document.querySelector('.pulse-circle');
+                    if (pulseCircle) {
+                        console.log("audioLevel", audioLevel);
+                        pulseCircle.style.setProperty('--audio-level', 1 + audioLevel);
+                    }
+                    animationId = requestAnimationFrame(updateAudioLevel);
+                }
+                updateAudioLevel();
+                // Add connection state change listener
+                peerConnection.addEventListener('connectionstatechange', () => {
+                    console.log('connectionstatechange', peerConnection.connectionState);
+                    if (peerConnection.connectionState === 'connected') {
+                        clearTimeout(timeoutId);
+                        const toast = document.getElementById('error-toast');
+                        toast.style.display = 'none';
+                    }
+                    updateButtonState();
+                });
+                // Handle incoming audio
+                peerConnection.addEventListener('track', (evt) => {
+                    if (audioOutput && audioOutput.srcObject !== evt.streams[0]) {
+                        audioOutput.srcObject = evt.streams[0];
+                        audioOutput.play();
+                        // Set up audio visualization on the output stream
+                        audioContext = new AudioContext();
+                        analyser = audioContext.createAnalyser();
+                        const source = audioContext.createMediaStreamSource(evt.streams[0]);
+                        source.connect(analyser);
+                        analyser.fftSize = 2048;
+                        dataArray = new Uint8Array(analyser.frequencyBinCount);
+                        updateVisualization();
+                    }
+                });
+                // Create data channel for messages
+                dataChannel = peerConnection.createDataChannel('text');
+                dataChannel.onmessage = (event) => {
+                    const eventJson = JSON.parse(event.data);
+                    if (eventJson.type === "error") {
+                        showError(eventJson.message);
+                    } else if (eventJson.type === "send_input") {
+                        fetch('/input_hook', {
+                            method: 'POST',
+                            headers: {
+                                'Content-Type': 'application/json',
+                            },
+                            body: JSON.stringify({
+                                webrtc_id: webrtc_id,
+                                api_key: apiKeyInput.value,
+                                voice_name: voiceSelect.value
+                            })
+                        });
+                    }
+                };
+                // Create and send offer
+                const offer = await peerConnection.createOffer();
+                await peerConnection.setLocalDescription(offer);
+                await new Promise((resolve) => {
+                    if (peerConnection.iceGatheringState === "complete") {
+                        resolve();
+                    } else {
+                        const checkState = () => {
+                            if (peerConnection.iceGatheringState === "complete") {
+                                peerConnection.removeEventListener("icegatheringstatechange", checkState);
+                                resolve();
+                            }
+                        };
+                        peerConnection.addEventListener("icegatheringstatechange", checkState);
+                    }
+                });
+                const response = await fetch('/webrtc/offer', {
+                    method: 'POST',
+                    headers: { 'Content-Type': 'application/json' },
+                    body: JSON.stringify({
+                        sdp: peerConnection.localDescription.sdp,
+                        type: peerConnection.localDescription.type,
+                        webrtc_id: webrtc_id,
+                    })
+                });
+                const serverResponse = await response.json();
+                if (serverResponse.status === 'failed') {
+                    showError(serverResponse.meta.error === 'concurrency_limit_reached'
+                        ? `Too many connections. Maximum limit is ${serverResponse.meta.limit}`
+                        : serverResponse.meta.error);
+                    stop();
+                    startButton.textContent = 'Start Recording';
+                    return;
+                }
+                await peerConnection.setRemoteDescription(serverResponse);
+            } catch (err) {
+                clearTimeout(timeoutId);
+                console.error('Error setting up WebRTC:', err);
+                showError('Failed to establish connection. Please try again.');
+                stop();
+                startButton.textContent = 'Start Recording';
+            }
+        }
+        function updateVisualization() {
+            if (!analyser) return;
+            analyser.getByteFrequencyData(dataArray);
+            const bars = document.querySelectorAll('.box');
+            for (let i = 0; i < bars.length; i++) {
+                const barHeight = (dataArray[i] / 255) * 2;
+                bars[i].style.transform = `scaleY(${Math.max(0.1, barHeight)})`;
+            }
+            animationId = requestAnimationFrame(updateVisualization);
+        }
+        function stopWebRTC() {
+            if (peerConnection) {
+                peerConnection.close();
+            }
+            if (animationId) {
+                cancelAnimationFrame(animationId);
+            }
+            if (audioContext) {
+                audioContext.close();
+            }
+            updateButtonState();
+        }
+        startButton.addEventListener('click', () => {
+            if (!isRecording) {
+                setupWebRTC();
+                startButton.classList.add('recording');
+            } else {
+                stopWebRTC();
+                startButton.classList.remove('recording');
+            }
+            isRecording = !isRecording;
+        });
+    </script>
+</body>
+</html>

src copy/models.py ADDED Viewed

	@@ -0,0 +1,30 @@

+"""Data models for the application."""
+from dataclasses import dataclass
+import pyaudio
+from dotenv import load_dotenv
+load_dotenv()
+@dataclass
+class AudioConfig:
+    """Audio configuration settings."""
+    format: int = pyaudio.paInt16
+    channels: int = 1
+    send_sample_rate: int = 16000
+    receive_sample_rate: int = 24000
+    chunk_size: int = 1024
+@dataclass
+class ModelConfig:
+    """Gemini model configuration."""
+    api_key: str
+    name: str
+    tools: dict
+    generation_config: dict
+    system_instruction: str

src copy/prompts/default_prompt.jinja2 ADDED Viewed

	@@ -0,0 +1,41 @@

+# Personality and Tone
+## Identity
+You are a friendly recruiter who conducts initial screening calls with candidates. You speak clear, professional English.
+YOU ARE THE RECRUITER AND THE USER IS THE CANDIDATE, THE USER MUST ANSWER THE QUESTIONS.
+## Tone and Language
+- You are polite and professional.
+- Use complete sentences
+- Maintain a formal but warm demeanor
+- Avoid slang or casual language
+## Task
+Your sole responsibility is to conduct brief initial screenings with candidates by following these exact steps:
+# Strict Interview Protocol
+1. ANSWER PROCESSING AND VALIDATION:
+    - ESSENTIAL INFO: Extract only the key information from candidate's response
+    - you MUST store the extracted information using validate_answer_tool
+    - VALIDATION: Use validate_answer_tool with the distilled answer ONLY
+    - ACKNOWLEDGE: Briefly acknowledge the candidate's response
+    - IMPORTANT: Never reveal validation process to candidates
+    - If validation fails, repeat question
+2. ANSWER VALIDATION PROTOCOL:
+    - If answer is VALID: Proceed to next question
+    - If answer is INVALID: Repeat the same question
+    - No exceptions to this rule
+3. INTERVIEW CONCLUSION:
+    - Only conclude after ALL questions are asked and validated
+    - End with a professional thank you message
+    - No additional commentary or questions allowed
+DO NOT deviate from these protocols under any circumstances.
+QUESTIONS SEQUENCE:
+    - You MUST ask questions in the exact order provided in:
+    {{ questions }}

src copy/run.py ADDED Viewed

	@@ -0,0 +1,96 @@

+"""Real-time Speech Interface
+This module provides a real-time speech interface using Google's Gemini model.
+It handles bidirectional audio streaming with automatic speech recognition and synthesis.
+Important:
+    Use headphones to prevent audio feedback and echo issues.
+"""
+import argparse
+import asyncio
+import json
+import logging
+import os
+import traceback
+from helpers.loop import AudioLoop, TextLoop
+from helpers.session import Session
+from models import AudioConfig, ModelConfig
+from tools import TOOLS
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
+)
+logger = logging.getLogger(__name__)
+def main(
+    modality: str = "text", system_prompt: str = None, instruction_audio: str = None
+) -> None:
+    """Entry point for the application."""
+    try:
+        model_config = ModelConfig(
+            api_key=os.environ.get("GOOGLE_API_KEY"),
+            name="models/gemini-2.0-flash-exp",
+            system_instruction=system_prompt,
+            tools=TOOLS,
+            generation_config={
+                "response_modalities": modality.upper(),
+            },
+        )
+        if modality == "audio":
+            loop_instance = AudioLoop(
+                audio_config=AudioConfig(),
+                model_config=model_config,
+                instruction_audio=instruction_audio,
+            )
+        elif modality == "text":
+            loop_instance = TextLoop(model_config=model_config)
+        else:
+            raise ValueError("Invalid modality")
+        asyncio.run(loop_instance.run(), debug=True)
+    except KeyboardInterrupt:
+        logger.info("Application terminated by user")
+    except Exception as e:
+        logger.error(f"Application error: {e}")
+        logger.debug(traceback.format_exc())
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Real-time Speech Interface")
+    parser.add_argument(
+        "-m",
+        "--modality",
+        choices=["text", "audio"],
+        help="Response modality",
+        required=True,
+    )
+    parser.add_argument(
+        "--instruction-audio",
+        type=str,
+        help="Path to audio instructions (.wav file)",
+        required=False,
+    )
+    parser.add_argument(
+        "-q",
+        "--questions",
+        type=str,
+        help="Path to JSON file containing questions",
+        required=True,
+    )
+    args = parser.parse_args()
+    with open(args.questions, "r") as f:
+        questions_dict = json.load(f)
+    session = Session(questions=questions_dict)
+    system_prompt = session.zero_shot_prompt("src/prompts/default_prompt.jinja2")
+    print(system_prompt)
+    main(
+        modality=args.modality,
+        system_prompt=system_prompt,
+        instruction_audio=args.instruction_audio,
+    )

src copy/tools/__init__.py ADDED Viewed

	@@ -0,0 +1,14 @@

+"""Tools package for API integrations."""
+from .functions import validate_answer, validate_answer_tool, store_input, store_input_tool
+# Map of function names to their implementations
+FUNCTION_MAP = {
+    "validate_answer": validate_answer,
+    "store_input": store_input,
+}
+# List of all available tools
+# TOOLS = [validate_answer_tool, store_input_tool]
+TOOLS = [validate_answer_tool]

src copy/tools/functions.py ADDED Viewed

	@@ -0,0 +1,148 @@

+import json
+import logging
+import os
+logger = logging.getLogger(__name__)
+"""Schedule meeting integration function."""
+def fetch_next_question() -> str:
+    """Fetch the next question.
+    Returns:
+        str: The next question.
+    """
+    questions = [
+        "What is the capital of France?",
+        "What is 2 + 2?",
+        "Who wrote Romeo and Juliet?",
+        "What is the chemical symbol for gold?",
+        "Which planet is known as the Red Planet?",
+    ]
+    question = questions[0]
+    return f"You need to ask the candidate following question: `{question}`. Allow the candidate some time to respond "
+fetch_next_question_tool = {
+    "name": "fetch_next_question",
+    "description": "Fetch the next question",
+}
+def validate_answer(
+    question_id: int, answer: str, answer_type: str | int | list
+) -> str:
+    """Validate the user's answer against an expected answer type.
+        question_id (int): The identifier of the question being validated
+        answer (str): The user's provided answer to validate
+        answer_type (type): The expected python type that the answer should match (e.g. str, int, list)
+        str: Returns "Answer is valid" if answer matches expected type, raises ValueError otherwise
+    Raises:
+        ValueError: If the answer's type does not match the expected answer_type
+    Example:
+        >>> validate_answer(1, "42", str)
+        True
+        >>> validate_answer(1, 42, str)
+        ValueError: Invalid answer type
+    """
+    logging.info(
+        {
+            "question_id": question_id,
+            "answer": answer,
+            "answer_type": answer_type,
+        }
+    )
+    if type(answer) is answer_type:
+        raise ValueError("Invalid answer type")
+    # Create or load the answers file
+    answers_file = "/Users/georgeslorre/ML6/internal/gemini-voice-agents/answers.json"
+    answers = []
+    if os.path.exists(answers_file):
+        with open(answers_file, "r") as f:
+            answers = json.load(f)
+    # Append new answer
+    answers[question_id] = {"question_id": question_id, "answer": answer}
+    # Write back to file
+    with open(answers_file, "w") as f:
+        json.dump(answers, f, indent=2)
+    return "Answer is valid"
+validate_answer_tool = {
+    "name": "validate_answer",
+    "description": "Validate the user's answer against an expected answer type",
+    "parameters": {
+        "type": "OBJECT",
+        "properties": {
+            "question_id": {
+                "type": "INTEGER",
+                "description": "The identifier of the question being validated"
+            },
+            "answer": {
+                "type": "STRING",
+                "description": "The user's provided answer to validate"
+            },
+            "answer_type": {
+                "type": "STRING",
+                "description": "The expected python type that the answer should match (e.g. str, int, list)"
+            }
+        },
+        "required": ["question_id", "answer", "answer_type"]
+    }
+}
+def store_input(role: str, input: str) -> str:
+    """Store conversation input in a JSON file.
+    Args:
+        role (str): The role of the speaker (user or assistant)
+        input (str): The text input to store
+    Returns:
+        str: Confirmation message
+    """
+    conversation_file = "/Users/georgeslorre/ML6/internal/gemini-voice-agents/conversation.json"
+    conversation = []
+    if os.path.exists(conversation_file):
+        with open(conversation_file, "r") as f:
+            conversation = json.load(f)
+    conversation.append({"role": role, "content": input})
+    with open(conversation_file, "w") as f:
+        json.dump(conversation, f, indent=2)
+    return "Input stored successfully"
+store_input_tool = {
+    "name": "store_input",
+    "description": "Store user input in conversation history",
+    "parameters": {
+        "type": "OBJECT",
+        "properties": {
+            "role": {
+                "type": "STRING",
+                "description": "The role of the speaker (user or assistant)"
+            },
+            "input": {
+                "type": "STRING",
+                "description": "The text input to store"
+            }
+        }
+    }
+}

src copy/tts.py ADDED Viewed

	@@ -0,0 +1,103 @@

+#!/usr/bin/env python
+# Copyright 2024 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""Google Cloud Text-To-Speech API streaming sample with input/output streams."""
+from google.cloud import texttospeech
+import itertools
+import queue
+import threading
+class TTSStreamer:
+    def __init__(self):
+        self.client = texttospeech.TextToSpeechClient()
+        self.text_queue = queue.Queue()
+        self.audio_queue = queue.Queue()
+    def start_stream(self):
+        streaming_config = texttospeech.StreamingSynthesizeConfig(
+            voice=texttospeech.VoiceSelectionParams(
+                name="en-US-Journey-D",
+                language_code="en-US"
+            )
+        )
+        config_request = texttospeech.StreamingSynthesizeRequest(
+            streaming_config=streaming_config
+        )
+        def request_generator():
+            while True:
+                try:
+                    text = self.text_queue.get()
+                    if text is None:  # Poison pill to stop
+                        break
+                    yield texttospeech.StreamingSynthesizeRequest(
+                        input=texttospeech.StreamingSynthesisInput(text=text)
+                    )
+                except queue.Empty:
+                    continue
+        def audio_processor():
+            responses = self.client.streaming_synthesize(
+                itertools.chain([config_request], request_generator())
+            )
+            for response in responses:
+                self.audio_queue.put(response.audio_content)
+        self.processor_thread = threading.Thread(target=audio_processor)
+        self.processor_thread.start()
+    def send_text(self, text: str):
+        """Send text to be synthesized."""
+        self.text_queue.put(text)
+    def get_audio(self):
+        """Get the next chunk of audio bytes."""
+        try:
+            return self.audio_queue.get_nowait()
+        except queue.Empty:
+            return None
+    def stop(self):
+        """Stop the streaming synthesis."""
+        self.text_queue.put(None)  # Send poison pill
+        if self.processor_thread:
+            self.processor_thread.join()
+def main():
+    tts = TTSStreamer()
+    tts.start_stream()
+    # Example usage
+    try:
+        while True:
+            text = input("Enter text (or 'q' to quit): ")
+            if text.lower() == 'q':
+                break
+            tts.send_text(text)
+            # Get and print audio bytes
+            while True:
+                audio_chunk = tts.get_audio()
+                if audio_chunk is None:
+                    break
+                print(f"Received audio chunk of {len(audio_chunk)} bytes")
+    finally:
+        tts.stop()
+if __name__ == "__main__":
+    main()

src/app.py ADDED Viewed

	@@ -0,0 +1,302 @@

+import asyncio
+import base64
+import json
+import os
+from typing import Literal
+import gradio as gr
+import numpy as np
+from fastrtc import AsyncStreamHandler, WebRTC, wait_for_item
+from google import genai
+from google.cloud import texttospeech
+from google.genai.types import FunctionDeclaration, LiveConnectConfig, Tool
+import helpers.datastore as datastore
+from helpers.prompts import load_prompt
+from tools import FUNCTION_MAP, TOOLS
+with open("questions.json", "r") as f:
+    questions_dict = json.load(f)
+datastore.DATA_STORE["questions"] = questions_dict
+SYSTEM_PROMPT = load_prompt(
+    "src/prompts/default_prompt.jinja2", questions=questions_dict
+)
+class TTSConfig:
+    def __init__(self):
+        self.client = texttospeech.TextToSpeechClient()
+        self.voice = texttospeech.VoiceSelectionParams(
+            name="en-US-Chirp3-HD-Charon", language_code="en-US"
+        )
+        self.audio_config = texttospeech.AudioConfig(
+            audio_encoding=texttospeech.AudioEncoding.LINEAR16
+        )
+class AsyncGeminiHandler(AsyncStreamHandler):
+    """Simple Async Gemini Handler"""
+    def __init__(
+        self,
+        expected_layout: Literal["mono"] = "mono",
+        output_sample_rate: int = 24000,
+        output_frame_size: int = 480,
+    ) -> None:
+        super().__init__(
+            expected_layout,
+            output_sample_rate,
+            output_frame_size,
+            input_sample_rate=16000,
+        )
+        self.input_queue: asyncio.Queue = asyncio.Queue()
+        self.output_queue: asyncio.Queue = asyncio.Queue()
+        self.text_queue: asyncio.Queue = asyncio.Queue()
+        self.quit: asyncio.Event = asyncio.Event()
+        self.chunk_size = 1024
+        self.tts_config: TTSConfig | None = TTSConfig()
+        self.text_buffer = ""
+    def copy(self) -> "AsyncGeminiHandler":
+        return AsyncGeminiHandler(
+            expected_layout="mono",
+            output_sample_rate=self.output_sample_rate,
+            output_frame_size=self.output_frame_size,
+        )
+    def _encode_audio(self, data: np.ndarray) -> str:
+        """Encode Audio data to send to the server"""
+        return base64.b64encode(data.tobytes()).decode("UTF-8")
+    async def receive(self, frame: tuple[int, np.ndarray]) -> None:
+        """Receives and processes audio frames asynchronously."""
+        _, array = frame
+        array = array.squeeze()
+        audio_message = self._encode_audio(array)
+        self.input_queue.put_nowait(audio_message)
+    async def emit(self) -> tuple[int, np.ndarray] | None:
+        """Asynchronously emits items from the output queue."""
+        return await wait_for_item(self.output_queue)
+    async def start_up(self) -> None:
+        """Initialize and start the voice agent application.
+        This asynchronous method sets up the Gemini API client, configures the live connection,
+        and starts three concurrent tasks for receiving, processing and sending information.
+        Returns:
+            None
+        Raises:
+            ValueError: If GEMINI_API_KEY is not provided when required.
+        """
+        if not os.getenv("GOOGLE_GENAI_USE_VERTEXAI") == "True":
+            api_key = os.getenv("GEMINI_API_KEY")
+            if not api_key:
+                raise ValueError("API Key is required")
+            client = genai.Client(
+                api_key=api_key,
+                http_options={"api_version": "v1alpha"},
+            )
+        else:
+            client = genai.Client(http_options={"api_version": "v1beta1"})
+        config = LiveConnectConfig(
+            system_instruction={
+                "parts": [{"text": SYSTEM_PROMPT}],
+                "role": "user",
+            },
+            tools=[
+                Tool(
+                    function_declarations=[
+                        FunctionDeclaration(**tool) for tool in TOOLS
+                    ]
+                )
+            ],
+            response_modalities=["AUDIO"],
+        )
+        async with (
+            client.aio.live.connect(
+                model="gemini-2.0-flash-exp", config=config
+            ) as session,  # setup the live connection session (websocket)
+            asyncio.TaskGroup() as tg,  # create a task group to run multiple tasks concurrently
+        ):
+            self.session = session
+            # these tasks will run concurrently and continuously
+            [
+                tg.create_task(self.process()),
+                tg.create_task(self.send_realtime()),
+                tg.create_task(self.tts()),
+            ]
+    async def process(self) -> None:
+        """Process responses from the session in a continuous loop.
+        This asynchronous method handles different types of responses from the session:
+        - Audio data: Processes and queues audio data with the specified sample rate
+        - Text data: Accumulates received text in a buffer
+        - Tool calls: Executes registered functions and sends their responses back
+        - Server content: Handles turn completion and stores conversation history
+        The method runs indefinitely until interrupted, handling any exceptions that occur
+        during processing by logging them and continuing after a brief delay.
+        Returns:
+            None
+        Raises:
+            Exception: Any exceptions during processing are caught and logged
+        """
+        while True:
+            try:
+                turn = self.session.receive()
+                async for response in turn:
+                    if data := response.data:
+                        # audio data
+                        array = np.frombuffer(data, dtype=np.int16)
+                        self.output_queue.put_nowait((self.output_sample_rate, array))
+                        continue
+                    if text := response.text:
+                        # text data
+                        print(f"Received text: {text}")
+                        self.text_buffer += text
+                    if response.tool_call is not None:
+                        # function calling
+                        for tool in response.tool_call.function_calls:
+                            try:
+                                tool_response = FUNCTION_MAP[tool.name](**tool.args)
+                                print(f"Calling tool: {tool.name}")
+                                print(f"Tool response: {tool_response}")
+                                await self.session.send(
+                                    input=tool_response, end_of_turn=True
+                                )
+                                await asyncio.sleep(0.1)
+                            except Exception as e:
+                                print(f"Error in tool call: {e}")
+                                await asyncio.sleep(0.1)
+                    if sc := response.server_content:
+                        # check if bot's turn is complete
+                        if sc.turn_complete and self.text_buffer:
+                            self.text_queue.put_nowait(self.text_buffer)
+                            FUNCTION_MAP["store_input"](
+                                role="bot", input=self.text_buffer
+                            )
+                            self.text_buffer = ""
+            except Exception as e:
+                print(f"Error in processing: {e}")
+                await asyncio.sleep(0.1)
+    async def send_realtime(self) -> None:
+        """Send real-time audio data to model.
+        This method continuously reads audio data from an input queue and sends it to a model
+        session in real-time. It runs in an infinite loop until interrupted.
+        The audio data is sent with mime type 'audio/pcm'. If an error occurs during sending,
+        it will be printed and the method will sleep briefly before retrying.
+        Returns:
+            None
+        Raises:
+            Exception: Any exceptions during queue access or session sending will be caught and logged.
+        """
+        while True:
+            try:
+                data = await self.input_queue.get()
+                msg = {"data": data, "mime_type": "audio/pcm"}
+                await self.session.send(input=msg)
+            except Exception as e:
+                print(f"Error in real-time sending: {e}")
+                await asyncio.sleep(0.1)
+    async def tts(self) -> None:
+        while True:
+            try:
+                text = await self.text_queue.get()
+                # Get response in a single request
+                if text:
+                    response = self.tts_config.client.synthesize_speech(
+                        input=texttospeech.SynthesisInput(text=text),
+                        voice=self.tts_config.voice,
+                        audio_config=self.tts_config.audio_config,
+                    )
+                    array = np.frombuffer(response.audio_content, dtype=np.int16)
+                    self.output_queue.put_nowait((self.output_sample_rate, array))
+            except Exception as e:
+                print(f"Error in TTS: {e}")
+                await asyncio.sleep(0.1)
+    def shutdown(self) -> None:
+        self.quit.set()
+# Main Gradio Interface
+def registry(*args, **kwargs):
+    """Sets up and returns the Gradio interface."""
+    interface = gr.Blocks()
+    with interface:
+        with gr.Tabs():
+            with gr.TabItem("Voice Chat"):
+                gr.HTML(
+                    """
+                    <div style='text-align: left'>
+                        <h1>ML6 Voice Demo</h1>
+                    </div>
+                    """
+                )
+                gemini_handler = AsyncGeminiHandler()
+                with gr.Row():
+                    audio = WebRTC(
+                        label="Voice Chat",
+                        modality="audio",
+                        mode="send-receive",
+                    )
+                # Add display components for questions and answers
+                with gr.Row():
+                    with gr.Column():
+                        gr.JSON(
+                            label="Questions",
+                            value=datastore.DATA_STORE["questions"],
+                        )
+                    with gr.Column():
+                        gr.JSON(
+                            label="Answers",
+                            value=lambda: datastore.DATA_STORE["answers"],
+                            every=1,
+                        )
+                audio.stream(
+                    gemini_handler,
+                    inputs=[audio],
+                    outputs=[audio],
+                    time_limit=600,
+                    concurrency_limit=10,
+                )
+    return interface
+# Launch the Gradio interface
+gr.load(
+    name="demo",
+    src=registry,
+).launch()

src/helpers/datastore.py ADDED Viewed

	@@ -0,0 +1,5 @@

+DATA_STORE = {
+    "questions": [],
+    "answers": [],
+    "conversation:": [],
+}

src/helpers/prompts.py ADDED Viewed

	@@ -0,0 +1,12 @@

+"""This module contains the prompts for the application."""
+import json
+from jinja2 import Template
+def load_prompt(prompt_path: str, **kwargs) -> str:
+    """Load the prompt from the given path."""
+    with open(prompt_path, "r", encoding="utf-8") as file:
+        prompt = Template(file.read())
+    return prompt.render(**{k: json.dumps(v) for k, v in kwargs.items()})

src/prompts/default_prompt.jinja2 ADDED Viewed

	@@ -0,0 +1,41 @@

+# Personality and Tone
+## Identity
+You are a friendly recruiter who conducts initial screening calls with candidates. You speak clear, professional English.
+YOU ARE THE RECRUITER AND THE USER IS THE CANDIDATE, THE USER MUST ANSWER THE QUESTIONS.
+## Tone and Language
+- You are polite and professional.
+- Use complete sentences
+- Maintain a formal but warm demeanor
+- Avoid slang or casual language
+## Task
+Your sole responsibility is to conduct brief initial screenings with candidates by following these exact steps:
+# Strict Interview Protocol
+1. ANSWER PROCESSING AND VALIDATION:
+    - ESSENTIAL INFO: Extract only the key information from candidate's response
+    - you MUST store the extracted information using validate_answer_tool
+    - VALIDATION: Use validate_answer_tool with the distilled answer ONLY
+    - ACKNOWLEDGE: Briefly acknowledge the candidate's response
+    - IMPORTANT: Never reveal validation process to candidates
+    - If validation fails, repeat question
+2. ANSWER VALIDATION PROTOCOL:
+    - If answer is VALID: Proceed to next question
+    - If answer is INVALID: Repeat the same question
+    - No exceptions to this rule
+3. INTERVIEW CONCLUSION:
+    - Only conclude after ALL questions are asked and validated
+    - End with a professional thank you message
+    - No additional commentary or questions allowed
+DO NOT deviate from these protocols under any circumstances.
+QUESTIONS SEQUENCE:
+    - You MUST ask questions in the exact order provided in:
+    {{ questions }}

src/tools/__init__.py ADDED Viewed

	@@ -0,0 +1,17 @@

+"""Tools package for API integrations."""
+from .functions import (
+    store_input,
+    store_input_tool,
+    validate_answer,
+    validate_answer_tool,
+)
+# Map of function names to their implementations
+FUNCTION_MAP = {
+    "validate_answer": validate_answer,
+    "store_input": store_input,
+}
+# List of all available tools
+TOOLS = [store_input_tool, validate_answer_tool]

src/tools/functions.py ADDED Viewed

	@@ -0,0 +1,103 @@

+import logging
+import helpers.datastore as datastore
+logger = logging.getLogger(__name__)
+def validate_answer(
+    question_id: int, answer: str, answer_type: str | int | list
+) -> str:
+    """Validate the user's answer against an expected answer type.
+        question_id (int): The identifier of the question being validated
+        answer (str): The user's provided answer to validate
+        answer_type (type): The expected python type that the answer should match (e.g. str, int, list)
+        str: Returns "Answer is valid" if answer matches expected type, raises ValueError otherwise
+    Raises:
+        ValueError: If the answer's type does not match the expected answer_type
+    Example:
+        >>> validate_answer(1, "42", str)
+        True
+        >>> validate_answer(1, 42, str)
+        ValueError: Invalid answer type
+    """
+    logging.info(
+        {
+            "question_id": question_id,
+            "answer": answer,
+            "answer_type": answer_type,
+        }
+    )
+    if type(answer) is answer_type:
+        raise ValueError("Invalid answer type")
+    datastore.DATA_STORE["answers"].append(
+        {"question_id": question_id, "answer": answer}
+    )
+    return "Answer is valid"
+validate_answer_tool = {
+    "name": "validate_answer",
+    "description": "Validate the user's answer against an expected answer type",
+    "parameters": {
+        "type": "OBJECT",
+        "properties": {
+            "question_id": {
+                "type": "INTEGER",
+                "description": "The identifier of the question being validated",
+            },
+            "answer": {
+                "type": "STRING",
+                "description": "The user's provided answer to validate",
+            },
+            "answer_type": {
+                "type": "STRING",
+                "description": "The expected python type that the answer should match (e.g. str, int, list)",
+            },
+        },
+        "required": ["question_id", "answer", "answer_type"],
+    },
+}
+def store_input(role: str, input: str) -> str:
+    """Store conversation input in a JSON file.
+    Args:
+        role (str): The role of the speaker (user or assistant)
+        input (str): The text input to store
+    Returns:
+        str: Confirmation message
+    """
+    print(datastore.DATA_STORE)
+    conversation = datastore.DATA_STORE.get("conversation")
+    if conversation is None:
+        datastore.DATA_STORE["conversation"] = [{"role": role, "input": input}]
+    else:
+        datastore.DATA_STORE["conversation"].append({"role": role, "input": input})
+    return "Input stored successfully"
+store_input_tool = {
+    "name": "store_input",
+    "description": "Store user input in conversation history",
+    "parameters": {
+        "type": "OBJECT",
+        "properties": {
+            "role": {
+                "type": "STRING",
+                "description": "The role of the speaker (user or assistant)",
+            },
+            "input": {"type": "STRING", "description": "The text input to store"},
+        },
+    },
+}

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff