--- title: JARVIS Gaia Agent emoji: 🦾 colorFrom: indigo colorTo: green sdk: gradio pinned: false license: mit short_description: Enhanced JARVIS AI agent for GAIA benchmark models: - meta-llama/Llama-3.2-1B-Instruct - sentence-transformers/all-MiniLM-L6-v2 datasets: - gaia-benchmark/GAIA --- # Evolved JARVIS Gaia Agent An advanced Python-based AI agent built with `langchain`, `langgraph`, SERPAPI, and OCR capabilities for web searches, file parsing, image analysis, and data retrieval. Deployed as a Hugging Face Space (`onisj/jarvis_gaia_agent`) for evaluating performance on the GAIA benchmark, targeting a score >30% (6/20 correct). ## Features - **Web Search**: Integrates SERPAPI and DuckDuckGo for robust, multi-hop searches. - **File Parsing**: Processes CSV, TXT, Excel, and PDF files for GAIA tasks. - **Image Parsing**: Uses OCR (`easyocr`) to extract text from images. - **Data Retrieval**: Includes a guest info retriever for structured queries. - **External APIs**: Supports weather data (OpenWeatherMap) and Hugging Face Hub stats. - **State Management**: Employs `langgraph` for multi-step reasoning workflows. - **Exact-Match Answers**: Optimized for GAIA Level 1 questions with precise formatting (e.g., USD to two decimals, comma-separated lists). - **Gradio Interface**: Provides a user-friendly UI for running evaluations and submitting answers. ## Directory Structure ``` jarvis_gaia_agent/ β”œβ”€β”€ app.py # Main Gradio application with agent logic β”œβ”€β”€ state.py # Defines JARVISState for LangGraph state management β”œβ”€β”€ search.py # Web search tools (SERPAPI, multi-hop search) β”œβ”€β”€ tools/ # Directory for all tools β”‚ β”œβ”€β”€ __init__.py # Exports all tools β”‚ β”œβ”€β”€ file_parser.py # Parses CSV, TXT, Excel, and PDF files β”‚ β”œβ”€β”€ image_parser.py # OCR-based image parsing β”‚ β”œβ”€β”€ calculator.py # Mathematical calculations β”‚ β”œβ”€β”€ document_retriever.py # PDF document retrieval β”‚ β”œβ”€β”€ duckduckgo_search.py # DuckDuckGo search integration β”‚ β”œβ”€β”€ weather_info.py # Weather data via OpenWeatherMap β”‚ β”œβ”€β”€ hub_stats.py # Hugging Face Hub statistics β”‚ β”œβ”€β”€ guest_info.py # Guest information retrieval β”œβ”€β”€ requirements.txt # Python dependencies β”œβ”€β”€ README.md # Project documentation β”œβ”€β”€ .gitignore # Excludes .env, temp/, etc. β”œβ”€β”€ temp/ # Temporary directory for GAIA files (created at runtime) ``` ## Models and Datasets - **Models**: - `meta-llama/Llama-3.2-1B-Instruct`: Primary LLM for reasoning and tool selection (Hugging Face Inference API or local). - `sentence-transformers/all-MiniLM-L6-v2`: Embedding model for text similarity tasks. - Note: Together AI models (`meta-llama/Llama-3.3-70B-Instruct-Turbo-Free`, `deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free`) are used via API but not hosted on Hugging Face, so they’re not listed in metadata. - **Datasets**: - `gaia-benchmark/GAIA`: Benchmark dataset for evaluating agent performance. ## Prerequisites - **Python**: 3.9 or higher. - **Tesseract OCR**: Required for image parsing. - macOS: `brew install tesseract` - Ubuntu: `sudo apt-get install tesseract-ocr` - Windows: Install via [Tesseract Installer](https://github.com/UB-Mannheim/tesseract/wiki). - **API Keys**: Set in `.env` (local) or Hugging Face Space Secrets (deployment): - `HUGGINGFACEHUB_API_TOKEN`: Hugging Face token for model access. - `TOGETHER_API_KEY`: Together AI API key for LLM inference. - `SERPAPI_API_KEY`: SERPAPI key for web searches. - `OPENWEATHERMAP_API_KEY`: OpenWeatherMap key for weather queries. - `SPACE_ID`: `onisj/jarvis_gaia_agent`. - Install dependencies: ```bash pip install -r requirements.txt ``` ## Setup and Local Testing 1. **Clone the Repository**: ```bash git clone https://huggingface.co/spaces/onisj/jarvis_gaia_agent cd jarvis_gaia_agent ``` 2. **Create Virtual Environment**: ```bash python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate ``` 3. **Install Dependencies**: ```bash pip install -r requirements.txt ``` 4. **Configure Environment Variables**: Create a `.env` file: ```text SPACE_ID=onisj/jarvis_gaia_agent HUGGINGFACEHUB_API_TOKEN=your_hf_token TOGETHER_API_KEY=your_together_api_key SERPAPI_API_KEY=your_serpapi_key OPENWEATHERMAP_API_KEY=your_openweather_key ``` 5. **Test with Mock File** (optional): ```bash mkdir temp echo "Item,Type,Sales\nBurger,Food,1000\nCola,Drink,500" > temp/7bd855d8-463d-4ed5-93ca-5fe35145f733.xlsx ``` 6. **Run Locally**: ```bash python app.py ``` - Open `http://127.0.0.1:7860` (port may vary). - Log in with Hugging Face credentials. - Click β€œRun Evaluation & Submit All Answers” to test GAIA tasks. ## Deployment to Hugging Face Space 1. **Push Code**: ```bash git add . git commit -m "Update JARVIS Gaia Agent with README metadata" git push origin main ``` 2. **Set Space Secrets**: - Go to `https://huggingface.co/spaces/onisj/jarvis_gaia_agent` > Settings > Repository Secrets. - Add: - `SPACE_ID`: `onisj/jarvis_gaia_agent` - `HUGGINGFACEHUB_API_TOKEN` - `TOGETHER_API_KEY` - `SERPAPI_API_KEY` - `OPENWEATHERMAP_API_KEY` 3. **Build and Run**: - Hugging Face auto-builds the Space after pushing. - Access the Gradio interface at `https://onisj-jarvis-gaia-agent.hf.space`. - Log in and click β€œRun Evaluation & Submit All Answers” to submit GAIA answers. 4. **Verify Submission**: - Check `status_output` for: ``` Submission Successful! User: your_username Overall Score: XX% (Y/20 correct) Message: ... ``` - Aim for >30% (6/20 correct). ## Troubleshooting - **Model Access (404)**: Verify API keys; test `initialize_llm` locally. - **SERPAPI Timeout**: Ensure `SERPAPI_API_KEY` is valid; check `search.py` logs. - **GAIA File Access**: Confirm `temp/` directory permissions; test `download_file`. - **Low GAIA Score**: Analyze `results_table` for errors; enhance `multi_hop_search_tool` or answer formatting. - **Logs**: Check Space > Settings > Logs for build/run errors. ## License MIT License. See [LICENSE](LICENSE) for details. ## Acknowledgements - Built with `langchain`, `langgraph`, and Hugging Face tools. - Evaluated on the GAIA benchmark (`gaia-benchmark/GAIA`).