jarvis_gaia_agent / README.md
onisj's picture
feat(tools): add more tool to extend the functionaily of jarvis
751d628

A newer version of the Gradio SDK is available: 5.33.0

Upgrade
metadata
title: JARVIS Gaia Agent
emoji: 🦾
colorFrom: indigo
colorTo: green
sdk: gradio
pinned: false
license: mit
short_description: Enhanced JARVIS AI agent for GAIA benchmark
models:
  - meta-llama/Llama-3.2-1B-Instruct
  - sentence-transformers/all-MiniLM-L6-v2
datasets:
  - gaia-benchmark/GAIA

Evolved JARVIS Gaia Agent

An advanced Python-based AI agent built with langchain, langgraph, SERPAPI, and OCR capabilities for web searches, file parsing, image analysis, and data retrieval. Deployed as a Hugging Face Space (onisj/jarvis_gaia_agent) for evaluating performance on the GAIA benchmark, targeting a score >30% (6/20 correct).

Features

  • Web Search: Integrates SERPAPI and DuckDuckGo for robust, multi-hop searches.
  • File Parsing: Processes CSV, TXT, Excel, and PDF files for GAIA tasks.
  • Image Parsing: Uses OCR (easyocr) to extract text from images.
  • Data Retrieval: Includes a guest info retriever for structured queries.
  • External APIs: Supports weather data (OpenWeatherMap) and Hugging Face Hub stats.
  • State Management: Employs langgraph for multi-step reasoning workflows.
  • Exact-Match Answers: Optimized for GAIA Level 1 questions with precise formatting (e.g., USD to two decimals, comma-separated lists).
  • Gradio Interface: Provides a user-friendly UI for running evaluations and submitting answers.

Directory Structure

jarvis_gaia_agent/
β”œβ”€β”€ app.py                  # Main Gradio application with agent logic
β”œβ”€β”€ state.py                # Defines JARVISState for LangGraph state management
β”œβ”€β”€ search.py               # Web search tools (SERPAPI, multi-hop search)
β”œβ”€β”€ tools/                  # Directory for all tools
β”‚   β”œβ”€β”€ __init__.py         # Exports all tools
β”‚   β”œβ”€β”€ file_parser.py      # Parses CSV, TXT, Excel, and PDF files
β”‚   β”œβ”€β”€ image_parser.py     # OCR-based image parsing
β”‚   β”œβ”€β”€ calculator.py       # Mathematical calculations
β”‚   β”œβ”€β”€ document_retriever.py # PDF document retrieval
β”‚   β”œβ”€β”€ duckduckgo_search.py # DuckDuckGo search integration
β”‚   β”œβ”€β”€ weather_info.py     # Weather data via OpenWeatherMap
β”‚   β”œβ”€β”€ hub_stats.py        # Hugging Face Hub statistics
β”‚   β”œβ”€β”€ guest_info.py       # Guest information retrieval
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ README.md               # Project documentation
β”œβ”€β”€ .gitignore              # Excludes .env, temp/, etc.
β”œβ”€β”€ temp/                   # Temporary directory for GAIA files (created at runtime)

Models and Datasets

  • Models:
    • meta-llama/Llama-3.2-1B-Instruct: Primary LLM for reasoning and tool selection (Hugging Face Inference API or local).
    • sentence-transformers/all-MiniLM-L6-v2: Embedding model for text similarity tasks.
    • Note: Together AI models (meta-llama/Llama-3.3-70B-Instruct-Turbo-Free, deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free) are used via API but not hosted on Hugging Face, so they’re not listed in metadata.
  • Datasets:
    • gaia-benchmark/GAIA: Benchmark dataset for evaluating agent performance.

Prerequisites

  • Python: 3.9 or higher.
  • Tesseract OCR: Required for image parsing.
    • macOS: brew install tesseract
    • Ubuntu: sudo apt-get install tesseract-ocr
    • Windows: Install via Tesseract Installer.
  • API Keys: Set in .env (local) or Hugging Face Space Secrets (deployment):
    • HUGGINGFACEHUB_API_TOKEN: Hugging Face token for model access.
    • TOGETHER_API_KEY: Together AI API key for LLM inference.
    • SERPAPI_API_KEY: SERPAPI key for web searches.
    • OPENWEATHERMAP_API_KEY: OpenWeatherMap key for weather queries.
    • SPACE_ID: onisj/jarvis_gaia_agent.
  • Install dependencies:
    pip install -r requirements.txt
    

Setup and Local Testing

  1. Clone the Repository:

    git clone https://huggingface.co/spaces/onisj/jarvis_gaia_agent
    cd jarvis_gaia_agent
    
  2. Create Virtual Environment:

    python -m venv venv
    source venv/bin/activate  # Windows: venv\Scripts\activate
    
  3. Install Dependencies:

    pip install -r requirements.txt
    
  4. Configure Environment Variables: Create a .env file:

    SPACE_ID=onisj/jarvis_gaia_agent
    HUGGINGFACEHUB_API_TOKEN=your_hf_token
    TOGETHER_API_KEY=your_together_api_key
    SERPAPI_API_KEY=your_serpapi_key
    OPENWEATHERMAP_API_KEY=your_openweather_key
    
  5. Test with Mock File (optional):

    mkdir temp
    echo "Item,Type,Sales\nBurger,Food,1000\nCola,Drink,500" > temp/7bd855d8-463d-4ed5-93ca-5fe35145f733.xlsx
    
  6. Run Locally:

    python app.py
    
    • Open http://127.0.0.1:7860 (port may vary).
    • Log in with Hugging Face credentials.
    • Click β€œRun Evaluation & Submit All Answers” to test GAIA tasks.

Deployment to Hugging Face Space

  1. Push Code:

    git add .
    git commit -m "Update JARVIS Gaia Agent with README metadata"
    git push origin main
    
  2. Set Space Secrets:

    • Go to https://huggingface.co/spaces/onisj/jarvis_gaia_agent > Settings > Repository Secrets.
    • Add:
      • SPACE_ID: onisj/jarvis_gaia_agent
      • HUGGINGFACEHUB_API_TOKEN
      • TOGETHER_API_KEY
      • SERPAPI_API_KEY
      • OPENWEATHERMAP_API_KEY
  3. Build and Run:

    • Hugging Face auto-builds the Space after pushing.
    • Access the Gradio interface at https://onisj-jarvis-gaia-agent.hf.space.
    • Log in and click β€œRun Evaluation & Submit All Answers” to submit GAIA answers.
  4. Verify Submission:

    • Check status_output for:
      Submission Successful!
      User: your_username
      Overall Score: XX% (Y/20 correct)
      Message: ...
      
    • Aim for >30% (6/20 correct).

Troubleshooting

  • Model Access (404): Verify API keys; test initialize_llm locally.
  • SERPAPI Timeout: Ensure SERPAPI_API_KEY is valid; check search.py logs.
  • GAIA File Access: Confirm temp/ directory permissions; test download_file.
  • Low GAIA Score: Analyze results_table for errors; enhance multi_hop_search_tool or answer formatting.
  • Logs: Check Space > Settings > Logs for build/run errors.

License

MIT License. See LICENSE for details.

Acknowledgements

  • Built with langchain, langgraph, and Hugging Face tools.
  • Evaluated on the GAIA benchmark (gaia-benchmark/GAIA).