IDWeekAgents / AI_LITERACY_GUIDE.md
IDAgents Developer
Add API load testing suite and rate limiters for workshop readiness
13537fe

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

🧠 ID Agents: AI Literacy Guide for Infectious Diseases Clinicians

This guide orients clinicians to the ID Agents application, how agents and tools work, and key AI literacy concepts for safe, effective use in ID workflows. It is designed for the ID Week workshop and references concrete features of the app.


1) What this app is (and is not)

  • What it is: A workspace to build, test, and use infectious diseases chat agents that can reason with patient context, call domain tools ("skills"), and—optionally—coordinate multiple specialist agents.
  • What it is not: An EMR integration or a replacement for clinical judgment. Outputs are decision support; clinicians remain the final authority.

Core building blocks you’ll interact with:

  • Agents: Specialized LLM-powered assistants with selected tools/skills
  • Tools/Skills: Domain functions an agent can call (e.g., de-escalation, PubMed search)
  • Patient Cards: Pre-loaded scenarios that populate the chat with clinical variables or task context
  • Variables: Structured fields (e.g., culture results, CrCl) that the app passes into an agent as JSON context
  • Orchestrator: A conductor agent that supervises and delegates to your other agents

2) The six agents you’ll use

Each agent is a configuration: a name, a set of skills (tools), and a role prompt describing its mission.

  1. SmartSteward (Antimicrobial Stewardship)

    • Goal: De-escalation, empiric therapy guidance, stewardship insights
    • Typical inputs: Culture results, current meds, site, CrCl, severity, allergies
    • Typical outputs: De-escalation options, dosing considerations, rationale
  2. InfectoGuard (Infection Prevention & Control)

    • Goal: Apply surveillance definitions, isolation recommendations, reporting
    • Typical inputs: Facility/location, infection type, onset date, device days, pathogen/resistance
    • Typical outputs: NHSN criteria evaluation, isolation precautions, reporting checklist
  3. ResearchRanger (Literature & Evidence)

    • Goal: Rapid literature search and synthesis
    • Typical inputs: Research topic/focus (from card context or your prompt)
    • Typical outputs: Recent studies with citations/links, brief syntheses
  4. ClinicoPilot (Clinical Assessment & Reasoning)

    • Goal: History-taking scaffolding, differential diagnosis, workup planning
    • Typical inputs: Chief complaint, HPI, PMH, meds, allergies, vitals, PE, labs, imaging
    • Typical outputs: DDx list, test plan, initial management suggestions
  5. EduMedCoach (Education & Teaching)

    • Goal: Create teaching content, quiz items, summaries for learners
    • Typical inputs: Topic, level, requested formats
    • Typical outputs: MCQs, flashcards, mini-lectures, lay summaries
  6. ID Maestro (Orchestrator)

    • Goal: Coordinate multiple specialist agents for complex cases
    • Typical inputs: All clinical variables across sections; can invoke your subagents
    • Typical outputs: A synthesized plan that cites subagent contributions

3) Tools/skills available (who can use what)

Below are representative skills registered in the app (see Tools Registry) and which agents typically use them.

Stewardship skills

  • recommend_deescalation → SmartSteward
  • alert_prolonged_antibiotic_use → SmartSteward
  • recommend_empiric_therapy → SmartSteward, ClinicoPilot (optionally)

Clinical reasoning & guidance

  • history_taking → ClinicoPilot
  • retrieve_guidelines → ClinicoPilot (and others)
  • explain_in_layman_language → ClinicoPilot, EduMedCoach

Infection Prevention

  • IPC_reporting → InfectoGuard
  • NHSN_criteria_evaluator → InfectoGuard
  • recommend_isolation_precautions → InfectoGuard

Evidence & Education

  • search_pubmed → ResearchRanger (and Orchestrator when coordinating)

Coordinator (Orchestrator)

  • No unique tool name; instead it is "agentic": it plans and invokes your other configured agents based on the task

Tip: If you don’t see a tool being used, ensure the agent actually has that skill selected in its configuration.


4) How data flows through the app

From a patient card to the agent’s reasoning, step by step:

  1. Select an agent → The app shows only the relevant variable sections (e.g., Stewardship vs IPC vs Clinical Assessment).
  2. Click a Patient Card → The app pre-populates those variables and/or inserts a case context message into the chat.
  3. Send a prompt → The app prepends structured context to your prompt using explicit tags so the agent sees the data. Examples:
    • [DEESCALATION_TOOL_INPUT] { ...stewardship fields... }
    • [EMPIRIC_THERAPY_INPUT] { ...empiric fields... }
    • [CLINICAL_ASSESSMENT_INPUT] { ...clinical H&P fields... }
    • [IPC_CONTEXT] { ...IPC fields... }
    • [ORCHESTRATOR_CONTEXT] { ...all sections... }
  4. Agent tool use → In the chat you’ll see lines like: "🔔 Tool history_taking invoked" when a tool/skill is activated.
  5. Orchestrator flow → When using ID Maestro, it plans, invokes your subagents (SmartSteward, InfectoGuard, etc.), and synthesizes their responses.

Privacy & isolation

  • Multi-user isolation: Each user’s agents and chat histories are isolated per login. Your orchestrator only sees your subagents.
  • Patient identifiers: The workshop app is for simulated cases; avoid entering PHI/PII.

5) AI literacy: mental models and concepts

A. Generative AI 101

  • LLMs are probability machines: They predict tokens based on context and instructions. Good context → better outputs.
  • Prompts vs context: You can ask a question (prompt) and also pass structured context (variables) the model will use.
  • Determinism: Temperature and randomness mean repeated runs can differ slightly. Use consistent prompts and context for repeatability.

B. Agentic workflows

  • Plan–Act–Observe loop: Agents plan, call tools, and refine. You’ll see when a tool is invoked in the chat.
  • Tool grounding: Tools act like validated functions that constrain what the model can do (e.g., PubMed search, IPC evaluation).
  • Orchestration: The conductor model (ID Maestro) routes subtasks to the right specialist agents and composes a final answer.

C. Safety, reliability, and verification

  • Hallucinations: LLMs can be confidently wrong. Use tools (search_pubmed, guidelines) to ground outputs in sources.
  • Provenance & citations: Prefer outputs that include references and links (PubMed, guidelines). Verify before acting.
  • Scope & guardrails: Use the right agent for the task; each agent’s skills limit its behavior and reduce error risk.
  • Clinical validation: Treat results as decision support. Cross-check with institutional policies and current guidelines.
  • Privacy & compliance: Do not paste identifiable patient data. The workshop app is for simulated cases only.

D. Prompts that work for clinicians

  • Context-first: Click a patient card, verify variables are populated, then ask your question.
  • Specificity helps: "Given the MSSA culture and CrCl 45, what is your de-escalation plan?"
  • For research: Reference the loaded topic implicitly (now supported) or explicitly: "Find latest CRE treatment trials (past 2 years)."

6) Hands-on exercises (workshop)

  1. Stewardship de-escalation (SmartSteward + Card 1)

    • Ask: "Given the loaded culture and renal function, recommend de-escalation and dosing adjustments."
    • Evaluate: Does it name a beta-lactam for MSSA? Does it account for CrCl?
  2. IPC CLABSI evaluation (InfectoGuard + Card 2)

    • Ask: "Evaluate NHSN CLABSI criteria and list isolation steps for this case."
    • Evaluate: Does it use facility, device days, MRSA, onset date? Are precautions appropriate?
  3. Research synthesis (ResearchRanger + Card 3)

    • Ask: "Based on the loaded query, find novel CRE treatment studies (past 2 years)."
    • Evaluate: Are citations recent and relevant? Are links provided?
  4. Clinical DDx + workup (ClinicoPilot + Card 4)

    • Ask: "Provide DDx, labs, imaging, and initial management for this traveler."
    • Evaluate: Does it include dengue/malaria/typhoid? Are tests reasonable?
  5. Teaching content (EduMedCoach + Card 5)

    • Ask: "Create 5 MCQs + answers on carbapenemases for a 3rd-year student."
    • Evaluate: Clear stems, answer keys, brief teaching points?
  6. Orchestrated plan (ID Maestro + Card 6)

    • Ask: "Coordinate a comprehensive plan across stewardship, IPC, and clinical assessment."
    • Evaluate: Do you see subagent invocations? Is the final plan synthesized and coherent?

Bonus: Create your own agent

  • Build "Fungal Focus" with skills: retrieve_guidelines, history_taking, explain_in_layman_language, search_pubmed.
  • Test on a new patient card you compose in the chat. Verify tool invocations.

7) Troubleshooting & tips

  • Agent asks for info already on screen → Ensure the correct agent is selected and the patient card variables are visible. The app injects JSON tags like [CLINICAL_ASSESSMENT_INPUT]; resend your question.
  • IPC agent doesn’t use facility/pathogen → Confirm IPC variables are populated; the app prepends [IPC_CONTEXT].
  • Research agent asks for topic → After the fix, it uses the loaded research context; if needed, restate the topic explicitly.
  • Orchestrator doesn’t delegate → Ensure you created subagents under your account. The orchestrator only sees your agents.
  • PubMed rate limits → Add NCBI_EMAIL (and optionally NCBI_API_KEY) in Space settings for smoother searches.

8) FAQ for clinicians

  • Can I rely on answers without checking? → No. Treat outputs as decision support; verify with sources and guidelines.
  • Where do the facts come from? → From your variables, your prompt, and any tools invoked (PubMed, guidelines). Prefer answers with citations.
  • Can the agent see other users’ agents or chats? → No. Sessions and agents are isolated per user.
  • Is this HIPAA-compliant? → This workshop instance is for de-identified/simulated data only.

9) Glossary

  • Agent: An LLM configuration with a role and a list of tools it is allowed to use.
  • Tool/Skill: A function the agent can call (e.g., search_pubmed, recommend_deescalation).
  • Orchestrator: An agent that plans and delegates to other agents.
  • Context injection: Passing structured variables into the model before your question.
  • Hallucination: A confident but incorrect statement from the model; mitigate via tools and verification.

10) Safety checklist (use before acting on outputs)

  • Does the answer cite sources (PubMed/guidelines)?
  • Did the agent use the loaded clinical variables or IPC context?
  • Are recommendations aligned with local policies and patient-specific factors?
  • For antibiotics: spectrum, dosing (CrCl), duration, and interactions considered?
  • For IPC: correct precautions, reporting steps, and documentation?

11) Where to find things in this app

  • Patient Cards & Variables: Chat panel (bottom) → select a card, watch variable sections populate
  • Tool Invocations: Look for "🔔 Tool ... invoked" messages in chat
  • Orchestrator Subagents: Use your own configured agents; the orchestrator only sees yours
  • PubMed Setup: See PUBMED_SETUP.md in the repo
  • This Guide: AI_LITERACY_GUIDE.md

Made for the ID Week AI Literacy Workshop. Use responsibly, verify rigorously, and keep humans in the loop.

12) Tool-by-tool reference (what they do and how to use them)

This section summarizes every tool registered in the app so you know what each one expects and returns. Inputs map directly to JSON fields the agent passes when it invokes a tool.

Stewardship tools

  • recommend_deescalation

    • Purpose: Narrow antibiotics based on culture/susceptibility and context.
    • Inputs: culture, meds, site_of_infection, risk_of_biofilm, current_response, creatinine_clearance, severity_of_infection, known_allergies.
    • Output: A text recommendation string.
    • Typical user: SmartSteward.
  • recommend_empiric_therapy

    • Purpose: Suggest empiric therapy using patient profile and stewardship variables.
    • Inputs: age, allergies, labs, culture, meds, site_of_infection, risk_of_biofilm, current_response, creatinine_clearance, severity_of_infection.
    • Output: A text recommendation string.
    • Typical users: SmartSteward, ClinicoPilot (optional).
  • alert_prolonged_antibiotic_use

    • Purpose: Summarize guideline-recommended antibiotic durations for a condition.
    • Inputs: condition, site_of_infection, risk_of_biofilm, current_response, creatinine_clearance, severity_of_infection, known_allergies.
    • Output: A text duration summary.
    • Typical user: SmartSteward.
  • invoke_stewardship_agent

    • Purpose: Delegate a user prompt to the stewardship agent with a list of enabled stewardship skills.
    • Inputs: user_prompt, enabled_skills[]
    • Output: Text status/result.
    • Typical user: Orchestrator when coordinating stewardship.

Clinical reasoning and guidance

  • history_taking

    • Purpose: Guided history-taking for a syndrome using a dynamic schema from an internal KB.
    • Inputs: dynamic per syndrome; required fields are generated from KB. The Clinical Assessment section is injected via [CLINICAL_ASSESSMENT_INPUT].
    • Output: A synthesized history response text.
    • Typical user: ClinicoPilot.
  • retrieve_guidelines

    • Purpose: Find IDSA (and closely related official) guidelines, extract key points, and summarize.
    • Inputs: topic, specific_focus?
    • Output: Dict with guidelines_found, question_summary, guidelines[] (title, url, publication_year, key_recommendations...), plus a source note.
    • Env: Uses web search via search_internet under the hood (SERPER_API_KEY required).
    • Typical users: ClinicoPilot, Orchestrator.
  • explain_in_layman_language

    • Purpose: Rewrite assessment/plan in patient-friendly language and add 2–3 educational links.
    • Inputs: assessment_and_plan, patient_context?
    • Output: Dict with layman_explanation, educational_resources[], key_topics_covered.
    • Typical users: ClinicoPilot, EduMedCoach.

Infection prevention and control

  • NHSN_criteria_evaluator

    • Purpose: Two-phase helper for NHSN definitions: (1) return required fields; (2) evaluate a filled case.
    • Inputs: case_description, definition_type, fields? (object)
    • Output: If fields missing: required_fields[]. If provided: evaluation stub (meets_definition/reasoning).
    • Typical user: InfectoGuard.
  • recommend_isolation_precautions

    • Purpose: Recommend isolation precautions given diagnosis/symptoms/pathogens.
    • Inputs: diagnosis, symptoms, pathogen_list.
    • Output: Recommendation text.
    • Typical user: InfectoGuard.
  • IPC_reporting

    • Purpose: Generate IPC/public health reporting with current requirements discovered online.
    • Inputs: case_summary, jurisdiction (or facility), city_state?, fields? (object of field→value).
    • Behavior: Phase-1 discovers required_fields and reporting_info; Phase-2 creates a formatted report when fields are provided.
    • Output: Phase-1: required_fields, reporting_info, organism, location. Phase-2: report, file_name, organism, location.
    • Env: Uses search_internet (SERPER_API_KEY). Accepts facility names and maps to jurisdictions; attempts organism extraction from case_summary.
    • Typical user: InfectoGuard.

Evidence and research

  • search_pubmed

    • Purpose: Query PubMed via NCBI E-utilities and return top results with links.
    • Inputs: q, max_results?, email? (defaults to env NCBI_EMAIL if not provided).
    • Output: List of articles [{uid,title,authors[],pubdate,source,link}]. Returns [] if no hits.
    • Env: NCBI_EMAIL recommended; NCBI_API_KEY optional to improve rate limits.
    • Typical users: ResearchRanger, Orchestrator.
  • search_internet

    • Purpose: General web search via Serper; returns a synthesized string of titles/snippets/links.
    • Inputs: q, max_results?; optionally trusted_links used internally by some tools.
    • Output: Markdown-like string with entries and Read more links.
    • Env: SERPER_API_KEY required.
    • Typical users: Multiple tools call it internally; Orchestrator may prompt tools that call it.

Education tools

  • generate_board_exam_question

    • Purpose: Create board-style vignette questions with explanations through an AI pipeline.
    • Inputs: topic, difficulty_level?, question_type?
    • Output: Dict including vignette, question_stem, answer_choices, explanations.
    • Typical user: EduMedCoach.
  • generate_flash_cards

    • Purpose: Produce flashcards (front/back) organized by subtopic with tips and a study schedule.
    • Inputs: topic, number_of_cards?, card_type?, difficulty_level?
    • Output: Dict: flash_cards[], organized_by_subtopic, study_tips, review_schedule.
    • Typical user: EduMedCoach.
  • create_educational_presentation

    • Purpose: Iterative research → report → slide deck structure and content, with speaker notes.
    • Inputs: topic, target_audience?, presentation_duration?, focus_area?, aspects_to_emphasize?, guidelines_to_include?, learning_objectives?, clinical_scenarios?, takeaway_message?
    • Output: Dict with slides[], speaker_notes, metadata, and research_report.
    • Env: Uses search_internet (SERPER_API_KEY) and LLM.
    • Typical user: EduMedCoach.

Publishing and formatting

  • suggest_journals_for_submission

    • Purpose: Suggest journals based on title/abstract/area via internet search and heuristics.
    • Inputs: title, abstract, research_area, study_type?, target_audience?
    • Output: Formatted ranked journal suggestions text.
    • Env: Uses search_internet (SERPER_API_KEY).
    • Typical user: ResearchRanger, EduMedCoach.
  • format_references

    • Purpose: Format references to a journal’s style; returns formatted entries and the inferred style notes.
    • Inputs: references_text, target_journal, max_length?
    • Output: Dict with formatted_references, formatting_guidelines, status.
    • Env: Optionally uses SERPER_API_KEY to look up style info; falls back to medical Vancouver/AMA patterns.
    • Typical user: ResearchRanger, EduMedCoach.

Data and utilities

  • synthetic_patient_lookup
    • Purpose: Fetch synthetic FHIR patient labs/vitals by patient_id (placeholder implementation).
    • Inputs: patient_id
    • Output: Dict with patient_id, labs[], vitals[].
    • Typical user: ClinicoPilot (future), Orchestrator demos.

13) Cross-cutting concepts and patterns

  • JSON argument schemas: Each tool publishes an args_schema; agents must supply exactly those fields. The app helps by injecting tagged JSON blocks from variable sections so the model sees and can populate tool calls correctly.
  • Two-phase tools: Some tools first discover required fields (NHSN_criteria_evaluator, IPC_reporting) then complete the task once fields are provided. You’ll see interim messages listing missing_fields or required_fields.
  • Errors and resilience: Tools raise ToolExecutionError on failure. Upstream agents typically recover by retrying or degrading gracefully. Internet-backed tools use timeouts and limited retries.
  • Environment configuration: SERPER_API_KEY is required for web search; NCBI_EMAIL recommended and NCBI_API_KEY optional for PubMed. See PubMed setup doc in this repo.
  • Observability: When a tool is called, the chat shows “🔔 Tool … invoked”. Use that trace to verify the agent grounded its answer in a tool result.
  • Safety and provenance: Prefer tools that return links/citations (search_pubmed, retrieve_guidelines, search_internet-based) and include them in your final answer; verify before acting clinically.

Quick env setup tips

  • Add SERPER_API_KEY in the Space/App secrets so internet-backed tools function.
  • Add NCBI_EMAIL (and optionally NCBI_API_KEY) for smoother PubMed usage.
  • If a tool says a field is missing, check the relevant variable section is visible and populated, or restate in your prompt.