|
--- |
|
title: Extract PDF content tool from file upload |
|
emoji: 🗂️ |
|
colorFrom: green |
|
colorTo: pink |
|
sdk: gradio |
|
sdk_version: 5.11.0 |
|
app_file: app.py |
|
pinned: true |
|
tags: |
|
- tool |
|
- pdf |
|
--- |
|
|
|
Use this tool for extracting data from pdf. |
|
|
|
|
|
Small example: |
|
|
|
```py |
|
pdf_extraction_tool = Tool.from_space( |
|
"matterattetatte/pdf-extractor-tool", |
|
name="pdf-extractor", |
|
description="Extract data" |
|
) |
|
|
|
pdf_extraction_tool("Extract all headlines from all pdfs in folder pdfs") |
|
``` |
|
|
|
|
|
Full-fledged example (managed and managing agents): |
|
|
|
|
|
```py |
|
from smolagents import CodeAgent, HfApiModel, ManagedAgent, ToolCallingAgent, GradioUI, Tool |
|
from huggingface_hub import login |
|
import os |
|
|
|
login('hf_*******') |
|
|
|
pdf_agent = ToolCallingAgent( |
|
tools=[Tool.from_space("matterattetatte/pdf-extractor-tool", name="pdf-extractor", description="Extract data")], |
|
model=HfApiModel(), |
|
max_steps=4, |
|
) |
|
|
|
managed_pdf_agent = ManagedAgent( |
|
agent=pdf_agent, |
|
name="extraction", |
|
description="Returns the content of pdf files in a string. Give it your path as an argument. Also, this agent should link to the files it is are using.", |
|
) |
|
|
|
manager_agent = CodeAgent( |
|
tools=[], |
|
model=HfApiModel(model_id = "Qwen/Qwen2.5-Coder-32B-Instruct"), |
|
managed_agents=[managed_pdf_agent], |
|
additional_authorized_imports=['os', 're'], |
|
) |
|
|
|
manager_agent.run("Read file pdfs/my_file.pdf and summarize its content for me. I want to understand how to do things") |
|
|
|
GradioUI(manager_agent).launch() |
|
|
|
``` |