Image-Text-to-Text
Transformers
ONNX
Safetensors
English
idefics3
conversational

Improving data extraction from complex forms

#39
by nikogamulin - opened

I’m working on extracting structured data from Bills of Lading and similar documents using smalldocling. While the overall OCR performance is solid, I’ve run into a recurring issue: some table fields—especially those with long or multi-line text—aren’t being read, even though the image quality is high and the text is clearly legible.

This seems to affect fields like item descriptions or freight terms that:
• Span multiple lines within a cell
• Contain detailed specs, units, or numerical values
• Are nested inside dense table structures

Has anyone encountered this and found a reliable way to improve extraction?
I’m open to:
• Fine-tuning ideas
• Preprocessing tricks (e.g., image cleanup, table detection)
• Any Hugging Face model better suited for structured form extraction?

Thanks in advance—any pointers or shared experiences would be super helpful!

IMG_3901.jpeg

Sign up or log in to comment