Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / chunked /content_aware_chunking /_model_summary /chunk_30.txt

Ahmadzei

update 1

57bdca5 over 1 year ago

raw

history blame

473 Bytes

The encoder is a ViT-style model for image understanding and processes the image as fixed-size patches. The decoder accepts the encoder's hidden states and autoregressively generates text. Donut is a more general visual document understanding model that doesn't rely on OCR-based approaches. It uses a Swin Transformer as the encoder and multilingual BART as the decoder. Donut is pretrained to read text by predicting the next word based on the image and text annotations.