Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
merve 
posted an update 26 days ago
Post
604
Dataset Viewer for PDFs just landed on Hugging Face 📖🤗 you can now preview all the PDFs easier than before!

on top of this, there's PdfFolder format to load the PDF datasets quicker 💨
> to use it, your dataset should follow a directory format like folder/train/doc1.pdf, folder/train/doc1.pdf
> if you want to include bounding boxes, labels etc. you can keep them in a metadata.csv file in the same folder 🤝

read document dataset docs https://huggingface.co/docs/datasets/main/en/document_dataset
check all the document datasets here https://huggingface.co/datasets?modality=modality:document&sort=trending 📖

This is a great step forward—making it much easier to preview and load PDF datasets on Hugging Face! At TeraVera, we’ve also built tools that work seamlessly with PdfFolder formats, using secure APIs and RAG-based retrieval to let you upload documents, extract structured data, and ask any questions with strong data privacy guarantees. Perfect for building smarter, safer document workflows!

Please check out: https://www.teravera.com/

Also, to request the API access and documentation, please sign up here: www.teravera.com/api-access-form/

In this post