newspaper_classifier_segformer
This model is a fine-tuned version of nvidia/mit-b0
on a document OCR dataset. It classifies text document images into two categories: those requiring special segmentation processing (segment
) and those that don't (no_segment
). This classification is a critical preprocessing step in our OCR pipeline, enabling optimized document processing paths.
Model Details
- Base Architecture: SegFormer (
nvidia/mit-b0
) - a transformer-based architecture that balances efficiency and performance for vision tasks - Training Dataset:
taresco/document_ocr
- specialized collection of text document images with segmentation annotations - Input Format: RGB images resized to 512ร512 pixels
- Output Classes:
segment
: Images containing two or more distinct, unrelated text segments that require special OCR processingno_segment
: Images containing single, cohesive content that can follow standard
Intended Uses & Applications
- OCR Pipeline Integration: Primary use is as a preprocessing classifier in OCR workflows for document digitization
- Document Routing: Automatically route documents to specialized segmentation processing when needed
- Batch Processing: Efficiently handle large collections of document archives by applying appropriate processing techniques
- Digital Library Processing: Support for historical text document digitization projects
Training and evaluation data
The model was fine-tuned on the taresco/newspaper_ocr dataset
. The dataset contains newspaper images labeled as either segment or no_segment.
Dataset Splits: Training Set: 19,111 examples, with 15% of this split set aside for cross-validation during training. Test Set: 4,787 examples
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-5
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
- num_epochs: 3
Training results
The model achieved the following results on the evaluation set:
- Loss: 0.0198
- Accuracy: 99.62%
precision recall f1-score support
no_segment 1.00 0.99 1.00 4471
segment 0.91 0.98 0.95 316
accuracy 0.99 4787
macro avg 0.95 0.99 0.97 4787
weighted avg 0.99 0.99 0.99 4787
How to Use
You can use this model with the Hugging Face transformers library:
from transformers import pipeline
# Load the pipeline
pipe = pipeline("image-classification", model="taresco/newspaper_classifier_segformer")
# Classify an image
image_path = "path_to_your_image.jpg"
result = pipe(image_path)
print(result)
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.0
- Downloads last month
- 26
Model tree for taresco/newspaper_classifier_segformer
Base model
nvidia/mit-b0