newspaper_classifier_segformer

This model is a fine-tuned version of nvidia/mit-b0 on a document OCR dataset. It classifies text document images into two categories: those requiring special segmentation processing (segment) and those that don't (no_segment). This classification is a critical preprocessing step in our OCR pipeline, enabling optimized document processing paths.

Model Details

Base Architecture: SegFormer (nvidia/mit-b0) - a transformer-based architecture that balances efficiency and performance for vision tasks
Training Dataset: taresco/document_ocr - specialized collection of text document images with segmentation annotations
Input Format: RGB images resized to 512×512 pixels
Output Classes:
- segment: Images containing two or more distinct, unrelated text segments that require special OCR processing
- no_segment: Images containing single, cohesive content that can follow standard

Intended Uses & Applications

OCR Pipeline Integration: Primary use is as a preprocessing classifier in OCR workflows for document digitization
Document Routing: Automatically route documents to specialized segmentation processing when needed
Batch Processing: Efficiently handle large collections of document archives by applying appropriate processing techniques
Digital Library Processing: Support for historical text document digitization projects

Training and evaluation data

The model was fine-tuned on the taresco/newspaper_ocr dataset. The dataset contains newspaper images labeled as either segment or no_segment.

Dataset Splits: Training Set: 19,111 examples, with 15% of this split set aside for cross-validation during training. Test Set: 4,787 examples

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-5
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
num_epochs: 3

Training results

The model achieved the following results on the evaluation set:

Loss: 0.0198
Accuracy: 99.62%

              precision    recall  f1-score   support

  no_segment       1.00      0.99      1.00      4471
     segment       0.91      0.98      0.95       316

    accuracy                           0.99      4787
   macro avg       0.95      0.99      0.97      4787
weighted avg       0.99      0.99      0.99      4787

How to Use

You can use this model with the Hugging Face transformers library:

from transformers import pipeline

# Load the pipeline
pipe = pipeline("image-classification", model="taresco/newspaper_classifier_segformer")

# Classify an image
image_path = "path_to_your_image.jpg"
result = pipe(image_path)
print(result)

Framework versions

Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 3.5.0
Tokenizers 0.21.0

taresco
/

newspaper_classifier_segformer