newspaper_classifier_segformer

This model is a fine-tuned version of nvidia/mit-b0 on a document OCR dataset. It classifies text document images into two categories: those requiring special segmentation processing (segment) and those that don't (no_segment). This classification is a critical preprocessing step in our OCR pipeline, enabling optimized document processing paths.

Model Details

  • Base Architecture: SegFormer (nvidia/mit-b0) - a transformer-based architecture that balances efficiency and performance for vision tasks
  • Training Dataset: taresco/document_ocr - specialized collection of text document images with segmentation annotations
  • Input Format: RGB images resized to 512ร—512 pixels
  • Output Classes:
    • segment: Images containing two or more distinct, unrelated text segments that require special OCR processing
    • no_segment: Images containing single, cohesive content that can follow standard

Intended Uses & Applications

  • OCR Pipeline Integration: Primary use is as a preprocessing classifier in OCR workflows for document digitization
  • Document Routing: Automatically route documents to specialized segmentation processing when needed
  • Batch Processing: Efficiently handle large collections of document archives by applying appropriate processing techniques
  • Digital Library Processing: Support for historical text document digitization projects

Training and evaluation data

The model was fine-tuned on the taresco/newspaper_ocr dataset. The dataset contains newspaper images labeled as either segment or no_segment.

Dataset Splits: Training Set: 19,111 examples, with 15% of this split set aside for cross-validation during training. Test Set: 4,787 examples

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-5
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
  • num_epochs: 3

Training results

The model achieved the following results on the evaluation set:

  • Loss: 0.0198
  • Accuracy: 99.62%
              precision    recall  f1-score   support

  no_segment       1.00      0.99      1.00      4471
     segment       0.91      0.98      0.95       316

    accuracy                           0.99      4787
   macro avg       0.95      0.99      0.97      4787
weighted avg       0.99      0.99      0.99      4787

How to Use

You can use this model with the Hugging Face transformers library:

from transformers import pipeline

# Load the pipeline
pipe = pipeline("image-classification", model="taresco/newspaper_classifier_segformer")

# Classify an image
image_path = "path_to_your_image.jpg"
result = pipe(image_path)
print(result)

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.0
Downloads last month
26
Safetensors
Model size
3.32M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for taresco/newspaper_classifier_segformer

Base model

nvidia/mit-b0
Finetuned
(393)
this model