Beyond Words Visual Content Detector
This family of YOLO object detection models detects and classifies seven types of visual content in historical newspaper pages from the Library of Congress's Chronicling America collection.
Trained on the crowdsourced Beyond Words dataset these models identify:
- Photographs
- Illustrations
- Maps
- Comics/Cartoons
- Editorial Cartoons
- Headlines
- Advertisements
Example Outputs
Below are two examples of the model's output. The first is an example from the training data, and the second is an example of the model's performance on an out-of-domain (OoD) image.

Objects detected for an example from the training data

Objects detected for an in the "wild" newspaper example
π Quick Start
Installation
pip install ultralytics huggingface_hub
Inference
from huggingface_hub import hf_hub_download
from ultralytics import YOLO
# Download and load nano model
model = YOLO(hf_hub_download(
repo_id="biglam/historic-newspaper-illustrations-yolov11",
filename="yolo11n.pt"
))
# Or download and load small model
model = YOLO(hf_hub_download(
repo_id="biglam/historic-newspaper-illustrations-yolov11",
filename="yolo11s.pt"
))
# Run inference on an image
results = model("path/to/newspaper_page.jpg")
# Process results
for result in results:
boxes = result.boxes # Boxes object for bbox outputs
for box in boxes:
# Get box coordinates
x1, y1, x2, y2 = box.xyxy[0]
# Get confidence score
conf = box.conf[0]
# Get class
cls = box.cls[0]
# Get class name
class_name = model.names[int(cls)]
print(f"Detected {class_name} with confidence {conf:.2f}")
π° Use Case
The model is intended to extract visual content from historical newspapers.
π Dataset
The models were trained on the Beyond Words dataset, which was created via crowdsourcing and augmented with expert-labelled annotations for headlines and advertisements. The dataset consists of:
- 3,559 annotated newspaper pages (train+val)
- 48,409 labelled visual content regions
- 7 content categories
Annotations are provided in COCO format and were aligned with METS/ALTO OCR for further downstream use.
It's important to note that the training data does not represent a representative sample of historical newspaper content. The training data is drawn from the Library of Congress's Chronicling America collection, which is focused on American newspapers from the late 19th and early 20th centuries. This also means that the models are likely to do much better on images from this period and digitised similarly.
π§ Model Details
All models are fine-tuned variants of the YOLO11 object detection architecture. The training set consisted of high-resolution historical newspaper scans from WWI-era publications, with performance validated on a held-out set of 712 pages.
π Performance Metrics
Class | Precision | Recall | [email protected] | [email protected]:0.95 |
---|---|---|---|---|
Photograph | 0.995 | 0.976 | 0.987 | 0.958 |
Illustration | 0.990 | 0.969 | 0.983 | 0.929 |
Map | 0.978 | 0.971 | 0.979 | 0.945 |
Comics/Cartoon | 0.994 | 0.967 | 0.975 | 0.952 |
Editorial Cartoon | 1.000 | 0.984 | 0.995 | 0.985 |
Headline | 0.987 | 0.996 | 0.995 | 0.895 |
Advertisement | 0.994 | 0.998 | 0.995 | 0.956 |
Overall (macro) | 0.991 | 0.980 | 0.987 | 0.946 |
π§ͺ Model Variants
yolo11n.pt
(nano)
Trained for edge applications or fast inference.
Results
Class | Precision | Recall | [email protected] | [email protected]:0.95 |
---|---|---|---|---|
Photograph | 0.995 | 0.976 | 0.987 | 0.958 |
Illustration | 0.990 | 0.969 | 0.983 | 0.929 |
Map | 0.978 | 0.971 | 0.979 | 0.945 |
Comics/Cartoon | 0.994 | 0.967 | 0.975 | 0.952 |
Editorial Cartoon | 1.000 | 0.984 | 0.995 | 0.985 |
Headline | 0.987 | 0.996 | 0.995 | 0.895 |
Advertisement | 0.994 | 0.998 | 0.995 | 0.956 |
Overall (macro) | 0.991 | 0.980 | 0.987 | 0.946 |
yolo11s.pt
(small)
Balanced for speed and accuracy.
Results
Class | Images | Instances | Box(P) | Box(R) | mAP50 | mAP50-95 |
---|---|---|---|---|---|---|
all | 711 | 9923 | 0.98 | 0.958 | 0.982 | 0.927 |
Photograph | 488 | 875 | 0.986 | 0.947 | 0.982 | 0.941 |
Illustration | 154 | 206 | 0.962 | 0.951 | 0.972 | 0.906 |
Map | 32 | 34 | 0.966 | 0.941 | 0.973 | 0.937 |
Comics/Cartoon | 109 | 211 | 0.985 | 0.942 | 0.968 | 0.923 |
Editorial Cartoon | 53 | 54 | 1.000 | 0.967 | 0.995 | 0.973 |
Headline | 617 | 5685 | 0.979 | 0.966 | 0.992 | 0.868 |
Advertisement | 510 | 2858 | 0.985 | 0.989 | 0.994 | 0.939 |
π Training
The models were trained using PyTorch and Ultralytics YOLOv11 on annotated newspaper pages. OCR-aligned captions were included in bounding boxes when present. Data augmentation techniques include resizing and random flipping.
π§Ύ Citation
If you use this model or dataset, please cite the following:
Model
@software{yolo11_ultralytics,
author = {Glenn Jocher and Jing Qiu},
title = {Ultralytics YOLO11},
version = {11.0.0},
year = {2024},
url = {https://github.com/ultralytics/ultralytics},
license = {AGPL-3.0}
}
Dataset
@inproceedings{10.1145/3340531.3412767,
author = {Lee, Benjamin Charles Germain and Mears, Jaime and Jakeway, Eileen and Ferriter, Meghan and Adams, Chris and Yarasavage, Nathan and Thomas, Deborah and Zwaard, Kate and Weld, Daniel S.},
title = {The Newspaper Navigator Dataset: Extracting Headlines and Visual Content from 16 Million Historic Newspaper Pages in Chronicling America},
year = {2020},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
doi = {10.1145/3340531.3412767},
url = {https://doi.org/10.1145/3340531.3412767}
}
License
This model is released under the AGPL-3.0 license. The dataset is licensed CC0 (Public Domain).
Model tree for biglam/historic-newspaper-illustrations-yolov11
Base model
Ultralytics/YOLO11