File size: 1,416 Bytes
bf1000e
 
 
 
 
4a953d7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
---
license: gpl-3.0
base_model:
- naver-clova-ix/donut-base
pipeline_tag: visual-document-retrieval
---

# HeR-T: Herbarium specimen label Recognition Transformer  

## ๐Ÿ“ƒ Paper
Application of computer vision to the automated extraction of metadata from natural history specimen labels: A case study on herbarium specimens (Under Review)

## ๐Ÿ’ Authors
Zacchigna, Jacopo; Liu, Weiwei; Pellegrino, Felice Andrea; Peron, Adriano; Roma-Marzio, Francesco; Peruzzi, Lorenzo; Martellos, Stefano

## ๐Ÿš€ Overview  
HeR-T (Herbarium specimen label Recognition Transformer) is a fine-tuned vision-language model designed for automated metadata extraction of history specimen labels, especially herbarium specimen labels. It leverages Donut-base and has been fine-tuned with 55,089 herbarium specimen images from the Herbarium of the University of Pisa (international acronym PI). 

## ๐Ÿ”ฅ Features  
- **Fine-tuned on** specimen images from the Herbarium of the University of Pisa for automated metadata extraction of history specimen labels
- **Supports** image inputs with labels containing printed, handwritten, or mixed-format texts  
- **Evaluation**: Tree Edit Distance (TED) accuracy score with the formula max(0, 1โˆ’TED(pr, gt)/TED(ฯ†, gt)), where gt, pr, and ฯ† stand for ground truth, prediction, and empty trees respectively 
- **Pre-trained weights** are loaded from Donut-base (naver-clova-ix/donut-base)