Update README.md
Browse files
README.md
CHANGED
@@ -5,6 +5,8 @@ tags:
|
|
5 |
- classification
|
6 |
base_model:
|
7 |
- google/vit-base-patch16-224
|
|
|
|
|
8 |
pipeline_tag: image-classification
|
9 |
license: mit
|
10 |
---
|
@@ -21,63 +23,50 @@ HF π hub support for the model
|
|
21 |
## Versions π
|
22 |
|
23 |
There are currently 2 version of the model available for download, both of them have the same set of categories,
|
24 |
-
but different data annotations. The latest `v2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
-
| Version | Pages | N-page files | PDFs | Description |
|
27 |
-
|--------:|:-----:|:------------:|:--------:|:--------------------------------------------------------------|
|
28 |
-
| `v1.0` | 10073 | **~104** | **3896** | annotations with mistakes, more heterogenous data |
|
29 |
-
| `v1.0` | 11940 | **~509** | **5002** | more diverse pages in each category, less annotation mistakes |
|
30 |
|
31 |
## Model description π
|
32 |
|
33 |
π² Fine-tuned model repository: vit-historical-page [^1] π
|
34 |
|
35 |
-
π³ Base model repository:
|
36 |
|
37 |
### Data π
|
38 |
|
39 |
-
Training set of the model: **8950** images for
|
|
|
|
|
40 |
|
41 |
-
Training set of the model: **
|
42 |
|
43 |
### Categories π·οΈ
|
44 |
|
45 |
-
**v1.0 version Categories πͺ§**:
|
46 |
-
|
47 |
-
| LabelοΈ | Ratio | Description |
|
48 |
-
|----------:|:------:|:------------------------------------------------------------------------------|
|
49 |
-
| `DRAW` | 11.89% | **π - drawings, maps, paintings with text** |
|
50 |
-
| `DRAW_L` | 8.17% | **ππ - drawings, etc with a table legend or inside tabular layout / forms** |
|
51 |
-
| `LINE_HW` | 5.99% | **βοΈπ - handwritten text lines inside tabular layout / forms** |
|
52 |
-
| `LINE_P` | 6.06% | **π - printed text lines inside tabular layout / forms** |
|
53 |
-
| `LINE_T` | 13.39% | **π - machine typed text lines inside tabular layout / forms** |
|
54 |
-
| `PHOTO` | 10.21% | **π - photos with text** |
|
55 |
-
| `PHOTO_L` | 7.86% | **ππ - photos inside tabular layout / forms or with a tabular annotation** |
|
56 |
-
| `TEXT` | 8.58% | **π° - mixed types of printed and handwritten texts** |
|
57 |
-
| `TEXT_HW` | 7.36% | **βοΈπ - only handwritten text** |
|
58 |
-
| `TEXT_P` | 6.95% | **π - only printed text** |
|
59 |
-
| `TEXT_T` | 13.53% | **π - only machine typed text** |
|
60 |
-
|
61 |
-
**v2.0 version Categories πͺ§**:
|
62 |
-
|
63 |
-
| LabelοΈ | Ratio | Description |
|
64 |
-
|----------:|:-----:|:------------------------------------------------------------------------------|
|
65 |
-
| `DRAW` | 9.12% | **π - drawings, maps, paintings with text** |
|
66 |
-
| `DRAW_L` | 9.14% | **ππ - drawings, etc with a table legend or inside tabular layout / forms** |
|
67 |
-
| `LINE_HW` | 8.84% | **βοΈπ - handwritten text lines inside tabular layout / forms** |
|
68 |
-
| `LINE_P` | 9.15% | **π - printed text lines inside tabular layout / forms** |
|
69 |
-
| `LINE_T` | 9.2% | **π - machine typed text lines inside tabular layout / forms** |
|
70 |
-
| `PHOTO` | 9.05% | **π - photos with text** |
|
71 |
-
| `PHOTO_L` | 9.1% | **ππ - photos inside tabular layout / forms or with a tabular annotation** |
|
72 |
-
| `TEXT` | 9.14% | **π° - mixed types of printed and handwritten texts** |
|
73 |
-
| `TEXT_HW` | 9.14% | **βοΈπ - only handwritten text** |
|
74 |
-
| `TEXT_P` | 9.07% | **π - only printed text** |
|
75 |
-
| `TEXT_T` | 9.05% | **π - only machine typed text** |
|
76 |
-
|
77 |
-
Evaluation set (same proportions): **995** images for v1.0
|
78 |
-
|
79 |
-
Evaluation set (same proportions): **1194** images for v2.0
|
80 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
81 |
|
82 |
#### Data preprocessing
|
83 |
|
@@ -105,31 +94,43 @@ During training the following transforms were applied randomly with a 50% chance
|
|
105 |
|
106 |
### Results π
|
107 |
|
108 |
-
**
|
|
|
|
|
|
|
|
|
109 |
|
110 |
-

|
70 |
|
71 |
#### Data preprocessing
|
72 |
|
|
|
94 |
|
95 |
### Results π
|
96 |
|
97 |
+
**v2.0** Evaluation set's accuracy (**Top-3**): **95.58%**
|
98 |
+
|
99 |
+

|
100 |
+
|
101 |
+
**v2.1** Evaluation set's accuracy (**Top-3**): **99.84%**
|
102 |
|
103 |
+

|
104 |
|
105 |
+
**v2.2** Evaluation set's accuracy (**Top-3**): **100.00%**
|
106 |
|
107 |
+

|
108 |
|
109 |
+
**v2.0** Evaluation set's accuracy (**Top-1**): **84.96%**
|
110 |
|
111 |
+

|
112 |
|
113 |
+
**v2.1** Evaluation set's accuracy (**Top-1**): **96.36%**
|
114 |
|
115 |
+

|
116 |
+
|
117 |
+
**v2.2** Evaluation set's accuracy (**Top-1**): **99.61%**
|
118 |
+
|
119 |
+

|
120 |
|
121 |
#### Result tables
|
122 |
|
123 |
+
- **v2.0** Manually β **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1142_model_v20_TOP-3_EVAL.csv) π
|
124 |
+
|
125 |
+
- **v2.0** Manually β **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1148_model_v20_TOP-1_EVAL.csv) π
|
126 |
+
|
127 |
+
- **v2.1** Manually β **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1153_model_v21_TOP-3_EVAL.csv) π
|
128 |
|
129 |
+
- **v2.1** Manually β **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1151_model_v21_TOP-1_EVAL.csv) π
|
130 |
|
131 |
+
- **v2.2** Manually β **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1156_model_v22_TOP-3_EVAL.csv) π
|
132 |
|
133 |
+
- **v2.2** Manually β **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1158_model_v22_TOP-1_EVAL.csv) π
|
134 |
|
135 |
#### Table columns
|
136 |
|
|
|
151 |
- **Developed by** UFAL [^5] π₯
|
152 |
- **Funded by** ATRIUM [^4] π°
|
153 |
- **Shared by** ATRIUM [^4] & UFAL [^5]
|
154 |
+
- **Model type:** fine-tuned ViT with a 224x224 [^2] π or 384x384 [^13] [^14] π resolution size
|
155 |
|
156 |
**Β©οΈ 2022 UFAL & ATRIUM**
|
157 |
|
|
|
160 |
[^3]: https://github.com/ufal/atrium-page-classification
|
161 |
[^4]: https://atrium-research.eu/
|
162 |
[^5]: https://ufal.mff.cuni.cz/home-page
|
163 |
+
[^6]: https://huggingface.co/google/vit-base-patch16-384
|
164 |
+
[^7]: https://huggingface.co/google/vit-large-patch16-384
|