k4tel commited on
Commit
6860ad0
Β·
verified Β·
1 Parent(s): f6c6a04

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -56
README.md CHANGED
@@ -5,6 +5,8 @@ tags:
5
  - classification
6
  base_model:
7
  - google/vit-base-patch16-224
 
 
8
  pipeline_tag: image-classification
9
  license: mit
10
  ---
@@ -21,63 +23,50 @@ HF 😊 hub support for the model
21
  ## Versions 🏁
22
 
23
  There are currently 2 version of the model available for download, both of them have the same set of categories,
24
- but different data annotations. The latest `v2.0` is considered to be default.
 
 
 
 
 
 
 
 
 
25
 
26
- | Version | Pages | N-page files | PDFs | Description |
27
- |--------:|:-----:|:------------:|:--------:|:--------------------------------------------------------------|
28
- | `v1.0` | 10073 | **~104** | **3896** | annotations with mistakes, more heterogenous data |
29
- | `v1.0` | 11940 | **~509** | **5002** | more diverse pages in each category, less annotation mistakes |
30
 
31
  ## Model description πŸ“‡
32
 
33
  πŸ”² Fine-tuned model repository: vit-historical-page [^1] πŸ”—
34
 
35
- πŸ”³ Base model repository: google's vit-base-patch16-224 [^2] πŸ”—
36
 
37
  ### Data πŸ“œ
38
 
39
- Training set of the model: **8950** images for v1.0
 
 
40
 
41
- Training set of the model: **10745** images for v2.0
42
 
43
  ### Categories 🏷️
44
 
45
- **v1.0 version Categories πŸͺ§**:
46
-
47
- | Label️ | Ratio | Description |
48
- |----------:|:------:|:------------------------------------------------------------------------------|
49
- | `DRAW` | 11.89% | **πŸ“ˆ - drawings, maps, paintings with text** |
50
- | `DRAW_L` | 8.17% | **πŸ“ˆπŸ“ - drawings, etc with a table legend or inside tabular layout / forms** |
51
- | `LINE_HW` | 5.99% | **βœοΈπŸ“ - handwritten text lines inside tabular layout / forms** |
52
- | `LINE_P` | 6.06% | **πŸ“ - printed text lines inside tabular layout / forms** |
53
- | `LINE_T` | 13.39% | **πŸ“ - machine typed text lines inside tabular layout / forms** |
54
- | `PHOTO` | 10.21% | **πŸŒ„ - photos with text** |
55
- | `PHOTO_L` | 7.86% | **πŸŒ„πŸ“ - photos inside tabular layout / forms or with a tabular annotation** |
56
- | `TEXT` | 8.58% | **πŸ“° - mixed types of printed and handwritten texts** |
57
- | `TEXT_HW` | 7.36% | **βœοΈπŸ“„ - only handwritten text** |
58
- | `TEXT_P` | 6.95% | **πŸ“„ - only printed text** |
59
- | `TEXT_T` | 13.53% | **πŸ“„ - only machine typed text** |
60
-
61
- **v2.0 version Categories πŸͺ§**:
62
-
63
- | Label️ | Ratio | Description |
64
- |----------:|:-----:|:------------------------------------------------------------------------------|
65
- | `DRAW` | 9.12% | **πŸ“ˆ - drawings, maps, paintings with text** |
66
- | `DRAW_L` | 9.14% | **πŸ“ˆπŸ“ - drawings, etc with a table legend or inside tabular layout / forms** |
67
- | `LINE_HW` | 8.84% | **βœοΈπŸ“ - handwritten text lines inside tabular layout / forms** |
68
- | `LINE_P` | 9.15% | **πŸ“ - printed text lines inside tabular layout / forms** |
69
- | `LINE_T` | 9.2% | **πŸ“ - machine typed text lines inside tabular layout / forms** |
70
- | `PHOTO` | 9.05% | **πŸŒ„ - photos with text** |
71
- | `PHOTO_L` | 9.1% | **πŸŒ„πŸ“ - photos inside tabular layout / forms or with a tabular annotation** |
72
- | `TEXT` | 9.14% | **πŸ“° - mixed types of printed and handwritten texts** |
73
- | `TEXT_HW` | 9.14% | **βœοΈπŸ“„ - only handwritten text** |
74
- | `TEXT_P` | 9.07% | **πŸ“„ - only printed text** |
75
- | `TEXT_T` | 9.05% | **πŸ“„ - only machine typed text** |
76
-
77
- Evaluation set (same proportions): **995** images for v1.0
78
-
79
- Evaluation set (same proportions): **1194** images for v2.0
80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
  #### Data preprocessing
83
 
@@ -105,31 +94,43 @@ During training the following transforms were applied randomly with a 50% chance
105
 
106
  ### Results πŸ“Š
107
 
108
- **v1.0** Evaluation set's accuracy (**Top-3**): **99.6%**
 
 
 
 
109
 
110
- ![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250416-1430_conf_mat_TOP-3.png?raw=true)
111
 
112
- **v2.0** Evaluation set's accuracy (**Top-3**): **99.75%**
113
 
114
- ![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250417-1049_conf_mat_TOP-3.png?raw=true)
115
 
116
- **v1.0** Evaluation set's accuracy (**Top-1**): **97.3%**
117
 
118
- ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250416-1436_conf_mat_TOP-1.png?raw=true)
119
 
120
- **v2.0** Evaluation set's accuracy (**Top-1**): **96.82%**
121
 
122
- ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250417-1055_conf_mat_TOP-1.png?raw=true)
 
 
 
 
123
 
124
  #### Result tables
125
 
126
- - **v1.0** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250416-1426_model_1119_3_TOP-3_EVAL.csv) πŸ”—
 
 
 
 
127
 
128
- - **v1.0** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250416-1431_model_1119_3_TOP-1_EVAL.csv) πŸ”—
129
 
130
- - **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250417-1044_model_672_3_TOP-3_EVAL.csv) πŸ”—
131
 
132
- - **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250417-1050_model_672_3_TOP-1_EVAL.csv) πŸ”—
133
 
134
  #### Table columns
135
 
@@ -150,7 +151,7 @@ Official repository: UFAL [^3]
150
  - **Developed by** UFAL [^5] πŸ‘₯
151
  - **Funded by** ATRIUM [^4] πŸ’°
152
  - **Shared by** ATRIUM [^4] & UFAL [^5]
153
- - **Model type:** fine-tuned ViT [^2] with a 224x224 resolution size
154
 
155
  **©️ 2022 UFAL & ATRIUM**
156
 
@@ -159,3 +160,5 @@ Official repository: UFAL [^3]
159
  [^3]: https://github.com/ufal/atrium-page-classification
160
  [^4]: https://atrium-research.eu/
161
  [^5]: https://ufal.mff.cuni.cz/home-page
 
 
 
5
  - classification
6
  base_model:
7
  - google/vit-base-patch16-224
8
+ - google/vit-base-patch16-384
9
+ - google/vit-large-patch16-384
10
  pipeline_tag: image-classification
11
  license: mit
12
  ---
 
23
  ## Versions 🏁
24
 
25
  There are currently 2 version of the model available for download, both of them have the same set of categories,
26
+ but different data annotations. The latest approved `v2.1` is considered to be default and can be found in the `main` branch
27
+ of HF 😊 hub [^1] πŸ”—
28
+
29
+ | Version | Base | Pages | PDFs | Description |
30
+ |--------:|------------------------|:-----:|:--------:|:--------------------------------------------------------------------------|
31
+ | `v2.0` | `vit-base-path16-224` | 10073 | **3896** | annotations with mistakes, more heterogenous data |
32
+ | `v2.1` | `vit-base-path16-224` | 11940 | **5002** | `main`: more diverse pages in each category, less annotation mistakes |
33
+ | `v2.2` | `vit-base-path16-224` | 15855 | **5730** | same data as `v2.1` + some restored pages from `v2.0` |
34
+ | `v3.2` | `vit-base-path16-384` | 15855 | **5730** | same data as `v2.2`, but a bit larger model base with higher resolution |
35
+ | `v5.2` | `vit-large-path16-384` | 15855 | **5730** | same data as `v2.2`, but the largest model base with higher resolution |
36
 
 
 
 
 
37
 
38
  ## Model description πŸ“‡
39
 
40
  πŸ”² Fine-tuned model repository: vit-historical-page [^1] πŸ”—
41
 
42
+ πŸ”³ Base model repository: Google's **vit-base-patch16-224**, **vit-base-patch16-384**, **vit-large-patch16-284** [^2] [^13] [^14] πŸ”—
43
 
44
  ### Data πŸ“œ
45
 
46
+ Training set of the model: **8950** images for `v2.0`
47
+
48
+ Training set of the model: **10745** images for `v2.1`
49
 
50
+ Training set of the model: **15855** images for `v2.2`, `v3.2` and `v5.2`
51
 
52
  ### Categories 🏷️
53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
+ | Label️ | Description |
56
+ |----------:|:------------------------------------------------------------------------------|
57
+ | `DRAW` | **πŸ“ˆ - drawings, maps, paintings with text** |
58
+ | `DRAW_L` | **πŸ“ˆπŸ“ - drawings, etc with a table legend or inside tabular layout / forms** |
59
+ | `LINE_HW` | **βœοΈπŸ“ - handwritten text lines inside tabular layout / forms** |
60
+ | `LINE_P` | **πŸ“ - printed text lines inside tabular layout / forms** |
61
+ | `LINE_T` | **πŸ“ - machine typed text lines inside tabular layout / forms** |
62
+ | `PHOTO` | **πŸŒ„ - photos with text** |
63
+ | `PHOTO_L` | **πŸŒ„πŸ“ - photos inside tabular layout / forms or with a tabular annotation** |
64
+ | `TEXT` | **πŸ“° - mixed types of printed and handwritten texts** |
65
+ | `TEXT_HW` | **βœοΈπŸ“„ - only handwritten text** |
66
+ | `TEXT_P` | **πŸ“„ - only printed text** |
67
+ | `TEXT_T` | **πŸ“„ - only machine typed text** |
68
+
69
+ Evaluation set: **1290** images (taken from `v2.2` annotations)
70
 
71
  #### Data preprocessing
72
 
 
94
 
95
  ### Results πŸ“Š
96
 
97
+ **v2.0** Evaluation set's accuracy (**Top-3**): **95.58%**
98
+
99
+ ![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250526-1147_model_v20_conf_mat_TOP-3.png?raw=true)
100
+
101
+ **v2.1** Evaluation set's accuracy (**Top-3**): **99.84%**
102
 
103
+ ![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250526-1157_model_v21_conf_mat_TOP-3.png?raw=true)
104
 
105
+ **v2.2** Evaluation set's accuracy (**Top-3**): **100.00%**
106
 
107
+ ![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250526-1201_model_v22_conf_mat_TOP-3.png?raw=true)
108
 
109
+ **v2.0** Evaluation set's accuracy (**Top-1**): **84.96%**
110
 
111
+ ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250526-1152_model_v20_conf_mat_TOP-1.png?raw=true)
112
 
113
+ **v2.1** Evaluation set's accuracy (**Top-1**): **96.36%**
114
 
115
+ ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250526-1156_model_v21_conf_mat_TOP-1.png?raw=true)
116
+
117
+ **v2.2** Evaluation set's accuracy (**Top-1**): **99.61%**
118
+
119
+ ![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250526-1202_model_v22_conf_mat_TOP-1.png?raw=true)
120
 
121
  #### Result tables
122
 
123
+ - **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1142_model_v20_TOP-3_EVAL.csv) πŸ”—
124
+
125
+ - **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1148_model_v20_TOP-1_EVAL.csv) πŸ”—
126
+
127
+ - **v2.1** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1153_model_v21_TOP-3_EVAL.csv) πŸ”—
128
 
129
+ - **v2.1** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1151_model_v21_TOP-1_EVAL.csv) πŸ”—
130
 
131
+ - **v2.2** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1156_model_v22_TOP-3_EVAL.csv) πŸ”—
132
 
133
+ - **v2.2** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1158_model_v22_TOP-1_EVAL.csv) πŸ”—
134
 
135
  #### Table columns
136
 
 
151
  - **Developed by** UFAL [^5] πŸ‘₯
152
  - **Funded by** ATRIUM [^4] πŸ’°
153
  - **Shared by** ATRIUM [^4] & UFAL [^5]
154
+ - **Model type:** fine-tuned ViT with a 224x224 [^2] πŸ”— or 384x384 [^13] [^14] πŸ”— resolution size
155
 
156
  **©️ 2022 UFAL & ATRIUM**
157
 
 
160
  [^3]: https://github.com/ufal/atrium-page-classification
161
  [^4]: https://atrium-research.eu/
162
  [^5]: https://ufal.mff.cuni.cz/home-page
163
+ [^6]: https://huggingface.co/google/vit-base-patch16-384
164
+ [^7]: https://huggingface.co/google/vit-large-patch16-384