k4tel
/

vit-historical-page

@@ -5,6 +5,8 @@ tags:
 - classification
 base_model:
 - google/vit-base-patch16-224
 pipeline_tag: image-classification
 license: mit
 ---
@@ -21,63 +23,50 @@ HF 😊 hub support for the model
 ## Versions 🏁
 There are currently 2 version of the model available for download, both of them have the same set of categories,
-but different data annotations. The latest `v2.0` is considered to be default.
-| Version | Pages | N-page files |   PDFs   | Description                                                   |
-|--------:|:-----:|:------------:|:--------:|:--------------------------------------------------------------|
-|  `v1.0` | 10073 |   **~104**   | **3896** | annotations with mistakes, more heterogenous data             |
-|  `v1.0` | 11940 |   **~509**   | **5002** | more diverse pages in each category, less annotation mistakes |
 ## Model description 📇
 🔲 Fine-tuned model repository:  vit-historical-page [^1] 🔗
-🔳 Base model repository: google's vit-base-patch16-224 [^2] 🔗
 ### Data 📜
-Training set of the model: **8950** images for v1.0
-Training set of the model: **10745** images for v2.0
 ### Categories 🏷️
-**v1.0 version Categories 🪧**:
-|    Label️ | Ratio  | Description                                                                   |
-|----------:|:------:|:------------------------------------------------------------------------------|
-|    `DRAW` | 11.89% | **📈 - drawings, maps, paintings with text**                                  |
-|  `DRAW_L` | 8.17%  | **📈📏 - drawings, etc with a table legend or inside tabular layout / forms** |
-| `LINE_HW` | 5.99%  | **✏️📏 - handwritten text lines inside tabular layout / forms**               |
-|  `LINE_P` | 6.06%  | **📏 - printed text lines inside tabular layout / forms**                     |
-|  `LINE_T` | 13.39% | **📏 - machine typed text lines inside tabular layout / forms**               |
-|   `PHOTO` | 10.21% | **🌄 - photos with text**                                                     |
-| `PHOTO_L` | 7.86%  | **🌄📏 - photos inside tabular layout / forms or with a tabular annotation**  |
-|    `TEXT` | 8.58%  | **📰 - mixed types of printed and handwritten texts**                         |
-| `TEXT_HW` | 7.36%  | **✏️📄 - only handwritten text**                                              |
-|  `TEXT_P` | 6.95%  | **📄 - only printed text**                                                    |
-|  `TEXT_T` | 13.53% | **📄 - only machine typed text**                                              |
-**v2.0 version Categories 🪧**:
-|    Label️ | Ratio | Description                                                                   |
-|----------:|:-----:|:------------------------------------------------------------------------------|
-|    `DRAW` | 9.12% | **📈 - drawings, maps, paintings with text**                                  |
-|  `DRAW_L` | 9.14% | **📈📏 - drawings, etc with a table legend or inside tabular layout / forms** |
-| `LINE_HW` | 8.84% | **✏️📏 - handwritten text lines inside tabular layout / forms**               |
-|  `LINE_P` | 9.15% | **📏 - printed text lines inside tabular layout / forms**                     |
-|  `LINE_T` | 9.2%  | **📏 - machine typed text lines inside tabular layout / forms**               |
-|   `PHOTO` | 9.05% | **🌄 - photos with text**                                                     |
-| `PHOTO_L` | 9.1%  | **🌄📏 - photos inside tabular layout / forms or with a tabular annotation**  |
-|    `TEXT` | 9.14% | **📰 - mixed types of printed and handwritten texts**                         |
-| `TEXT_HW` | 9.14% | **✏️📄 - only handwritten text**                                              |
-|  `TEXT_P` | 9.07% | **📄 - only printed text**                                                    |
-|  `TEXT_T` | 9.05% | **📄 - only machine typed text**                                              |
-Evaluation set (same proportions):	**995** images for v1.0
-Evaluation set (same proportions):	**1194** images for v2.0
 #### Data preprocessing
@@ -105,31 +94,43 @@ During training the following transforms were applied randomly with a 50% chance
 ### Results 📊
-**v1.0** Evaluation set's accuracy (**Top-3**):  **99.6%**
-![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250416-1430_conf_mat_TOP-3.png?raw=true)
-**v2.0** Evaluation set's accuracy (**Top-3**):  **99.75%**
-![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250417-1049_conf_mat_TOP-3.png?raw=true)
-**v1.0** Evaluation set's accuracy (**Top-1**):  **97.3%**
-![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250416-1436_conf_mat_TOP-1.png?raw=true)
-**v2.0** Evaluation set's accuracy (**Top-1**):  **96.82%**
-![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250417-1055_conf_mat_TOP-1.png?raw=true)
 #### Result tables
-- **v1.0** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250416-1426_model_1119_3_TOP-3_EVAL.csv) 🔗
-- **v1.0** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250416-1431_model_1119_3_TOP-1_EVAL.csv) 🔗
-- **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250417-1044_model_672_3_TOP-3_EVAL.csv) 🔗
-- **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250417-1050_model_672_3_TOP-1_EVAL.csv) 🔗
 #### Table columns
@@ -150,7 +151,7 @@ Official repository: UFAL [^3]
 - **Developed by** UFAL [^5] 👥
 - **Funded by** ATRIUM [^4]  💰
 - **Shared by** ATRIUM [^4] & UFAL [^5]
-- **Model type:** fine-tuned ViT [^2] with a 224x224 resolution size
 **©️ 2022 UFAL & ATRIUM**
@@ -159,3 +160,5 @@ Official repository: UFAL [^3]
 [^3]: https://github.com/ufal/atrium-page-classification
 [^4]: https://atrium-research.eu/
 [^5]: https://ufal.mff.cuni.cz/home-page

 - classification
 base_model:
 - google/vit-base-patch16-224
+- google/vit-base-patch16-384
+- google/vit-large-patch16-384
 pipeline_tag: image-classification
 license: mit
 ---
 ## Versions 🏁
 There are currently 2 version of the model available for download, both of them have the same set of categories,
+but different data annotations. The latest approved `v2.1` is considered to be default and can be found in the `main` branch
+of HF 😊 hub [^1] 🔗
+| Version | Base                   | Pages |   PDFs   | Description                                                               |
+|--------:|------------------------|:-----:|:--------:|:--------------------------------------------------------------------------|
+|  `v2.0` | `vit-base-path16-224`  | 10073 | **3896** | annotations with mistakes, more heterogenous data                         |
+|  `v2.1` | `vit-base-path16-224`  | 11940 | **5002** | `main`: more diverse pages in each category, less annotation mistakes     |
+|  `v2.2` | `vit-base-path16-224`  | 15855 | **5730** | same data as `v2.1` + some restored pages from `v2.0`                     |
+|  `v3.2` | `vit-base-path16-384`  | 15855 | **5730** | same data as `v2.2`, but a bit larger model base with higher resolution |
+|  `v5.2` | `vit-large-path16-384` | 15855 | **5730** | same data as `v2.2`, but the largest model base with higher resolution  |
 ## Model description 📇
 🔲 Fine-tuned model repository:  vit-historical-page [^1] 🔗
+🔳 Base model repository: Google's **vit-base-patch16-224**,  **vit-base-patch16-384**,  **vit-large-patch16-284** [^2] [^13] [^14] 🔗
 ### Data 📜
+Training set of the model: **8950** images for `v2.0`
+Training set of the model: **10745** images for `v2.1`
+Training set of the model: **15855** images for `v2.2`, `v3.2` and `v5.2`
 ### Categories 🏷️
+|    Label️ | Description                                                                   |
+|----------:|:------------------------------------------------------------------------------|
+|    `DRAW` | **📈 - drawings, maps, paintings with text**                                  |
+|  `DRAW_L` | **📈📏 - drawings, etc with a table legend or inside tabular layout / forms** |
+| `LINE_HW` | **✏️📏 - handwritten text lines inside tabular layout / forms**               |
+|  `LINE_P` | **📏 - printed text lines inside tabular layout / forms**                     |
+|  `LINE_T` | **📏 - machine typed text lines inside tabular layout / forms**               |
+|   `PHOTO` | **🌄 - photos with text**                                                     |
+| `PHOTO_L` | **🌄📏 - photos inside tabular layout / forms or with a tabular annotation**  |
+|    `TEXT` | **📰 - mixed types of printed and handwritten texts**                         |
+| `TEXT_HW` | **✏️📄 - only handwritten text**                                              |
+|  `TEXT_P` | **📄 - only printed text**                                                    |
+|  `TEXT_T` | **📄 - only machine typed text**                                              |
+Evaluation set:  **1290** images (taken from `v2.2` annotations)
 #### Data preprocessing
 ### Results 📊
+**v2.0** Evaluation set's accuracy (**Top-3**):  **95.58%**
+![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250526-1147_model_v20_conf_mat_TOP-3.png?raw=true)
+**v2.1** Evaluation set's accuracy (**Top-3**):  **99.84%**
+![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250526-1157_model_v21_conf_mat_TOP-3.png?raw=true)
+**v2.2** Evaluation set's accuracy (**Top-3**):  **100.00%**
+![TOP-3 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250526-1201_model_v22_conf_mat_TOP-3.png?raw=true)
+**v2.0** Evaluation set's accuracy (**Top-1**):  **84.96%**
+![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250526-1152_model_v20_conf_mat_TOP-1.png?raw=true)
+**v2.1** Evaluation set's accuracy (**Top-1**):  **96.36%**
+![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250526-1156_model_v21_conf_mat_TOP-1.png?raw=true)
+**v2.2** Evaluation set's accuracy (**Top-1**):  **99.61%**
+![TOP-1 confusion matrix - trained ViT](https://github.com/ufal/atrium-page-classification/blob/main/result/plots/20250526-1202_model_v22_conf_mat_TOP-1.png?raw=true)
 #### Result tables
+- **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1142_model_v20_TOP-3_EVAL.csv) 🔗
+- **v2.0** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1148_model_v20_TOP-1_EVAL.csv) 🔗
+- **v2.1** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1153_model_v21_TOP-3_EVAL.csv) 🔗
+- **v2.1** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1151_model_v21_TOP-1_EVAL.csv) 🔗
+- **v2.2** Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1156_model_v22_TOP-3_EVAL.csv) 🔗
+- **v2.2** Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/ufal/atrium-page-classification/blob/main/result/tables/20250526-1158_model_v22_TOP-1_EVAL.csv) 🔗
 #### Table columns
 - **Developed by** UFAL [^5] 👥
 - **Funded by** ATRIUM [^4]  💰
 - **Shared by** ATRIUM [^4] & UFAL [^5]
+- **Model type:** fine-tuned ViT with a 224x224 [^2] 🔗 or 384x384 [^13] [^14] 🔗 resolution size
 **©️ 2022 UFAL & ATRIUM**
 [^3]: https://github.com/ufal/atrium-page-classification
 [^4]: https://atrium-research.eu/
 [^5]: https://ufal.mff.cuni.cz/home-page
+[^6]: https://huggingface.co/google/vit-base-patch16-384
+[^7]: https://huggingface.co/google/vit-large-patch16-384