k4tel commited on
Commit
03a753b
Β·
verified Β·
1 Parent(s): 2f7cc7f

fancy readme

Browse files
Files changed (1) hide show
  1. README.md +39 -42
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
 
8
  # Image processing using ViT - for historical documents
9
 
10
- **Goal:** This project solves a task of page images classification
11
 
12
  **Scope:** Processing of images, training and evaluation of ViT model,
13
  input file/directory processing, class (category) results of top
@@ -18,23 +18,12 @@ HF 😊 hub support for the model
18
 
19
  Fine-tuned model files can be found here: [huggingface.co/k4tel/vit-historical-page](https://huggingface.co/k4tel/vit-historical-page) πŸ”—
20
 
21
- - **Developed by:** Kate L [github/k4tel](https://github.com/K4TEL/ltp-ocr.git)
22
  - **Funded by ATRIUM:**
23
  - **Shared by ATRIUM & UFAL:**
24
  - **Model type:** finetuned ViT
25
  - **Base model repository:** [google/vit](https://huggingface.co/google/vit-base-patch16-224) πŸ”—
26
 
27
- ### Model Sources [optional]
28
-
29
- <!-- Provide the basic links for the model. -->
30
-
31
- - **Paper:** not yet
32
- - **Demo:** [github](https://github.com/K4TEL/ltp-ocr.git)
33
-
34
- ### Direct Use
35
-
36
- Page images classification in to 11 predefined categories.
37
-
38
  #### Training Hyperparameters
39
 
40
  * eval_strategy "epoch"
@@ -52,30 +41,6 @@ Page images classification in to 11 predefined categories.
52
 
53
  Training set of the model: **8950** images
54
 
55
- #### Categories
56
-
57
- - **DRAW πŸ“ˆ**: 1182 (11.89%) - drawings, maps, paintings with text
58
-
59
- - **DRAW_L πŸ“ˆπŸ“**: 813 (8.17%) - drawings, maps, paintings with a table legend or inside tabular layout / forms
60
-
61
- - **LINE_HW βœοΈπŸ“**: 596 (5.99%) - handwritten text lines inside tabular layout / forms
62
-
63
- - **LINE_P πŸ“**: 603 (6.06%) - printed text lines inside tabular layout / forms
64
-
65
- - **LINE_T πŸ“**: 1332 (13.39%) - machine typed text lines inside tabular layout / forms
66
-
67
- - **PHOTO πŸŒ„**: 1015 (10.21%) - photos with text
68
-
69
- - **PHOTO_L πŸŒ„πŸ“**: 782 (7.86%) - photos inside tabular layout / forms
70
-
71
- - **TEXT πŸ“°**: 853 (8.58%) - mixed types, printed, and handwritten texts
72
-
73
- - **TEXT_HW βœοΈπŸ“„**: 732 (7.36%) - only handwritten text
74
-
75
- - **TEXT_P πŸ“„**: 691 (6.95%) - only printed text
76
-
77
- - **TEXT_T πŸ“„**: 1346 (13.53%) - only machine typed text
78
-
79
  #### Data preprocessing
80
 
81
  During training the following transforms were applied randomly with a 50% chance:
@@ -87,15 +52,47 @@ During training the following transforms were applied randomly with a 50% chance
87
  * transforms.Lambda(lambda img: ImageEnhance.Sharpness(img).enhance(random.uniform(0.5, 1.5)))
88
  * transforms.Lambda(lambda img: img.filter(ImageFilter.GaussianBlur(radius=random.uniform(0, 2))))
89
 
90
- Evaluation set (10% of the above stats): **995** images
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
 
92
  ### Results πŸ“Š
93
 
94
- Evaluation set's accuracy (Top-3): **99.6%**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
- ⚠️ Regarding the model output, **Top-3** is enough to cover most of the images,
97
- setting **Top-5** will help with a small number of difficult to classify samples.
98
- Finally, using **Top-11** option will give you a **raw version** of class scores returned by the model
 
 
99
 
100
  #### Contacts
101
 
 
7
 
8
  # Image processing using ViT - for historical documents
9
 
10
+ ### Goal: This project solves a task of page images classification
11
 
12
  **Scope:** Processing of images, training and evaluation of ViT model,
13
  input file/directory processing, class (category) results of top
 
18
 
19
  Fine-tuned model files can be found here: [huggingface.co/k4tel/vit-historical-page](https://huggingface.co/k4tel/vit-historical-page) πŸ”—
20
 
21
+ - **Developed by:** Kate L
22
  - **Funded by ATRIUM:**
23
  - **Shared by ATRIUM & UFAL:**
24
  - **Model type:** finetuned ViT
25
  - **Base model repository:** [google/vit](https://huggingface.co/google/vit-base-patch16-224) πŸ”—
26
 
 
 
 
 
 
 
 
 
 
 
 
27
  #### Training Hyperparameters
28
 
29
  * eval_strategy "epoch"
 
41
 
42
  Training set of the model: **8950** images
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  #### Data preprocessing
45
 
46
  During training the following transforms were applied randomly with a 50% chance:
 
52
  * transforms.Lambda(lambda img: ImageEnhance.Sharpness(img).enhance(random.uniform(0.5, 1.5)))
53
  * transforms.Lambda(lambda img: img.filter(ImageFilter.GaussianBlur(radius=random.uniform(0, 2))))
54
 
55
+ #### Categories
56
+
57
+ | Label | Ratio | Description |
58
+ | --- |-----------|------------------------------------------------------------------------------|
59
+ | **DRAW** | 11.89% | **πŸ“ˆ - drawings, maps, paintings with text** |
60
+ |**DRAW_L**| 8.17% | **πŸ“ˆπŸ“ - drawings ... with a table legend or inside tabular layout / forms** |
61
+ | **LINE_HW**| 5.99% | **βœοΈπŸ“ - handwritten text lines inside tabular layout / forms** |
62
+ | **LINE_P**| 6.06% | **πŸ“ - printed text lines inside tabular layout / forms** |
63
+ |**LINE_T**| 13.39% | **πŸ“ - machine typed text lines inside tabular layout / forms** |
64
+ | **PHOTO**| 10.21% | **πŸŒ„ - photos with text** |
65
+ | **PHOTO_L**| 7.86% | **πŸŒ„πŸ“ - photos inside tabular layout / forms or with a tabular annotation** |
66
+ | **TEXT**| 8.58% | **πŸ“° - mixed types of printed and handwritten texts** |
67
+ | **TEXT_HW**| 7.36% | **βœοΈπŸ“„ - only handwritten text** |
68
+ | **TEXT_P**| 6.95% | **πŸ“„ - only printed text** |
69
+ | **TEXT_T**| 13.53% | **πŸ“„ - only machine typed text** |
70
+
71
+ Evaluation set (same proportions): **995** images
72
 
73
  ### Results πŸ“Š
74
 
75
+ Evaluation set's accuracy (**Top-3**): **99.6%**
76
+
77
+ ![TOP-3 confusion matrix - trained ViT](https://github.com/K4TEL/ltp-ocr/blob/transformer/result/plots/20250209-1526_conf_mat.png?raw=true)
78
+
79
+ Evaluation set's accuracy (**Top-1**): **97.3%**
80
+
81
+ ![TOP-1 confusion matrix - trained ViT](https://github.com/K4TEL/ltp-ocr/blob/transformer/result/plots/20250218-1523_conf_mat.png?raw=true)
82
+
83
+ #### Result tables
84
+
85
+ - Manually ✍ **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/K4TEL/ltp-ocr/blob/transformer/result/tables/20250209-1534_model_1119_3_TOP-3_EVAL.csv) πŸ”—
86
+
87
+ - Manually ✍ **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/K4TEL/ltp-ocr/blob/transformer/result/tables/20250218-1519_model_1119_3_TOP-1_EVAL.csv) πŸ”—
88
+
89
+ #### Table columns
90
 
91
+ - **FILE** - name of the file
92
+ - **PAGE** - number of the page
93
+ - **CLASS-N** - label of the category, guess TOP-N
94
+ - **SCORE-N** - score of the category, guess TOP-N
95
+ - **TRUE** - actual label of the category
96
 
97
  #### Contacts
98