fancy readme
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ tags:
|
|
7 |
|
8 |
# Image processing using ViT - for historical documents
|
9 |
|
10 |
-
|
11 |
|
12 |
**Scope:** Processing of images, training and evaluation of ViT model,
|
13 |
input file/directory processing, class (category) results of top
|
@@ -18,23 +18,12 @@ HF π hub support for the model
|
|
18 |
|
19 |
Fine-tuned model files can be found here: [huggingface.co/k4tel/vit-historical-page](https://huggingface.co/k4tel/vit-historical-page) π
|
20 |
|
21 |
-
- **Developed by:** Kate L
|
22 |
- **Funded by ATRIUM:**
|
23 |
- **Shared by ATRIUM & UFAL:**
|
24 |
- **Model type:** finetuned ViT
|
25 |
- **Base model repository:** [google/vit](https://huggingface.co/google/vit-base-patch16-224) π
|
26 |
|
27 |
-
### Model Sources [optional]
|
28 |
-
|
29 |
-
<!-- Provide the basic links for the model. -->
|
30 |
-
|
31 |
-
- **Paper:** not yet
|
32 |
-
- **Demo:** [github](https://github.com/K4TEL/ltp-ocr.git)
|
33 |
-
|
34 |
-
### Direct Use
|
35 |
-
|
36 |
-
Page images classification in to 11 predefined categories.
|
37 |
-
|
38 |
#### Training Hyperparameters
|
39 |
|
40 |
* eval_strategy "epoch"
|
@@ -52,30 +41,6 @@ Page images classification in to 11 predefined categories.
|
|
52 |
|
53 |
Training set of the model: **8950** images
|
54 |
|
55 |
-
#### Categories
|
56 |
-
|
57 |
-
- **DRAW π**: 1182 (11.89%) - drawings, maps, paintings with text
|
58 |
-
|
59 |
-
- **DRAW_L ππ**: 813 (8.17%) - drawings, maps, paintings with a table legend or inside tabular layout / forms
|
60 |
-
|
61 |
-
- **LINE_HW βοΈπ**: 596 (5.99%) - handwritten text lines inside tabular layout / forms
|
62 |
-
|
63 |
-
- **LINE_P π**: 603 (6.06%) - printed text lines inside tabular layout / forms
|
64 |
-
|
65 |
-
- **LINE_T π**: 1332 (13.39%) - machine typed text lines inside tabular layout / forms
|
66 |
-
|
67 |
-
- **PHOTO π**: 1015 (10.21%) - photos with text
|
68 |
-
|
69 |
-
- **PHOTO_L ππ**: 782 (7.86%) - photos inside tabular layout / forms
|
70 |
-
|
71 |
-
- **TEXT π°**: 853 (8.58%) - mixed types, printed, and handwritten texts
|
72 |
-
|
73 |
-
- **TEXT_HW βοΈπ**: 732 (7.36%) - only handwritten text
|
74 |
-
|
75 |
-
- **TEXT_P π**: 691 (6.95%) - only printed text
|
76 |
-
|
77 |
-
- **TEXT_T π**: 1346 (13.53%) - only machine typed text
|
78 |
-
|
79 |
#### Data preprocessing
|
80 |
|
81 |
During training the following transforms were applied randomly with a 50% chance:
|
@@ -87,15 +52,47 @@ During training the following transforms were applied randomly with a 50% chance
|
|
87 |
* transforms.Lambda(lambda img: ImageEnhance.Sharpness(img).enhance(random.uniform(0.5, 1.5)))
|
88 |
* transforms.Lambda(lambda img: img.filter(ImageFilter.GaussianBlur(radius=random.uniform(0, 2))))
|
89 |
|
90 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
91 |
|
92 |
### Results π
|
93 |
|
94 |
-
Evaluation set's accuracy (Top-3): **99.6%**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
|
96 |
-
|
97 |
-
|
98 |
-
|
|
|
|
|
99 |
|
100 |
#### Contacts
|
101 |
|
|
|
7 |
|
8 |
# Image processing using ViT - for historical documents
|
9 |
|
10 |
+
### Goal: This project solves a task of page images classification
|
11 |
|
12 |
**Scope:** Processing of images, training and evaluation of ViT model,
|
13 |
input file/directory processing, class (category) results of top
|
|
|
18 |
|
19 |
Fine-tuned model files can be found here: [huggingface.co/k4tel/vit-historical-page](https://huggingface.co/k4tel/vit-historical-page) π
|
20 |
|
21 |
+
- **Developed by:** Kate L
|
22 |
- **Funded by ATRIUM:**
|
23 |
- **Shared by ATRIUM & UFAL:**
|
24 |
- **Model type:** finetuned ViT
|
25 |
- **Base model repository:** [google/vit](https://huggingface.co/google/vit-base-patch16-224) π
|
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
#### Training Hyperparameters
|
28 |
|
29 |
* eval_strategy "epoch"
|
|
|
41 |
|
42 |
Training set of the model: **8950** images
|
43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
#### Data preprocessing
|
45 |
|
46 |
During training the following transforms were applied randomly with a 50% chance:
|
|
|
52 |
* transforms.Lambda(lambda img: ImageEnhance.Sharpness(img).enhance(random.uniform(0.5, 1.5)))
|
53 |
* transforms.Lambda(lambda img: img.filter(ImageFilter.GaussianBlur(radius=random.uniform(0, 2))))
|
54 |
|
55 |
+
#### Categories
|
56 |
+
|
57 |
+
| Label | Ratio | Description |
|
58 |
+
| --- |-----------|------------------------------------------------------------------------------|
|
59 |
+
| **DRAW** | 11.89% | **π - drawings, maps, paintings with text** |
|
60 |
+
|**DRAW_L**| 8.17% | **ππ - drawings ... with a table legend or inside tabular layout / forms** |
|
61 |
+
| **LINE_HW**| 5.99% | **βοΈπ - handwritten text lines inside tabular layout / forms** |
|
62 |
+
| **LINE_P**| 6.06% | **π - printed text lines inside tabular layout / forms** |
|
63 |
+
|**LINE_T**| 13.39% | **π - machine typed text lines inside tabular layout / forms** |
|
64 |
+
| **PHOTO**| 10.21% | **π - photos with text** |
|
65 |
+
| **PHOTO_L**| 7.86% | **ππ - photos inside tabular layout / forms or with a tabular annotation** |
|
66 |
+
| **TEXT**| 8.58% | **π° - mixed types of printed and handwritten texts** |
|
67 |
+
| **TEXT_HW**| 7.36% | **βοΈπ - only handwritten text** |
|
68 |
+
| **TEXT_P**| 6.95% | **π - only printed text** |
|
69 |
+
| **TEXT_T**| 13.53% | **π - only machine typed text** |
|
70 |
+
|
71 |
+
Evaluation set (same proportions): **995** images
|
72 |
|
73 |
### Results π
|
74 |
|
75 |
+
Evaluation set's accuracy (**Top-3**): **99.6%**
|
76 |
+
|
77 |
+

|
78 |
+
|
79 |
+
Evaluation set's accuracy (**Top-1**): **97.3%**
|
80 |
+
|
81 |
+

|
82 |
+
|
83 |
+
#### Result tables
|
84 |
+
|
85 |
+
- Manually β **checked** evaluation dataset results (TOP-3): [model_TOP-3_EVAL.csv](https://github.com/K4TEL/ltp-ocr/blob/transformer/result/tables/20250209-1534_model_1119_3_TOP-3_EVAL.csv) π
|
86 |
+
|
87 |
+
- Manually β **checked** evaluation dataset results (TOP-1): [model_TOP-1_EVAL.csv](https://github.com/K4TEL/ltp-ocr/blob/transformer/result/tables/20250218-1519_model_1119_3_TOP-1_EVAL.csv) π
|
88 |
+
|
89 |
+
#### Table columns
|
90 |
|
91 |
+
- **FILE** - name of the file
|
92 |
+
- **PAGE** - number of the page
|
93 |
+
- **CLASS-N** - label of the category, guess TOP-N
|
94 |
+
- **SCORE-N** - score of the category, guess TOP-N
|
95 |
+
- **TRUE** - actual label of the category
|
96 |
|
97 |
#### Contacts
|
98 |
|