k4tel
/

vit-historical-page

@@ -11,7 +11,8 @@ tags:
 **Scope:** Processing of images, training and evaluation of ViT model,
 input file/directory processing, class (category) results of top
-N predictions output, predictions summarizing into a tabular format
 ## Model description:
@@ -21,29 +22,29 @@ Training set of the model: **8950** images
 #### Categories:
-- **DRAW**:	1182	(11.89%)  - drawings, maps, paintings
-- **DRAW_L**:	813	(8.17%)   - drawings, maps, paintings inside tabular layout
-- **LINE_HW**:	596	(5.99%)   - handwritten text lines inside tabular layout
-- **LINE_P**:	603	(6.06%)   - printed text lines inside tabular layout
-- **LINE_T**:	1332	(13.39%)  - typed text lines inside tabular layout
 - **PHOTO**:	1015	(10.21%)  - photos with text
-- **PHOTO_L**:	782	(7.86%)   - photos inside tabular layout
-- **TEXT**:	853	(8.58%)   - mixed types, printed, and handwritten texts
-- **TEXT_HW**:	732	(7.36%)   - handwritten text
-- **TEXT_P**:	691	(6.95%)   - printed text
-- **TEXT_T**:	1346	(13.53%)  - typed text
-Evaluation set (10% of the above stats):	**995** images - percentage correct (Top-3):  **99.6%**
 ### Result tables:
@@ -68,7 +69,7 @@ Page images classification in to 11 predefined categories.
 #### Preprocessing
-train_transforms = transforms.Compose([
     transforms.RandomApply([
@@ -86,37 +87,33 @@ train_transforms = transforms.Compose([
     ], p=0.5),
-    ...
-  ])
 eval_transforms - basic.
 #### Training Hyperparameters
-training_args = TrainingArguments(
-            eval_strategy="epoch",
-            save_strategy="epoch",
-            learning_rate=5e-5,
-            per_device_train_batch_size=8,
-            per_device_eval_batch_size=8,
-            num_train_epochs=3,
-            warmup_ratio=0.1,
-            logging_steps=10,
-            load_best_model_at_end=True,
-            metric_for_best_model="accuracy",
-        )
 ## Evaluation

 **Scope:** Processing of images, training and evaluation of ViT model,
 input file/directory processing, class (category) results of top
+N predictions output, predictions summarizing into a tabular format,
+HF hub support for the model
 ## Model description:
 #### Categories:
+- **DRAW**:	1182	(11.89%)  - drawings, maps, paintings with text
+- **DRAW_L**:	813	(8.17%)   - drawings, maps, paintings with a table legend or inside tabular layout / forms
+- **LINE_HW**:	596	(5.99%)   - handwritten text lines inside tabular layout / forms
+- **LINE_P**:	603	(6.06%)   - printed text lines inside tabular layout / forms
+- **LINE_T**:	1332	(13.39%)  - machine typed text lines inside tabular layout / forms
 - **PHOTO**:	1015	(10.21%)  - photos with text
+- **PHOTO_L**:	782	(7.86%)   - photos inside tabular layout / forms
+- **TEXT**:	853	(8.58%)   - mixed types, printed, and handwritten texts
+- **TEXT_HW**:	732	(7.36%)   - only handwritten text
+- **TEXT_P**:	691	(6.95%)   - only printed text
+- **TEXT_T**:	1346	(13.53%)  - only machine typed text
+Evaluation set (10% of the above stats):	**995** images
 ### Result tables:
 #### Preprocessing
+train_transforms:
     transforms.RandomApply([
     ], p=0.5),
 eval_transforms - basic.
 #### Training Hyperparameters
+    eval_strategy="epoch",
+    save_strategy="epoch",
+    learning_rate=5e-5,
+    per_device_train_batch_size=8,
+    per_device_eval_batch_size=8,
+    num_train_epochs=3,
+    warmup_ratio=0.1,
+    logging_steps=10,
+    load_best_model_at_end=True,
+    metric_for_best_model="accuracy",
 ## Evaluation