ogulcanakca commited on
Commit
f34d82b
·
verified ·
1 Parent(s): e823f15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -27
README.md CHANGED
@@ -6,34 +6,42 @@ datasets:
6
  - naver-clova-ix/cord-v2
7
  base_model:
8
  - microsoft/layoutlmv2-base-uncased
9
-
10
  model-index:
11
- - name: Smart Receipt Reader - LayoutLMv2 on CORD-v2
12
- results:
13
- - task:
14
- type: token-classification
15
- name: Receipt Entity Extraction
16
- dataset:
17
- name: CORD-v2 (Test Set)
18
- type: naver-clova-ix/cord-v2
19
- metrics:
20
- - name: Overall F1 (Weighted Avg)
21
- type: f1
22
- value: 0.9575
23
- - name: Overall Precision (Weighted Avg)
24
- type: precision
25
- value: 0.9582
26
- - name: Overall Recall (Weighted Avg)
27
- type: recall
28
- value: 0.9567
29
- - name: Overall Accuracy
30
- type: accuracy
31
- value: 0.9690
32
- - name: Macro Avg F1-Score
33
- type: f1_macro
34
- value: 0.80
 
 
 
 
 
 
 
 
 
35
  ---
36
- # Project Name: Smart Receipt Reader: Automatic Information Extraction with LayoutLMv2 (CORD-v2)
37
 
38
  ## Overview and Project Contribution
39
 
@@ -203,4 +211,47 @@ for token_str, pred_id in zip(input_tokens, predicted_ids_list):
203
 
204
  print("\nExtracted Information (Simple Grouping):")
205
  for label, texts in extracted_info.items():
206
- print(f"{label}: {' '.join(texts)}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - naver-clova-ix/cord-v2
7
  base_model:
8
  - microsoft/layoutlmv2-base-uncased
 
9
  model-index:
10
+ - name: Smart Receipt Reader - LayoutLMv2 on CORD-v2
11
+ results:
12
+ - task:
13
+ type: token-classification
14
+ name: Receipt Entity Extraction
15
+ dataset:
16
+ name: CORD-v2 (Test Set)
17
+ type: naver-clova-ix/cord-v2
18
+ metrics:
19
+ - name: Overall F1 (Weighted Avg)
20
+ type: f1
21
+ value: 0.9575
22
+ - name: Overall Precision (Weighted Avg)
23
+ type: precision
24
+ value: 0.9582
25
+ - name: Overall Recall (Weighted Avg)
26
+ type: recall
27
+ value: 0.9567
28
+ - name: Overall Accuracy
29
+ type: accuracy
30
+ value: 0.969
31
+ - name: Macro Avg F1-Score
32
+ type: f1_macro
33
+ value: 0.8
34
+ pipeline_tag: token-classification
35
+ tags:
36
+ - transformers
37
+ - pytorch
38
+ - document-ai
39
+ - information-extraction
40
+ - token-classification
41
+ - cord-v2
42
+ - ocr-post-processing
43
  ---
44
+ # Smart Receipt Reader: Automatic Information Extraction with LayoutLMv2 (CORD-v2)
45
 
46
  ## Overview and Project Contribution
47
 
 
211
 
212
  print("\nExtracted Information (Simple Grouping):")
213
  for label, texts in extracted_info.items():
214
+ print(f"{label}: {' '.join(texts)}")
215
+ ```
216
+ ## Training Hyperparameters
217
+
218
+ * Learning Rate: 5e-5
219
+ * Number of Training Epochs: 10
220
+ * Per Device Train Batch Size: 2
221
+ * Per Device Eval Batch Size: 2
222
+ * Gradient Accumulation Steps: 1
223
+ * Warmup Ratio: 0.1
224
+ * Weight Decay: 0.01
225
+ * Optimizer: AdamW
226
+ * adam_beta1: 0.9
227
+ * adam_beta2: 0.999
228
+ * adam_epsilon: 1e-8
229
+ * LR Scheduler Type: linear
230
+ * Mixed Precision: FP32 (fp16=True & bf16=True)
231
+ * Seed for Reproducibility: 42
232
+ * Max Sequence Length: 512
233
+
234
+ ## Enviroment Informations
235
+
236
+ * model.safe_tensors (or pytorch_model.bin): ~802 MB
237
+ * Dataset: CORD-v2 (naver-clova-ix/cord-v2) - 13,500 training examples
238
+ * GPU: NVIDIA P100 (on Kaggle)
239
+ * Total Training Time (for 10 epochs): Approximately 34 minutes 15 seconds
240
+ * Inference Speed (Indicative):
241
+ * Using Trainer.predict() on the test set (NVIDIA P100): Approximately 8.17 samples per second
242
+
243
+ ```
244
+ @misc{ogulcanakca_layoutlmv2_cordv2_receipts_2025,
245
+ author = {[Oğulcan Akca]},
246
+ title = {Fine-tuned LayoutLMv2 for Receipt Information Extraction on CORD-v2},
247
+ year = {2025},
248
+ publisher = {Hugging Face},
249
+ journal = {Hugging Face Model Hub},
250
+ howpublished = {https://huggingface.co/ogulcanakca/layoutlmv2-base-uncased-finetuned-cordv2-receipts}
251
+ }
252
+ ```
253
+
254
+ ## Model Card Contact
255
+
256
+ - ogulcanakca (Hugging Face)
257
+ - Mail: [email protected]