ogulcanakca
/

layoutlmv2-base-uncased-finetuned-cordv2-receipts

@@ -6,34 +6,42 @@ datasets:
 - naver-clova-ix/cord-v2
 base_model:
 - microsoft/layoutlmv2-base-uncased
 model-index:
-  - name: Smart Receipt Reader - LayoutLMv2 on CORD-v2
-    results:
-      - task:
-          type: token-classification
-          name: Receipt Entity Extraction
-        dataset:
-          name: CORD-v2 (Test Set)
-          type: naver-clova-ix/cord-v2
-        metrics:
-          - name: Overall F1 (Weighted Avg)
-            type: f1
-            value: 0.9575
-          - name: Overall Precision (Weighted Avg)
-            type: precision
-            value: 0.9582
-          - name: Overall Recall (Weighted Avg)
-            type: recall
-            value: 0.9567
-          - name: Overall Accuracy
-            type: accuracy
-            value: 0.9690
-          - name: Macro Avg F1-Score
-            type: f1_macro
-            value: 0.80
 ---
-# Project Name: Smart Receipt Reader: Automatic Information Extraction with LayoutLMv2 (CORD-v2)
 ## Overview and Project Contribution
@@ -203,4 +211,47 @@ for token_str, pred_id in zip(input_tokens, predicted_ids_list):
 print("\nExtracted Information (Simple Grouping):")
 for label, texts in extracted_info.items():
-    print(f"{label}: {' '.join(texts)}")

 - naver-clova-ix/cord-v2
 base_model:
 - microsoft/layoutlmv2-base-uncased
 model-index:
+- name: Smart Receipt Reader - LayoutLMv2 on CORD-v2
+  results:
+  - task:
+      type: token-classification
+      name: Receipt Entity Extraction
+    dataset:
+      name: CORD-v2 (Test Set)
+      type: naver-clova-ix/cord-v2
+    metrics:
+    - name: Overall F1 (Weighted Avg)
+      type: f1
+      value: 0.9575
+    - name: Overall Precision (Weighted Avg)
+      type: precision
+      value: 0.9582
+    - name: Overall Recall (Weighted Avg)
+      type: recall
+      value: 0.9567
+    - name: Overall Accuracy
+      type: accuracy
+      value: 0.969
+    - name: Macro Avg F1-Score
+      type: f1_macro
+      value: 0.8
+pipeline_tag: token-classification
+tags:
+- transformers
+- pytorch
+- document-ai
+- information-extraction
+- token-classification
+- cord-v2
+- ocr-post-processing
 ---
+# Smart Receipt Reader: Automatic Information Extraction with LayoutLMv2 (CORD-v2)
 ## Overview and Project Contribution
 print("\nExtracted Information (Simple Grouping):")
 for label, texts in extracted_info.items():
+    print(f"{label}: {' '.join(texts)}")
+```
+## Training Hyperparameters
+* Learning Rate: 5e-5
+* Number of Training Epochs: 10
+* Per Device Train Batch Size: 2
+* Per Device Eval Batch Size: 2
+* Gradient Accumulation Steps: 1
+* Warmup Ratio: 0.1
+* Weight Decay: 0.01
+* Optimizer: AdamW
+* adam_beta1: 0.9
+* adam_beta2: 0.999
+* adam_epsilon: 1e-8
+* LR Scheduler Type: linear
+* Mixed Precision: FP32 (fp16=True & bf16=True)
+* Seed for Reproducibility: 42
+* Max Sequence Length: 512
+## Enviroment Informations
+* model.safe_tensors (or pytorch_model.bin): ~802 MB
+* Dataset: CORD-v2 (naver-clova-ix/cord-v2) - 13,500 training examples
+* GPU: NVIDIA P100 (on Kaggle)
+* Total Training Time (for 10 epochs): Approximately 34 minutes 15 seconds
+* Inference Speed (Indicative):
+* Using Trainer.predict() on the test set (NVIDIA P100): Approximately 8.17 samples per second
+```
+@misc{ogulcanakca_layoutlmv2_cordv2_receipts_2025,
+  author = {[Oğulcan Akca]},
+  title = {Fine-tuned LayoutLMv2 for Receipt Information Extraction on CORD-v2},
+  year = {2025},
+  publisher = {Hugging Face},
+  journal = {Hugging Face Model Hub},
+  howpublished = {https://huggingface.co/ogulcanakca/layoutlmv2-base-uncased-finetuned-cordv2-receipts}
+}
+```
+## Model Card Contact
+- ogulcanakca (Hugging Face)
+- Mail: [email protected]