CardVault+ SmolVLM-500M Fine-tuned Model

Model Description

CardVault+ is a specialized fine-tuned version of HuggingFaceTB/SmolVLM-500M-Instruct, optimized for credit card information extraction. This model can accurately extract structured data from credit card images in JSON format.

Key Features

  • Vision-Language Processing: Processes credit card images and extracts text
  • Structured Output: Returns data in consistent JSON format
  • High Accuracy: Fine-tuned on 9,612 synthetic credit card images
  • Production Ready: Optimized for real-world deployment

Training Details

  • Base Model: HuggingFaceTB/SmolVLM-500M-Instruct
  • Training Method: LoRA (Low-Rank Adaptation) fine-tuning
  • Training Data: 9,612 synthetic credit card images
  • Training Epochs: 4
  • Final Validation Loss: 0.000056
  • Training Date: July 22, 2025

Usage

from transformers import AutoProcessor, Idefics3ForConditionalGeneration
from PIL import Image
import torch

# Load model and processor
model = Idefics3ForConditionalGeneration.from_pretrained(
    "sugiv/cardvaultplus-500m",
    torch_dtype=torch.float16,
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
    "sugiv/cardvaultplus-500m", 
    trust_remote_code=True
)

# Process credit card image
image = Image.open("credit_card.jpg")
prompt = '''<|im_start|>user
Extract the card information from this image in the following JSON format:
{
    "card_number": "XXXX XXXX XXXX XXXX",
    "cardholder_name": "FULL NAME", 
    "expiry_date": "MM/YY",
    "cvv": "XXX",
    "card_type": "VISA/MASTERCARD/AMEX/DISCOVER"
}

<image>
<|im_end|>
<|im_start|>assistant
'''

inputs = processor(images=image, text=prompt, return_tensors="pt")
with torch.no_grad():
    generated_ids = model.generate(**inputs, max_new_tokens=200)
output = processor.decode(generated_ids[0], skip_special_tokens=True)

Expected Output Format

{
    "card_number": "4532 1234 5678 9012",
    "cardholder_name": "JOHN DOE",
    "expiry_date": "12/27", 
    "cvv": "123",
    "card_type": "VISA"
}

Technical Specifications

  • Model Size: ~1GB (merged weights)
  • Architecture: Idefics3ForConditionalGeneration
  • Input Resolution: Variable (optimized for card images)
  • Output Format: Structured JSON
  • Inference Speed: ~2-3 seconds per image (RTX A6000)

Limitations and Ethical Considerations

  • Intended Use: Educational and development purposes only
  • Data Privacy: Do not use with real credit card data
  • Security: Always implement proper data handling and security measures
  • Compliance: Ensure compliance with financial data regulations

License

Apache 2.0

Downloads last month
19
Safetensors
Model size
507M params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sugiv/cardvaultplus-500m

Quantized
(12)
this model
Quantizations
1 model