im2latex_model

This model is a VisionEncoderDecoderModel trained on a dataset for generating LaTeX formulas from images. This is part of a project that reproduces the following paper: https://arxiv.org/html/2408.04015v1. NOTE: In the paper, the model is finetuned on handwritten data after training. This is the model before finetuning.

Model Details

  • Encoder: Swin Transformer
  • Decoder: GPT-2
  • Framework: PyTorch

Training Data

The data is taken from OleehyO/latex-formulas. The data was divided into 80:10:10 for train, val and test. The splits were made as follows:

dataset = load_dataset(OleehyO/latex-formulas, cleaned_formulas)
train_val_split = dataset["train"].train_test_split(test_size=0.2, seed=42)
train_ds = train_val_split["train"]
val_test_split = train_val_split["test"].train_test_split(test_size=0.5, seed=42)
val_ds = val_test_split["train"]
test_ds = val_test_split["test"]

Evaluation Metrics

The model was evaluated on a test set with the following results:

  • Test Loss: TBA
  • Test BLEU Score: ~0.7

Usage

You can use the model directly with the transformers library:

from transformers import VisionEncoderDecoderModel, AutoTokenizer, AutoFeatureExtractor
import torch
from PIL import Image

# Load model, tokenizer, and feature extractor
model = VisionEncoderDecoderModel.from_pretrained("your-username/your-model-name")
tokenizer = AutoTokenizer.from_pretrained("your-username/your-model-name")
feature_extractor = AutoFeatureExtractor.from_pretrained("your-username/your-model-name")

# Prepare an image
image = Image.open("path/to/your/image.png")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values

# Generate LaTeX formula
generated_ids = model.generate(pixel_values)
generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

print("Generated LaTeX formula:", generated_texts[0])

Training Script

The training script for this model can be found in the following repository: GitHub


license: agpl-3.0

Downloads last month
112
Safetensors
Model size
240M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Matthijs0/im2latex_base