--- license: apache-2.0 language: - de - en base_model: - allenai/Molmo-7B-D-0924 pipeline_tag: image-text-to-text library_name: transformers --- # ELAM-7B ELAM (Evaluative Large Action Model) is a Molmo 7B-D-based LAM (Large Action Model) that is also able to evaluate user expectations on screenshots of user interfaces. It was specifically fine-tuned on 17,708 instructions and evaluations for 6,230 automotive UI images. The images contained German and English text. All training prompts were English. German content was either translated or quoted directly. The evaluation dataset [AutomotiveUI-Bench-4K](https://huggingface.co/datasets/sparks-solutions/AutomotiveUI-Bench-4K) is available on Hugging Face. # Results ## AutomotiveUI-Bench-4K | Model | Test Action Grounding | Expected Result Grounding | Expected Result Evaluation | |---|---|---|---| | InternVL2.5-8B | 26.6 | 5.7 | 64.8 | | TinyClick | 61.0 | 54.6 | - | | UGround-V1-7B (Qwen2-VL) | 69.4 | 55.0 | - | | Molmo-7B-D-0924 | 71.3 | 71.4 | 66.9 | | LAM-270M (TinyClick) | 73.9 | 59.9 | - | | ELAM-7B (Molmo) | **87.6** | **77.5** | **78.2** | # Quick-Start ``` conda create -n elam python=3.10 -y conda activate elam pip install datasets==3.5.0 einops==0.8.1 torchvision==0.20.1 accelerate==1.6.0 pip install transformers==4.48.2 ``` ```python import re import torch from PIL import Image from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig # Load processor model_name = "sparks-solutions/ELAM-7B" processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True, torch_dtype="bfloat16", device_map="auto") # Load model model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True, torch_dtype="bfloat16", device_map="auto" ) def preprocess_elam_prompt(user_request: str, label_class: str): """Apply ELAM prompt template depending on class.""" if label_class == "Expected Result": return f"Evaluate this statement about the image:\n'{user_request}'\nThink step by step, conclude whether the evaluation is 'PASSED' or 'FAILED' and point to the UI element that corresponds to this evaluation." elif label_class == "Test Action": return f"Identify and point to the UI element that corresponds to this test action:\n{user_request}" def postprocess_response_elam(response: str): """Parse Molmo-style point coordinates from string and return tuple of floats in [0-1].""" pattern = r'