Improve model card: Update pipeline tag, add descriptive tags, and enrich content

by nielsr HF Staff - opened Nov 19, 2025

←

nielsr

Nov 19, 2025

This PR significantly enhances the model card by:

Corrected pipeline_tag: Changed from image-text-to-text to image-segmentation to accurately reflect the model's primary function of language-guided dense grounding and segmentation in images and video. This improves discoverability for users.
Added descriptive tags: Included dense-grounding and referring-expression-segmentation for more precise categorization based on the model's core tasks.
Enriched Content: The model card content has been substantially expanded by incorporating detailed information from the GitHub repository. This includes:
- Explicit links to the paper (Sa2VA-i: Improving Sa2VA Results with Consistent Training and Inference) and the GitHub repository.
- A visual teaser image.
- Comprehensive sections on the model's overview, performance highlights, competition results, model zoo, quick start guide, and key technical improvements.
- A consolidated "Citation" section for both Sa2VA-i and the original Sa2VA.
Removed Redundant/Irrelevant Sections: The "File information" and the less comprehensive "Acknowledgement" sections have been removed to streamline the model card and adhere to best practices.

These changes provide a more complete, accurate, and user-friendly model card.

kumuji changed pull request status to merged Nov 20, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment