tue-mps
/

coco_instance_eomt_large_640

Image Segmentation

Model card Files Files and versions Community

coco_instance_eomt_large_640 / README.md

yaswanthgali's picture

Update README.md

42dd786 verified 5 days ago

|

history blame contribute delete

2.63 kB

	---
	library_name: transformers
	license: mit
	tags:
	- vision
	- image-segmentation
	- pytorch
	---
	# EoMT

	[![PyTorch](https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white)](https://pytorch.org/)

	EoMT (Encoder-only Mask Transformer) is a Vision Transformer (ViT) architecture designed for high-quality and efficient image segmentation. It was introduced in the CVPR 2025 highlight paper:
	[Your ViT is Secretly an Image Segmentation Model](https://www.tue-mps.org/eomt)
	by Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan de Geus.

	> Key Insight: Given sufficient scale and pretraining, a plain ViT along with additional few params can perform segmentation without the need for task-specific decoders or pixel fusion modules. The same model backbone supports semantic, instance, and panoptic segmentation with different post-processing 🤗

	The original implementation can be found in this [repository](https://github.com/tue-mps/eomt)

	---


	### How to use

	Here is how to use this model for Instance Segmentation:

	```python
	import matplotlib.pyplot as plt
	import requests
	import torch
	from PIL import Image

	from transformers import EomtForUniversalSegmentation, AutoImageProcessor


	model_id = "tue-mps/coco_instance_eomt_large_640"
	processor = AutoImageProcessor.from_pretrained(model_id)
	model = EomtForUniversalSegmentation.from_pretrained(model_id)

	image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)

	inputs = processor(
	images=image,
	return_tensors="pt",
	)

	with torch.inference_mode():
	outputs = model(**inputs)

	# Prepare the original image size in the format (height, width)
	target_sizes = [(image.height, image.width)]

	# Post-process the model outputs to get final segmentation prediction
	preds = processor.post_process_instance_segmentation(
	outputs,
	target_sizes=target_sizes,
	)

	# Visualize the segmentation mask
	plt.imshow(preds[0]["segmentation"])
	plt.axis("off")
	plt.title("Instance Segmentation")
	plt.show()
	```

	## Citation
	If you find our work useful, please consider citing us as:
	```bibtex
	@inproceedings{kerssies2025eomt,
	author = {Kerssies, Tommie and Cavagnero, Niccolò and Hermans, Alexander and Norouzi, Narges and Averta, Giuseppe and Leibe, Bastian and Dubbelman, Gijs and de Geus, Daan},
	title = {Your ViT is Secretly an Image Segmentation Model},
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	year = {2025},
	}
	```