metadata

title: CLIP Service
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

CLIP Service 🔍

A FastAPI service that provides CLIP (Contrastive Language-Image Pre-training) embeddings for images and text using the openai/clip-vit-large-patch14 model.

🚀 Features

Image Encoding: Generate 768-dimensional embeddings from image URLs
Text Encoding: Generate embeddings from text descriptions
High Performance: Optimized for batch processing
REST API: Simple HTTP endpoints for easy integration

📋 API Endpoints

`POST /encode/image`

Generate embeddings for an image from URL.

Request:

{
  "image_url": "https://example.com/image.jpg"
}

Response:

{
  "embedding": [0.1, -0.2, 0.3, ...], // 768 dimensions
  "dimensions": 768
}

`POST /encode/text`

Generate embeddings for text.

Request:

{
  "text": "a beautiful sunset over mountains"
}

Response:

{
  "embedding": [0.1, -0.2, 0.3, ...], // 768 dimensions
  "dimensions": 768
}

`GET /health`

Check service health and status.

🔧 Usage Examples

# Encode an image
curl -X POST "https://your-username-clip-service.hf.space/encode/image" \
  -H "Content-Type: application/json" \
  -d '{"image_url": "https://example.com/image.jpg"}'

# Encode text
curl -X POST "https://your-username-clip-service.hf.space/encode/text" \
  -H "Content-Type: application/json" \
  -d '{"text": "a beautiful landscape"}'

🏗️ Integration

This service is designed to work with Pinterest-like applications for:

Visual similarity search
Content-based recommendations
Cross-modal search (text to image, image to text)

📝 Model Information

Model: openai/clip-vit-large-patch14
Embedding Dimensions: 768
Supported Images: JPG, PNG, GIF, WebP
Max Image Size: Recommended < 10MB

⚡ Performance

CPU: ~2-5 seconds per image
GPU: ~0.5-1 second per image (when available)
Batch Processing: Supported for multiple requests

Built with ❤️ using Transformers and FastAPI