AoiNoGeso
/

japanese-clip-stair-v4

Zero-Shot Image Classification

vision-language

image-text-matching

Model card Files Files and versions Community

Japanese CLIP Model with Full Tuning

日本語画像・テキスト対応CLIPモデル（STAIR Captions v1.2で学習）

モデル概要 / Model Overview

このモデルは、STAIR Captions v1.2データセットで学習された日本語CLIPモデルです。

特徴 / Features

Full Tuning: 画像エンコーダーとテキストエンコーダーの両方を学習
高品質な日本語理解: BERT-base-japanese-v3をファインチューニング
温度付きコントラスト損失: InfoNCE損失による効果的な学習

モデル詳細 / Model Details

テキストエンコーダー: tohoku-nlp/bert-base-japanese-v3 (ファインチューニング)
画像エンコーダー: ResNet50 (ImageNet1K事前学習済み、ファインチューニング)
共通埋め込み次元: 768
画像サイズ: 224x224
最大テキスト長: 128
学習率: 1e-05
損失関数: 温度付きコントラスト損失 (InfoNCE)

学習データ / Training Data

データセット: STAIR Captions v1.2
言語: 日本語
ドメイン: 一般的な画像キャプション

ライセンス / License

Apache License 2.0

Downloads last month: 103

Inference Providers NEW

Zero-Shot Image Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support