prashgec's picture
Add new SentenceTransformer model
b4aa3f8 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:70
  - loss:MultipleNegativesRankingLoss
base_model: sentence-transformers/clip-ViT-L-14
widget:
  - source_sentence: How to Manage Data Science Projects
    sentences:
      - Fine-Tuning Text Embeddings For Domain-specific Search (w/ Python)
      - I Was Wrong About AI Consulting (what I learned)
      - What Nature Can Teach Us About Business...
  - source_sentence: 4 Ways to Measure Fat Tails with Python (+ Example Code)
    sentences:
      - How I’d Learn AI in 2025 (if I could start over)
      - A Practical Introduction to Large Language Models (LLMs)
      - Fine-tuning Large Language Models (LLMs) | w/ Example Code
  - source_sentence: Dimensionality Reduction & Segmentation with Decision Trees | Python Code
    sentences:
      - 5 AI Projects For People in a Hurry (w/ Python)
      - How to Improve LLMs with RAG (Overview + Python Code)
      - How to Build an LLM from Scratch | An Overview
  - source_sentence: What Is Data Science & How To Start? | A Beginner's Guide
    sentences:
      - 3 AI Use Cases (that are not a chatbot)
      - The OpenAI (Python) API | Introduction & Example Code
      - Time Series, Signals, & the Fourier Transform | Introduction
  - source_sentence: 5 Questions Every Data Scientist Should Hardcode into Their Brain
    sentences:
      - How to Improve LLMs with Tools (ft. OpenAI Agents SDK)
      - ML Foundations for AI Engineers (in 34 Minutes)
      - 'Causality: An Introduction | How (naive) statistics can fail us'
datasets:
  - prashgec/my-learning-ds
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
model-index:
  - name: SentenceTransformer based on sentence-transformers/clip-ViT-L-14
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: yt title thumbnail train
          type: yt-title-thumbnail-train
        metrics:
          - type: cosine_accuracy
            value: 1
            name: Cosine Accuracy
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: yt title thumbnail valid
          type: yt-title-thumbnail-valid
        metrics:
          - type: cosine_accuracy
            value: 0.8666666746139526
            name: Cosine Accuracy

SentenceTransformer based on sentence-transformers/clip-ViT-L-14

This is a sentence-transformers model finetuned from sentence-transformers/clip-ViT-L-14 on the my-learning-ds dataset. It maps sentences & paragraphs to a None-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): CLIPModel()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("prashgec/clip-title-thumbnail-embeddings")
# Run inference
sentences = [
    '5 Questions Every Data Scientist Should Hardcode into Their Brain',
    'How to Improve LLMs with Tools (ft. OpenAI Agents SDK)',
    'ML Foundations for AI Engineers (in 34 Minutes)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6706, 0.7328],
#         [0.6706, 1.0000, 0.8154],
#         [0.7328, 0.8154, 1.0000]])

Evaluation

Metrics

Triplet

  • Datasets: yt-title-thumbnail-train and yt-title-thumbnail-valid
  • Evaluated with TripletEvaluator
Metric yt-title-thumbnail-train yt-title-thumbnail-valid
cosine_accuracy 1.0 0.8667

Training Details

Training Dataset

my-learning-ds

  • Dataset: my-learning-ds at 70c7274
  • Size: 70 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 70 samples:
    anchor positive negative
    type PIL.JpegImagePlugin.JpegImageFile string string
    details
    • min: 8 tokens
    • mean: 15.13 tokens
    • max: 27 tokens
    • min: 8 tokens
    • mean: 15.34 tokens
    • max: 27 tokens
  • Samples:
    anchor positive negative
    Causal Effects An introduction
    3 Ways to Make a Custom AI Assistant RAG, Tools, & Fine-tuning
    Prompt Engineering: How to Trick AI into Solving Your Problems Dimensionality Reduction & Segmentation with Decision Trees
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

my-learning-ds

  • Dataset: my-learning-ds at 70c7274
  • Size: 15 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 15 samples:
    anchor positive negative
    type PIL.JpegImagePlugin.JpegImageFile string string
    details
    • min: 8 tokens
    • mean: 14.07 tokens
    • max: 22 tokens
    • min: 10 tokens
    • mean: 15.0 tokens
    • max: 21 tokens
  • Samples:
    anchor positive negative
    The Wavelet Transform Introduction & Example Code
    Smoothing Crypto Time Series with Wavelets Real-world Data Project
    3 Reasons Businesses Should NOT Use AI Fine-tuning Large Language Models (LLMs)
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 0.0001
  • num_train_epochs: 2

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0001
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss yt-title-thumbnail-train_cosine_accuracy yt-title-thumbnail-valid_cosine_accuracy
-1 -1 - - 0.9571 0.8000
0.2 1 2.0436 - - -
0.4 2 2.1845 - - -
0.6 3 1.9404 - - -
0.8 4 2.0339 - - -
1.0 5 0.9129 2.2639 - -
1.2 6 1.3342 - - -
1.4 7 1.6938 - - -
1.6 8 1.6759 - - -
1.8 9 1.423 - - -
2.0 10 0.7338 2.2676 - -
-1 -1 - - 1.0 0.8667

Framework Versions

  • Python: 3.9.23
  • Sentence Transformers: 5.0.0
  • Transformers: 4.53.2
  • PyTorch: 2.7.1
  • Accelerate: 1.9.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}