SentenceTransformer based on WhereIsAI/UAE-Large-V1

This is a sentence-transformers model finetuned from WhereIsAI/UAE-Large-V1. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: WhereIsAI/UAE-Large-V1
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("cyberbabooshka/uae_large_ft1")
# Run inference
sentences = [
    'What is the relationship between the smallest perturbation of a matrix and its rank, as established in theorems regarding matrix perturbations?',
    '"Suppose $A \\in C^{m \\times n}$ has full column rank (= n). Then $\\min _{\\Delta \\in \\mathbb{C}^{m \\times n}}\\left\\{\\|\\Delta\\|_{2} \\mid A+\\Delta \\text { has rank }<n\\right\\}=\\sigma_{n}(A)$."',
    '"If a beam of light enters and then exits the elevator, the observer on Earth and the one accelerating in empty space must observe the same thing, since they cannot distinguish between being on Earth or accelerating in space. The observer in space, who is accelerating, will observe that the beam of light bends as it crosses the elevator... that means that if the path of a beam of light is curved near Earth, it must be because space itself is curved in the presence of a gravitational field!"',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6143
cosine_accuracy@3 0.7357
cosine_accuracy@5 0.7833
cosine_accuracy@10 0.8381
cosine_precision@1 0.6143
cosine_precision@3 0.2452
cosine_precision@5 0.1567
cosine_precision@10 0.0838
cosine_recall@1 0.6143
cosine_recall@3 0.7357
cosine_recall@5 0.7833
cosine_recall@10 0.8381
cosine_ndcg@10 0.7235
cosine_mrr@10 0.6871
cosine_map@100 0.6925

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,760 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 9 tokens
    • mean: 24.87 tokens
    • max: 70 tokens
    • min: 11 tokens
    • mean: 68.37 tokens
    • max: 500 tokens
  • Samples:
    anchor positive
    How is a proper coloring of a graph defined in the context of vertices and edges? "A coloring is called proper if for each edge joining two distinct vertices, the two vertices it joins have different colors."
    What is the relationship between the first excited state of the box model and the p orbitals in a hydrogen atom? "The p orbitals are similar to the first excited state of the box, i.e. $(n_{x},n_{y},n_{z})=(2,1,1)$ is similar to a $p_{x}$ orbital, $(n_{x},n_{y},n_{z})=(1,2,1)$ is similar to a $p_{y}$ orbital and $(n_{x},n_{y},n_{z})=(1,1,2)$ is similar to a $p_{z}$ orbital."
    How can the behavior of the derivative ( f'(x) ) indicate the presence of a local maximum or minimum at a critical point ( x=a )? "If there is a local maximum when ( x=a ), the function must be lower near ( x=a ) than it is right at ( x=a ). If the derivative exists near ( x=a ), this means ( f'(x)>0 ) when ( x ) is near ( a ) and ( x < a ), because the function must 'slope up' just to the left of ( a ). Similarly, ( f'(x) < 0 ) when ( x ) is near ( a ) and ( x>a ), because ( f ) slopes down from the local maximum as we move to the right. Using the same reasoning, if there is a local minimum at ( x=a ), the derivative of ( f ) must be negative just to the left of ( a ) and positive just to the right."
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 420 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 420 samples:
    anchor positive
    type string string
    details
    • min: 12 tokens
    • mean: 24.97 tokens
    • max: 66 tokens
    • min: 7 tokens
    • mean: 68.52 tokens
    • max: 452 tokens
  • Samples:
    anchor positive
    What are the two central classes mentioned in the FileSystem framework and what do they represent? "The class FileReference is the most important entry point to the framework." and "FileSystem is a powerful and elegant library to manipulate files."
    What is the significance of Turing's work in the context of PDE-based models for self-organization of complex systems? "Turing’s monumental work on the chemical basis of morphogenesis played an important role in igniting researchers’ attention to the PDE-based continuous field models as a mathematical framework to study self-organization of complex systems."
    What are the two options for reducing accelerations as discussed in the passage? "From the above definitions we see that there are really two options for reducing accelerations. We can reduce the amount that velocity changes, or we can increase the time over which the velocity changes (or both)."
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • weight_decay: 0.05
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • fp16: True
  • eval_on_start: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.05
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: True
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss eval_cosine_ndcg@10
0 0 - 0.0971 0.6824
0.0091 1 0.1198 - -
0.0182 2 0.0787 - -
0.0273 3 0.0614 - -
0.0364 4 0.138 - -
0.0455 5 0.1204 - -
0.0545 6 0.1885 - -
0.0636 7 0.0475 - -
0.0727 8 0.1358 - -
0.0818 9 0.1666 - -
0.0909 10 0.0737 - -
0.1 11 0.0997 - -
0.1091 12 0.0795 - -
0.1182 13 0.1071 - -
0.1273 14 0.1224 - -
0.1364 15 0.0499 - -
0.1455 16 0.0806 - -
0.1545 17 0.0353 - -
0.1636 18 0.0542 - -
0.1727 19 0.0412 - -
0.1818 20 0.1375 - -
0.1909 21 0.1124 - -
0.2 22 0.0992 - -
0.2091 23 0.0285 - -
0.2182 24 0.0337 - -
0.2273 25 0.0737 - -
0.2364 26 0.2011 - -
0.2455 27 0.0241 - -
0.2545 28 0.1319 - -
0.2636 29 0.0104 - -
0.2727 30 0.0162 - -
0.2818 31 0.3061 - -
0.2909 32 0.0422 - -
0.3 33 0.1893 - -
0.3091 34 0.0207 - -
0.3182 35 0.0744 - -
0.3273 36 0.0246 - -
0.3364 37 0.0079 - -
0.3455 38 0.0256 - -
0.3545 39 0.0224 - -
0.3636 40 0.0151 - -
0.3727 41 0.0738 - -
0.3818 42 0.0239 - -
0.3909 43 0.0169 - -
0.4 44 0.0152 - -
0.4091 45 0.0244 - -
0.4182 46 0.1708 - -
0.4273 47 0.0146 - -
0.4364 48 0.1367 - -
0.4455 49 0.049 - -
0.4545 50 0.0211 - -
0.4636 51 0.0135 - -
0.4727 52 0.0668 - -
0.4818 53 0.087 - -
0.4909 54 0.0046 - -
0.5 55 0.0032 - -
0.5091 56 0.0133 - -
0.5182 57 0.0109 - -
0.5273 58 0.0396 - -
0.5364 59 0.0291 - -
0.5455 60 0.0299 - -
0.5545 61 0.0134 - -
0.5636 62 0.0135 - -
0.5727 63 0.0049 - -
0.5818 64 0.0199 - -
0.5909 65 0.1533 - -
0.6 66 0.3639 - -
0.6091 67 0.0652 - -
0.6182 68 0.0315 - -
0.6273 69 0.0403 - -
0.6364 70 0.011 - -
0.6455 71 0.0265 - -
0.6545 72 0.1146 - -
0.6636 73 0.0932 - -
0.6727 74 0.0234 - -
0.6818 75 0.0581 - -
0.6909 76 0.0132 - -
0.7 77 0.1183 - -
0.7091 78 0.0913 - -
0.7182 79 0.0262 - -
0.7273 80 0.0262 - -
0.7364 81 0.0159 - -
0.7455 82 0.0407 - -
0.7545 83 0.0294 - -
0.7636 84 0.0567 - -
0.7727 85 0.0959 - -
0.7818 86 0.033 - -
0.7909 87 0.0234 - -
0.8 88 0.0088 - -
0.8091 89 0.0249 - -
0.8182 90 0.0276 - -
0.8273 91 0.0936 - -
0.8364 92 0.0067 - -
0.8455 93 0.0064 - -
0.8545 94 0.0654 - -
0.8636 95 0.0048 - -
0.8727 96 0.0087 - -
0.8818 97 0.0115 - -
0.8909 98 0.0092 - -
0.9 99 0.0514 - -
0.9091 100 0.1856 - -
0.9182 101 0.0364 - -
0.9273 102 0.0455 - -
0.9364 103 0.0057 - -
0.9455 104 0.0038 - -
0.9545 105 0.0209 - -
0.9636 106 0.0247 - -
0.9727 107 0.0735 - -
0.9818 108 0.004 - -
0.9909 109 0.0174 - -
1.0 110 0.018 0.0282 0.7093
1.0091 111 0.0187 - -
1.0182 112 0.0116 - -
1.0273 113 0.0043 - -
1.0364 114 0.0059 - -
1.0455 115 0.0067 - -
1.0545 116 0.0093 - -
1.0636 117 0.0821 - -
1.0727 118 0.0097 - -
1.0818 119 0.0141 - -
1.0909 120 0.0202 - -
1.1 121 0.0034 - -
1.1091 122 0.0025 - -
1.1182 123 0.006 - -
1.1273 124 0.004 - -
1.1364 125 0.003 - -
1.1455 126 0.0399 - -
1.1545 127 0.0026 - -
1.1636 128 0.0043 - -
1.1727 129 0.1317 - -
1.1818 130 0.0024 - -
1.1909 131 0.0027 - -
1.2 132 0.076 - -
1.2091 133 0.0302 - -
1.2182 134 0.0026 - -
1.2273 135 0.1611 - -
1.2364 136 0.0413 - -
1.2455 137 0.0118 - -
1.2545 138 0.0042 - -
1.2636 139 0.0401 - -
1.2727 140 0.0036 - -
1.2818 141 0.0034 - -
1.2909 142 0.0026 - -
1.3 143 0.0044 - -
1.3091 144 0.0024 - -
1.3182 145 0.0036 - -
1.3273 146 0.0242 - -
1.3364 147 0.0015 - -
1.3455 148 0.1008 - -
1.3545 149 0.0057 - -
1.3636 150 0.0062 - -
1.3727 151 0.0048 - -
1.3818 152 0.0026 - -
1.3909 153 0.0045 - -
1.4 154 0.0139 - -
1.4091 155 0.0017 - -
1.4182 156 0.0012 - -
1.4273 157 0.0009 - -
1.4364 158 0.006 - -
1.4455 159 0.0618 - -
1.4545 160 0.0889 - -
1.4636 161 0.0034 - -
1.4727 162 0.0184 - -
1.4818 163 0.0035 - -
1.4909 164 0.002 - -
1.5 165 0.0115 - -
1.5091 166 0.0008 - -
1.5182 167 0.0113 - -
1.5273 168 0.01 - -
1.5364 169 0.0177 - -
1.5455 170 0.0059 - -
1.5545 171 0.0123 - -
1.5636 172 0.0103 - -
1.5727 173 0.008 - -
1.5818 174 0.002 - -
1.5909 175 0.0039 - -
1.6 176 0.0174 - -
1.6091 177 0.0191 - -
1.6182 178 0.002 - -
1.6273 179 0.0009 - -
1.6364 180 0.0021 - -
1.6455 181 0.0011 - -
1.6545 182 0.0027 - -
1.6636 183 0.0005 - -
1.6727 184 0.0026 - -
1.6818 185 0.0047 - -
1.6909 186 0.0033 - -
1.7 187 0.0402 - -
1.7091 188 0.0128 - -
1.7182 189 0.01 - -
1.7273 190 0.0057 - -
1.7364 191 0.0133 - -
1.7455 192 0.0099 - -
1.7545 193 0.1022 - -
1.7636 194 0.0223 - -
1.7727 195 0.0037 - -
1.7818 196 0.0073 - -
1.7909 197 0.0212 - -
1.8 198 0.0231 - -
1.8091 199 0.0016 - -
1.8182 200 0.0017 - -
1.8273 201 0.0035 - -
1.8364 202 0.0165 - -
1.8455 203 0.0131 - -
1.8545 204 0.0032 - -
1.8636 205 0.0075 - -
1.8727 206 0.0438 - -
1.8818 207 0.0022 - -
1.8909 208 0.0501 - -
1.9 209 0.0121 - -
1.9091 210 0.0036 - -
1.9182 211 0.0041 - -
1.9273 212 0.0048 - -
1.9364 213 0.0159 - -
1.9455 214 0.0036 - -
1.9545 215 0.0035 - -
1.9636 216 0.004 - -
1.9727 217 0.0039 - -
1.9818 218 0.0177 - -
1.9909 219 0.0042 - -
2.0 220 0.0044 0.0230 0.7225
2.0091 221 0.0339 - -
2.0182 222 0.0032 - -
2.0273 223 0.0133 - -
2.0364 224 0.0031 - -
2.0455 225 0.0025 - -
2.0545 226 0.0039 - -
2.0636 227 0.0011 - -
2.0727 228 0.0021 - -
2.0818 229 0.0591 - -
2.0909 230 0.0011 - -
2.1 231 0.0008 - -
2.1091 232 0.0014 - -
2.1182 233 0.0057 - -
2.1273 234 0.0044 - -
2.1364 235 0.001 - -
2.1455 236 0.0009 - -
2.1545 237 0.0028 - -
2.1636 238 0.0076 - -
2.1727 239 0.0018 - -
2.1818 240 0.0022 - -
2.1909 241 0.0029 - -
2.2 242 0.0004 - -
2.2091 243 0.0025 - -
2.2182 244 0.0013 - -
2.2273 245 0.0487 - -
2.2364 246 0.0016 - -
2.2455 247 0.0023 - -
2.2545 248 0.0038 - -
2.2636 249 0.003 - -
2.2727 250 0.0017 - -
2.2818 251 0.0056 - -
2.2909 252 0.0036 - -
2.3 253 0.0016 - -
2.3091 254 0.0021 - -
2.3182 255 0.0019 - -
2.3273 256 0.001 - -
2.3364 257 0.0017 - -
2.3455 258 0.0027 - -
2.3545 259 0.0039 - -
2.3636 260 0.0011 - -
2.3727 261 0.0248 - -
2.3818 262 0.0219 - -
2.3909 263 0.0015 - -
2.4 264 0.0009 - -
2.4091 265 0.0013 - -
2.4182 266 0.0049 - -
2.4273 267 0.0073 - -
2.4364 268 0.007 - -
2.4455 269 0.0024 - -
2.4545 270 0.0008 - -
2.4636 271 0.001 - -
2.4727 272 0.0016 - -
2.4818 273 0.0007 - -
2.4909 274 0.0091 - -
2.5 275 0.0127 - -
2.5091 276 0.0013 - -
2.5182 277 0.001 - -
2.5273 278 0.0006 - -
2.5364 279 0.005 - -
2.5455 280 0.0154 - -
2.5545 281 0.0015 - -
2.5636 282 0.0229 - -
2.5727 283 0.0026 - -
2.5818 284 0.0008 - -
2.5909 285 0.0024 - -
2.6 286 0.0012 - -
2.6091 287 0.0748 - -
2.6182 288 0.0086 - -
2.6273 289 0.0013 - -
2.6364 290 0.0089 - -
2.6455 291 0.0011 - -
2.6545 292 0.0096 - -
2.6636 293 0.1416 - -
2.6727 294 0.0005 - -
2.6818 295 0.0021 - -
2.6909 296 0.0014 - -
2.7 297 0.0097 - -
2.7091 298 0.0014 - -
2.7182 299 0.0009 - -
2.7273 300 0.0016 - -
2.7364 301 0.0166 - -
2.7455 302 0.0028 - -
2.7545 303 0.0014 - -
2.7636 304 0.0018 - -
2.7727 305 0.0059 - -
2.7818 306 0.0012 - -
2.7909 307 0.0008 - -
2.8 308 0.0007 - -
2.8091 309 0.0038 - -
2.8182 310 0.0012 - -
2.8273 311 0.0091 - -
2.8364 312 0.0111 - -
2.8455 313 0.0016 - -
2.8545 314 0.0089 - -
2.8636 315 0.0071 - -
2.8727 316 0.0012 - -
2.8818 317 0.0251 - -
2.8909 318 0.0017 - -
2.9 319 0.0006 - -
2.9091 320 0.0014 - -
2.9182 321 0.0011 - -
2.9273 322 0.0084 - -
2.9364 323 0.0055 - -
2.9455 324 0.0011 - -
2.9545 325 0.0017 - -
2.9636 326 0.0008 - -
2.9727 327 0.0082 - -
2.9818 328 0.0006 - -
2.9909 329 0.0008 - -
3.0 330 0.0022 0.0275 0.6950
3.0091 331 0.0007 - -
3.0182 332 0.0012 - -
3.0273 333 0.0007 - -
3.0364 334 0.0038 - -
3.0455 335 0.0006 - -
3.0545 336 0.0012 - -
3.0636 337 0.0873 - -
3.0727 338 0.0022 - -
3.0818 339 0.0004 - -
3.0909 340 0.001 - -
3.1 341 0.0002 - -
3.1091 342 0.0069 - -
3.1182 343 0.0009 - -
3.1273 344 0.0101 - -
3.1364 345 0.0022 - -
3.1455 346 0.009 - -
3.1545 347 0.0018 - -
3.1636 348 0.0018 - -
3.1727 349 0.0045 - -
3.1818 350 0.029 - -
3.1909 351 0.0036 - -
3.2 352 0.0015 - -
3.2091 353 0.0021 - -
3.2182 354 0.0103 - -
3.2273 355 0.0005 - -
3.2364 356 0.0133 - -
3.2455 357 0.0015 - -
3.2545 358 0.001 - -
3.2636 359 0.0024 - -
3.2727 360 0.0052 - -
3.2818 361 0.0032 - -
3.2909 362 0.0024 - -
3.3 363 0.0008 - -
3.3091 364 0.0035 - -
3.3182 365 0.0012 - -
3.3273 366 0.0049 - -
3.3364 367 0.0452 - -
3.3455 368 0.0017 - -
3.3545 369 0.0112 - -
3.3636 370 0.0011 - -
3.3727 371 0.0016 - -
3.3818 372 0.0015 - -
3.3909 373 0.004 - -
3.4 374 0.0074 - -
3.4091 375 0.0005 - -
3.4182 376 0.0007 - -
3.4273 377 0.0014 - -
3.4364 378 0.0097 - -
3.4455 379 0.0026 - -
3.4545 380 0.0022 - -
3.4636 381 0.001 - -
3.4727 382 0.0004 - -
3.4818 383 0.004 - -
3.4909 384 0.0017 - -
3.5 385 0.0014 - -
3.5091 386 0.001 - -
3.5182 387 0.0047 - -
3.5273 388 0.0061 - -
3.5364 389 0.0017 - -
3.5455 390 0.0024 - -
3.5545 391 0.0021 - -
3.5636 392 0.0007 - -
3.5727 393 0.0009 - -
3.5818 394 0.0006 - -
3.5909 395 0.0038 - -
3.6 396 0.0006 - -
3.6091 397 0.0011 - -
3.6182 398 0.001 - -
3.6273 399 0.0014 - -
3.6364 400 0.0007 - -
3.6455 401 0.0052 - -
3.6545 402 0.0008 - -
3.6636 403 0.0009 - -
3.6727 404 0.0017 - -
3.6818 405 0.0028 - -
3.6909 406 0.0044 - -
3.7 407 0.0009 - -
3.7091 408 0.0134 - -
3.7182 409 0.001 - -
3.7273 410 0.0044 - -
3.7364 411 0.0138 - -
3.7455 412 0.0032 - -
3.7545 413 0.0004 - -
3.7636 414 0.0065 - -
3.7727 415 0.0007 - -
3.7818 416 0.0008 - -
3.7909 417 0.0007 - -
3.8 418 0.0018 - -
3.8091 419 0.001 - -
3.8182 420 0.0305 - -
3.8273 421 0.001 - -
3.8364 422 0.0011 - -
3.8455 423 0.0004 - -
3.8545 424 0.003 - -
3.8636 425 0.002 - -
3.8727 426 0.0018 - -
3.8818 427 0.0968 - -
3.8909 428 0.002 - -
3.9 429 0.002 - -
3.9091 430 0.0156 - -
3.9182 431 0.0059 - -
3.9273 432 0.001 - -
3.9364 433 0.0153 - -
3.9455 434 0.0013 - -
3.9545 435 0.0003 - -
3.9636 436 0.001 - -
3.9727 437 0.0005 - -
3.9818 438 0.0012 - -
3.9909 439 0.0109 - -
4.0 440 0.1597 0.0211 0.7235
4.0091 441 0.0027 - -
4.0182 442 0.0007 - -
4.0273 443 0.0089 - -
4.0364 444 0.0007 - -
4.0455 445 0.005 - -
4.0545 446 0.0019 - -
4.0636 447 0.0007 - -
4.0727 448 0.0008 - -
4.0818 449 0.002 - -
4.0909 450 0.043 - -
4.1 451 0.0273 - -
4.1091 452 0.0009 - -
4.1182 453 0.0011 - -
4.1273 454 0.0007 - -
4.1364 455 0.0062 - -
4.1455 456 0.0004 - -
4.1545 457 0.0008 - -
4.1636 458 0.0128 - -
4.1727 459 0.0012 - -
4.1818 460 0.0013 - -
4.1909 461 0.0009 - -
4.2 462 0.0011 - -
4.2091 463 0.0336 - -
4.2182 464 0.0018 - -
4.2273 465 0.0009 - -
4.2364 466 0.0049 - -
4.2455 467 0.0012 - -
4.2545 468 0.001 - -
4.2636 469 0.0024 - -
4.2727 470 0.0063 - -
4.2818 471 0.0008 - -
4.2909 472 0.0793 - -
4.3 473 0.0016 - -
4.3091 474 0.0016 - -
4.3182 475 0.0043 - -
4.3273 476 0.036 - -
4.3364 477 0.002 - -
4.3455 478 0.0019 - -
4.3545 479 0.0012 - -
4.3636 480 0.0059 - -
4.3727 481 0.0017 - -
4.3818 482 0.0004 - -
4.3909 483 0.0014 - -
4.4 484 0.0143 - -
4.4091 485 0.0014 - -
4.4182 486 0.0009 - -
4.4273 487 0.0027 - -
4.4364 488 0.0017 - -
4.4455 489 0.0007 - -
4.4545 490 0.0008 - -
4.4636 491 0.0008 - -
4.4727 492 0.0014 - -
4.4818 493 0.0011 - -
4.4909 494 0.0013 - -
4.5 495 0.0016 - -
4.5091 496 0.001 - -
4.5182 497 0.0008 - -
4.5273 498 0.001 - -
4.5364 499 0.0019 - -
4.5455 500 0.0008 - -

Framework Versions

  • Python: 3.12.9
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
23
Safetensors
Model size
335M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cyberbabooshka/uae_large_ft1

Finetuned
(10)
this model

Evaluation results