How can I resolve a 'No embeddings were generated' error during fine-tuning with SentenceTransformer?

#54
by delojmkt - opened

Hi All!! I'm trying to fine-tune a sentence-transformer model using TripletLoss, but I keep encountering the following error. How can I fix this?

model = SentenceTransformer(MODEL_CARD,
                            cache_folder=MODEL_PATH,
                            trust_remote_code=True,
                            model_kwargs={'default_task': 'text-matching'}   
                        ).to(DEVICE)

tokenizer = AutoTokenizer.from_pretrained(MODEL_CARD, cache_dir=MODEL_PATH)
model.tokenizer = tokenizer

loss = TripletLoss(model=model)

args = SentenceTransformerTrainingArguments(
    output_dir="models",
    num_train_epochs=10,
    learning_rate=2e-5,
    warmup_ratio=0.1,
    fp16=True,
    bf16=False,
    batch_sampler = BatchSamplers.NO_DUPLICATES,
    eval_strategy="steps",
    eval_steps=10,
    save_strategy="steps",
    save_steps=20,
    logging_steps=10,
)

With the code above, I initialized the model and ran the following evaluator, which worked fine:

dev_evaluator = TripletEvaluator(
    anchors   = eval_dataset["anchor"],   # list[str]
    positives = eval_dataset["positive"], # list[str]
    negatives = eval_dataset["negative"], # list[str]
    name      = "daeun-triplet-dev",
    # task='text-matching'
)

dev_evaluator(model)
> {'daeun-triplet-dev_cosine_accuracy': 0.9617486596107483}

However, when I started the training process, I encountered a "No embeddings were generated" error:

trainer = SentenceTransformerTrainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    loss=loss,
    evaluator=dev_evaluator,
)

trainer.train()

error :

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[13], line 1
----> 1 trainer.train()

File ~/2025_dev/dev_repo/loan_recommend/daeun_loan/lib/python3.11/site-packages/transformers/trainer.py:2240, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   2238         hf_hub_utils.enable_progress_bars()
   2239 else:
-> 2240     return inner_training_loop(
   2241         args=args,
   2242         resume_from_checkpoint=resume_from_checkpoint,
   2243         trial=trial,
   2244         ignore_keys_for_eval=ignore_keys_for_eval,
   2245     )

File ~/2025_dev/dev_repo/loan_recommend/daeun_loan/lib/python3.11/site-packages/transformers/trainer.py:2555, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   2548 context = (
   2549     functools.partial(self.accelerator.no_sync, model=model)
   2550     if i != len(batch_samples) - 1
   2551     and self.accelerator.distributed_type != DistributedType.DEEPSPEED
   2552     else contextlib.nullcontext
   2553 )
   2554 with context():
-> 2555     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
   2557 if (
...
--> 175     raise RuntimeError("No embeddings were generated")
    177 all_embeddings.sort(key=lambda x: x[0])  # sort by original index
    178 combined_embeddings = torch.stack([emb for _, emb in all_embeddings])

RuntimeError: No embeddings were generated

I'm trying to perform fine-tuning — how should I proceed from here?

Sign up or log in to comment