How can I resolve a 'No embeddings were generated' error during fine-tuning with SentenceTransformer?
#54
by
delojmkt
- opened
Hi All!! I'm trying to fine-tune a sentence-transformer
model using TripletLoss
, but I keep encountering the following error. How can I fix this?
model = SentenceTransformer(MODEL_CARD,
cache_folder=MODEL_PATH,
trust_remote_code=True,
model_kwargs={'default_task': 'text-matching'}
).to(DEVICE)
tokenizer = AutoTokenizer.from_pretrained(MODEL_CARD, cache_dir=MODEL_PATH)
model.tokenizer = tokenizer
loss = TripletLoss(model=model)
args = SentenceTransformerTrainingArguments(
output_dir="models",
num_train_epochs=10,
learning_rate=2e-5,
warmup_ratio=0.1,
fp16=True,
bf16=False,
batch_sampler = BatchSamplers.NO_DUPLICATES,
eval_strategy="steps",
eval_steps=10,
save_strategy="steps",
save_steps=20,
logging_steps=10,
)
With the code above, I initialized the model and ran the following evaluator, which worked fine:
dev_evaluator = TripletEvaluator(
anchors = eval_dataset["anchor"], # list[str]
positives = eval_dataset["positive"], # list[str]
negatives = eval_dataset["negative"], # list[str]
name = "daeun-triplet-dev",
# task='text-matching'
)
dev_evaluator(model)
> {'daeun-triplet-dev_cosine_accuracy': 0.9617486596107483}
However, when I started the training process, I encountered a "No embeddings were generated" error:
trainer = SentenceTransformerTrainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
loss=loss,
evaluator=dev_evaluator,
)
trainer.train()
error :
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[13], line 1
----> 1 trainer.train()
File ~/2025_dev/dev_repo/loan_recommend/daeun_loan/lib/python3.11/site-packages/transformers/trainer.py:2240, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
2238 hf_hub_utils.enable_progress_bars()
2239 else:
-> 2240 return inner_training_loop(
2241 args=args,
2242 resume_from_checkpoint=resume_from_checkpoint,
2243 trial=trial,
2244 ignore_keys_for_eval=ignore_keys_for_eval,
2245 )
File ~/2025_dev/dev_repo/loan_recommend/daeun_loan/lib/python3.11/site-packages/transformers/trainer.py:2555, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
2548 context = (
2549 functools.partial(self.accelerator.no_sync, model=model)
2550 if i != len(batch_samples) - 1
2551 and self.accelerator.distributed_type != DistributedType.DEEPSPEED
2552 else contextlib.nullcontext
2553 )
2554 with context():
-> 2555 tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
2557 if (
...
--> 175 raise RuntimeError("No embeddings were generated")
177 all_embeddings.sort(key=lambda x: x[0]) # sort by original index
178 combined_embeddings = torch.stack([emb for _, emb in all_embeddings])
RuntimeError: No embeddings were generated
I'm trying to perform fine-tuning — how should I proceed from here?