'BeamSearchParams' object has no attribute 'logprobs'
I met an error when inferring. If I use BeamSearchParams
, it will report an error:
Error processing translation batch: 'BeamSearchParams' object has no attribute 'logprobs'
If I switch to SamplingParams
, it works.
My vllm version is the latest 0.92.
Hi, could you share your inference code and environment info for us to reproduce this error?
Hi, here is my code and environment info.
test_seed-x.py
import os
import sys
import argparse
from vllm import LLM
from vllm.sampling_params import BeamSearchParams
def parse_args():
parser = argparse.ArgumentParser(description="Test Seed-X")
parser.add_argument(
"--model",
type=str,
required=True,
help="Model path or ID",
)
parser.add_argument(
"--tensor_parallel_size",
type=int,
default=1,
)
parser.add_argument(
"--max_num_seqs",
type=int,
default=4,
)
parser.add_argument(
"--dtype",
type=str,
default="auto",
)
parser.add_argument(
"--max_model_len",
type=int,
default=131072,
help="Maximum model context length",
)
parser.add_argument(
"--max_input_length",
type=int,
default=2048,
help="Maximum input text length (in tokens) to process",
)
parser.add_argument(
"--temperature",
type=float,
default=0.6,
help="Sampling temperature for text generation",
)
parser.add_argument(
"--max_tokens",
type=int,
default=2048,
help="Maximum tokens to generate for each response",
)
return parser.parse_args()
def main():
args = parse_args()
print(args)
decoding_params = BeamSearchParams(
beam_width=4,
max_tokens=args.max_tokens,
)
llm = LLM(
model=args.model,
tensor_parallel_size=args.tensor_parallel_size,
max_num_seqs=args.max_num_seqs,
dtype=args.dtype,
trust_remote_code=True,
max_model_len=args.max_model_len,
)
messages = [
"Translate the following English sentence into Chinese:\nMay the force be with you <zh>",
]
results = llm.generate(messages, decoding_params)
print(results)
if __name__ == "__main__":
main()
test_seed-x.sh:
python ./test_seed-x.py \
--model "../models/Seed-X-PPO-7B" \
--tensor_parallel_size 1 \
--max_num_seqs 1 \
--dtype 'auto' \
--max_model_len 4096 \
--max_input_length 2048 \
--max_tokens 2048 \
--temperature 0.0
My environment info:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM-64GB On | 00000000:1D:00.0 Off | 0 |
| N/A 43C P0 62W / 473W | 6MiB / 65536MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
=== Python Info ===
Python Version: 3.11.6 (main, Feb 6 2024, 18:28:10) [GCC 8.5.0 20210514 (Red Hat 8.5.0-16)]
Python Executable: /leonardo_scratch/fast/AIFAC_L01_028/zihao/env/vllm_env/bin/python
Platform: Linux-4.18.0-477.27.1.el8_8.x86_64-x86_64-with-glibc2.28
System: Linux
Machine: x86_64
Processor: x86_64
=== Installed Packages ===
aiohappyeyeballs==2.6.1
aiohttp==3.12.13
aiosignal==1.3.2
airportsdata==20250622
annotated-types==0.7.0
anyio==4.9.0
astor==0.8.1
attrs==25.3.0
blake3==1.0.5
cachetools==5.5.2
certifi==2025.6.15
charset-normalizer==3.4.2
click==8.2.1
cloudpickle==3.1.1
colorama==0.4.6
compressed-tensors==0.10.2
cupy-cuda12x==13.4.1
datasets==3.6.0
depyf==0.18.0
dill==0.3.8
diskcache==5.6.3
distro==1.9.0
dnspython==2.7.0
einops==0.8.1
email-validator==2.2.0
fastapi==0.115.14
fastapi-cli==0.0.7
fastrlock==0.8.3
filelock==3.18.0
frozenlist==1.7.0
fsspec==2025.3.0
gguf==0.17.1
google-auth==2.40.3
google-auth-oauthlib==1.2.2
googleapis-common-protos==1.70.0
grpcio==1.73.1
gspread==6.2.1
h11==0.16.0
hf-xet==1.1.5
httpcore==1.0.9
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.33.1
idna==3.10
importlib-metadata==8.7.0
interegular==0.3.3
jinja2==3.1.6
jiter==0.10.0
jsonschema==4.24.0
jsonschema-specifications==2025.4.1
lark==1.2.2
llguidance==0.7.30
llvmlite==0.44.0
lm-format-enforcer==0.10.11
lxml==6.0.0
markdown-it-py==3.0.0
MarkupSafe==3.0.2
mdurl==0.1.2
mistral-common==1.8.1
mpmath==1.3.0
msgpack==1.1.1
msgspec==0.19.0
multidict==6.6.3
multiprocess==0.70.16
nest-asyncio==1.6.0
networkx==3.5
ninja==1.11.1.4
numba==0.61.2
numpy==2.2.6
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
oauthlib==3.3.1
openai==1.90.0
opencv-python-headless==4.11.0.86
opentelemetry-api==1.34.1
opentelemetry-exporter-otlp==1.34.1
opentelemetry-exporter-otlp-proto-common==1.34.1
opentelemetry-exporter-otlp-proto-grpc==1.34.1
opentelemetry-exporter-otlp-proto-http==1.34.1
opentelemetry-proto==1.34.1
opentelemetry-sdk==1.34.1
opentelemetry-semantic-conventions==0.55b1
opentelemetry-semantic-conventions-ai==0.4.9
outlines==0.1.11
outlines-core==0.1.26
packaging==25.0
pandas==2.3.0
partial-json-parser==0.2.1.1.post6
pillow==11.3.0
pip==23.2.1
portalocker==3.2.0
prometheus-client==0.22.1
prometheus-fastapi-instrumentator==7.1.0
propcache==0.3.2
protobuf==5.29.5
psutil==7.0.0
py-cpuinfo==9.0.0
pyarrow==20.0.0
pyasn1==0.6.1
pyasn1-modules==0.4.2
pybase64==1.4.1
pycountry==24.6.1
pydantic==2.11.7
pydantic-core==2.33.2
pydantic-extra-types==2.10.5
pygments==2.19.2
python-dateutil==2.9.0.post0
python-dotenv==1.1.1
python-json-logger==3.3.0
python-multipart==0.0.20
pytz==2025.2
PyYAML==6.0.2
pyzmq==27.0.0
ray==2.47.1
referencing==0.36.2
regex==2024.11.6
requests==2.32.4
requests-oauthlib==2.0.0
rich==14.0.0
rich-toolkit==0.14.8
rpds-py==0.26.0
rsa==4.9.1
sacrebleu==2.5.1
safetensors==0.5.3
scipy==1.16.0
sentencepiece==0.2.0
setuptools==65.5.0
shellingham==1.5.4
six==1.17.0
sniffio==1.3.1
starlette==0.46.2
sympy==1.14.0
tabulate==0.9.0
tiktoken==0.9.0
tokenizers==0.21.2
torch==2.7.0
torchaudio==2.7.0
torchvision==0.22.0
tqdm==4.67.1
transformers==4.53.2
triton==3.3.0
typer==0.16.0
typing-extensions==4.14.0
typing-inspection==0.4.1
tzdata==2025.2
urllib3==2.5.0
uvicorn==0.35.0
uvloop==0.21.0
vllm==0.9.2
watchfiles==1.1.0
websockets==15.0.1
xformers==0.0.30
xgrammar==0.1.19
xxhash==3.5.0
yarl==1.20.1
zipp==3.23.0
=== CUDA Info ===
PyTorch CUDA Available: True
CUDA Version: 12.6
cuDNN Version: 90501
Number of GPUs: 1
GPU 0: NVIDIA A100-SXM-64GB
NVIDIA-SMI output:
=== CPU Info ===
Physical cores: 32
Total cores: 32
Total RAM: 502.91 GB
Hello, any update about this issue? Is it a problem with the vllm version?
Use older vLLM or use SGLang should fix this problem.
P.S. latest vLLM (v0.10.0
) have drastically changed the API due to V0 to V1 migration, which in turn broke the tokenizer of this model.
@Zihao-Li
,
@yolozyk
Guys, sorry for the wait. To clarify, there's a slight difference in the required input format between greedy decoding and beam search.
For beam search, each prompt in the list must be wrapped in a dictionary with a "prompt" key, as shown below. This should resolve the issue.
prompts = messages
messages = [{"prompt": prompt} for prompt in prompts]
results = llm.beam_search(messages, decoding_params)