Translation
Safetensors
mistral

'BeamSearchParams' object has no attribute 'logprobs'

#8
by Zihao-Li - opened

I met an error when inferring. If I use BeamSearchParams, it will report an error:

Error processing translation batch: 'BeamSearchParams' object has no attribute 'logprobs'

If I switch to SamplingParams, it works.

My vllm version is the latest 0.92.

Hi, could you share your inference code and environment info for us to reproduce this error?

Hi, here is my code and environment info.

test_seed-x.py

import os
import sys
import argparse
from vllm import LLM
from vllm.sampling_params import BeamSearchParams

def parse_args():
    parser = argparse.ArgumentParser(description="Test Seed-X")
    parser.add_argument(
        "--model",
        type=str,
        required=True,
        help="Model path or ID",
    )
    parser.add_argument(
        "--tensor_parallel_size",
        type=int,
        default=1,
    )
    parser.add_argument(
        "--max_num_seqs",
        type=int,
        default=4,
    )
    parser.add_argument(
        "--dtype",
        type=str,
        default="auto",
    )
    parser.add_argument(
        "--max_model_len",
        type=int,
        default=131072,
        help="Maximum model context length",
    )
    parser.add_argument(
        "--max_input_length",
        type=int,
        default=2048,
        help="Maximum input text length (in tokens) to process",
    )
    parser.add_argument(
        "--temperature",
        type=float,
        default=0.6,
        help="Sampling temperature for text generation",
    )
    parser.add_argument(
        "--max_tokens",
        type=int,
        default=2048,
        help="Maximum tokens to generate for each response",
    )
    return parser.parse_args()


def main():
    args = parse_args()
    print(args)

    decoding_params = BeamSearchParams(
        beam_width=4,
        max_tokens=args.max_tokens,
    )

    llm = LLM(
        model=args.model,
        tensor_parallel_size=args.tensor_parallel_size,
        max_num_seqs=args.max_num_seqs,
        dtype=args.dtype,
        trust_remote_code=True,
        max_model_len=args.max_model_len,
    )

    messages = [
        "Translate the following English sentence into Chinese:\nMay the force be with you <zh>",
    ]

    results = llm.generate(messages, decoding_params)
    print(results)


if __name__ == "__main__":
    main()

test_seed-x.sh:

python ./test_seed-x.py \
    --model "../models/Seed-X-PPO-7B" \
    --tensor_parallel_size 1 \
    --max_num_seqs 1 \
    --dtype 'auto' \
    --max_model_len 4096 \
    --max_input_length 2048 \
    --max_tokens 2048 \
    --temperature 0.0

My environment info:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM-64GB           On  | 00000000:1D:00.0 Off |                    0 |
| N/A   43C    P0              62W / 473W |      6MiB / 65536MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
=== Python Info ===
Python Version: 3.11.6 (main, Feb  6 2024, 18:28:10) [GCC 8.5.0 20210514 (Red Hat 8.5.0-16)]
Python Executable: /leonardo_scratch/fast/AIFAC_L01_028/zihao/env/vllm_env/bin/python
Platform: Linux-4.18.0-477.27.1.el8_8.x86_64-x86_64-with-glibc2.28
System: Linux
Machine: x86_64
Processor: x86_64

=== Installed Packages ===
aiohappyeyeballs==2.6.1
aiohttp==3.12.13
aiosignal==1.3.2
airportsdata==20250622
annotated-types==0.7.0
anyio==4.9.0
astor==0.8.1
attrs==25.3.0
blake3==1.0.5
cachetools==5.5.2
certifi==2025.6.15
charset-normalizer==3.4.2
click==8.2.1
cloudpickle==3.1.1
colorama==0.4.6
compressed-tensors==0.10.2
cupy-cuda12x==13.4.1
datasets==3.6.0
depyf==0.18.0
dill==0.3.8
diskcache==5.6.3
distro==1.9.0
dnspython==2.7.0
einops==0.8.1
email-validator==2.2.0
fastapi==0.115.14
fastapi-cli==0.0.7
fastrlock==0.8.3
filelock==3.18.0
frozenlist==1.7.0
fsspec==2025.3.0
gguf==0.17.1
google-auth==2.40.3
google-auth-oauthlib==1.2.2
googleapis-common-protos==1.70.0
grpcio==1.73.1
gspread==6.2.1
h11==0.16.0
hf-xet==1.1.5
httpcore==1.0.9
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.33.1
idna==3.10
importlib-metadata==8.7.0
interegular==0.3.3
jinja2==3.1.6
jiter==0.10.0
jsonschema==4.24.0
jsonschema-specifications==2025.4.1
lark==1.2.2
llguidance==0.7.30
llvmlite==0.44.0
lm-format-enforcer==0.10.11
lxml==6.0.0
markdown-it-py==3.0.0
MarkupSafe==3.0.2
mdurl==0.1.2
mistral-common==1.8.1
mpmath==1.3.0
msgpack==1.1.1
msgspec==0.19.0
multidict==6.6.3
multiprocess==0.70.16
nest-asyncio==1.6.0
networkx==3.5
ninja==1.11.1.4
numba==0.61.2
numpy==2.2.6
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
oauthlib==3.3.1
openai==1.90.0
opencv-python-headless==4.11.0.86
opentelemetry-api==1.34.1
opentelemetry-exporter-otlp==1.34.1
opentelemetry-exporter-otlp-proto-common==1.34.1
opentelemetry-exporter-otlp-proto-grpc==1.34.1
opentelemetry-exporter-otlp-proto-http==1.34.1
opentelemetry-proto==1.34.1
opentelemetry-sdk==1.34.1
opentelemetry-semantic-conventions==0.55b1
opentelemetry-semantic-conventions-ai==0.4.9
outlines==0.1.11
outlines-core==0.1.26
packaging==25.0
pandas==2.3.0
partial-json-parser==0.2.1.1.post6
pillow==11.3.0
pip==23.2.1
portalocker==3.2.0
prometheus-client==0.22.1
prometheus-fastapi-instrumentator==7.1.0
propcache==0.3.2
protobuf==5.29.5
psutil==7.0.0
py-cpuinfo==9.0.0
pyarrow==20.0.0
pyasn1==0.6.1
pyasn1-modules==0.4.2
pybase64==1.4.1
pycountry==24.6.1
pydantic==2.11.7
pydantic-core==2.33.2
pydantic-extra-types==2.10.5
pygments==2.19.2
python-dateutil==2.9.0.post0
python-dotenv==1.1.1
python-json-logger==3.3.0
python-multipart==0.0.20
pytz==2025.2
PyYAML==6.0.2
pyzmq==27.0.0
ray==2.47.1
referencing==0.36.2
regex==2024.11.6
requests==2.32.4
requests-oauthlib==2.0.0
rich==14.0.0
rich-toolkit==0.14.8
rpds-py==0.26.0
rsa==4.9.1
sacrebleu==2.5.1
safetensors==0.5.3
scipy==1.16.0
sentencepiece==0.2.0
setuptools==65.5.0
shellingham==1.5.4
six==1.17.0
sniffio==1.3.1
starlette==0.46.2
sympy==1.14.0
tabulate==0.9.0
tiktoken==0.9.0
tokenizers==0.21.2
torch==2.7.0
torchaudio==2.7.0
torchvision==0.22.0
tqdm==4.67.1
transformers==4.53.2
triton==3.3.0
typer==0.16.0
typing-extensions==4.14.0
typing-inspection==0.4.1
tzdata==2025.2
urllib3==2.5.0
uvicorn==0.35.0
uvloop==0.21.0
vllm==0.9.2
watchfiles==1.1.0
websockets==15.0.1
xformers==0.0.30
xgrammar==0.1.19
xxhash==3.5.0
yarl==1.20.1
zipp==3.23.0

=== CUDA Info ===
PyTorch CUDA Available: True
CUDA Version: 12.6
cuDNN Version: 90501
Number of GPUs: 1
GPU 0: NVIDIA A100-SXM-64GB

NVIDIA-SMI output:

=== CPU Info ===
Physical cores: 32
Total cores: 32
Total RAM: 502.91 GB

Hello, any update about this issue? Is it a problem with the vllm version?

ByteDance Seed org

@yolozyk Hi~we recommend vllm==0.8.0, transformers==4.51.3.

Use older vLLM or use SGLang should fix this problem.

P.S. latest vLLM (v0.10.0) have drastically changed the API due to V0 to V1 migration, which in turn broke the tokenizer of this model.

@Zihao-Li , @yolozyk
Guys, sorry for the wait. To clarify, there's a slight difference in the required input format between greedy decoding and beam search.
For beam search, each prompt in the list must be wrapped in a dictionary with a "prompt" key, as shown below. This should resolve the issue.

prompts = messages
messages = [{"prompt": prompt} for prompt in prompts]
results = llm.beam_search(messages, decoding_params)

Sign up or log in to comment