Can't run on HF Inference Endpoints
#2
by
barisyildirim
- opened
When I try to deploy this on HF Inference Endpoints, I get the following error:
[Server message]Endpoint failed to start
Exit code: 139. Reason: {"timestamp":"2025-04-11T02:29:07.363925Z","level":"INFO","message":"Args { model_id: \"/rep****ory\", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: \"r-barisyildirim-nomic-embed-code-sgs-equryfr1-0e6b6-ijg5k\", port: 80, uds_path: \"/tmp/text-embeddings-inference-server\", huggingface_hub_cache: Some(\"/repository/cache\"), payload_limit: 15000000, api_key: None, json_output: true, otlp_endpoint: None, otlp_service_name: \"text-embeddings-inference.server\", cors_allow_origin: None }","target":"text_embeddings_router","filename":"router/src/main.rs","line_number":175}
thread 'main' panicked at /usr/src/router/src/lib.rs:134:62:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum ModelWrapper", line: 757444, column: 1)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
I have no idea why this happens -- and trying different setups or messing with payload_limit
didn't work.
I am encountering the same issue! I am unsure if it is a problem with the Qwen2Tokenizer not being initialized/automatically set as the fast version, I do notice that there is no tokenizer key or automap to set autotokenizer config to the tokenizer_config.json, is there a workaround for this?