Instructions to use prithivMLmods/Raptor-X3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Raptor-X3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/Raptor-X3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Raptor-X3")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/Raptor-X3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/Raptor-X3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Raptor-X3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Raptor-X3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Raptor-X3

SGLang

How to use prithivMLmods/Raptor-X3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Raptor-X3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Raptor-X3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Raptor-X3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Raptor-X3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Raptor-X3 with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Raptor-X3
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Raptor X3

Raptor X3 is based on the Qwen 2.5 14B modality architecture, designed to enhance the reasoning capabilities of 14B-parameter models. This model is optimized for advanced coding reasoning and UI coding. It excels in contextual understanding, logical deduction, and multi-step problem-solving. Raptor X3 has been fine-tuned using a long chain-of-thought reasoning model and specialized datasets to improve comprehension, structured responses, and conversational intelligence.

Key improvements include:

Enhanced Coding Reasoning: Provides in-depth explanations and optimizations for complex coding problems, making it useful for developers and engineers.
Advanced UI Coding Support: Excels in generating and refining front-end code for web and mobile applications.
General-Purpose Coding: Capable of generating, debugging, and optimizing code across multiple programming languages, supporting software development and automation.
Long-Context Support: Supports up to 128K tokens for input context and can generate up to 8K tokens in a single output, making it ideal for detailed responses.
Multilingual Proficiency: Supports over 29 languages, including English, Chinese, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.

Prompt Style :

Make a dark-themed minimalist dashboard for an oil rig.

[HTML, CSS, and more if required].

Quickstart with transformers

Here is a code snippet with apply_chat_template to show you how to load the tokenizer and model and generate content:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Raptor-X3"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "How do I optimize React performance?"
messages = [
    {"role": "system", "content": "You are a helpful assistant capable of answering a wide range of questions."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Intended Use

Coding Reasoning:
Designed for providing explanations, optimizations, and best practices for coding problems.
UI Coding and Development:
Excels in front-end development, including React, Vue, and other UI frameworks.
Programming and Software Development:
Capable of generating, analyzing, and optimizing code in multiple programming languages.
Educational Assistance:
Helps developers by providing coding tutorials, debugging assistance, and structured learning material.
Multilingual Applications:
Supports global communication, translations, and multilingual content generation.
Long-Form Content Generation:
Can generate extended responses, including documentation, technical reports, and coding guides.

Limitations

Hardware Requirements:
Requires high-memory GPUs or TPUs due to its large parameter size and long-context support.
Potential Bias in Responses:
While designed to be neutral, outputs may still reflect biases present in training data.
Complexity in Some Advanced Topics:
While proficient in general coding, highly specialized fields may require verification.
Limited Real-World Awareness:
Does not have access to real-time events beyond its training cutoff.
Error Propagation in Extended Outputs:
Minor errors in early responses may affect overall coherence in long-form outputs.
Prompt Sensitivity:
The effectiveness of responses may depend on how well the input prompt is structured.