linzhao-amd's picture
Update README.md
8f9a19f verified
metadata
license: mit
base_model:
  - deepseek-ai/DeepSeek-R1-0528

Model Overview

  • Model Architecture: DeepSeek-R1-0528
    • Input: Text
    • Output: Text
  • Supported Hardware Microarchitecture: AMD MI350/MI355
  • ROCm: 7.0
  • Operating System(s): Linux
  • Inference Engine: SGLang/vLLM
  • Model Optimizer: AMD-Quark
    • Weight quantization: OCP MXFP4, Static
    • Activation quantization: OCP MXFP4, Dynamic
  • Calibration Dataset: Pile

This model was built with deepseek-ai DeepSeek-R1-0528 model by applying AMD-Quark for MXFP4 quantization.

Model Quantization

The model was quantized from deepseek-ai/DeepSeek-R1-0528 using AMD-Quark. Both weights and activations were quantized to MXFP4 format, and the AutoSmoothQuant algorithm was applied to enhance accuracy.

Preprocessing requirement:

Before executing the quantization script below, the original FP8 model must first be dequantized to BFloat16. You can either perform the dequantization manually using this conversion script, or use the pre-converted BFloat16 model available at unsloth/DeepSeek-R1-0528-BF16.

Quantization scripts:

cd Quark/examples/torch/language_modeling/llm_ptq/
exclude_layers="*self_attn* *mlp.gate.* *lm_head"
python3 quantize_quark.py --model_dir $MODEL_DIR \
                          --quant_scheme w_mxfp4_a_mxfp4 \
                          --num_calib_data 128 \
                          --exclude_layers $exclude_layers \
                          --multi_gpu \
                          --quant_algo autosmoothquant \
                          --model_export hf_format \
                          --output_dir amd/DeepSeek-R1-0528-MXFP4-ASQ

Deployment

This model can be deployed efficiently using the SGLang and vLLM backends.

Evaluation

The model was evaluated on AIME24, GPQA Diamond, MATH-500, and GSM8K benchmarks. The tasks of AIME24, GPQA Diamond, and MATH-500 were conducted using lighteval with 10 rounds for different generation seeds. GSM8K was conducted using lm-eval-harness.

Accuracy

Benchmark DeepSeek-R1-0528 DeepSeek-R1-0528-MXFP4-ASQ(this model) Recovery
AIME24 88.00 87.67 99.62%
GPQA Diamond 79.90 79.65 99.69%
MATH-500 97.06 96.90 99.84%
GSM8K 95.30 95.18 99.87%

Reproduction

The results of AIME24, MATH-500 and GPQA Diamond were obtained using forked lighteval and vLLM docker (emulation qdq) rocm/vllm-private:pytorch-vllm-gfx950-mxfp4-mxfp6-v3.

# Set docker env
export VLLM_QUARK_F4F6_OFFLINE_DEQUANT_TMPENVVAR=1

# Set model args
MODEL_ARGS="model_name=amd/DeepSeek-R1-0528-MXFP4-ASQ,dtype=bfloat16,tensor_parallel_size=8,max_model_length=71536,max_num_batched_tokens=32768,gpu_memory_utilization=0.85,generation_parameters={max_new_tokens:65536,temperature:0.6,top_p:0.95,seed:$SEED}"
OUTPUT_DIR="results/DeepSeek-R1-0528-MXFP4-ASQ-Seed"
LOG="logs/deepseek_0528_maxfp4.log"

# Evaluating 10 rounds 
for i in $(seq 1 10); do
    # seed in [0, 2**30 - 1]
    SEED=$(shuf -i 0-1073741823 -n 1)

    lighteval vllm $MODEL_ARGS "custom|aime24_single|0|0,custom|math_500_single|0|0,custom|gpqa:diamond_single|0|0" \
        --use-chat-template \
        --output-dir "$OUTPUT_DIR/seed_$SEED" \
        2>&1 | tee -a "$LOG"

The result of GSM8K was obtained using lm-eval-harness and SGLang docker.

# Launching server
SGLANG_USE_AITER=1 python -m sglang.launch_server \
    --model-path $MODEL_DIR \
    --tp 8 \
    --port 8000 \
    --attention-backend aiter

# Evaluating
MODEL_ARGS="model=amd/DeepSeek-R1-0528-MXFP4-ASQ,base_url=http://localhost:8000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=38768,temperature=0.6,top_p=0.95,add_bos_token=True,seed=$SEED"
lm_eval \
    --model local-completions \
    --model_args $MODEL_ARGS \
    --tasks gsm8k \
    --num_fewshot 8 \
    --batch_size auto

License

Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.