DeepSeek-V3.1-int4-AutoRound / README.md

n1ck-guo

Update README.md

2f78351 verified about 1 month ago

preview code

raw

history blame

7.62 kB

metadata

base_model:
  - deepseek-ai/DeepSeek-V3.1
pipeline_tag: text-generation

Model Details

This model is a int4 model with group_size 128 and symmetric quantization of deepseek-ai/DeepSeek-V3.1 generated by intel/auto-round. Please follow the license of the original model.

How To Use

INT4 Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers
import torch
quantized_model_dir = "Intel/DeepSeek-V3.1-int4-AutoRound"

model = AutoModelForCausalLM.from_pretrained(
        quantized_model_dir,
        torch_dtype=torch.bfloat16,
        device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
prompts = [
        "9.11和9.8哪个数字大",
        "strawberry中有几个r?",
        "There is a girl who likes adventure,",
        "Please give a brief introduction of DeepSeek company.",
        ]

texts=[]
for prompt in prompts:
    messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
            )
    texts.append(text)
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

outputs = model.generate(
        input_ids=inputs["input_ids"].to(model.device),
        attention_mask=inputs["attention_mask"].to(model.device),
        max_length=200, ##change this to align with the official usage
        num_return_sequences=1,
        do_sample=False  ##change this to align with the official usage
        )
generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs["input_ids"], outputs)
        ]
decoded_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

for i, prompt in enumerate(prompts):
    input_id = inputs
    print(f"Prompt: {prompt}")
    print(f"Generated: {decoded_outputs[i]}")
"""
Prompt: 9.11和9.8哪个数字大
Generated: 9.11 和 9.8 相比，**9.11 更大**。
- 9.11 可以理解为 9.11
- 9.8 可以理解为 9.80
比较小数点后第二位：1（来自9.11）大于 0（来自9.80），因此 9.11 > 9.8。
--------------------------------------------------
Prompt: strawberry中有几个r?
Generated: 在英文单词 "strawberry" 中，字母 "r" 出现了 **3 次**。
- 位置：第 3 个字母（s**t r**awberry）、第 6 个字母（stra**w b**erry 中的 "r" 实际是第 6 个字符，但注意 "w" 后是 "b"，这里需要仔细数）
实际上：
- 分解：s-t-r-a-w-b-e-r-r-y
- 字母 "r" 出现在第 3、第 8 和第 9 位（索引从 1 开始）。

所以，**"strawberry" 包含 3 个 "r"**。
--------------------------------------------------
Prompt: There is a girl who likes adventure,
Generated: Of course! Here are a few ways to imagine what that could look like, from a simple story to a character profile.

### A Short Story Snippet

The map was old, the edges frayed and the ink faded in places. Ella traced the route with her finger for the hundredth time, her heart beating a rhythm of pure excitement. It wasn't just a path to a hidden waterfall; it was a path to *discovery*.

She packed her bag not with fancy clothes, but with a well-worn compass, a rope, a water bottle, and her trusted journal. The forest welcomed her with the smell of damp earth and pine. Every rustle in the undergrowth was a mystery, every unfamiliar bird call a secret she was determined to learn.

As she reached the cliff face she needed to climb, a thrill, not fear, shot through her. She
--------------------------------------------------
Prompt: Please give a brief introduction of DeepSeek company.
Generated: Of course. Here is a brief introduction to DeepSeek.

**DeepSeek** is a leading Chinese AI research company focused on developing powerful artificial intelligence models, with a primary emphasis on large language models (LLMs) and multimodal systems.

Here are the key points about the company:

*   **Core Focus:** They are best known for their **DeepSeek-V2** and the more recent **DeepSeek-V3** models, which are highly capable LLMs that compete with other top-tier models like GPT-4. They specialize in both closed and open-source AI.
*   **Open-Source Contribution:** DeepSeak has made significant contributions to the open-source community. They have released powerful models like **DeepSeek-Coder** (focused on code generation and programming tasks) and the weights for earlier versions of their LLMs, allowing developers and researchers worldwide
--------------------------------------------------
"""

### Generate the model

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers

model_name = "deepseek-ai/DeepSeek-V3.1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=False, torch_dtype="auto")

block = model.model.layers
device_map = {}

for n, m in block.named_modules():
    if isinstance(m, (torch.nn.Linear, transformers.modeling_utils.Conv1D)):
        if "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) < 63:
            device = "cuda:1"
        elif "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) >= 63 and int(
                n.split('.')[-2]) < 128:
            device = "cuda:2"
        elif "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) >= 128 and int(
                n.split('.')[-2]) < 192:
            device = "cuda:3"
        elif "experts" in n and ("shared_experts" not in n) and int(
                n.split('.')[-2]) >= 192:
            device = "cuda:4"
        else:
            device = "cuda:0"
        n = n[2:]

        device_map.update({n: device})


from auto_round import AutoRound

autoround = AutoRound(model=model, tokenizer=tokenizer, device_map=device_map, nsamples=512,
                      batch_size=4, low_gpu_mem_usage=True, seqlen=2048,
                      )
autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github