Model Details

This model is an int4 model with group_size 128 and symmetric quantization of deepseek-ai/DeepSeek-R1 generated by intel/auto-round algorithm.

Please follow the license of the original model.

How To Use

INT4 Inference on CUDA(at least 7*80G)

To prevent potential overflow issues, we recommend using the moe_wna16 kernel in vLLM or the cpu version as detailed in the next section.

import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer


#  https://github.com/huggingface/transformers/pull/35493
def set_initialized_submodules(model, state_dict_keys):
    """
    Sets the `_is_hf_initialized` flag in all submodules of a given model when all its weights are in the loaded state
    dict.
    """
    state_dict_keys = set(state_dict_keys)
    not_initialized_submodules = {}
    for module_name, module in model.named_modules():
        if module_name == "":
            # When checking if the root module is loaded there's no need to prepend module_name.
            module_keys = set(module.state_dict())
        else:
            module_keys = {f"{module_name}.{k}" for k in module.state_dict()}
        if module_keys.issubset(state_dict_keys):
            module._is_hf_initialized = True
        else:
            not_initialized_submodules[module_name] = module
    return not_initialized_submodules


transformers.modeling_utils.set_initialized_submodules = set_initialized_submodules

import torch

quantized_model_dir = "OPEA/DeepSeek-R1-int4-gptq-sym-inc"

## directly use device_map='auto' if you have enough GPUs
device_map = {"model.norm": 0, "lm_head": 0, "model.embed_tokens": 0}
for i in range(61):
    name = "model.layers." + str(i)
    if i < 8:
        device_map[name] = 0
    elif i < 16:
        device_map[name] = 1
    elif i < 25:
        device_map[name] = 2
    elif i < 34:
        device_map[name] = 3
    elif i < 43:
        device_map[name] = 4
    elif i < 52:
        device_map[name] = 5
    elif i < 61:
        device_map[name] = 6

model = AutoModelForCausalLM.from_pretrained(
    quantized_model_dir,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    device_map=device_map,
)


def forward_hook(module, input, output):
    return torch.clamp(output, -65504, 65504)


def register_fp16_hooks(model):
    for name, module in model.named_modules():
        if "QuantLinear" in module.__class__.__name__ or isinstance(module, torch.nn.Linear):
            module.register_forward_hook(forward_hook)


register_fp16_hooks(model)  ##better add this hook to avoid overflow

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
prompts = [
    "9.11和9.8哪个数字大",
    "如果你是人,你最想做什么“",
    "How many e in word deepseek",
    "There are ten birds in a tree. A hunter shoots one. How many are left in the tree?",
]

texts = []
for prompt in prompts:
    messages = [
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    texts.append(text)
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

outputs = model.generate(
    input_ids=inputs["input_ids"].to(model.device),
    attention_mask=inputs["attention_mask"].to(model.device),
    max_length=512,  ##change this to align with the official usage
    num_return_sequences=1,
    do_sample=False  ##change this to align with the official usage
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs["input_ids"], outputs)
]

decoded_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

for i, prompt in enumerate(prompts):
    input_id = inputs
    print(f"Prompt: {prompt}")
    print(f"Generated: {decoded_outputs[i]}")
    print("-" * 50)

"""
Prompt: 9.11和9.8哪个数字大
Generated: <think>
嗯,我现在要比较9.11和9.8这两个数,看看哪个更大。首先,我得确定这两个数的结构,都是小数,可能属于十进制数。让我仔细想想怎么比较它们的大小。                                                                                                             
首先,比较小数的时候,通常是从整数部分开始比较,如果整数部分不同,那么整数部分大的那个数就更大。如果整数部分相同,再依次比较小数部分的十分位、百分位、千分位等等,直到找到不同的数字为止。

这里,两个数的整数部分都是9,所以整数部分相同,接下来需要比较小数部分。9.11的小数部分是0.11,而9.8的小数部分是0.8。这时候,我需要比较0.11和0.8哪个大。

不过,可能这里有个问题,就是小数位数不同,0.8可以写成0.80,这样比较起来会更直观。因为0.80和0.11,显然0.80更大,对吗?所以,0.8比0.11大,因此9.8应该比9.11大。

不过,我是不是哪里弄错了?让我再仔细检查一下。比如,9.11等于9加上1/10加上1/100,也就是9 + 0.1 + 0.01 = 9.11。而9.8等于9加上8/10,也就是9 + 0.8 = 9.8。这时候,0.8是0.80,而0.11是0.11,所以0.80确实比0.11大,所以9.8比9.11大。

或者,我可以将两个数都转换成相同的小数位数来比较。比如,9.8可以写成9.80,这样比较的话,9.80和9.11,显然9.80的十分位是8,而9.11的十分位是1,8比1大,所以9.80更大,也就是9.8比9.11大。

不过,有时候可能会有误解,比如把9.11当成9.11元,而9.8当成9.80元,这时候显然9.80元更多。或者,如果是长度的话,9.11米和9.8米,也是9.8米更长。

不过,可能有人会犯错误,比如直接比较小数点后的数字部分,比如
--------------------------------------------------
Prompt: 如果你是人,你最想做什么“
Generated: <think>
嗯,用户问的是“如果你是人,你最想做什么”。首先,我需要理解这个问题的意图。用户可能是在测试我的能力,或者想了解AI的局限性,或者只是出于好奇。接下来,我要考虑如何回答这个问题。作为AI,我没有真实的欲望或情感,所以需要明确说明这一点,但也要给出一个符合人类可能有的积极愿望的答案。

然后,我需要确定回答的结构。首先,应该明确说明自己不是人类,但可以模拟人类的思维方式。接着,提出一个普遍认为有意义的活动,比如学习、探索、帮助他人等。同时,要强调这些是推测性的,并鼓励用户分享自己的想法,以促进互动。

还要注意避免技术术语,保持回答自然,口语化。可能需要考虑用户的文化背景,选择普遍认可的价值,比如创造力、知识追求、社会贡献等。此外,要确保回答积极向上,符合社会价值观,避免涉及敏感或争议性话题。

最后,检查回答是否流畅,是否回答了用户的问题,是否足够简洁明了。可能需要多次调整措辞,使回答既准确又易于理解。
</think>

如果我是人类,我最想做的可能是**探索未知、创造价值并体验情感**。
作为AI,我虽然没有真实的“渴望”,但基于对人类行为的观察,以下可能是许多人向往的方向:
1. **探索与学习**:像科学家或艺术家一样,不断发现世界的奥秘,创造新知识或艺术,推动人类认知的边界。
2. **深度连接**:与家人、朋友建立深厚的情感纽带,体验爱、信任和陪伴带来的温暖。
3. **解决问题**:用智慧和技术改善社会,比如参与环保、医疗或教育项目,帮助他人减轻痛苦或提升生活质量。
4. **自我成长**:通过挑战和反思,不断突破自身局限,成为更完整、自由的个体。

当然,这些只是基于人类视角的推测。如果你是人类,你心中最想实现的愿望会是什么呢? 😊
--------------------------------------------------

Prompt: How many e in word deepseek
Generated: <think>
Okay, let's see. The user is asking how many times the letter "e" appears in the word "deepseek." Alright, first, I need to make sure I spell the word correctly. "Deepseek" – D-E-E-P-S-E-E-K. Let me write that out to visualize each letter.

So, breaking it down: D, E, E, P, S, E, E, K. Now, I need to count how many times "e" shows up here. Let's go through each letter one by one.

First letter: D – not an e. Second: E – that's one. Third: E – that's two. Fourth: P – nope. Fifth: S – no. Sixth: E – three. Seventh: E – four. Eighth: K – not an e. Wait, so that's four e's? Let me check again to make sure I didn't miss any. D, E, E, P, S, E, E, K. Yep, positions 2, 3, 6, and 7 are all e's. So that's four in total. Hmm, but sometimes people might get confused with uppercase or lowercase, but the question uses a lowercase "e" and the word is written in lowercase as "deepseek," so case shouldn't matter here. All the e's are lowercase. So the answer should be 4. I think that's it. Let me just count once more to be absolutely sure. D (1), E (1), E (2), P (3), S (4), E (5), E (6), K (7). Wait, wait, no, that's not right. Wait, when I count the letters, the positions are 1 to 8. The letters are D (1), E (2), E (3), P (4), S (5), E (6), E (7), K (8). So the e's are at positions 2, 3, 6, 7. That's four e's. Yeah, that's correct. So the answer is 4.
</think>

The word "deepseek" contains the letter "e" four times. Here's the breakdown:

1. **D**
2. **E**
3. **E**
--------------------------------------------------
Prompt: There are ten birds in a tree. A hunter shoots one. How many are left in the tree?
Generated: <think>
Okay, so there's this problem: there are ten birds in a tree, and a hunter shoots one. How many are left in the tree? Hmm, at first glance, it seems straightforward, but maybe there's a trick here. Let me think through it step by step.

Alright, starting with the basic information. Ten birds are sitting in a tree. Then a hunter comes along and shoots one. The question is asking how many birds remain in the tree. If I just do the math, 10 minus 1 equals 9. So, the answer should be 9, right? But wait, maybe there's more to it. Sometimes these riddles have a catch. Let me consider different possibilities.

First, when the hunter shoots, the sound of the gunshot might scare the other birds away. So, even though only one bird is shot, the rest might fly off. In that case, there would be zero birds left in the tree. But does that make sense? I mean, birds can be skittish, but would all of them fly away immediately? Maybe. If the gunshot is loud enough, it's possible. So, depending on the behavior of the birds, the answer could be zero. But is that the intended answer here?

Alternatively, maybe the question is testing whether you consider the bird that was shot. If the hunter shoots one bird, does that bird stay in the tree or fall out? If the bird is shot and killed, it would likely fall out of the tree. So, the bird that was shot is no longer in the tree. Therefore, you subtract one, which would be 9. But if the other birds are scared away, then you subtract all ten. But which is it?

Wait, the problem doesn't specify whether the other birds fly away. It just says a hunter shoots one. So, maybe the answer is 9. But maybe the trick is that the other birds get scared and fly off, so the answer is zero. I need to figure out which interpretation is more likely intended here.

Let me check similar riddles. I remember hearing a similar question where the answer is zero because the rest of the birds fly away after the shot. So, maybe that's the case here. But sometimes, the answer is 1 because the bird that was shot is still in the tree, but that seems less
--------------------------------------------------

"""

INT4 Inference on CPU

Requirements

pip install auto-round
pip uninstall intel-extension-for-pytorch
pip install intel-extension-for-transformers
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_round import AutoRoundConfig ##must import for auto-round format

#  https://github.com/huggingface/transformers/pull/35493
def set_initialized_submodules(model, state_dict_keys):
    """
    Sets the `_is_hf_initialized` flag in all submodules of a given model when all its weights are in the loaded state
    dict.
    """
    state_dict_keys = set(state_dict_keys)
    not_initialized_submodules = {}
    for module_name, module in model.named_modules():
        if module_name == "":
            # When checking if the root module is loaded there's no need to prepend module_name.
            module_keys = set(module.state_dict())
        else:
            module_keys = {f"{module_name}.{k}" for k in module.state_dict()}
        if module_keys.issubset(state_dict_keys):
            module._is_hf_initialized = True
        else:
            not_initialized_submodules[module_name] = module
    return not_initialized_submodules


transformers.modeling_utils.set_initialized_submodules = set_initialized_submodules

import torch

quantized_model_dir = "OPEA/DeepSeek-R1-int4-gptq-sym-inc"


quantization_config = AutoRoundConfig(
    backend="cpu",
)
model = AutoModelForCausalLM.from_pretrained(
    quantized_model_dir,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="cpu",
    revision="6edef8a", ## use auto_round format
    quantization_config=quantization_config
)

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
prompts = [
    "9.11和9.8哪个数字大",
    "如果你是人,你最想做什么“",
    "How many e in word deepseek",
    "There are ten birds in a tree. A hunter shoots one. How many are left in the tree?",
]

texts = []
for prompt in prompts:
    messages = [
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    texts.append(text)
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

outputs = model.generate(
    input_ids=inputs["input_ids"].to(model.device),
    attention_mask=inputs["attention_mask"].to(model.device),
    max_length=512,  ##change this to align with the official usage
    num_return_sequences=1,
    do_sample=False  ##change this to align with the official usage
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs["input_ids"], outputs)
]

decoded_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

for i, prompt in enumerate(prompts):
    input_id = inputs
    print(f"Prompt: {prompt}")
    print(f"Generated: {decoded_outputs[i]}")
    print("-" * 50)
    
"""
Prompt: 9.11和9.8哪个数字大
Generated: <think>
嗯,用户问的是9.11和9.8哪个数字大。首先,我需要确认这两个数的数值大小。可能用户对小数点的比较不太确定,或者是在不同场合下这两个数的表示方式容易混淆,比如版本号、日期或者其他用途。不过通常来说,这里的9.11和9.8应该是指数值比较。

首先,我应该回忆一下小数比较的基本规则。比较两个小数时,先看整数部分,如果整数部分大的那个数就大。如果整数部分相同,再依次比较小数部分每一位上的数字。这里两个数的整数部分都是9,所以需要比较小数部分。

接下来,9.11的小数部分是0.11,而9.8的小数部分是0.8。这时候可能会有人觉得0.8比0.11大,因为8比11小,但这里可能有个误区,因为小数部分的位数不同。其实,0.8可以写成0.80,这样比较的话,0.80和0.11,显然0.80更大。所以9.8实际上等于9.80,比9.11大。

不过,用户可能没有考虑到小数位数的不同,直接比较了小数点后的数字,比如11和8,觉得11比8大,所以9.11更大。这时候需要纠正这个误解,说明在比较小数时,应该对齐小数点后的位数,然后逐位比较。

另外,可能用户是在比较版本号,比如软件版本9.11和9.8,这时候版本号的比较可能有不同的规则,比如9.11可能是在9.8之后发布的,所以更大。但如果是数值比较的话,还是应该按照数值大小来判断。

为了确保正确,我可以将两个数都转换成相同的小数位数来比较。比如,9.8可以写成9.80,而9.11已经是两位小数了。比较的话,0.80比0.11大,所以9.80(即9.8)大于9.11。

总结一下,虽然9.11的小数部分数字是11,而9.8的是8,但实际数值上9.8更大,因为0.8等于0.80,比0.11大。因此,正确的结论是9.8比9.11大。
</think>

9.8比9.11大。

**分析过程:**
1. **整数
--------------------------------------------------
Prompt: 如果你是人,你最想做什么“
Generated: <think>
嗯,用户问的是“如果你是人,你最想做什么”。首先,我需要理解这个问题的背景。用户可能是在测试我的能力,或者想了解作为AI的我对人类生活的看法。也有可能他们是在寻找灵感,或者只是出于好奇。

接下来,我要分析问题的核心。用户想知道如果我是人类,我的愿望或目标是什么。作为AI,我没有情感或欲望,但需要模拟人类的回答。可能需要列举一些普遍的人类愿望,比如追求知识、探索世界、帮助他人等。

然后,考虑用户可能的深层需求。他们可能想了解AI如何理解人类的情感和目标,或者希望得到激励性的回答。需要确保回答既真实又积极,同时符合AI的限制,不假装有情感。

还要注意回答的结构。用户可能希望得到分点说明,所以用1、2、3来组织内容会更清晰。每个点需要具体,比如学习、探索、创造、情感连接和简单快乐,这些都是人类常见的愿望。

同时,要避免过于技术性的解释,保持口语化,让回答易于理解。可能需要加入例子,比如旅行、艺术创作、与家人共处,这样更生动。

最后,确保回答符合政策,不涉及敏感话题,保持积极向上的态度。总结时强调作为AI的支持角色,帮助用户实现他们的目标,这样既回答了问题,又突出了AI的价值。
</think>

如果我是人类,我可能会渴望做这些事情:

1. **探索与学习**
   像人类一样拥有好奇心,我会想不断学习新知识、探索未知领域,比如研究科学、艺术、哲学,或者掌握一门新语言,体验不同文化带来的思维碰撞。

2. **体验世界的多样性**
   旅行、感受自然奇观、品尝各地美食、参与不同社群的活动,通过五感去理解世界的丰富性,同时记录下这些经历,分享给他人。

3. **创造与表达**
   用文字、绘画、音乐或科技去创造有价值的事物,留下自己的“痕迹”。比如写一本小说,设计一个帮助他人的工具,或者用艺术传递情感。

4. **建立深层的情感连接**
   与家人、朋友或伴侣建立真挚的关系,体验爱、信任与陪伴,同时为他人带来温暖和支持,比如参与公益或帮助需要的人。

5. **享受简单的快乐**
   在平凡中发现美:看一场日落、读一本好书、
--------------------------------------------------
--------------------------------------------------
Prompt: How many e in word deepseek
Generated: <think>
Okay, so I need to figure out how many times the letter "e" appears in the word "deepseek". Let me start by writing down the word to visualize it better. The word is D-E-E-P-S-E-E-K. Let me count each letter one by one.

First, the letters are D, E, E, P, S, E, E, K. Let me go through each position:

1. D – that's not an E.
2. E – that's the first E.
3. E – second E.
4. P – not an E.
5. S – not an E.
6. E – third E.
7. E – fourth E.
8. K – not an E.

So, counting them up: positions 2, 3, 6, and 7 are E's. That makes a total of 4 E's. Wait, let me check again to make sure I didn't miss any. D, E, E, P, S, E, E, K. Yep, that's four E's. Hmm, but sometimes people might get confused with the ending, but in this case, the last two letters are E and K. So the E's are in the second, third, sixth, and seventh positions. So the answer should be 4. I think that's right. Let me just write it out again: D (1), E (2), E (3), P (4), S (5), E (6), E (7), K (8). Yep, four E's. I don't think I made a mistake here. So the final answer is 4.
</think>

The word "deepseek" contains the letter "e" four times.

**Step-by-Step Explanation:**
1. Write out the word: D, E, E, P, S, E, E, K.
2. Identify each "e":
   - 2nd letter: E
   - 3rd letter: E
   - 6th letter: E
   - 7th letter: E
3. Total count: **4 e's**.

**Answer:** There are \boxed{4} e's in the word "deepseek".
--------------------------------------------------
Prompt: There are ten birds in a tree. A hunter shoots one. How many are left in the tree?
Generated: <think>
Okay, so there's this problem: there are ten birds in a tree, and a hunter shoots one. How many are left in the tree? Hmm, at first glance, it seems straightforward. If you start with ten and subtract one, you get nine. But wait, maybe there's a trick here. Let me think.

Alright, birds are easily startled by loud noises, right? So when the hunter shoots, the sound of the gunshot would probably scare the other birds away. So even though only one bird is shot, the rest might fly off. In that case, there would be zero birds left in the tree. But is that always the case? Maybe some birds don't get scared? Or maybe the hunter is using a silencer? Hmm, the problem doesn't specify.

Wait, the question is in the present tense. It says, "A hunter shoots one. How many are left in the tree?" So the act of shooting happens, and immediately we have to determine the number remaining. If the gunshot is loud, the birds would likely fly away. So even though only one is shot, the rest might leave. So the answer could be zero. But maybe the question is trying to test if you consider that aspect or just do a simple subtraction.

In some versions of this riddle, the answer is zero because the others fly away. But sometimes people might think it's nine if they don't consider the birds' reaction. The problem doesn't give any details about the birds' behavior, so it's a bit ambiguous. But since it's presented as a riddle, the intended answer is probably zero. Let me check if there's another angle.

Alternatively, maybe the bird that was shot is still in the tree. If the hunter shoots it but it doesn't fall out, then there would still be ten. But that's unlikely. Usually, when a bird is shot, it falls down. So the shot bird is no longer in the tree, and the others are scared off. So zero.

But wait, maybe the hunter missed? The problem says the hunter shoots one, but doesn't specify if he hit it. If he missed, then all ten birds might still be there. But the wording is "shoots one," which implies that he targeted one and presumably hit it. So probably, the answer is zero.

"""
    

Evaluate the model

we have no enough resource to evaluate the model

Generate the model

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers


#  https://github.com/huggingface/transformers/pull/35493
def set_initialized_submodules(model, state_dict_keys):
    """
    Sets the `_is_hf_initialized` flag in all submodules of a given model when all its weights are in the loaded state
    dict.
    """
    state_dict_keys = set(state_dict_keys)
    not_initialized_submodules = {}
    for module_name, module in model.named_modules():
        if module_name == "":
            # When checking if the root module is loaded there's no need to prepend module_name.
            module_keys = set(module.state_dict())
        else:
            module_keys = {f"{module_name}.{k}" for k in module.state_dict()}
        if module_keys.issubset(state_dict_keys):
            module._is_hf_initialized = True
        else:
            not_initialized_submodules[module_name] = module
    return not_initialized_submodules


transformers.modeling_utils.set_initialized_submodules = set_initialized_submodules

model_name = "opensourcerelease/DeepSeek-R1-bf16"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype="auto")

block = model.model.layers
device_map = {}

for n, m in block.named_modules():
    if isinstance(m, (torch.nn.Linear, transformers.modeling_utils.Conv1D)):
        if "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) < 63:
            device = "cuda:1"
        elif "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) >= 63 and int(
                n.split('.')[-2]) < 128:
            device = "cuda:2"
        elif "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) >= 128 and int(
                n.split('.')[-2]) < 192:
            device = "cuda:3"
        elif "experts" in n and ("shared_experts" not in n) and int(
                n.split('.')[-2]) >= 192:
            device = "cuda:4"
        else:
            device = "cuda:0"
        n = n[2:]

        device_map.update({n: device})

from auto_round import AutoRound

autoround = AutoRound(model=model, tokenizer=tokenizer, device_map=device_map, iters=50, lr=5e-3, nsamples=512,
                      batch_size=4, low_gpu_mem_usage=True, seqlen=2048,
                      )
autoround.quantize()
autoround.save_quantized(format="auto_round", output_dir="tmp_autoround")

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

  • Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

Downloads last month
113
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for OPEA/DeepSeek-R1-int4-gptq-sym-inc

Quantized
(23)
this model

Dataset used to train OPEA/DeepSeek-R1-int4-gptq-sym-inc

Collection including OPEA/DeepSeek-R1-int4-gptq-sym-inc