keisawada commited on
Commit
620364f
·
verified ·
1 Parent(s): 41c6fb7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -5
README.md CHANGED
@@ -3,7 +3,6 @@ thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rin
3
  license: apache-2.0
4
  language:
5
  - ja
6
- - en
7
  tags:
8
  - qwen2
9
  - conversational
@@ -22,9 +21,13 @@ library_name: transformers
22
 
23
  This model is a 4-bit quantized model for [rinna/qwen2.5-bakeneko-32b-instruct](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct) using [AutoAWQ](https://github.com/casper-hansen/AutoAWQ). The quantized version is 4x smaller than the original model and thus requires less memory and provides faster inference.
24
 
25
- | Size | Continual Pre-Training | Instruction Tuning | DeepSeek-R1 Distillation
26
- | :- | :- | :- | :-
27
- | 32B | Qwen2.5 Bakeneko 32B [[HF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b) | Qwen2.5 Bakeneko 32B Instruct [[HF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct)[[AWQ]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-awq)[[GGUF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-gguf)[[GPTQ int8]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-gptq-int8)[[GPTQ int4]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-gptq-int4)| DeepSeek R1 Distill Qwen2.5 Bakeneko 32B [[HF]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b)[[AWQ]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-awq)[[GGUF]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-gguf)[[GPTQ int8]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-gptq-int8)[[GPTQ int4]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-gptq-int4)
 
 
 
 
28
 
29
  See [rinna/qwen2.5-bakeneko-32b-instruct](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct) for details about model architecture and data.
30
 
@@ -33,11 +36,27 @@ See [rinna/qwen2.5-bakeneko-32b-instruct](https://huggingface.co/rinna/qwen2.5-b
33
  - [Xinqi Chen](https://huggingface.co/Keely0419)
34
  - [Kei Sawada](https://huggingface.co/keisawada)
35
 
 
 
 
 
36
  ---
37
 
38
  # Benchmarking
39
 
40
- Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ---
43
 
 
3
  license: apache-2.0
4
  language:
5
  - ja
 
6
  tags:
7
  - qwen2
8
  - conversational
 
21
 
22
  This model is a 4-bit quantized model for [rinna/qwen2.5-bakeneko-32b-instruct](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct) using [AutoAWQ](https://github.com/casper-hansen/AutoAWQ). The quantized version is 4x smaller than the original model and thus requires less memory and provides faster inference.
23
 
24
+ | Model Type | Model Name
25
+ | :- | :-
26
+ | Japanese Continual Pre-Training Model | Qwen2.5 Bakeneko 32B [[HF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b)
27
+ | Instruction-Tuning Model | Qwen2.5 Bakeneko 32B Instruct [[HF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct)[[AWQ]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-awq)[[GGUF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-gguf)[[GPTQ int8]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-gptq-int8)[[GPTQ int4]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-gptq-int4)
28
+ | DeepSeek R1 Distill Qwen2.5 Merged Reasoning Model | DeepSeek R1 Distill Qwen2.5 Bakeneko 32B [[HF]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b)[[AWQ]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-awq)[[GGUF]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-gguf)[[GPTQ int8]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-gptq-int8)[[GPTQ int4]](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b-gptq-int4)
29
+ | QwQ Merged Reasoning Model | QwQ Bakeneko 32B [[HF]](https://huggingface.co/rinna/qwq-bakeneko-32b)[[AWQ]](https://huggingface.co/rinna/qwq-bakeneko-32b-awq)[[GGUF]](https://huggingface.co/rinna/qwq-bakeneko-32b-gguf)[[GPTQ int8]](https://huggingface.co/rinna/qwq-bakeneko-32b-gptq-int8)[[GPTQ int4]](https://huggingface.co/rinna/qwq-bakeneko-32b-gptq-int4)
30
+ | QwQ Bakeneko Merged Instruction-Tuning Model | Qwen2.5 Bakeneko 32B Instruct V2 [[HF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-v2)[[AWQ]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-v2-awq)[[GGUF]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-v2-gguf)[[GPTQ int8]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-v2-gptq-int8)[[GPTQ int4]](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-v2-gptq-int4)
31
 
32
  See [rinna/qwen2.5-bakeneko-32b-instruct](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct) for details about model architecture and data.
33
 
 
36
  - [Xinqi Chen](https://huggingface.co/Keely0419)
37
  - [Kei Sawada](https://huggingface.co/keisawada)
38
 
39
+ * **Release date**
40
+
41
+ February 13, 2025
42
+
43
  ---
44
 
45
  # Benchmarking
46
 
47
+ | Model | Japanese LM Evaluation Harness | Japanese MT-Bench (first turn) | Japanese MT-Bench (multi turn)
48
+ | :- | :-: | :-: | :-:
49
+ | [Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) | 79.46 | - | -
50
+ | [rinna/qwen2.5-bakeneko-32b](https://huggingface.co/rinna/qwen2.5-bakeneko-32b) | 79.18 | - | -
51
+ | [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 78.29 | 8.13 | 7.54
52
+ | [rinna/qwen2.5-bakeneko-32b-instruct](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct) | 79.62 | 8.17 | 7.66
53
+ | [rinna/qwen2.5-bakeneko-32b-instruct-v2](https://huggingface.co/rinna/qwen2.5-bakeneko-32b-instruct-v2) | 77.92 | 8.86 | 8.53
54
+ | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | 73.51 | 7.39 | 6.88
55
+ | [rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b](https://huggingface.co/rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b) | 77.43 | 8.58 | 8.19
56
+ | [Qwen/QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) | 76.12 | 8.58 | 8.25
57
+ | [rinna/qwq-bakeneko-32b](https://huggingface.co/rinna/qwq-bakeneko-32b) | 78.31 | 8.81 | 8.52
58
+
59
+ For detailed benchmarking results, please refer to [rinna's LM benchmark page (Sheet 20250213)](https://rinnakk.github.io/research/benchmarks/lm/index.html).
60
 
61
  ---
62