MediaTek-Research
/

Breeze-7B-Base-v0_1

@@ -23,48 +23,79 @@ and is comparable with Mistral-7B-v0.1 on MMLU and MT-Bench in English.
 - **Model type:** Causal decoder-only transformer language model
 - **Language:** English and Traditional Chinese (zh-tw)
-##  Base Model Performance
-| Models                                              | TMMLU+ (ACC) | DRCD (EM) | MMLU (ACC) |
-|-----------------------------------------------------|--------------|-----------|------------|
-|                                                     | 5 shot       | 3 shot    | 5 shot     |
-| MediaTek-Research/Breeze-7B-Base-v0.1               |              |           |            |
-| mistralai/Mistral-7B-v0.1                           |              |           |            |
-| yentinglin/Taiwan-LLM-7B-v2.1-base                  |              |           |            |
-| yentinglin/Taiwan-LLM-13B-v2.0-base                 |              |           |            |
-| 01-ai/Yi-6B                                         |              |           |            |
-| 01-ai/Yi-34B                                        |              |           |            |
-| Qwen/Qwen-7B                                        |              |           |            |
-| Qwen/Qwen-14B                                       |              |           |            |
-## Inference Performance
-| Models                                                             | Speed (char/sec)  | Compression Ratio | Max Character Size |
-|--------------------------------------------------------------------|-------------------|-------------------|--------------------|
-| MediaTek-Research/Breeze-7B-Base-v0.1                              |                   |                   |                    |                    |
-| mistralai/Mistral-7B-v0.1                                          |                   |                   |                    |
-| yentinglin/Taiwan-LLM-7B-v2.1-base                                 |                   |                   |                    |
-| yentinglin/Taiwan-LLM-13B-v2.0-base                                |                   |                   |                    |
-| 01-ai/Yi-6B                                                        |                   |                   |                    |
-| 01-ai/Yi-34B                                                       |                   |                   |                    |
-| Qwen/Qwen-7B                                                       |                   |                   |                    |
-| Qwen/Qwen-14B                                                      |                   |                   |                    |
-##  Chat Model Performance
-| Models                                              | TMMLU+ (ACC) | DRCD (EM) | MT-Bench-tw (Score) | MMLU (ACC) | MT-Bench (Score) |
-|-----------------------------------------------------|--------------|-----------|---------------------|------------|------------------|
-|                                                     | 5 shot       | 3 shot    | 0 shot              | 5 shot     | 0 shot           |
-| MediaTek-Research/Breeze-7B-Instruct-v0.1           |              |           |                     |            |                  |
-| mistralai/Mistral-7B-Instruct-v0.1                  |              |           |                     |            |                  |
-| yentinglin/Taiwan-LLM-7B-v2.1-chat                  |              |           |                     |            |                  |
-| yentinglin/Taiwan-LLM-13B-v2.0-chat                 |              |           |                     |            |                  |
-| 01-ai/Yi-6B-Chat                                    |              |           |                     |            |                  |
-| 01-ai/Yi-34B-Chat                                   |              |           |                     |            |                  |
-| Qwen/Qwen-7B-Chat                                   |              |           |                     |            |                  |
-| Qwen/Qwen-14B-Chat                                  |              |           |                     |            |                  |
-| gpt-3.5-turbo-0613                                  |              | 76.30     |                     |            |                  |
 ## Use in Transformers

 - **Model type:** Causal decoder-only transformer language model
 - **Language:** English and Traditional Chinese (zh-tw)
+## Base Model Performance
+| Models                                       |        | TMMLU+ (ACC) | DRCD (EM)   | Table (ACC) | MMLU (ACC) |
+|----------------------------------------------|--------|--------------|-------------|-------------|------------|
+|                                              |        |TC, Knowledge |TC, Reasoning|TC, Reasoning|EN, Knowledge|
+|                                              |        | 5 shot       | 3 shot      | 5 shot      | 5 shot     |
+| [Yi-34B](https://huggingface.co/01-ai/Yi-34B)| 34B    | 63.10        | 84.57       | 49.31  | 77.42      |
+| [Qwen-14B](https://huggingface.co/01-ai/Qwen/Qwen-14B)| 14B    | 51.30        | 16.95 *     | 50.69  | 68.83      |
+| [Yi-6B](https://huggingface.co/01-ai/Yi-6B) | 6B     | 49.63        | 76.61       | 34.72  | 65.35      |
+| [Qwen-7B](https://huggingface.co/01-ai/Qwen/Qwen-7B)| 7B     | 42.84        | 0.0 *       | 39.58  | 61.00      |
+| [**Breeze-7B-Base-v0.1**](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1)       | 7B     | 40.35        | 81.13        | 28.47  | 61.63      |
+| [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)| 7B     | 36.93        | 79.27        | 27.78 | 64.89      |
+\* Few-shot learning cannot effectively guide the model to generate the proper answer.
+| Category ACC of TMMLU+ (5 shot)                     | STEM         | Social Science | Humanities | Other      |
+|-----------------------------------------------------|--------------|----------------|------------|------------|
+| Yi-34B                                        | 56.03        | 73.06          | 61.12      | 62.19      |
+| Qwen-14B                                       | 46.51        | 58.20          | 51.12      | 49.38      |
+| Yi-6B                                         | 41.14        | 57.77          | 50.22      | 49.39      |
+| Qwen-7B                                        | 28.25        | 47.80          | 43.14      | 42.17      |
+| **Breeze-7B-Base-v0.1**               | 35.74        | 46.08          | 40.29      | 39.27      |
+| Mistral-7B-v0.1                           | 33.01        | 42.23          | 35.86      | 37.63      |
+## Chat Model Performance
+| Models                                     |        | TMMLU+ (ACC) | TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MT-Bench-tw (Score) | MMLU (ACC) | MMLU (ACC) | MT-Bench (Score) |
+|--------------------------------------------|--------|--------------|--------------|-----------|-------------|--------|------------|------------|------------------|
+|                                                                                                         |        |TC, Knowledge |TC, Knowledge |TC, Reasoning|TC, Reasoning|TC, Chat           |EN, Knowledge|EN, Knowledge|EN, Chat        |
+|                                                                                                         |        | 0 shot       | 5 shot       | 3 shot    | 0 shot | 0 shot              | 0 shot     | 5 shot    | 0 shot           |
+| [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat)                                                 | 34B    | 54.87        |              |           | 36.81 |   6.9             | 71.04      |           |    7.6            |
+| [Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat)                                              | 14B    | 48.41        |              |           | 41.67 |   6.4             | 64.91      |           |    7.2            |
+| [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)                                                   | 6B     | 44.79        |              |           | 25.69 |   5.0             | 59.45      |           |    6.0            |
+| [gpt-3.5-turbo](https://openai.com)                                                                                     |        | 41.76        |              |           |  |    7.1             |   70.00      |           |    7.9            |
+| [**Breeze-7B-Instruct-v0.1**](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0.1)         | 7B     | 41.61        |              |           | 45.83  |   5.7             | 63.26      |           |    7.1            |
+| [**Breeze-7B-Instruct-64k-v0.1**](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-64k-v0.1) | 7B     | 40.99        |              |           | 36.11 |   5.5             | 63.68      |           |    7.1            |
+| [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)                                                | 7B     | 40.02        |              |           | 33.33 |   5.4             | 55.94      |           |    6.2            |
+| [Taiwan-LLM-13B-v2.0-chat](https://huggingface.co/yentinglin/Taiwan-LLM-13B-v2.0-chat)                  | 13B    | 29.47        |              |           | 23.61 |   5.0             | 50.50      |           |     -*            |
+| [Taiwan-LLM-7B-v2.1-chat](https://huggingface.co/yentinglin/Taiwan-LLM-7B-v2.1-chat)                    | 7B     | 28.08        |              |           | 31.25 |   4.2             | 42.72      |           |     -*            |
+\* Taiwan-LLM models responds to multi-turn questions (English) in Traditional Chinese.
+| Category ACC of TMMLU+ (0 shot)                     | STEM         | Social Science | Humanities | Other      |
+|-----------------------------------------------------|--------------|----------------|------------|------------|
+| Yi-34B-Chat                                         | 47.65        | 64.25          | 52.73      | 54.91      |
+| Qwen-14B-Chat                                       | 43.83        | 55.00          | 48.55      | 46.22      |
+| Yi-6B-Chat                                          | 37.80        | 51.74          | 45.36      | 44.25      |
+| gpt-3.5-turbo                                       | 41.56        | 46.72          | 36.73      | 42.03      |
+| **Breeze-7B-Instruct-v0.1**                             | 37.41        | 46.81          | 42.06      | 40.16      |
+| **Breeze-7B-Instruct-64k-v0.1**                         | 37.88        | 46.35          | 40.31      | 39.40      |
+| Qwen-7B-Chat                                        | 35.44        | 46.22          | 38.35      | 40.06      |
+| Taiwan-LLM-13B-v2.0-chat                            | 27.74        | 33.69          | 27.03      | 29.43      |
+| Taiwan-LLM-7B-v2.1-chat                             | 25.58        | 31.76          | 27.36      | 27.61      |
+## Inference Performance
+In this test, we use the first 700 characters of the [web article](https://health.udn.com/health/story/5976/7699252?from=udn_ch1005_main_index) as the input and ask the model to write the same article again.
+All models were inferenced with `vllm` on 2 A6000 (TP=2).
+| Models                                                             | Inference Time (sec)|Estimated Max Input Length (TC Char)|
+|--------------------------------------------------------------------|-------------------|--------------------------|
+| Yi-6B                                                        |   10.62  |   5.2k                |
+| **Breeze-7B-Instruct-v0.1**                              |  10.74  |    11.1k                 |
+| **Breeze-7B-Instruct-64k-v0.1**                              | 10.74       |  88.8k            |
+| Qwen-7B                                                       |   10.86         |    9.8k                  |
+| Qwen-14B                                                      |   18.89  |    9.8k                  |
+| Mistral-7B-v0.1                                          |  20.48   |    5.1k                 |
+| Taiwan-LLM-7B-v2.1-base                                 |   26.26          |    2.2k                  |
+| Taiwan-LLM-13B-v2.0-base                                |   36.80          |    2.2k                  |
+| Yi-34B                                                       |  43.71   |    4.5k                  |
 ## Use in Transformers