RichardErkhov
/

tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf

GGUF

Inference Endpoints

Model card Files Files and versions Community

RichardErkhov commited on Sep 6

Commit

437e63a

•

1 Parent(s): dae6eb3

uploaded readme

Browse files

Files changed (1) hide show

README.md +233 -0

README.md ADDED Viewed

	@@ -0,0 +1,233 @@

+Quantization made by Richard Erkhov.
+[Github](https://github.com/RichardErkhov)
+[Discord](https://discord.gg/pvy7H8DZMG)
+[Request more models](https://github.com/RichardErkhov/quant_request)
+Llama-3-Swallow-70B-v0.1 - GGUF
+- Model creator: https://huggingface.co/tokyotech-llm/
+- Original model: https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-v0.1/
+| Name | Quant method | Size |
+| ---- | ---- | ---- |
+| [Llama-3-Swallow-70B-v0.1.Q2_K.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/blob/main/Llama-3-Swallow-70B-v0.1.Q2_K.gguf) | Q2_K | 24.56GB |
+| [Llama-3-Swallow-70B-v0.1.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/blob/main/Llama-3-Swallow-70B-v0.1.IQ3_XS.gguf) | IQ3_XS | 27.29GB |
+| [Llama-3-Swallow-70B-v0.1.IQ3_S.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/blob/main/Llama-3-Swallow-70B-v0.1.IQ3_S.gguf) | IQ3_S | 28.79GB |
+| [Llama-3-Swallow-70B-v0.1.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/blob/main/Llama-3-Swallow-70B-v0.1.Q3_K_S.gguf) | Q3_K_S | 28.79GB |
+| [Llama-3-Swallow-70B-v0.1.IQ3_M.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/blob/main/Llama-3-Swallow-70B-v0.1.IQ3_M.gguf) | IQ3_M | 29.74GB |
+| [Llama-3-Swallow-70B-v0.1.Q3_K.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/blob/main/Llama-3-Swallow-70B-v0.1.Q3_K.gguf) | Q3_K | 31.91GB |
+| [Llama-3-Swallow-70B-v0.1.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/blob/main/Llama-3-Swallow-70B-v0.1.Q3_K_M.gguf) | Q3_K_M | 31.91GB |
+| [Llama-3-Swallow-70B-v0.1.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/blob/main/Llama-3-Swallow-70B-v0.1.Q3_K_L.gguf) | Q3_K_L | 34.59GB |
+| [Llama-3-Swallow-70B-v0.1.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/blob/main/Llama-3-Swallow-70B-v0.1.IQ4_XS.gguf) | IQ4_XS | 35.64GB |
+| [Llama-3-Swallow-70B-v0.1.Q4_0.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/blob/main/Llama-3-Swallow-70B-v0.1.Q4_0.gguf) | Q4_0 | 37.22GB |
+| [Llama-3-Swallow-70B-v0.1.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/tree/main/) | IQ4_NL | 37.58GB |
+| [Llama-3-Swallow-70B-v0.1.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/tree/main/) | Q4_K_S | 37.58GB |
+| [Llama-3-Swallow-70B-v0.1.Q4_K.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/tree/main/) | Q4_K | 39.6GB |
+| [Llama-3-Swallow-70B-v0.1.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/tree/main/) | Q4_K_M | 39.6GB |
+| [Llama-3-Swallow-70B-v0.1.Q4_1.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/tree/main/) | Q4_1 | 41.27GB |
+| [Llama-3-Swallow-70B-v0.1.Q5_0.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/tree/main/) | Q5_0 | 45.32GB |
+| [Llama-3-Swallow-70B-v0.1.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/tree/main/) | Q5_K_S | 45.32GB |
+| [Llama-3-Swallow-70B-v0.1.Q5_K.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/tree/main/) | Q5_K | 46.52GB |
+| [Llama-3-Swallow-70B-v0.1.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/tree/main/) | Q5_K_M | 46.52GB |
+| [Llama-3-Swallow-70B-v0.1.Q5_1.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/tree/main/) | Q5_1 | 49.36GB |
+| [Llama-3-Swallow-70B-v0.1.Q6_K.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/tree/main/) | Q6_K | 53.91GB |
+| [Llama-3-Swallow-70B-v0.1.Q8_0.gguf](https://huggingface.co/RichardErkhov/tokyotech-llm_-_Llama-3-Swallow-70B-v0.1-gguf/tree/main/) | Q8_0 | 69.83GB |
+Original model description:
+---
+language:
+  - en
+  - ja
+library_name: transformers
+pipeline_tag: text-generation
+license: llama3
+model_type: llama
+---
+# Llama3 Swallow
+Our Swallow model has undergone continual pre-training from the [Llama 3 family](https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6), primarily with the addition of Japanese language data. The Instruct versions use supervised fine-tuning (SFT) and Chat Vector. Links to other models can be found in the index.
+# Model Release Updates
+We are excited to share the release schedule for our latest models:
+- **July 1, 2024**: Released the [Llama-3-Swallow-8B-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-v0.1), [Llama-3-Swallow-8B-Instruct-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1), [Llama-3-Swallow-70B-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-v0.1), and [Llama-3-Swallow-70B-Instruct-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1).
+## Swallow Model Index
+|Model|Llama-3-Swallow|Llama3 Swallow Instruct|
+|---|---|---|
+|8B| [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-v0.1) | [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1) |
+|70B| [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-v0.1) | [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1) |
+![logo](./logo.png)
+This repository provides large language models developed by [Swallow-LLM](https://swallow-llm.github.io/).
+Read our [blog post](https://zenn.dev/tokyotech_lm/articles/f65989d76baf2c).
+## Model Details
+* **Model type**: Please refer to [Llama 3 MODEL_CARD](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for details on the model architecture.
+* **Language(s)**: Japanese English
+* **Library**: [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
+* **Tokenizer**: Please refer to [Llama 3 blog](https://ai.meta.com/blog/meta-llama-3/) for details on the tokenizer.
+* **Contact**: swallow[at]nlp.c.titech.ac.jp
+## Model Performance
+### Japanese tasks
+|Model|Size|JCom.|JEMHopQA|NIILC|JSQuAD|XL-Sum|MGSM|WMT20-en-ja|WMT20-ja-en|JMMLU|JHumanEval|Ja Avg|
+|---|---|---|---|---|---|---|---|---|---|---|---|---|
+|   |   |4-shot|4-shot|4-shot|4-shot|1-shot|4-shot|4-shot|4-shot|5-shot|0-shot|   |
+|   |   |EM acc|Char-F1|Char-F1|Char-F1|ROUGE-2|EM acc|BLEU|BLEU|EM acc|pass@1|   |
+|Llama-2-70b|70B|0.8651|0.5157|0.5464|0.9130|0.2372|0.3640|0.2657|0.2402|0.5496|0.2841|0.4781|
+|Swallow-70b-hf|70B|0.9178|0.6178|**0.6910**|0.9208|0.2279|0.4720|0.3046|0.2301|0.5750|0.2262|0.5183|
+|Qwen2-72B|72B|0.9607|0.6399|0.5617|**0.9261**|0.2362|**0.7560**|0.2747|0.2419|**0.7831**|**0.5567**|**0.5937**|
+|Meta-Llama-3-70B|70B|0.9473|0.6042|0.5965|0.9207|0.2254|0.6720|0.2855|0.2526|0.6975|0.4799|0.5682|
+|Llama-3-Swallow-70B-v0.1|70B|**0.9714**|**0.6695**|0.6881|0.9218|**0.2404**|0.7080|**0.3072**|**0.2548**|0.7049|0.4683|0.5934|
+### English tasks
+|Model|Size|OpenBookQA|TriviaQA|HellaSWAG|SQuAD2.0|XWINO|MMLU|GSM8K|BBH|HumanEval|En Avg|
+|---|---|---|---|---|---|---|---|---|---|---|---|
+|   |   |4-shot|4-shot|4-shot|4-shot|4-shot|5-shot|4-shot|3-shot|0-shot|   |
+|   |   |Acc|EM acc|Acc|EM acc|Acc|Acc|EM acc|CoT EM Acc|pass@1|   |
+|Llama-2-70b|70B|0.4260|0.7988|0.6681|0.3379|**0.9256**|0.6876|0.5466|0.6643|0.3152|0.5967|
+|Swallow-70b-hf|70B|0.4160|0.7610|0.6433|0.3345|0.9191|0.6571|0.5080|0.6537|0.2409|0.5704|
+|Qwen2-72B|72B|0.4160|0.7890|0.6766|0.4052|0.9161|**0.8428**|**0.8908**|0.6388|**0.6049**|0.6867|
+|Meta-Llama-3-70B|70B|**0.4360**|**0.8263**|**0.6909**|**0.4071**|0.9213|0.7870|0.8014|**0.8266**|0.5177|**0.6905**|
+|Llama-3-Swallow-70B-v0.1|70B|0.4240|0.8231|0.6828|0.4059|0.9234|0.7745|0.8143|0.7352|0.4909|0.6749|
+## Evaluation Benchmarks
+### Japanese evaluation benchmarks
+We used llm-jp-eval(v1.3.0), JP Language Model Evaluation Harness(commit #9b42d41) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows:
+- Multiple-choice question answering (JCommonsenseQA [Kurihara et al., 2022])
+- Open-ended question answering (JEMHopQA [Ishii et al., 2024])
+- Open-ended question answering (NIILC [関根, 2003])
+- Machine reading comprehension (JSQuAD [Kurihara et al., 2022])
+- Automatic summarization (XL-Sum [Hasan et al., 2021])
+- Machine translation (WMT2020 ja-en [Barrault et al., 2020])
+- Machine translation (WMT2020 en-ja [Barrault et al., 2020])
+- Mathematical reasoning (MGSM [Shi et al., 2023])
+- Academic exams (JMMLU [尹ら, 2024])
+- Code generation (JHumanEval [佐藤ら, 2024])
+### English evaluation benchmarks
+We used the Language Model Evaluation Harness(v.0.4.2) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows:
+- Multiple-choice question answering (OpenBookQA [Mihaylov et al., 2018])
+- Open-ended question answering (TriviaQA [Joshi et al., 2017])
+- Machine reading comprehension (SQuAD2 [Rajpurkar et al., 2018])
+- Commonsense reasoning (XWINO [Tikhonov and Ryabinin, 2021])
+- Natural language inference (HellaSwag [Zellers et al., 2019])
+- Mathematical reasoning (GSM8K [Cobbe et al., 2021])
+- Reasoning (BBH (BIG-Bench-Hard) [Suzgun et al., 2023])
+- Academic exams (MMLU [Hendrycks et al., 2021])
+- Code generation (HumanEval [Chen et al., 2021])
+## Training Datasets
+### Continual Pre-Training
+The following datasets were used for continual pre-training.
+- [Algebraic Stack](https://huggingface.co/datasets/EleutherAI/proof-pile-2)
+- [Cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia)
+- [English Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
+- [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
+- [Laboro ParaCorpus](https://github.com/laboroai/Laboro-ParaCorpus)
+- [OpenWebMath](https://huggingface.co/datasets/EleutherAI/proof-pile-2)
+- [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)
+- [Swallow Corpus](https://arxiv.org/abs/2404.17733)
+## Risks and Limitations
+The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.
+## Acknowledgements
+We thank Meta Research for releasing Llama 3 under an open license for others to build on.
+Our project is supported by the [Large Generative AI Development Support Program](https://abci.ai/en/link/lfm_support_program.html) of the National Institute of Advanced Industrial Science and Technology.
+## License
+[META LLAMA 3 COMMUNITY LICENSE](https://llama.meta.com/llama3/license/)
+## Authors
+Here are the team members:
+- From [Tokyo Institute of Technology Okazaki Laboratory](https://www.nlp.c.titech.ac.jp/index.en.html), the following members:
+  - [Naoaki Okazaki](https://www.chokkan.org/index.ja.html)
+  - [Sakae Mizuki](https://s-mizuki-nlp.github.io/)
+  - [Youmi Ma](https://www.nlp.c.titech.ac.jp/member/youmi.en.html)
+  - [Koki Maeda](https://sites.google.com/view/silviase)
+  - [Kakeru Hattori](https://aya-se.vercel.app/)
+  - [Masanari Ohi](https://sites.google.com/view/masanariohi)
+  - [Taihei Shiotani](https://github.com/inatoihs)
+  - [Koshiro Saito](https://sites.google.com/view/koshiro-saito)
+- From [Tokyo Institute of Technology YOKOTA Laboratory](https://www.rio.gsic.titech.ac.jp/en/index.html), the following members:
+  - [Rio Yokota](https://twitter.com/rioyokota)
+  - [Kazuki Fujii](https://twitter.com/okoge_kaz)
+  - [Taishi Nakamura](https://twitter.com/Setuna7777_2)
+  - [Takumi Okamoto](https://www.linkedin.com/in/takumi-okamoto)
+  - [Ishida Shigeki](https://www.wantedly.com/id/reborn27)
+- From [Artificial Intelligence Research Center, AIST, Japan](https://www.airc.aist.go.jp/en/teams/), the following members:
+  - [Hiroya Takamura](https://sites.google.com/view/hjtakamura)
+## How to cite
+If you find our work helpful, please feel free to cite us.
+```
+@inproceedings{Fujii:COLM2024,
+   title={Continual Pre-Training for Cross-Lingual LLM Adaptation:
+Enhancing Japanese Language Capabilities},
+   author={Kazuki Fujii and Taishi Nakamura and Mengsay Loem and Hiroki
+Iida and Masanari Ohi and Kakeru Hattori and Hirai Shota and Sakae
+Mizuki and Rio Yokota and Naoaki Okazaki},
+   booktitle="Proceedings of the First Conference on Language Modeling",
+   series={COLM},
+   pages="(to appear)",
+   year="2024",
+   month=oct,
+   address={University of Pennsylvania, USA},
+}
+@inproceedings{Okazaki:COLM2024,
+   title={Building a Large Japanese Web Corpus for Large Language Models},
+   author={Naoaki Okazaki and Kakeru Hattori and Hirai Shota and Hiroki
+Iida and Masanari Ohi and Kazuki Fujii and Taishi Nakamura and Mengsay
+Loem and Rio Yokota and Sakae Mizuki},
+   booktitle="Proceedings of the First Conference on Language Modeling",
+   series={COLM},
+   pages="(to appear)",
+   year="2024",
+   month=oct,
+   address={University of Pennsylvania, USA},
+}
+```
+### Citations
+```tex
+@article{llama3modelcard,
+    title={Llama 3 Model Card},
+    author={AI@Meta},
+    year={2024},
+    url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
+}
+```