--- license: cc-by-sa-4.0 datasets: - izumi-lab/llm-japanese-dataset language: - ja tags: - llama - causal-lm --- This repo contains a low-rank adapter for LLaMA-13b fit on the [llm-japanese-dataset](https://github.com/masanorihirano/llm-japanese-dataset) dataset. You can test this at https://huggingface.co/spaces/izumi-lab/llama-13b-japanese-lora-v0-1ep This version of the weights was trained with the following hyperparameters: - Epochs: 1 - Batch size: 130 - Cutoff length: 256 - Learning rate: 3e-4 - Lora _r_: 4 - Lora target modules: q_proj, v_proj ```python import torch from transformers import LlamaForCausalLM, LlamaTokenizer from peft import PeftModel base_model = "decapoda-research/llama-13b-hf" # Please note that the special license of decapoda-research/llama-13b-hf is applied. model = LlamaForCausalLM.from_pretrained(base_model, torch_dtype=torch.float16) tokenizer = LlamaTokenizer.from_pretrained(base_model) model = PeftModel.from_pretrained( model, "izumi-lab/llama-13b-japanese-lora-v0", torch_dtype=torch.float16, ) ``` To see more latest information, please go to [llm.msuzuki.me](https://llm.msuzuki.me). ## Details - Japanese Paper: [https://jxiv.jst.go.jp/index.php/jxiv/preprint/view/383](https://jxiv.jst.go.jp/index.php/jxiv/preprint/view/383) - English Paper: [https://arxiv.org/abs/2305.12720](https://arxiv.org/abs/2305.12720) - GitHub: [https://github.com/masanorihirano/llm-japanese-dataset](https://github.com/masanorihirano/llm-japanese-dataset) - Website: [llm.msuzuki.me](https://llm.msuzuki.me). Citation: ``` @preprint{Hirano2023-llmj, title={{llm-japanese-dataset v0: Construction of Japanese Chat Dataset for Large Language Models and its Methodology}}, autor={Masanori HIRANO and Masahiro SUZUKI and Hiroki SAKAJI}, doi={10.48550/arXiv.2305.12720}, archivePrefix={arXiv}, arxivId={2305.12720}, year={2023} } ``` If you have any inquiries, such as joint research, data provision, various types of support, please email to izumi-llm@socsim.org .