Update README.md
Browse files
README.md
CHANGED
@@ -66,15 +66,18 @@ model-index:
|
|
66 |
|
67 |
</div>
|
68 |
|
69 |
-
|
70 |
-
<img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
|
71 |
-
</p>
|
72 |
|
73 |
## TLDR
|
74 |
This repository contains the research preview of **LongLLaMA, a large language model capable of handling long contexts of 256k tokens or even more**.
|
75 |
|
76 |
-
LongLLaMA is built upon the foundation of [
|
77 |
-
|
|
|
|
|
|
|
|
|
|
|
78 |
|
79 |
## Overview
|
80 |
|
@@ -84,7 +87,7 @@ LongLLaMA Code is built upon the foundation of [Code Llama](https://huggingface.
|
|
84 |
|
85 |
**LongLLaMA** is an [OpenLLaMA](https://github.com/openlm-research/open_llama) model finetuned with the FoT method,
|
86 |
with three layers used for context extension. **Crucially, LongLLaMA is able to extrapolate much beyond the context length seen in training: 8k. E.g., in the passkey retrieval task, it can handle inputs of length 256k**.
|
87 |
-
**LongLLaMA
|
88 |
|
89 |
|
90 |
<div align="center">
|
@@ -159,9 +162,9 @@ LongLLaMA has several other parameters:
|
|
159 |
import torch
|
160 |
from transformers import LlamaTokenizer, AutoModelForCausalLM
|
161 |
|
162 |
-
tokenizer = LlamaTokenizer.from_pretrained("syzymon/
|
163 |
model = AutoModelForCausalLM.from_pretrained(
|
164 |
-
"syzymon/
|
165 |
mem_layers=[],
|
166 |
mem_dtype='bfloat16',
|
167 |
trust_remote_code=True,
|
@@ -177,8 +180,8 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
177 |
from transformers import LlamaTokenizer, LlamaForCausalLM
|
178 |
import torch
|
179 |
|
180 |
-
tokenizer = LlamaTokenizer.from_pretrained("syzymon/
|
181 |
-
model = LlamaForCausalLM.from_pretrained("syzymon/
|
182 |
```
|
183 |
|
184 |
|
|
|
66 |
|
67 |
</div>
|
68 |
|
69 |
+
|
|
|
|
|
70 |
|
71 |
## TLDR
|
72 |
This repository contains the research preview of **LongLLaMA, a large language model capable of handling long contexts of 256k tokens or even more**.
|
73 |
|
74 |
+
LongLLaMA-Code is built upon the foundation of [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf).
|
75 |
+
|
76 |
+
LongLLaMA-Code has **improved reasoning capabilities** compared to CodeLlama, in particular we improve **GSM8K math reasoning from 13% to 17.4%**.
|
77 |
+
|
78 |
+
<p align="center" width="100%">
|
79 |
+
<img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
|
80 |
+
</p>
|
81 |
|
82 |
## Overview
|
83 |
|
|
|
87 |
|
88 |
**LongLLaMA** is an [OpenLLaMA](https://github.com/openlm-research/open_llama) model finetuned with the FoT method,
|
89 |
with three layers used for context extension. **Crucially, LongLLaMA is able to extrapolate much beyond the context length seen in training: 8k. E.g., in the passkey retrieval task, it can handle inputs of length 256k**.
|
90 |
+
**LongLLaMA-Code** is a [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf) model finetuned with the FoT method.
|
91 |
|
92 |
|
93 |
<div align="center">
|
|
|
162 |
import torch
|
163 |
from transformers import LlamaTokenizer, AutoModelForCausalLM
|
164 |
|
165 |
+
tokenizer = LlamaTokenizer.from_pretrained("syzymon/long_llama_code_7b")
|
166 |
model = AutoModelForCausalLM.from_pretrained(
|
167 |
+
"syzymon/long_llama_code_7b", torch_dtype=torch.float32,
|
168 |
mem_layers=[],
|
169 |
mem_dtype='bfloat16',
|
170 |
trust_remote_code=True,
|
|
|
180 |
from transformers import LlamaTokenizer, LlamaForCausalLM
|
181 |
import torch
|
182 |
|
183 |
+
tokenizer = LlamaTokenizer.from_pretrained("syzymon/long_llama_code_7b")
|
184 |
+
model = LlamaForCausalLM.from_pretrained("syzymon/long_llama_code_7b", torch_dtype=torch.float32)
|
185 |
```
|
186 |
|
187 |
|