gradientai
/

Llama-3-8B-Instruct-262k

Text Generation

text-generation-inference

Model card Files Files and versions

leo-pekelis-gradient commited on Apr 25, 2024

Commit

5cfd414

·

verified ·

1 Parent(s): 9411de7

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -7,10 +7,9 @@ tags:
 - llama-3
 ---
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6585dc9be92bc5f258156bd6/F2WLF8_jOx_gttxbPtLK1.png)
-This model extends LLama-3 8B's context length from 8k to > 130K, developed by Gradient, sponsored by compute from Crusoe Energy. It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
 **Approach:**

 - llama-3
 ---
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6585dc9be92bc5f258156bd6/hiHWva3CbsrnPvZTp5-lu.png)
+This model extends LLama-3 8B's context length from 8k to > 160K, developed by Gradient, sponsored by compute from Crusoe Energy. It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
 **Approach:**