Update README.md
Browse files
README.md
CHANGED
|
@@ -7,10 +7,9 @@ tags:
|
|
| 7 |
- llama-3
|
| 8 |
---
|
| 9 |
|
|
|
|
| 10 |
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
This model extends LLama-3 8B's context length from 8k to > 130K, developed by Gradient, sponsored by compute from Crusoe Energy. It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
|
| 14 |
|
| 15 |
**Approach:**
|
| 16 |
|
|
|
|
| 7 |
- llama-3
|
| 8 |
---
|
| 9 |
|
| 10 |
+

|
| 11 |
|
| 12 |
+
This model extends LLama-3 8B's context length from 8k to > 160K, developed by Gradient, sponsored by compute from Crusoe Energy. It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
|
|
|
|
|
|
|
| 13 |
|
| 14 |
**Approach:**
|
| 15 |
|