Improve model card: Add library name
Browse filesThis PR adds the `library_name: transformers` field to the model card metadata. The provided code examples clearly demonstrate the model's compatibility with the Hugging Face `transformers` library. This addition enhances the model card's clarity and improves discoverability for users seeking models compatible with this popular library.
README.md
CHANGED
@@ -1,21 +1,32 @@
|
|
1 |
---
|
2 |
-
license: mit
|
3 |
-
license_link: https://huggingface.co/microsoft/Phi-3-small-128k-instruct/resolve/main/LICENSE
|
4 |
-
|
5 |
language:
|
6 |
- multilingual
|
|
|
|
|
7 |
pipeline_tag: text-generation
|
8 |
tags:
|
9 |
- nlp
|
10 |
- code
|
|
|
11 |
inference:
|
12 |
parameters:
|
13 |
temperature: 0.7
|
14 |
widget:
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
🎉 **Phi-3.5**: [[mini-instruct]](https://huggingface.co/microsoft/Phi-3.5-mini-instruct); [[MoE-instruct]](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) ; [[vision-instruct]](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)
|
20 |
|
21 |
## Model Summary
|
@@ -277,4 +288,4 @@ The model is licensed under the [MIT license](https://huggingface.co/microsoft/P
|
|
277 |
|
278 |
## Trademarks
|
279 |
|
280 |
-
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
|
|
|
1 |
---
|
|
|
|
|
|
|
2 |
language:
|
3 |
- multilingual
|
4 |
+
license: mit
|
5 |
+
license_link: https://huggingface.co/microsoft/Phi-3-small-128k-instruct/resolve/main/LICENSE
|
6 |
pipeline_tag: text-generation
|
7 |
tags:
|
8 |
- nlp
|
9 |
- code
|
10 |
+
library_name: transformers
|
11 |
inference:
|
12 |
parameters:
|
13 |
temperature: 0.7
|
14 |
widget:
|
15 |
+
- messages:
|
16 |
+
- role: user
|
17 |
+
content: Can you provide ways to eat combinations of bananas and dragonfruits?
|
18 |
---
|
19 |
+
|
20 |
+
# LongRoPE2: Near-Lossless LLM Context Window Scaling
|
21 |
+
|
22 |
+
The model was presented in the paper [LongRoPE2: Near-Lossless LLM Context Window Scaling](https://hf.co/papers/2502.20082).
|
23 |
+
|
24 |
+
# Paper abstract
|
25 |
+
|
26 |
+
The abstract of the paper is the following:
|
27 |
+
|
28 |
+
LongRoPE2 is a novel approach that extends the effective context window of pre-trained large language models (LLMs) to the target length, while preserving the performance on the original shorter context window. This is achieved by three contributions: (1) a hypothesis that insufficient training in higher RoPE dimensions contributes to the persistent out-of-distribution (OOD) issues observed in existing methods; (2) an effective RoPE rescaling algorithm that adopts evolutionary search guided by "needle-driven" perplexity to address the insufficient training problem; (3) a mixed context window training approach that fine-tunes model weights to adopt rescaled RoPE for long-context sequences while preserving the short-context performance with the original RoPE. Extensive experiments on LLaMA3-8B and Phi3-mini-3.8B across various benchmarks validate the hypothesis and demonstrate the effectiveness of LongRoPE2. Remarkably, LongRoPE2 extends LLaMA3-8B to achieve a 128K effective context length while retaining over 98.5% of short-context performance, using only 10B tokens -- 80x fewer than Meta's approach, which fails to reach the target effective context length. Code will be available at https://github.com/microsoft/LongRoPE.
|
29 |
+
|
30 |
🎉 **Phi-3.5**: [[mini-instruct]](https://huggingface.co/microsoft/Phi-3.5-mini-instruct); [[MoE-instruct]](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) ; [[vision-instruct]](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)
|
31 |
|
32 |
## Model Summary
|
|
|
288 |
|
289 |
## Trademarks
|
290 |
|
291 |
+
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
|