Updated README.md
Browse files
README.md
CHANGED
@@ -1,53 +1,60 @@
|
|
1 |
---
|
2 |
language:
|
3 |
-
- en
|
4 |
pipeline_tag: text-generation
|
5 |
tags:
|
6 |
-
- OpenVINO
|
7 |
-
- meta
|
8 |
-
- llama
|
9 |
-
- llama-3
|
10 |
-
- PyTorch
|
11 |
license: llama3
|
12 |
extra_gated_prompt: |
|
13 |
-
|
14 |
-
|
15 |
library_name: transformers
|
16 |
---
|
|
|
17 |
# Llama-3-8B-Instruct-ov-fp16-int4-sym
|
18 |
-
|
19 |
-
## Built with Meta Llama 3
|
20 |
-
|
21 |
## Model Description
|
22 |
-
|
23 |
-
|
|
|
24 |
## Intended Use
|
25 |
This model is designed for advanced natural language understanding and generation tasks, ideal for academic researchers and developers in commercial settings looking to integrate efficient AI capabilities into their applications. It is not to be used for creating or promoting harmful or illegal content as per the guidelines outlined in the [Meta Llama 3 Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
|
26 |
-
|
27 |
-
## Model Architecture
|
28 |
-
Like the original model,
|
29 |
-
Llama-3-8B-Instruct-ov-fp16-int4-sym is based on an auto-regressive transformer architecture, fine-tuned with a focus on instruction-based tasks. The int8 quantization ensures it runs efficiently on compatible hardware without significant loss in performance.
|
30 |
-
|
31 |
-
## Carbon Footprint and Sustainability
|
32 |
-
Our model training processes are committed to sustainability. The original training utilized Meta’s Research SuperCluster, significantly offsetting carbon emissions to ensure environmentally responsible AI development.
|
33 |
-
|
34 |
## Licensing and Redistribution
|
35 |
-
This model is released under the Meta Llama 3 Community License. Redistribution requires inclusion of this license and a citation to the original model. Modifications and derivative works must prominently display "Built with Meta Llama 3" and adhere to the redistribution policies detailed in the original [license terms](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
|
36 |
-
|
37 |
-
##
|
38 |
-
|
39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
```python
|
41 |
from optimum.intel.openvino import OVModelForCausalLM
|
42 |
from transformers import AutoTokenizer
|
43 |
-
|
44 |
-
|
45 |
model_id = "nsbendre25/Llama-3-8B-Instruct-ov_fp16-int4_sym"
|
46 |
-
|
47 |
# Initialize the tokenizer and model
|
48 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
49 |
model = OVModelForCausalLM.from_pretrained(model_id)
|
50 |
-
|
51 |
pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")
|
52 |
pipeline("Hey how are you doing today?")
|
53 |
```
|
|
|
1 |
---
|
2 |
language:
|
3 |
+
- en
|
4 |
pipeline_tag: text-generation
|
5 |
tags:
|
6 |
+
- OpenVINO
|
7 |
+
- meta
|
8 |
+
- llama
|
9 |
+
- llama-3
|
10 |
+
- PyTorch
|
11 |
license: llama3
|
12 |
extra_gated_prompt: |
|
13 |
+
|
14 |
+
Meta Llama 3 Version Release Date: April 18, 2024
|
15 |
library_name: transformers
|
16 |
---
|
17 |
+
|
18 |
# Llama-3-8B-Instruct-ov-fp16-int4-sym
|
19 |
+
|
20 |
+
## Built with Meta Llama 3
|
21 |
+
|
22 |
## Model Description
|
23 |
+
|
24 |
+
This is a version of the original [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model converted to [OpenVINO™](https://github.com/openvinotoolkit/openvino) IR (Intermediate Representation) format for optimized inference on Intel® hardware. The model is created using the examples shown in [OpenVINO™ Notebooks](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks) repository.
|
25 |
+
|
26 |
## Intended Use
|
27 |
This model is designed for advanced natural language understanding and generation tasks, ideal for academic researchers and developers in commercial settings looking to integrate efficient AI capabilities into their applications. It is not to be used for creating or promoting harmful or illegal content as per the guidelines outlined in the [Meta Llama 3 Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
|
28 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
## Licensing and Redistribution
|
30 |
+
This model is released under the Meta Llama 3 Community License. Redistribution requires inclusion of this license and a citation to the original model. Modifications and derivative works must prominently display "Built with Meta Llama 3" and adhere to the redistribution policies detailed in the original model [license terms](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/LICENSE).
|
31 |
+
|
32 |
+
## Weight Compression Parameters
|
33 |
+
For more information on the parameters, refer to the [OpenVINO™ 2024.1.0 documentation](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html)
|
34 |
+
|
35 |
+
* mode: **INT4_SYM**
|
36 |
+
* group_size: **128**
|
37 |
+
* ratio: **0.8**
|
38 |
+
|
39 |
+
## Running Model Inference
|
40 |
+
|
41 |
+
Install packages required for using [Optimum Intel](https://huggingface.co/docs/optimum/intel/index) integration with the OpenVINO™ backend:
|
42 |
+
|
43 |
+
```sh
|
44 |
+
pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
|
45 |
+
```
|
46 |
+
|
47 |
+
Run model inference:
|
48 |
```python
|
49 |
from optimum.intel.openvino import OVModelForCausalLM
|
50 |
from transformers import AutoTokenizer
|
51 |
+
|
|
|
52 |
model_id = "nsbendre25/Llama-3-8B-Instruct-ov_fp16-int4_sym"
|
53 |
+
|
54 |
# Initialize the tokenizer and model
|
55 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
56 |
model = OVModelForCausalLM.from_pretrained(model_id)
|
57 |
+
|
58 |
pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")
|
59 |
pipeline("Hey how are you doing today?")
|
60 |
```
|