File size: 2,135 Bytes
650d5f3 d5d17c3 650d5f3 5ea7804 650d5f3 be13ad8 650d5f3 d5d17c3 650d5f3 d5d17c3 650d5f3 d5d17c3 650d5f3 d5d17c3 650d5f3 d5d17c3 5ea7804 650d5f3 d5d17c3 650d5f3 d5d17c3 650d5f3 d5d17c3 650d5f3 d5d17c3 650d5f3 d5d17c3 650d5f3 d5d17c3 650d5f3 d5d17c3 650d5f3 d5d17c3 5ea7804 d5d17c3 650d5f3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
---
base_model: meta-llama/Llama-3.3-70B-Instruct
---
# MISHANM/meta-llama-Llama-3.3-70B-Instruct-int4
This model is an INT4 quantized version of the meta-llama/Llama-3.3-70B-Instruct, offering maximum compression for specialized hardware environments, supported languages : English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
## Model Details
1. Tasks: Causal Language Modeling, Text Generation
2. Base Model: meta-llama/Llama-3.3-70B-Instruct
3. Quantization Format: INT4
# Device Used
1. GPUs: AMD Instinct™ MI210 Accelerators
## Inference with HuggingFace
```python3
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the fine-tuned model and tokenizer
model_path = "MISHANM/meta-llama-Llama-3.3-70B-Instruct-int4"
model = AutoModelForCausalLM.from_pretrained(model_path,device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Function to generate text
def generate_text(prompt, max_length=1000, temperature=0.9):
# Format the prompt according to the chat template
messages = [
{
"role": "system",
"content": "Give response to the user query.", # change as per your requirement.
},
{"role": "user", "content": prompt}
]
# Apply the chat template
formatted_prompt = f"<|system|>{messages[0]['content']}<|user|>{messages[1]['content']}<|assistant|>"
# Tokenize and generate output
inputs = tokenizer(formatted_prompt, return_tensors="pt")
output = model.generate( # Use model.module for DataParallel
**inputs, max_new_tokens=max_length, temperature=temperature, do_sample=True
)
return tokenizer.decode(output[0], skip_special_tokens=True)
# Example usage
prompt = """Give a poem on LLM ."""
text = generate_text(prompt)
print(text)
```
## Citation Information
```
@misc{MISHANM/meta-llama-Llama-3.3-70B-Instruct-int4,
author = {Mishan Maurya},
title = {Introducing INT4 quantized version of meta-llama/Llama-3.3-70B-Instruct},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face repository},
}
```
|