File size: 2,135 Bytes
650d5f3
d5d17c3
 
650d5f3
 
5ea7804
650d5f3
be13ad8
650d5f3
 
 
 
d5d17c3
 
 
650d5f3
 
 
d5d17c3
650d5f3
d5d17c3
 
 
650d5f3
 
d5d17c3
 
 
 
 
650d5f3
d5d17c3
5ea7804
650d5f3
d5d17c3
650d5f3
d5d17c3
650d5f3
d5d17c3
 
 
 
 
 
 
 
 
 
650d5f3
d5d17c3
 
650d5f3
d5d17c3
 
 
 
 
 
650d5f3
d5d17c3
 
 
 
650d5f3
 
 
d5d17c3
650d5f3
d5d17c3
 
5ea7804
d5d17c3
 
 
 
 
 
 
 
650d5f3
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
base_model: meta-llama/Llama-3.3-70B-Instruct

---

# MISHANM/meta-llama-Llama-3.3-70B-Instruct-int4

This model is an INT4 quantized version of the meta-llama/Llama-3.3-70B-Instruct, offering maximum compression for specialized hardware environments, supported languages :  English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.



## Model Details
1. Tasks: Causal Language Modeling, Text Generation
2. Base Model: meta-llama/Llama-3.3-70B-Instruct
3. Quantization Format: INT4



# Device Used

1. GPUs: AMD Instinct™ MI210 Accelerators 
  
   


 ## Inference with HuggingFace
 ```python3
 
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the fine-tuned model and tokenizer
model_path = "MISHANM/meta-llama-Llama-3.3-70B-Instruct-int4"

model = AutoModelForCausalLM.from_pretrained(model_path,device_map="auto")

tokenizer = AutoTokenizer.from_pretrained(model_path)

# Function to generate text
def generate_text(prompt, max_length=1000, temperature=0.9):
    # Format the prompt according to the chat template
    messages = [
        {
            "role": "system",
            "content": "Give response to the user query.", # change as per your requirement.
        },
        {"role": "user", "content": prompt}
    ]

    # Apply the chat template
    formatted_prompt = f"<|system|>{messages[0]['content']}<|user|>{messages[1]['content']}<|assistant|>"

    # Tokenize and generate output
    inputs = tokenizer(formatted_prompt, return_tensors="pt")
    output = model.generate(  # Use model.module for DataParallel
        **inputs, max_new_tokens=max_length, temperature=temperature, do_sample=True
    )
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Example usage
prompt = """Give a poem on LLM ."""
text = generate_text(prompt)
print(text)



```

## Citation Information
```
@misc{MISHANM/meta-llama-Llama-3.3-70B-Instruct-int4,
  author = {Mishan Maurya},
  title = {Introducing INT4 quantized version of meta-llama/Llama-3.3-70B-Instruct},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  
}
```