--- base_model: meta-llama/Llama-3.3-70B-Instruct --- # MISHANM/meta-llama-Llama-3.3-70B-Instruct-int4 This model is an INT4 quantized version of the meta-llama/Llama-3.3-70B-Instruct, offering maximum compression for specialized hardware environments, supported languages : English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. ## Model Details 1. Tasks: Causal Language Modeling, Text Generation 2. Base Model: meta-llama/Llama-3.3-70B-Instruct 3. Quantization Format: INT4 # Device Used 1. GPUs: AMD Instinctâ„¢ MI210 Accelerators ## Inference with HuggingFace ```python3 import torch from transformers import AutoModelForCausalLM, AutoTokenizer # Load the fine-tuned model and tokenizer model_path = "MISHANM/meta-llama-Llama-3.3-70B-Instruct-int4" model = AutoModelForCausalLM.from_pretrained(model_path,device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_path) # Function to generate text def generate_text(prompt, max_length=1000, temperature=0.9): # Format the prompt according to the chat template messages = [ { "role": "system", "content": "Give response to the user query.", # change as per your requirement. }, {"role": "user", "content": prompt} ] # Apply the chat template formatted_prompt = f"<|system|>{messages[0]['content']}<|user|>{messages[1]['content']}<|assistant|>" # Tokenize and generate output inputs = tokenizer(formatted_prompt, return_tensors="pt") output = model.generate( # Use model.module for DataParallel **inputs, max_new_tokens=max_length, temperature=temperature, do_sample=True ) return tokenizer.decode(output[0], skip_special_tokens=True) # Example usage prompt = """Give a poem on LLM .""" text = generate_text(prompt) print(text) ``` ## Citation Information ``` @misc{MISHANM/meta-llama-Llama-3.3-70B-Instruct-int4, author = {Mishan Maurya}, title = {Introducing INT4 quantized version of meta-llama/Llama-3.3-70B-Instruct}, year = {2024}, publisher = {Hugging Face}, journal = {Hugging Face repository}, } ```