---
base_model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: apache-2.0
language:
- en
---

# Uploaded  model

- **Developed by:** CRLannister
- **License:** apache-2.0
- **Finetuned from model :** unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

This model builds upon the base Meta-Llama-3.1-8B-Instruct-bnb-4bit and is fine-tuned for text-generation tasks using parameter-efficient techniques such as LoRA (Low-Rank Adaptation) through Hugging Face's TRL library.

Fine-tuning was accelerated with the Unsloth library, enabling faster training and optimization.

# Key Features

**Efficient Fine-Tuning:** LoRA adapters were used, significantly reducing computational costs and memory usage compared to full-model fine-tuning.
**High Performance:** Optimized for text generation and conversational AI tasks.
**Fast Training:** Training achieved a 2x speed-up with Unsloth's optimizations and advanced features like gradient checkpointing.

# How to Use
## Load the Model
To load the fine-tuned model for inference, follow these steps:
```
# Load the base model                                                                                                                                                                                             
max_seq_length = 1024                                                                                                                                                                                             
base_model = "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit"  # Your base model                                                                                                                                              
lora_path = "CRLannister/finetuned_Llama_3_1_8B_Amharic_lora"  # Path to your saved LoRA weights                                                                                                                                                       

# Load model with LoRA weights                                                                                                                                                                                    
model, tokenizer = FastLanguageModel.from_pretrained(                                                                                                                                                             
 model_name=base_model,                                                                                                                                                                                        
 max_seq_length=max_seq_length,                                                                                                                                                                                
 load_in_4bit=True,                                                                                                                                                                                            
 dtype=None,                                                                                                                                                                                                   
)                                                                                                                                                                                                                 

# Load LoRA adapters                                                                                                                                                                                              
model = FastLanguageModel.get_peft_model(                                                                                                                                                                         
 model,                                                                                                                                                                                                        
 r=16,                                                                                                                                                                                                          
 lora_alpha=16,                                                                                                                                                                                                
 lora_dropout=0,                                                                                                                                                                                               
 target_modules=["q_proj", "k_proj", "v_proj", "up_proj", "down_proj", "o_proj", "gate_proj"],                                                                                                                 
 use_rslora=True,                                                                                                                                                                                              
)                                                                                                                                                                                                                 

# Load the trained weights                                                                                                                                                                                        
model.load_adapter(lora_path, "default")                                                                                                                                                                          

# Prepare model for inference                                                                                                                                                                                     
FastLanguageModel.for_inference(model)


def generate_output(instruction, input_, max_length=1024):
    # Format the prompt
    formatted_prompt = alpaca_prompt.format(instruction, input_, '')

    # Tokenize
    inputs = tokenizer(
     [formatted_prompt],
     return_tensors="pt",
     truncation=True,
     max_length=max_length,
     padding=True
    ).to("cuda")

    # Generate
    outputs = model.generate(
     **inputs,
     max_new_tokens=64,  
     use_cache=True,
     temperature=0,    # Lower temperature for more deterministic outputs
     do_sample=False,    # Deterministic generation
     num_beams=1,        # Simple greedy decoding
     pad_token_id=tokenizer.pad_token_id,
     eos_token_id=tokenizer.eos_token_id,
    )

    # Decode and process output
    result = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract the classification from the generated text
    # Remove the input prompt to get only the generated part
    generated_text = result[len(formatted_prompt):].strip()

    return generated_text


generate_output(query['instruction'], query['input'])
```

# Model Details
## Training
Fine-Tuning Method: LoRA (Low-Rank Adaptation)
Optimizer: AdamW 8-bit
Batch Size: 32
Gradient Accumulation Steps: 4
Learning Rate: 2e-4
Sequence Length: 2048 tokens

# Frameworks Used:
Unsloth for training optimizations
Transformers
TRL

# Hardware Requirements
This model was trained on GPUs with 4-bit quantization (bnb-4bit) to optimize memory usage. It is suitable for inference on GPUs with at least 16 GB of VRAM.

# Results
The model was fine-tuned on conversational and text generation tasks, demonstrating high fluency and coherence. This makes it ideal for applications like:

Chatbots
Summarization
Question Answering
Text Completion

# Contributing
Contributions to this model are welcome! Feel free to open issues or submit pull requests on the Hugging Face repository.

# Acknowledgments
Special thanks to the Unsloth team for making fine-tuning faster and more accessible.
The base model was developed by Meta and enhanced by the Unsloth community.