CRLannister
/

finetuned_Llama_3_1_8B_Amharic_lora

@@ -20,3 +20,117 @@ language:
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
+This model builds upon the base Meta-Llama-3.1-8B-Instruct-bnb-4bit and is fine-tuned for text-generation tasks using parameter-efficient techniques such as LoRA (Low-Rank Adaptation) through Hugging Face's TRL library.
+Fine-tuning was accelerated with the Unsloth library, enabling faster training and optimization.
+# Key Features
+**Efficient Fine-Tuning:** LoRA adapters were used, significantly reducing computational costs and memory usage compared to full-model fine-tuning.
+**High Performance:** Optimized for text generation and conversational AI tasks.
+**Fast Training:** Training achieved a 2x speed-up with Unsloth's optimizations and advanced features like gradient checkpointing.
+# How to Use
+## Load the Model
+To load the fine-tuned model for inference, follow these steps:
+```
+# Load the base model
+max_seq_length = 1024
+base_model = "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit"  # Your base model
+lora_path = "CRLannister/finetuned_Llama_3_1_8B_Amharic_lora"  # Path to your saved LoRA weights
+# Load model with LoRA weights
+model, tokenizer = FastLanguageModel.from_pretrained(
+ model_name=base_model,
+ max_seq_length=max_seq_length,
+ load_in_4bit=True,
+ dtype=None,
+)
+# Load LoRA adapters
+model = FastLanguageModel.get_peft_model(
+ model,
+ r=16,
+ lora_alpha=16,
+ lora_dropout=0,
+ target_modules=["q_proj", "k_proj", "v_proj", "up_proj", "down_proj", "o_proj", "gate_proj"],
+ use_rslora=True,
+)
+# Load the trained weights
+model.load_adapter(lora_path, "default")
+# Prepare model for inference
+FastLanguageModel.for_inference(model)
+def generate_output(instruction, input_, max_length=1024):
+    # Format the prompt
+    formatted_prompt = alpaca_prompt.format(instruction, input_, '')
+    # Tokenize
+    inputs = tokenizer(
+     [formatted_prompt],
+     return_tensors="pt",
+     truncation=True,
+     max_length=max_length,
+     padding=True
+    ).to("cuda")
+    # Generate
+    outputs = model.generate(
+     **inputs,
+     max_new_tokens=64,
+     use_cache=True,
+     temperature=0,    # Lower temperature for more deterministic outputs
+     do_sample=False,    # Deterministic generation
+     num_beams=1,        # Simple greedy decoding
+     pad_token_id=tokenizer.pad_token_id,
+     eos_token_id=tokenizer.eos_token_id,
+    )
+    # Decode and process output
+    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
+    # Extract the classification from the generated text
+    # Remove the input prompt to get only the generated part
+    generated_text = result[len(formatted_prompt):].strip()
+    return generated_text
+generate_output(query['instruction'], query['input'])
+```
+# Model Details
+## Training
+Fine-Tuning Method: LoRA (Low-Rank Adaptation)
+Optimizer: AdamW 8-bit
+Batch Size: 32
+Gradient Accumulation Steps: 4
+Learning Rate: 2e-4
+Sequence Length: 2048 tokens
+# Frameworks Used:
+Unsloth for training optimizations
+Transformers
+TRL
+# Hardware Requirements
+This model was trained on GPUs with 4-bit quantization (bnb-4bit) to optimize memory usage. It is suitable for inference on GPUs with at least 16 GB of VRAM.
+# Results
+The model was fine-tuned on conversational and text generation tasks, demonstrating high fluency and coherence. This makes it ideal for applications like:
+Chatbots
+Summarization
+Question Answering
+Text Completion
+# Contributing
+Contributions to this model are welcome! Feel free to open issues or submit pull requests on the Hugging Face repository.
+# Acknowledgments
+Special thanks to the Unsloth team for making fine-tuning faster and more accessible.
+The base model was developed by Meta and enhanced by the Unsloth community.