CRLannister commited on
Commit
d6fc9ae
·
verified ·
1 Parent(s): 95f1ebb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md CHANGED
@@ -20,3 +20,117 @@ language:
20
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
23
+
24
+ This model builds upon the base Meta-Llama-3.1-8B-Instruct-bnb-4bit and is fine-tuned for text-generation tasks using parameter-efficient techniques such as LoRA (Low-Rank Adaptation) through Hugging Face's TRL library.
25
+
26
+ Fine-tuning was accelerated with the Unsloth library, enabling faster training and optimization.
27
+
28
+ # Key Features
29
+
30
+ **Efficient Fine-Tuning:** LoRA adapters were used, significantly reducing computational costs and memory usage compared to full-model fine-tuning.
31
+ **High Performance:** Optimized for text generation and conversational AI tasks.
32
+ **Fast Training:** Training achieved a 2x speed-up with Unsloth's optimizations and advanced features like gradient checkpointing.
33
+
34
+ # How to Use
35
+ ## Load the Model
36
+ To load the fine-tuned model for inference, follow these steps:
37
+ ```
38
+ # Load the base model
39
+ max_seq_length = 1024
40
+ base_model = "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit" # Your base model
41
+ lora_path = "CRLannister/finetuned_Llama_3_1_8B_Amharic_lora" # Path to your saved LoRA weights
42
+
43
+ # Load model with LoRA weights
44
+ model, tokenizer = FastLanguageModel.from_pretrained(
45
+ model_name=base_model,
46
+ max_seq_length=max_seq_length,
47
+ load_in_4bit=True,
48
+ dtype=None,
49
+ )
50
+
51
+ # Load LoRA adapters
52
+ model = FastLanguageModel.get_peft_model(
53
+ model,
54
+ r=16,
55
+ lora_alpha=16,
56
+ lora_dropout=0,
57
+ target_modules=["q_proj", "k_proj", "v_proj", "up_proj", "down_proj", "o_proj", "gate_proj"],
58
+ use_rslora=True,
59
+ )
60
+
61
+ # Load the trained weights
62
+ model.load_adapter(lora_path, "default")
63
+
64
+ # Prepare model for inference
65
+ FastLanguageModel.for_inference(model)
66
+
67
+
68
+ def generate_output(instruction, input_, max_length=1024):
69
+ # Format the prompt
70
+ formatted_prompt = alpaca_prompt.format(instruction, input_, '')
71
+
72
+ # Tokenize
73
+ inputs = tokenizer(
74
+ [formatted_prompt],
75
+ return_tensors="pt",
76
+ truncation=True,
77
+ max_length=max_length,
78
+ padding=True
79
+ ).to("cuda")
80
+
81
+ # Generate
82
+ outputs = model.generate(
83
+ **inputs,
84
+ max_new_tokens=64,
85
+ use_cache=True,
86
+ temperature=0, # Lower temperature for more deterministic outputs
87
+ do_sample=False, # Deterministic generation
88
+ num_beams=1, # Simple greedy decoding
89
+ pad_token_id=tokenizer.pad_token_id,
90
+ eos_token_id=tokenizer.eos_token_id,
91
+ )
92
+
93
+ # Decode and process output
94
+ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
95
+
96
+ # Extract the classification from the generated text
97
+ # Remove the input prompt to get only the generated part
98
+ generated_text = result[len(formatted_prompt):].strip()
99
+
100
+ return generated_text
101
+
102
+
103
+ generate_output(query['instruction'], query['input'])
104
+ ```
105
+
106
+ # Model Details
107
+ ## Training
108
+ Fine-Tuning Method: LoRA (Low-Rank Adaptation)
109
+ Optimizer: AdamW 8-bit
110
+ Batch Size: 32
111
+ Gradient Accumulation Steps: 4
112
+ Learning Rate: 2e-4
113
+ Sequence Length: 2048 tokens
114
+
115
+ # Frameworks Used:
116
+ Unsloth for training optimizations
117
+ Transformers
118
+ TRL
119
+
120
+ # Hardware Requirements
121
+ This model was trained on GPUs with 4-bit quantization (bnb-4bit) to optimize memory usage. It is suitable for inference on GPUs with at least 16 GB of VRAM.
122
+
123
+ # Results
124
+ The model was fine-tuned on conversational and text generation tasks, demonstrating high fluency and coherence. This makes it ideal for applications like:
125
+
126
+ Chatbots
127
+ Summarization
128
+ Question Answering
129
+ Text Completion
130
+
131
+ # Contributing
132
+ Contributions to this model are welcome! Feel free to open issues or submit pull requests on the Hugging Face repository.
133
+
134
+ # Acknowledgments
135
+ Special thanks to the Unsloth team for making fine-tuning faster and more accessible.
136
+ The base model was developed by Meta and enhanced by the Unsloth community.