File size: 4,456 Bytes
2a12503 3450eea 2a1536f 3450eea 2a12503 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
---
base_model: Qwen/Qwen2.5-7B-Instruct
library_name: peft
pipeline_tag: text-generation
language: en
license: apache-2.0
tags:
- lora
- sft
- transformers
- trl
- unsloth
- fine-tuned
datasets:
- theprint/Tom-4.2k-alpaca
---
# Tom-Qwen-7B-Instruct
A fine-tuned 7B parameter model specialized for step-by-step instruction and conversation.
## Model Details
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct using the Unsloth framework with LoRA (Low-Rank Adaptation) for efficient training.
- **Developed by:** theprint
- **Model type:** Causal Language Model (Fine-tuned with LoRA)
- **Language:** en
- **License:** apache-2.0
- **Base model:** Qwen/Qwen2.5-7B-Instruct
- **Fine-tuning method:** LoRA with rank 128
# GGUF Quantized Versions
You can find quantized gguf versions of this model in the [/gguf-folder](https://huggingface.co/theprint/Tom-Qwen-7B-Instruct/tree/main/gguf).
Quantized GGUF versions are in the `gguf/` directory for use with llama.cpp:
- `Tom-Qwen-7B-Instruct-f16.gguf` (14531.9 MB) - 16-bit float (original precision, largest file)
- `Tom-Qwen-7B-Instruct-q3_k_m.gguf` (3632.0 MB) - 3-bit quantization (medium quality)
- `Tom-Qwen-7B-Instruct-q4_k_m.gguf` (4466.1 MB) - 4-bit quantization (medium, recommended for most use cases)
- `Tom-Qwen-7B-Instruct-q5_k_m.gguf` (5192.6 MB) - 5-bit quantization (medium, good quality)
- `Tom-Qwen-7B-Instruct-q6_k.gguf` (5964.5 MB) - 6-bit quantization (high quality)
- `Tom-Qwen-7B-Instruct-q8_0.gguf` (7723.4 MB) - 8-bit quantization (very high quality)
## Intended Use
Conversation, brainstorming, and general instruction following
## Training Details
### Training Data
Synthesized data set created specifically for this, focused on practical tips and well being.
- **Dataset:** theprint/Tom-4.2k-alpaca
- **Format:** alpaca
### Training Procedure
- **Training epochs:** 3
- **LoRA rank:** 128
- **Learning rate:** 0.0002
- **Batch size:** 4
- **Framework:** Unsloth + transformers + PEFT
- **Hardware:** NVIDIA RTX 5090
## Usage
```python
from unsloth import FastLanguageModel
import torch
# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="theprint/Tom-Qwen-7B-Instruct",
max_seq_length=4096,
dtype=None,
load_in_4bit=True,
)
# Enable inference mode
FastLanguageModel.for_inference(model)
# Example usage
inputs = tokenizer(["Your prompt here"], return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
### Alternative Usage (Standard Transformers)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"theprint/Tom-Qwen-7B-Instruct",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("theprint/Tom-Qwen-7B-Instruct")
# Example usage
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Your question here"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
print(response)
```
### Using with llama.cpp
```bash
# Download a quantized version (q4_k_m recommended for most use cases)
wget https://huggingface.co/theprint/Tom-Qwen-7B-Instruct/resolve/main/gguf/Tom-Qwen-7B-Instruct-q4_k_m.gguf
# Run with llama.cpp
./llama.cpp/main -m Tom-Qwen-7B-Instruct-q4_k_m.gguf -p "Your prompt here" -n 256
```
## Limitations
May hallucinate or provide incorrect information. Not suitable for critical decision making.
## Citation
If you use this model, please cite:
```bibtex
@misc{tom_qwen_7b_instruct,
title={Tom-Qwen-7B-Instruct: Fine-tuned Qwen/Qwen2.5-7B-Instruct},
author={theprint},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/theprint/Tom-Qwen-7B-Instruct}
}
```
## Acknowledgments
- Base model: [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
- Training dataset: [theprint/Tom-4.2k-alpaca](https://huggingface.co/datasets/theprint/Tom-4.2k-alpaca)
- Fine-tuning framework: [Unsloth](https://github.com/unslothai/unsloth)
- Quantization: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|