|
--- |
|
base_model: Qwen/Qwen2.5-7B-Instruct |
|
library_name: peft |
|
pipeline_tag: text-generation |
|
language: en |
|
license: apache-2.0 |
|
tags: |
|
- lora |
|
- sft |
|
- transformers |
|
- trl |
|
- unsloth |
|
- fine-tuned |
|
datasets: |
|
- theprint/Tom-4.2k-alpaca |
|
--- |
|
# Tom-Qwen-7B-Instruct |
|
|
|
A fine-tuned 7B parameter model specialized for step-by-step instruction and conversation. |
|
|
|
## Model Details |
|
|
|
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct using the Unsloth framework with LoRA (Low-Rank Adaptation) for efficient training. |
|
|
|
- **Developed by:** theprint |
|
- **Model type:** Causal Language Model (Fine-tuned with LoRA) |
|
- **Language:** en |
|
- **License:** apache-2.0 |
|
- **Base model:** Qwen/Qwen2.5-7B-Instruct |
|
- **Fine-tuning method:** LoRA with rank 128 |
|
|
|
# GGUF Quantized Versions |
|
|
|
You can find quantized gguf versions of this model in the [/gguf-folder](https://huggingface.co/theprint/Tom-Qwen-7B-Instruct/tree/main/gguf). |
|
|
|
Quantized GGUF versions are in the `gguf/` directory for use with llama.cpp: |
|
|
|
- `Tom-Qwen-7B-Instruct-f16.gguf` (14531.9 MB) - 16-bit float (original precision, largest file) |
|
- `Tom-Qwen-7B-Instruct-q3_k_m.gguf` (3632.0 MB) - 3-bit quantization (medium quality) |
|
- `Tom-Qwen-7B-Instruct-q4_k_m.gguf` (4466.1 MB) - 4-bit quantization (medium, recommended for most use cases) |
|
- `Tom-Qwen-7B-Instruct-q5_k_m.gguf` (5192.6 MB) - 5-bit quantization (medium, good quality) |
|
- `Tom-Qwen-7B-Instruct-q6_k.gguf` (5964.5 MB) - 6-bit quantization (high quality) |
|
- `Tom-Qwen-7B-Instruct-q8_0.gguf` (7723.4 MB) - 8-bit quantization (very high quality) |
|
|
|
|
|
## Intended Use |
|
|
|
Conversation, brainstorming, and general instruction following |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
Synthesized data set created specifically for this, focused on practical tips and well being. |
|
|
|
- **Dataset:** theprint/Tom-4.2k-alpaca |
|
- **Format:** alpaca |
|
|
|
### Training Procedure |
|
|
|
- **Training epochs:** 3 |
|
- **LoRA rank:** 128 |
|
- **Learning rate:** 0.0002 |
|
- **Batch size:** 4 |
|
- **Framework:** Unsloth + transformers + PEFT |
|
- **Hardware:** NVIDIA RTX 5090 |
|
|
|
## Usage |
|
|
|
```python |
|
from unsloth import FastLanguageModel |
|
import torch |
|
|
|
# Load model and tokenizer |
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name="theprint/Tom-Qwen-7B-Instruct", |
|
max_seq_length=4096, |
|
dtype=None, |
|
load_in_4bit=True, |
|
) |
|
|
|
# Enable inference mode |
|
FastLanguageModel.for_inference(model) |
|
|
|
# Example usage |
|
inputs = tokenizer(["Your prompt here"], return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7) |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
### Alternative Usage (Standard Transformers) |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
"theprint/Tom-Qwen-7B-Instruct", |
|
torch_dtype=torch.float16, |
|
device_map="auto" |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained("theprint/Tom-Qwen-7B-Instruct") |
|
|
|
# Example usage |
|
messages = [ |
|
{"role": "system", "content": "You are a helpful assistant."}, |
|
{"role": "user", "content": "Your question here"} |
|
] |
|
|
|
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True) |
|
outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, do_sample=True) |
|
response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
### Using with llama.cpp |
|
|
|
```bash |
|
# Download a quantized version (q4_k_m recommended for most use cases) |
|
wget https://huggingface.co/theprint/Tom-Qwen-7B-Instruct/resolve/main/gguf/Tom-Qwen-7B-Instruct-q4_k_m.gguf |
|
|
|
# Run with llama.cpp |
|
./llama.cpp/main -m Tom-Qwen-7B-Instruct-q4_k_m.gguf -p "Your prompt here" -n 256 |
|
``` |
|
## Limitations |
|
|
|
May hallucinate or provide incorrect information. Not suitable for critical decision making. |
|
|
|
## Citation |
|
|
|
If you use this model, please cite: |
|
|
|
```bibtex |
|
@misc{tom_qwen_7b_instruct, |
|
title={Tom-Qwen-7B-Instruct: Fine-tuned Qwen/Qwen2.5-7B-Instruct}, |
|
author={theprint}, |
|
year={2025}, |
|
publisher={Hugging Face}, |
|
url={https://huggingface.co/theprint/Tom-Qwen-7B-Instruct} |
|
} |
|
``` |
|
|
|
## Acknowledgments |
|
|
|
- Base model: [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
|
- Training dataset: [theprint/Tom-4.2k-alpaca](https://huggingface.co/datasets/theprint/Tom-4.2k-alpaca) |
|
- Fine-tuning framework: [Unsloth](https://github.com/unslothai/unsloth) |
|
- Quantization: [llama.cpp](https://github.com/ggerganov/llama.cpp) |
|
|