Programmer-RD-AI's picture
Update README.md
d00b559 verified
metadata
license: cc
language:
  - en
base_model:
  - Qwen/Qwen2.5-3B
tags:
  - qwen2
  - qwen
  - text-generation
  - question-answering
  - research
  - engineering
  - lora
  - 4bit
  - bitsandbytes
  - faiss
  - rag
metrics:
  - type: rougeL
    value: 57.2
  - type: bleu
    value: 42.8
library_name: transformers

🛰️ ResearchQwen 2.5-3B-LoRA

Compact, domain-expert Q&A for systems researchers.
Base model: Qwen/Qwen2.5-3B Tuning recipe: 4-bit QLoRA with bitsandbytes NF4 quantisation Retriever: FAISS cosine-similarity store for ~33 k document chunks


🚀 Quick inference

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "Programmer-RD-AI/ResearchQwen2.5-3B-LoRA"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto",
    load_in_4bit=True,  # uses bitsandbytes
)
qa = pipeline("text-generation", model=model, tokenizer=tok)
print(qa("Explain how Chain Replication with Apportioned Queries improves tail-latency."))

llama.cpp / GGUF

wget https://huggingface.co/Programmer-RD-AI/ResearchQwen2.5-3B-LoRA/resolve/main/model_Q4_K_M.gguf
./main -m model_Q4_K_M.gguf -p "Give the core idea of the 3FS log-structured layout in 3 sentences."

📚 Training data

Source Docs Words
3FS white-paper 14 162 k
CRAQ spec + benchmarks 11 119 k
Distributed AI infra notes 32 287 k
Total 57 568 k

Synthetic Q&A pairs were generated with an instruction template tuned for factual density; unhelpful pairs were filtered via a weak-to-strong scoring cascade (ROUGE-L > 0.4, BLEU > 0.35) ([GitHub][1]).


🛠️ Fine-tuning details

Setting Value
GPU 1× A100 40 GB
Precision 4-bit NF4 w/ double-quant (bnb 0.45.4)
LoRA r/α 64 / 16
LR sched cosine, 5 % warm-up
Steps 1 100
Epochs 3
Peak VRAM 21 GB

📈 Evaluation

Metric Base Qwen2.5-3B This model
ROUGE-L 45.6 57.2
BLEU-4 30.4 42.8

See eval/ for scripts and raw scores (ROUGE, BLEU).


🔗 Integration recipe (RAG)

from langchain.vectorstores import FAISS       # or llama-index
from langchain.embeddings import HuggingFaceEmbeddings

emb = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5")
vs = FAISS.from_texts(texts, emb)

Retriever-generator latency: 330 ms average (GPU), 1.9 s average (CPU, gguf-int4).


💡 Why it should trend

  • Fresh domain niche – deep systems-engineering Q&A is underserved on HF.
  • Ultra-portable – 4-bit LoRA + GGUF = laptop-friendly.
  • Full stack repo – weights, notebook, RAG demo, eval scripts.
  • Eye-catching tagsqwen2, lora, rag, research map directly to popular HF filters and the trending feed ([Hugging Face][4]).
  • Clear usage code – copy-run experience = more downloads.

⚠️ Limitations & responsible use

  • Trained solely on English; non-English queries degrade sharply.
  • Answers may quote or paraphrase the training docs verbatim.
  • Not suitable for critical medical / legal advice.
  • LoRA adapters are GPL-3.0; commercial use must comply with both GPL-3.0 and the Qwen 2.5 base license.

✍️ Citation

@misc{ranuga_disansa_gamage_2025,
    author       = { Ranuga Disansa Gamage and Rivindu Ashinsa and Thuan Naheem and Sanila Wijesekara },
    title        = { ResearchQwen-2.5-3B-LoRA (Revision 7ea9f5f) },
    year         = 2025,
    url          = { https://huggingface.co/Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA },
    doi          = { 10.57967/hf/5623 },
    publisher    = { Hugging Face }
}