Text Generation
PEFT
Safetensors

phi2-memory-deeptalks

A LoRA adapter for the Phi-2 language model, fine-tuned on short conversational snippets to provide short-term memory in dialogue. This adapter enables your assistant to recall and leverage the last few user/assistant turnsβ€”without full fine-tuning of the 2.7 B-parameter base model.

πŸ”— Live Demo on Hugging Face Spaces

⏳ It takes time to generate responses since it's running on the CPU free tier


πŸš€ Overview

phi2-memory-deeptalks injects lightweight, low-rank corrections into the attention and MLP layers of microsoft/phi-2.

  • Size: ~6 M trainable parameters (β‰ˆ 0.2 % of the base model)
  • Base: Phi-2 (2.7 B parameters)
  • Adapter: Low-Rank Adaptation (LoRA) via the PEFT library

πŸ“¦ Model Details

Architecture & Adapter Configuration

  • Base model: microsoft/phi-2 (causal-LM)
  • LoRA rank (r): 4
  • Modules wrapped:
    • Attention projections: q_proj, k_proj, v_proj, dense
    • MLP layers: fc1, fc2
  • LoRA hyperparameters:
    • lora_alpha: 32
    • lora_dropout: 0.05
    • Trainable params: ~5.9 M

Training Data & Preprocessing

  • Dataset: HyperThink-Mini 50 K (7 % used)
  • Prompt format:
    ### Human:
    <user message>
    
    ### Assistant:
    <assistant response>
    
  • Tokenization: Truncated/padded to 256 tokens, labels = input_ids
  • Optimizer: AdamW (PyTorch), FP16 on GPU
  • Batching: per_device_train_batch_size=1 + gradient_accumulation_steps=8
  • Epochs: 3
  • Checkpointing: Save every 500 steps; final adapter weights in adapter_model.safetensors

🎯 Evaluation

  • Training loss (step 500): ~1.08
  • Validation loss: ~1.10
  • Qualitative:
    • Improved recall of the last 2–4 turns in dialogue
    • Maintains base Phi-2 fluency on general language

πŸ”§ Usage

Load the adapter into your Phi-2 model with just a few lines:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, LoraConfig

# 1) Load base
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")

# 2) Apply LoRA adapter
peft_config = LoraConfig.from_pretrained("sourize/phi2-memory-deeptalks")
model = PeftModel.from_pretrained(model, peft_config)

# 3) (Optional) Resize embeddings
model.base_model.resize_token_embeddings(len(tokenizer))

# 4) Generate
prompt = "### Human:\nHello, how are you?\n\n### Assistant:"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))

βš™οΈ Inference & Deployment

  • Preferred: GPU (NVIDIA-CUDA) for sub-second latency
  • CPU-only: ~7–10 min per response (large model!)
  • Hugging Face Inference API:
    curl -X POST \
      -H "Authorization: Bearer $HF_TOKEN" \
      -H "Content-Type: application/json" \
      https://api-inference.huggingface.co/pipeline/text-generation/sourize/phi2-memory-deeptalks \
      -d '{
        "inputs": "Hello, how are you?",
        "parameters": {
          "max_new_tokens": 64,
          "do_sample": true,
          "temperature": 0.7,
          "top_p": 0.9,
          "return_full_text": false
        }
      }'
    

πŸ’‘ Use Cases & Limitations

  • Ideal for:
    • Short back-and-forth chats (2–4 turns)
    • Chatbots that need to β€œremember” very recent context
  • Not suited for:
    • Long-term memory or document-level retrieval
    • High-volume production on CPU (too slow)

πŸ“– Further Reading


πŸ”– Citation

@misc{sourize_phi2_memory_deeptalks,
  title        = {phi2-memory-lora: LoRA adapter for Phi-2 with short-term conversational memory},
  author       = {Sourish},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/sourize/phi2-memory-deeptalks}},
  license      = {MIT}
}

Questions or feedback? Please open an issue on the repository. ```

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sourize/phi2-memory-deeptalks

Base model

microsoft/phi-2
Adapter
(875)
this model

Dataset used to train sourize/phi2-memory-deeptalks

Space using sourize/phi2-memory-deeptalks 1