Model Details
LoRA finetuned checkpoint of a meta-llama/Llama-3.2-3B-Instruct base model. This model can be loaded on an M3 Macbook Air with 16GB unified memory.
Model Description
This model assists users with searching for research papers. It assists in creating a query that is compatible with a search API. The model is finetuned to output structured markdown corresponding to the user query. This makes it possible to parse the output and construct a query for a search API.
Model Sources
- Repository: https://github.com/shaikh58/llm-paper-retriever
- Developed by: Mustafa Shaikh
- Language(s) (NLP): English
- License: MIT
- Finetuned from model: meta-llama/Llama-3.2-3B-Instruct
Uses
This model is intended to be used with the MCP server released in the repository linked above. It is complete with search functionality and is integrated with Cursor.
How to Get Started with the Model
If you wish to use the model directly, rather than through Cursor, you can use the code below to load it.
from transformers import AutoModelForCausalLM
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-3B-Instruct",
trust_remote_code=True,
device_map="auto"
)
model = PeftModel.from_pretrained(
base_model,
"Shaikh58/llama-3.2-3b-instruct-lora-arxiv-query"
)
Training Details
Training Data
Input query | Label |
---|---|
"Find recent papers on transformer architectures in NLP published since 2023 with at least 100 citations" | "## QUERY PARAMETERS\n\n- **Topic**: NLP\n\n## CONSTRAINTS\n\n- **Citations**: (>=, 100)\n- **Keyword**: transformers\n- **Year**: (>=, 2023)\n\n## OPTIONS\n\n- **Limit**: 10\n- **Sort By**: relevance\n- **Sort Order**: descending" |
During training, the input query is also augmented with a system prompt (not shown) to guide the model to output structured markdown.
Training Procedure
LoRA finetuned on 50,000 synthetically generated training data points.
Training Hyperparameters
- Training regime:
- fp16 mixed precision
- LoRA: r = 16, alpha = 32, dropout = 0.05
Evaluation
Testing Data, Factors & Metrics
Testing Data
Same format as training data.
Metrics
The model was evaluated with the rouge metric. This is because the expected output is known in advance.
Results
Several versions of the model were evaluated, each with a different number of trianing samples used during fine tuning. The plots show that finetuning with as low as 1000 samples leads to a major improvement in model performance. Empirically, we see that the model trained on 50,000 samples performs better in production, even though the rouge score is similar to models trained on less data. This is because the rouge score does not penalize minor differences to the expected output. However, minor differences can lead to very different parsing of the output and query result.
Model tree for Shaikh58/llama-3.2-3b-instruct-lora-arxiv-query
Base model
meta-llama/Llama-3.2-3B-Instruct