Model Card for LoRI-D_code_llama3_rank_64

This model is part of LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation.

LoRI (LoRA with Reduced Interference) is a simple yet effective approach that freezes the projection matrices $A$ as random projections and sparsifies the matrices $B$ using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance. Moreover, LoRI minimizes cross-task interference in adapter merging by leveraging the orthogonality between adapter subspaces, and supports continual learning by using sparsity to mitigate catastrophic forgetting.

Model Details

Model Description

LoRI-D_code_llama3_rank_64 is an adapter for the meta-llama/Meta-Llama-3-8B base model, fine-tuned using the LoRI (LoRA with Reduced Interference) framework specifically for code generation tasks. LoRI is a parameter-efficient fine-tuning (PEFT) method designed to address overhead and parameter interference in multi-task scenarios when using traditional LoRA. It achieves this by freezing projection matrices A as random projections and sparsifying matrices B with task-specific masks, significantly reducing trainable parameters while maintaining strong performance. This model utilizes a rank of 64 for its LoRA adaptations.

Developed by: Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
Model type: Low-Rank Adaptation (LoRA) adapter for Causal Language Models
Language(s) (NLP): English
License: Apache-2.0
Finetuned from model: meta-llama/Meta-Llama-3-8B

Model Sources

Repository: https://github.com/juzhengz/LoRI/
Paper: https://huggingface.co/papers/2504.07448
Hugging Face Collection: LoRI Adapters

Uses

Direct Use

This model is intended to be loaded as a PEFT adapter on top of the meta-llama/Meta-Llama-3-8B base model to enhance its performance on code generation tasks. It provides an efficient way to fine-tune large language models with significantly fewer trainable parameters.

Downstream Use

LoRI adapters facilitate effective adapter merging and continual learning across various tasks, including natural language understanding, mathematical reasoning, code generation, and safety alignment. This makes them suitable for multi-task learning environments and adaptive model deployments.

Out-of-Scope Use

This model is not intended for generating harmful, biased, or unethical content. Users should exercise caution and implement appropriate safeguards when deploying it in real-world applications, especially in sensitive domains.

Bias, Risks, and Limitations

As an adaptation method built upon pre-trained Large Language Models, LoRI models inherit biases and risks present in their base models (e.g., Meta-Llama-3-8B) and the datasets they were fine-tuned on. Users should be aware of potential issues related to fairness, toxicity, and factual accuracy. Specific limitations include:

Performance might vary depending on the chosen base model and the sparsity level.
While LoRI significantly reduces cross-task interference, perfect isolation of knowledge across tasks is not guaranteed during adapter merging.

Recommendations

Users (both direct and downstream) should refer to the original meta-llama/Meta-Llama-3-8B model card for inherent biases and risks. It is recommended to perform task-specific evaluations and careful validation when deploying models fine-tuned with LoRI in sensitive applications.

How to Get Started with the Model

Pretrained LoRI adapters are available via the Hugging Face collection and can be loaded as follows:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3-8B",
    torch_dtype=torch.bfloat16,
    device_map="auto" # or specify your device, e.g., "cuda"
)

# Load the LoRI adapter
adapter = PeftModel.from_pretrained(base_model, "tomg-group-umd/LoRI-D_code_llama3_rank_64")

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")

# Example for text generation (code generation)
prompt = "def factorial(n):
    if n == 0:
        return 1
    else:
        "
inputs = tokenizer(prompt, return_tensors="pt").to(base_model.device)

# Generate text
with torch.no_grad():
    outputs = adapter.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Training Details

Training Data

LoRI adapters were extensively evaluated and trained on various datasets relevant to specific tasks. For code generation tasks, like this model, the CodeAlpaca dataset was primarily used. Other tasks included:

Mathematical reasoning: GSM8K
Safety alignment: Saferpaca
Natural language understanding: (specific datasets for NLU implied but not detailed in source)

Training Procedure

LoRI is implemented using Fully Sharded Data Parallel (FSDP) and supports multi-GPU training environments. The training process involves two main stages:

LoRI-D (Decomposition): Initial training where projection matrices A are frozen as random projections, and matrices B are learned. This stage also extracts sparse masks.
LoRI-S (Sparsity): Continued training with the learned sparse masks (e.g., 90% sparsity) applied to matrices B, further reducing parameters and promoting orthogonality.

Training Hyperparameters

Adapter ranks: Models were trained with adapter ranks of 32 and 64 (this model uses rank 64).
Sparsity: 90% (for LoRI-S stage).
Base models used: LLaMA-3-8B and Mistral-7B.

Evaluation

Extensive experiments demonstrated that LoRI outperforms full fine-tuning and existing PEFT methods while using up to 95% fewer trainable parameters than standard LoRA. For code generation, performance was evaluated on the HumanEval benchmark. In multi-task experiments, LoRI enabled effective adapter merging and continual learning with reduced cross-task interference. Detailed evaluation results and comparisons can be found in the accompanying paper.

Acknowledgements

This project builds on the codebase of dpo-rlaif and incorporates code from lottery-ticket-adaptation. Code generation performance on HumanEval is evaluated using the bigcode-evaluation-harness.

Citation

If you use LoRI in your work, please cite:

@article{zhang2025lori,
  title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation},
  author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom},
  journal={arXiv preprint arXiv:2504.07448},
  year={2025}
}

Framework versions

PEFT 0.12.0

tomg-group-umd
/

LoRI-D_code_llama3_rank_64