Model Card for LoRI-S_code_llama3_rank_64

This model is part of LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation.

LoRI (LoRA with Reduced Interference) is a simple yet effective parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs). It addresses common issues like notable overhead and parameter interference in multi-task scenarios by freezing the projection matrices A as random projections and sparsifying the matrices B using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance, minimizing cross-task interference in adapter merging, and supporting continual learning by mitigating catastrophic forgetting.

Model Details

Model Description

LoRI-S_code_llama3_rank_64 is a specific LoRI-S (Sparse) adapter trained for code generation tasks. It is built upon the meta-llama/Meta-Llama-3-8B base model with an adapter rank of 64. The LoRI approach has been demonstrated to outperform full fine-tuning and existing PEFT methods, using up to 95% fewer trainable parameters than standard LoRA. This model is part of a broader set of LoRI adapters that cover natural language understanding, mathematical reasoning, code generation, and safety alignment tasks.

Developed by: Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
Model type: Low-Rank Adaptation (LoRA) variant (LoRI-S), Parameter-Efficient Fine-Tuning (PEFT) adapter for Causal Language Models.
Language(s) (NLP): English
License: Apache-2.0
Finetuned from model: meta-llama/Meta-Llama-3-8B

Model Sources

Repository: https://github.com/juzhengz/LoRI
Paper: https://huggingface.co/papers/2504.07448
Project Page: https://juzhengz.github.io/
Hugging Face Collection: https://huggingface.co/collections/tomg-group-umd/lori-adapters-67f795549d792613e1290011

Uses

Direct Use

This model is intended to be used as a PEFT adapter to efficiently fine-tune or enhance the meta-llama/Meta-Llama-3-8B base model specifically for code generation tasks. It should be loaded using the Hugging Face PEFT library on top of the base LLM.

Downstream Use

LoRI adapters are particularly designed for multi-task scenarios and continual learning, where they enable effective adapter merging and reduce cross-task interference. This model can be combined with other LoRI adapters for different tasks to build more robust multi-task systems.

Out-of-Scope Use

This model is not intended for standalone use; it strictly requires the meta-llama/Meta-Llama-3-8B as its base model. Like all large language models, it may generate biased, harmful, or factually incorrect content, and should not be used in critical applications without thorough evaluation and additional safeguards.

Bias, Risks, and Limitations

While LoRI aims to reduce interference and parameter overhead, the model may still inherit biases present in its pre-training or fine-tuning data (e.g., CodeAlpaca, Meta-Llama-3-8B's pre-training data). Potential risks and limitations include:

Generalization: Performance may degrade on code generation tasks significantly different from its training distribution.
Factual Accuracy: Generated code or comments may not always be logically sound or factually correct.
Safety: The model may generate insecure or malicious code, or outputs that perpetuate stereotypes or harmful content if not properly constrained.

Recommendations

Users (both direct and downstream) should be aware of these potential issues and implement appropriate validation and filtering mechanisms for the model's outputs. It is recommended to apply responsible AI practices and conduct task-specific evaluations.

How to Get Started with the Model

Use the code below to get started with the model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# 1. Load the base model
base_model_name = "meta-llama/Meta-Llama-3-8B"
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.bfloat16, # Llama 3 models often use bfloat16
    device_map="auto",          # Load model onto available devices (GPU if available)
    low_cpu_mem_usage=True      # Optimize CPU memory usage
)

# 2. Load the LoRI adapter
# Replace "tomg-group-umd/LoRI-S_code_llama3_rank_64" with the correct model ID if different
adapter_model_id = "tomg-group-umd/LoRI-S_code_llama3_rank_64"
adapter_model = PeftModel.from_pretrained(base_model, adapter_model_id)

# 3. Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# Set pad_token if not already set, crucial for batching/generation
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token # Or another appropriate token

# 4. Set the model to evaluation mode
adapter_model.eval()

# 5. Prepare your input prompt for code generation
prompt = '''
def bubble_sort(arr):
    n = len(arr)
    for i in range(n - 1):
        for j in range(0, n - i - 1):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
    return arr

# Write a docstring for the function above, describing its purpose and parameters.
'''

# Encode the prompt and move to the model's device
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(adapter_model.device)

# 6. Generate output
with torch.no_grad():
    output_ids = adapter_model.generate(
        input_ids,
        max_new_tokens=100,
        do_sample=True,          # Sample outputs
        temperature=0.01,        # Low temperature for less randomness, more deterministic code
        top_p=0.95,              # Nucleus sampling
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )

# Decode and print the generated text
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)

# Optional: Merge adapter weights into the base model for easier deployment
# merged_model = adapter_model.merge_and_unload()
# merged_model.save_pretrained("path/to/merged-lori-model")

Training Details

Training Data

This LoRI-S_code_llama3_rank_64 adapter was specifically fine-tuned on the CodeAlpaca dataset for code generation tasks. The LoRI paper also describes experiments on:

Natural Language Understanding (NLU): GLUE benchmark
Mathematical Reasoning: GSM8K dataset
Safety Alignment: Saferpaca dataset

Training Procedure

LoRI training typically involves a two-stage process, implemented using Fully Sharded Data Parallel (FSDP) for efficient multi-GPU training:

LoRI-D (Dense) Training: An initial phase where the projection matrices A are frozen as random projections, and the B matrices are trained densely.
Mask Extraction: After LoRI-D training, sparse masks are extracted from the learned B matrices. For LoRI-S models, a high sparsity level (e.g., 90%) is typically applied.
LoRI-S (Sparse) Training: The model continues training using these extracted sparse masks. This particular model, LoRI-S_code_llama3_rank_64, is the result of this sparsified training phase.

Training Hyperparameters

Base Model: meta-llama/Meta-Llama-3-8B
Adapter Rank (r): 64
LoRA Alpha (lora_alpha): 128
LoRA Dropout (lora_dropout): 0.05
Sparsity (for LoRI-S phase): 90%
Training Regime: Mixed precision (bf16 for Llama 3 models)

Evaluation

LoRI models have been extensively evaluated across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks. Experiments demonstrate that LoRI outperforms full fine-tuning and existing PEFT methods, while using significantly fewer trainable parameters (up to 95% less than LoRA). In multi-task settings, LoRI enables effective adapter merging and continual learning with reduced cross-task interference.

Results

For detailed quantitative results, specific metrics (e.g., HumanEval for code generation, SuperGLUE for NLU, GSM8K for math), and comprehensive comparisons against baselines, please refer to the official paper.

Technical Specifications

Model Architecture and Objective

LoRI introduces a modification to the standard LoRA architecture where the projection matrices A are fixed as random projections, and the matrices B are sparsified using task-specific masks. This design is aimed at reducing cross-task interference in multi-task learning and mitigating catastrophic forgetting in continual learning scenarios.

Compute Infrastructure

Software

PEFT 0.12.0
Transformers (compatible with versions supporting Llama 3 and PEFT)

Citation

If you use LoRI in your work, please cite:

@article{zhang2025lori,
  title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation},
  author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom},
  journal={arXiv preprint arXiv:2504.07448},
  year={2025}
}

Model Card Authors

Niels Rogge (Hugging Face Community Science Team)

Model Card Contact

[email protected]

Downloads last month: 7

Model tree for tomg-group-umd/LoRI-S_code_llama3_rank_64

Base model

meta-llama/Meta-Llama-3-8B

Adapter

(652)

this model

Collection including tomg-group-umd/LoRI-S_code_llama3_rank_64

LoRI Adapters

Collection

LoRI adapters for natural language understanding, code generation, mathematical reasoning, and safety alignment, based on LLaMA-3-8B and Mistral-7B. • 39 items • Updated May 21 • 4