PEFT
Safetensors
English
A newer version of this model is available: Qwen/Qwen2.5-7B-Instruct

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

FlowerTune LLM on General NLP/Medical/Finance

This directory conducts federated instruction tuning with pretrained language models on the Flowertune-llm benchmark, including a general NLP dataset vicgalle/alpaca-gpt4, a Medical dataset, and a Finance dataset. We use Flower Datasets to download, partition, and preprocess the dataset. Flower's Simulation Engine is used to simulate the LLM fine-tuning process in a federated way, which allows users to perform the training on a single GPU.

Important Links

Experimental Setup

The dataset is divided into 20 partitions in an IID fashion, a partition is assigned to each ClientApp. We randomly sample a fraction (0.1) of the total nodes to participate in each round, for a total of 10 rounds. All settings are defined in pyproject.toml.

Methodology

This baseline performs federated LLM fine-tuning with LoRA using the 🤗PEFT library. The clients' models are aggregated with FedAvg strategy. This provides a baseline performance for the benchmark.

Example: Qwen2.5-7B-Instruct

For example, with the Qwen/Qwen2.5-7B-Instruct model we adopted the following fine-tuning methodology:

  • Precision: bf16 for model weights.
  • Quantization: 4-bit quantization for reduced memory usage.
  • LoRA Configuration:
    • Rank (r): 32
    • Alpha: 64
  • Training Configuration:
    • Batch size: 8
    • Maximum number of steps: 10
    • Total number of rounds: 10
    • Fraction fit per round: 0.1
  • Learning Rate Scheduler:
    • Maximum LR: 5e-5
    • Minimum LR: 1e-6
    • Constant learning rate scheduler over steps
  • Strategy: FedAvg

Environment and Execution

Environment Setup

Project dependencies are defined in pyproject.toml. Install them in an activated Python environment with:

python -m pip install --upgrade pip wheel setuptools packaging

pip install -e .

Running the Training and Evaluation

We use a wrapper script run_all_experiments.sh to handle both training and evaluation processes:

# Example of running experiments
./run_all_experiments.sh --model Qwen/Qwen2.5-7B-Instruct --task general_nlp

The wrapper script sets up the proper environment, including:

  • Activating the conda environment
  • Setting up proxy configurations if needed
  • Executing the main experiment runner script with the provided parameters

The actual experiment workflow is implemented in run_experiments.py, which is called by the wrapper script.

Model Saving

The global PEFT model checkpoints are saved every 5 rounds after aggregation on the server side as default, which can be specified with train.save-every-round under [tool.flwr.app.config] entry in pyproject.toml.

Evaluation Results

Please find the checkpoint link by clicking each model's name.

General NLP

The evaluation was conducted on the MMLU (Massive Multitask Language Understanding) benchmark, which tests knowledge across various domains:

Model STEM Social Sciences Humanities Average Comm. Costs
Qwen/Qwen2.5-7B-Instruct 52.52% 79.27% 60.32% 64.04% 1.50 GB
Qwen/Qwen2.5-1.5B-Instruct 47.13% 62.30% 50.54% 53.32% 0.65 GB
mistralai/Mistral-7B-Instruct-v0.3 29.94% 54.27% 44.93% 43.05% 2.03 GB
meta-llama/Llama-3.1-8B-Instruct 22.87% 39.55% 32.05% 31.49% 2.03 GB
TinyLlama/TinyLlama-1.1B-Chat-v1.0 14.18% 21.61% 21.91% 19.23% 0.67 GB
meta-llama/Llama-3.2-1B-Instruct 12.88% 17.61% 6.16% 12.22% 0.51 GB
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B 0.54% 0.00% 0.04% 0.19% 0.65 GB

Medical

Model PubMedQA MedMCQA MedQA CareQA Average Comm. Costs
meta-llama/Llama-3.1-8B-Instruct 59.94% 69.40% 59.94% 57.74% 61.75% 2.03 GB
Qwen/Qwen2.5-7B-Instruct 65.15% 56.80% 65.15% 55.46% 60.64% 1.50 GB
mistralai/Mistral-7B-Instruct-v0.3 54.40% 55.20% 54.40% 49.80% 53.45% 2.03 GB

Finance

Model FPB FIQA TFNS Average Comm. Costs
mistralai/Mistral-7B-Instruct-v0.3 58.63% 67.11% 58.63% 61.45% 2.03 GB

Hardware Details

For the experiments, I utilized a GPU-enabled virtual machine.

Component Specification
GPU 1 × GPU with 16+ GB
vCPUs 6
Memory 16+ GB
Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zjudai/FlowerTune

Base model

Qwen/Qwen2.5-7B
Adapter
(420)
this model

Dataset used to train zjudai/FlowerTune

Collection including zjudai/FlowerTune