Layer Freezing and Transformer-Based Data Curation for Enhanced Transfer Learning in YOLO Architectures

Abstract

The You Only Look Once (YOLO) architecture has revolutionized real-time object detection by performing detection, localization, and classification in a single forward pass. However, balancing detection accuracy with computational efficiency remains a critical challenge, particularly for deployment in resource-constrained environments such as edge devices and UAV-based monitoring systems. This research presents a comprehensive analysis of layer freezing strategies for transfer learning in modern YOLO architectures, systematically investigating how selective parameter freezing affects both performance and computational requirements. We evaluate multiple freezing configurations across YOLOv8 and YOLOv10 variants (nano, small, medium, large) on four challenging datasets representing critical infrastructure monitoring applications: InsPLAD-det, Electric Substation, Common-VALID, and Bird's Nest. Our methodology incorporates gradient behavior analysis through L2 norm monitoring and visual explanations via Gradient-weighted Class Activation Mapping (GradCAM) to provide deeper insights into training dynamics under different freezing strategies. Results demonstrate that strategic layer freezing—particularly freezing the first 4 blocks or the complete backbone—achieves substantial computational savings while maintaining competitive detection accuracy. The optimal configurations reduce GPU memory consumption by up to 28% compared to full fine-tuning, while in several cases achieving superior mAP@50 scores (e.g., our YOLOv10-small with 4-block freezing achieved 0.84 vs 0.81 for fine-tuning on the InsPLAD-det dataset). Gradient analysis reveals distinct convergence patterns across freezing strategies, with backbone-frozen models exhibiting stable learning dynamics while preserving essential feature extraction capabilities. These findings provide actionable guidelines for deploying efficient YOLO models in resource-limited scenarios, demonstrating that selective layer freezing represents a viable alternative to full fine-tuning for transfer learning in object detection tasks.

Table of Contents

Installation

Using pip

  1. Clone the repository:

    git clone https://huggingface.co/AndrzejDD/enhanced-transfer-learning
    cd enhanced-transfer-learning
    
  2. Create a virtual environment (optional but recommended):

    python -m venv enhanced-tl
    source enhanced-tl/bin/activate  # On Windows use `venv\Scripts\activate`
    
  3. Install the required packages:

    pip install -r requirements.txt
    

Using conda

  1. Clone the repository:

    git clone https://huggingface.co/AndrzejDD/enhanced-transfer-learning
    cd enhanced-transfer-learning
    
  2. Create a conda environment from the provided environment file:

    conda env create -f environment.yml
    
  3. Activate the conda environment:

    conda activate enhanced-tl
    

After completing these steps, the required dependencies will be installed, and you can start training your models.

Usage

To display the help message and see all available options, run the following command:

python3 main.py --help

Example Output

When you run the help command, you will see an output like this:

usage: main.py [-h] [--dataset DATASET_NAME] [--epochs EPOCHS] [--batch BATCH] [--imgsz IMGSZ] 
               [--patience PATIENCE] [--cache CACHE] [--pretrained] [--cos_lr] [--profile] [--plots] [--resume]
               [--model MODEL_NAME] [--run RUN_NAME]

options:
  -h, --help            show this help message and exit
  --dataset 		DATASET_NAME
                        Dataset name to be used
  --epochs EPOCHS       Number of epochs for training
  --batch BATCH         Batch size
  --imgsz IMGSZ         Image size for training
  --patience PATIENCE   Early stopping patience
  --cache CACHE         Caching mechanism to use
  --pretrained          Use pretrained weights
  --cos_lr              Use cosine learning rate schedule
  --profile             Enable training profiling
  --plots               Generate training plots
  --resume              Resume training from a checkpoint
  --model		MODEL_NAME
                        Name of the YOLO model to use
  --run 		RUN_NAME
            Name of the run configuration

To run the project, use the following command:

python3 main.py --dataset "Dataset Name" --epochs 1000 --batch 16 --imgsz 640 --patience 30 --model "yolov10s" --run "Finetuning"

Examples

Example 1: Fine-Tuning the YOLOv10 Model

To fine-tune the YOLOv10 small model (yolov10s) with frozen backbone layers, run:

python3 main.py --dataset "Dataset Name" --epochs 1000 --batch 16 --pretrained --plots --model "yolov10s" --run "Finetuning"
span

License

Please note that the license for each specific dataset should be checked from its source. Additionally, ensure to review the licenses for the YOLOv10 and YOLOv8 models as well. The original datasets used in this research are: InsPLAD-det (https://github.com/andreluizbvs/InsPLAD/tree/main), Electric Substation (https://figshare.com/articles/dataset/A_YOLO_Annotated_15-class_Ground_Truth_Dataset_for_Substation_Equipment/24060960), VALID (https://sites.google.com/view/valid-dataset), and Birds Nest (https://zenodo.org/records/4015912#.X1O_0osRVPY).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AndrzejDD/enhanced_transfer_learning

Base model

Ultralytics/YOLOv8
Finetuned
(83)
this model