Model Card: NER Active Learning Models

Overview

This repository contains a series of NER models trained using an active learning framework. Our approach leverages a low-quality (cheap) dataset combined with high-quality expert annotations to iteratively improve entity recognition performance. The core idea is to begin with a model trained solely on the cheap dataset (model_llm_pure) and then incrementally fine-tune it by selecting the most uncertain expert examples based on an uncertainty estimation module.

Our baseline model, model_llm_pure, achieves limited performance, while the model model_init_12, fine-tuned on the cheap dataset plus an additional 12% of expert examples, demonstrates a significant improvement. The active learning loop further refines the model by iteratively adding the most informative examples and saving each intermediate model in a dedicated branch.

1. Entity-Level Evaluation Module

This module provides an improved metric to evaluate model performance at the entity level, which is crucial for NER. A correct prediction requires that the entire entity (with proper boundaries and correct labels) is recognized correctly.

Key Steps:

  1. Prediction Collection:
    The evaluation function processes each batch from the evaluation DataLoader and, for each sentence, collects predicted and true labels in a list-of-lists format.

  2. Metric Calculation:
    Using the seqeval library, we compute:

    • Seqeval Accuracy: Overall accuracy at the entity level.
    • F1-Score: The harmonic mean of precision and recall computed over complete entities.
    • Classification Report: Detailed precision, recall, and F1-score for each entity type.

2. Uncertainty Estimation Module

This module estimates the uncertainty of each sentence by computing the average entropy of its tokens. A high average entropy indicates that the model is less confident in its predictions for that sentence.

Process:

  1. Pass each sentence (example) through the model in evaluation mode (with gradients disabled).

  2. Retrieve logits and apply softmax to obtain a probability distribution over labels for each token.

  3. Compute the entropy for each valid token (i.e., where ner_tag_mask == 1):

    $$ H(token) = - \sum_{y} P(y\mid token) \log P(y\mid token) $$

  4. The average entropy across valid tokens serves as the sentence’s uncertainty measure.

3. Preliminary Threshold Experiment with K-Fold Cross-Validation

Before initiating the active learning loop, we run a preliminary experiment using k-fold (5-fold) cross-validation on the expert dataset. This experiment determines the minimal volume of expert examples that yield a significant improvement over the baseline model.

Procedure:

  1. For each percentage value (e.g., 1%, 2%, 3%, 5%, 7%, 10%) of the cheap dataset size, the corresponding number of expert examples is determined.
  2. The expert dataset is split into 5 folds.
  3. For each fold, a subset of expert examples is selected, combined with the cheap dataset, and the model is fine-tuned for a few epochs.
  4. Evaluation metrics (F1, seqeval accuracy, validation loss) are computed and averaged over all folds.
  5. A graph is then constructed plotting F1-score versus the number of added expert examples to identify the point where improvements saturate.

image/png

Model Comparison

Below is a comparison of the initial evaluation metrics for two baseline models:

Model Validation Loss Seqeval Accuracy F1-Score
model_llm_pure 0.53443 0.85185 0.47493
model_init_12 0.33402 0.93084 0.65344

Model model_init_12 is obtained by fine-tuning the base model on the cheap dataset combined with an additional 12% of expert examples, demonstrating significantly improved performance.

4. Active Learning Loop

The core active learning loop starts from a pre-trained model (typically model_init_12) and iteratively:

  • Computes uncertainty for the remaining expert examples.
  • Selects the top uncertain examples (batch size controlled by batch_to_add).
  • Fine-tunes the model on the combined dataset (cheap data + newly added expert examples).
  • Saves the intermediate model in a separate branch on Hugging Face.
  • Stops when the improvement in F1-score is below a set threshold after a minimum number of iterations.

Note: Each intermediate model is saved in its own branch (e.g., active_iter_1_added_20, active_iter_2_added_40, etc.), which allows for easy comparison and retrieval later.


How to Use This Repository

Saving Intermediate Models

Each time the model is fine-tuned in the active learning loop, it is saved to a dedicated branch on Hugging Face. For example, to save the current model in a branch, use:

branch_name = "active_iter_1_added_20"  # Example branch name
save_model_to_branch(model, REPO_NAME, branch_name)

Loading a Model

To load an intermediate model from a specific branch:

loaded_model = load_model_from_branch(REPO_NAME, "active_iter_1_added_20")

Recommended Workflow

  1. Preliminary Experiment:
    Run the preliminary threshold experiment (with k-fold cross-validation) to determine the optimal percentage of expert data to start with. For instance, if the analysis indicates that adding 7% of expert examples provides a stable improvement, use that as your baseline for active learning.

  2. Initialize with Expert Data:
    Fine-tune the base model (model_llm_pure) with the selected percentage (e.g., 7%) to produce model_init_12 (or a similar variant). Save this model in a dedicated branch (e.g., model_percentage_12).

  3. Active Learning Loop:
    Start the active learning loop with the pre-trained model_init_12 (by setting use_initial_training=False) and iteratively add batches of expert examples selected by the uncertainty estimation module.

  4. Graph Analysis:
    After the active learning loop completes, plot the graph of F1-score vs. the total number of added expert examples. This graph illustrates the improvement (or saturation) of the model as more high-quality data is incorporated.


Conclusion

This repository documents a complete active learning workflow for NER. Our approach includes:

  • An entity-level evaluation module to accurately assess performance.
  • An uncertainty estimation module using average token entropy.
  • A preliminary threshold experiment using k-fold cross-validation to robustly determine the minimal volume of expert data needed.
  • An iterative active learning loop that fine-tunes the model and saves intermediate checkpoints in separate branches on Hugging Face.

By following this workflow, one can observe the improvement in model performance (primarily measured by entity-level F1-score) as additional expert data is added. The saved intermediate models allow for comprehensive analysis and comparison.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including estnafinema0/active-learning-nerc-models-kfold