Model Details

This model represent a distil-tuned version of Qwen/Qwen2.5-0.5B-Instruct on MultiClinSum training data and rationale composed for them. Results obtained by this model were used to form submission for BioASQ-2025 Workshop / CLEF 2025.

image/png

We adopt Qwen/Qwen2.5-72B-Instruct first for infering rationale for the training data (read further for greater detail).

The baseline version: https://huggingface.co/nicolay-r/qwen25-05b-multiclinsum-standard

Model Description

Model Sources [optional]

Open In Colab

Usage

We use bulk-chain for inference with the Qwen2 provider based on transformers pipelines API.

Provider huggingface_qwen.py: https://github.com/nicolay-r/nlp-thirdgate/blob/9e46629792e9a53871710884f7b9e2fe42666aa7/llm/transformers_qwen2.py

from bulk_chain.api import iter_content
from bulk_chain.core.utils import dynamic_init

content_it = iter_content(
  schema={"schema": [
      {"prompt": "Summarize: {input}", "out": "summary"}]
  },
  llm=dynamic_init(
    class_filepath="huggingface_qwen.py",
    class_name="Qwen2")(
      api_token="YOUR_HF_API_KEY_GOES_HERE",
      model_name="nicolay-r/qwen25-05b-multiclinsum-distil",
      temp=0.1,
      use_bf16=True,
      max_new_tokens=args.max_tokens,
      device=args.device
  ),
  infer_mode="batch",
  batch_size=4,
  return_mode="record",
  # INPUT TEXTS:
  input_dicts_it=[
     {"input": "A patient 62 years old with ..."}
  ],
)

for record in content_it:
  # here is the result dictionary that includes summary.
  print(record["summary"])

Training Details

Training Data

Training Procedure

The training procedure involves:

  1. Preparation of the rationale for summaries distillation.
  2. Launch of the fine-tuning process.

Preparation: We adopt Qwen/Qwen2.5-72B-Instruct for infering rationale via the following script:

Fine-tuning: Please follow this script for using MultiClinSum dataset for fine-tuning at GoogleColab A100 (40GB VRAM) + 80GB RAM:

Preprocessing [optional]

Refer to the following script for the fine-tuning pre-processing:

Training Hyperparameters

We refer to the original parameters here:

Speeds, Sizes, Times [optional]

The fine-tuning procedure for 3 epochs takes around ~1 hour using the GoogleColab A100.

Evaluation

Testing Data

We use evaluation split of the 20 documents out of the small portion the available training data across all the languages: en, fr, pt, es

Metrics

In this evaluation we use onle rouge score.

Results

We launch 3 individual fine-tuning processes for distil and standard versions to showcase results variation among multiple runs.

Figure: the obtained results for this model correspond to the distil version 🟢

image/png

Summary

Hardware

We experiment with model inference and launching using GoolgeColab Notebook service and related resources:

  • Fine-tuning: A100 (40GB)
  • Inference: T4 (16GB)

Follow the Google Codalab Notebook at the repository:

Software

This is an official repository for this card:

Citation [optional]

BibTeX:

TO BE ADDED

Model Card Authors

Nicolay Rusnachenko

Downloads last month
299
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
Input a message to start chatting with nicolay-r/qwen25-05b-multiclinsum-distil.

Model tree for nicolay-r/qwen25-05b-multiclinsum-distil

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(394)
this model