Model Details
This model represent a distil-tuned version of Qwen/Qwen2.5-0.5B-Instruct
on MultiClinSum training data and rationale
composed for them.
Results obtained by this model were used to form submission for BioASQ-2025 Workshop / CLEF 2025.
We adopt Qwen/Qwen2.5-72B-Instruct
first for infering rationale
for the training data (read further for greater detail).
The baseline version: https://huggingface.co/nicolay-r/qwen25-05b-multiclinsum-standard
Model Description
- Model type: Decoder-based Model
- Language(s) (NLP): Supported by Qwen2.5 + fine-tuned on summarries written in
en
,fr
,pt
,es
- License: MIT
- Finetuned from model [optional]: https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct
Model Sources [optional]
- Repository: https://github.com/nicolay-r/distil-tuning-llm
- Paper: TBA
- Demo: https://colab.research.google.com/drive/1TXGaz39o73nBucEQw12gbad7Tw11j2Ol?usp=sharing
Usage
We use bulk-chain for inference with the Qwen2 provider based on transformers
pipelines API.
Provider huggingface_qwen.py
: https://github.com/nicolay-r/nlp-thirdgate/blob/9e46629792e9a53871710884f7b9e2fe42666aa7/llm/transformers_qwen2.py
from bulk_chain.api import iter_content
from bulk_chain.core.utils import dynamic_init
content_it = iter_content(
schema={"schema": [
{"prompt": "Summarize: {input}", "out": "summary"}]
},
llm=dynamic_init(
class_filepath="huggingface_qwen.py",
class_name="Qwen2")(
api_token="YOUR_HF_API_KEY_GOES_HERE",
model_name="nicolay-r/qwen25-05b-multiclinsum-distil",
temp=0.1,
use_bf16=True,
max_new_tokens=args.max_tokens,
device=args.device
),
infer_mode="batch",
batch_size=4,
return_mode="record",
# INPUT TEXTS:
input_dicts_it=[
{"input": "A patient 62 years old with ..."}
],
)
for record in content_it:
# here is the result dictionary that includes summary.
print(record["summary"])
Training Details
Training Data
- MultiClinSum
- We use the following script for downloading datasets.
- Web: https://temu.bsc.es/multiclinsum
- Data: https://zenodo.org/records/15463353
- BioASQ: http://bioasq.org/
Training Procedure
The training procedure involves:
- Preparation of the
rationale
for summaries distillation. - Launch of the fine-tuning process.
Preparation: We adopt Qwen/Qwen2.5-72B-Instruct
for infering rationale
via the following script:
- https://github.com/nicolay-r/distil-tuning-llm/blob/master/predict/annotate_train_rationale.py
- The script above relies on
open-router
provider as remote API: https://openrouter.ai/qwen/qwen-2.5-72b-instruct
Fine-tuning: Please follow this script for using MultiClinSum
dataset for fine-tuning at GoogleColab A100 (40GB VRAM) + 80GB RAM:
Preprocessing [optional]
Refer to the following script for the fine-tuning
pre-processing:
Training Hyperparameters
We refer to the original parameters here:
- https://github.com/QwenLM/Qwen2.5-VL/tree/main/qwen-vl-finetune And use the following script:
- https://github.com/nicolay-r/distil-tuning-llm/blob/master/distil_ft_qwen25_05b_A100-40GB_80GB_dis.sh
Speeds, Sizes, Times [optional]
The fine-tuning procedure for 3
epochs takes around ~1 hour
using the GoogleColab A100.
Evaluation
Testing Data
We use evaluation split of the 20 documents out of the small portion the available training data across all the languages: en
, fr
, pt
, es
Metrics
In this evaluation we use onle rouge
score.
Results
We launch 3 individual fine-tuning processes for distil
and standard
versions to showcase results variation among multiple runs.
Figure: the obtained results for this model correspond to the
distil
version ๐ข
Summary
Hardware
We experiment with model inference and launching using GoolgeColab Notebook service and related resources:
- Fine-tuning: A100 (40GB)
- Inference: T4 (16GB)
Follow the Google Codalab Notebook at the repository:
Software
This is an official repository for this card:
Citation [optional]
BibTeX:
TO BE ADDED
Model Card Authors
Nicolay Rusnachenko
- Downloads last month
- 114