ipst
/

Rolshoven commited on
Commit
ba53c35
·
verified ·
1 Parent(s): 318d241

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -6
README.md CHANGED
@@ -1,7 +1,9 @@
1
  ---
2
  base_model: unsloth/Qwen2.5-7B-Instruct
3
  language:
4
- - en
 
 
5
  license: apache-2.0
6
  tags:
7
  - text-generation-inference
@@ -9,14 +11,121 @@ tags:
9
  - unsloth
10
  - qwen2
11
  - trl
 
 
 
 
 
 
12
  ---
 
13
 
14
- # Uploaded model
15
 
16
- - **Developed by:** rcds
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/Qwen2.5-7B-Instruct
19
 
20
- This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
21
 
22
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  base_model: unsloth/Qwen2.5-7B-Instruct
3
  language:
4
+ - de
5
+ - fr
6
+ - it
7
  license: apache-2.0
8
  tags:
9
  - text-generation-inference
 
11
  - unsloth
12
  - qwen2
13
  - trl
14
+ datasets:
15
+ - ipst/slds
16
+ metrics:
17
+ - bertscore
18
+ - bleu
19
+ - rouge
20
  ---
21
+ # Model Card for Qwen2.5-7B-Instruct-SLDS
22
 
23
+ ## Model Summary
24
 
25
+ This model is a **Qwen2.5-7B-Instruct fine-tuned on the Swiss Landmark Decisions Summarization (SLDS) dataset**.
26
+ SLDS is a multilingual dataset of **20,000 Swiss Federal Supreme Court decisions** (1954–2024), each paired with **headnotes in German, French, and Italian**, resulting in ~60,000 decision–headnote pairs.
 
27
 
28
+ The model is optimized for **legal abstractive summarization** and is capable of producing **concise, legally structured headnotes**.
29
+ It can be used for both **monolingual** and **cross-lingual summarization** tasks.
30
+
31
+ This model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
32
 
33
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
34
+
35
+ ---
36
+
37
+ ## Intended Use
38
+
39
+ - **Primary Task**: Judicial summarization (decision → headnote generation).
40
+ - **Languages**: German (`de`), French (`fr`), Italian (`it`).
41
+ - **Scenarios**:
42
+ - Monolingual summarization: e.g., German decision → German headnote.
43
+ - Cross-lingual summarization: e.g., German decision → French headnote.
44
+ - Legal research support: assisting in retrieval and navigation of court decisions.
45
+
46
+ **Not intended for**:
47
+ - Replacing human legal expertise.
48
+ - Serving as an authoritative legal source.
49
+ - Automated legal advice or decision-making.
50
+
51
+ ---
52
+
53
+ ## Training Data
54
+
55
+ - **Dataset**: [Swiss Landmark Decisions Summarization (SLDS)](https://huggingface.co/datasets/ipst/slds).
56
+ - **Size**: ~20K decisions, ~60K decision–headnote pairs.
57
+ - **Splits**: Train (1954–2021), Validation (2022), Test (2023–2024).
58
+ - **Source**: [Swiss Federal Supreme Court](https://www.bger.ch).
59
+
60
+ ---
61
+
62
+ ## Training Procedure
63
+
64
+ - **Base Models**:
65
+ - Qwen2.5 family (0.5B–14B)
66
+ - Llama 3.2 (3B)
67
+ - Phi-3.5-mini
68
+
69
+ - **Fine-tuning Objective**: Conditional generation (decision → headnote).
70
+ - **Evaluation Metrics**:
71
+ - Lexical: ROUGE-1/2/L, BLEU, BERTScore.
72
+ - Domain-specific: LLM-as-a-Judge framework (DeepSeek V3) assessing five rubrics: accuracy, completeness, clarity, legal citations, and considerations.
73
+
74
+ ---
75
+
76
+ ## Model Performance
77
+
78
+ On the SLDS test set (2023–2024):
79
+
80
+ | Model | Setting | BERTScore ↑ | BLEU ↑ | ROUGE-1 ↑ | ROUGE-2 ↑ | ROUGE-L ↑ | JUDGE ↑ |
81
+ |:--- |:--- |:--- |:--- |:--- |:--- |:--- |:--- |
82
+ | [Phi-3.5-mini](https://huggingface.co/ipst/Phi-3.5-mini-instruct-SLDS) | fine-tuned | 11.24 ± 3.82 | 34.84 ± 0.41 | 31.20 ± 2.08 | 14.11 ± 1.27 | 20.96 ± 1.35 | 15.25 ± 2.32 |
83
+ | [Llama 3.2B](https://huggingface.co/ipst/Llama-3.2-3B-Instruct-SLDS) | fine-tuned | 15.20 ± 4.40 | 21.89 ± 0.42 | 31.89 ± 2.34 | 14.87 ± 1.61 | 22.49 ± 1.60 | 18.47 ± 2.99 |
84
+ | [Qwen2.5 0.5B](https://huggingface.co/ipst/Qwen2.5-0.5B-Instruct-SLDS) | fine-tuned | -1.37 ± 3.85 | 32.20 ± 0.35 | 23.87 ± 1.68 | 9.46 ± 0.94 | 17.37 ± 1.09 | 5.80 ± 1.26 |
85
+ | [Qwen2.5 1.5B](https://huggingface.co/ipst/Qwen2.5-1.5B-Instruct-SLDS) | fine-tuned | 19.81 ± 2.72 | 36.79 ± 0.34 | 33.03 ± 1.73 | 14.14 ± 1.08 | 22.67 ± 1.13 | 15.92 ± 2.27 |
86
+ | [Qwen2.5 3B](https://huggingface.co/ipst/Qwen2.5-3B-Instruct-SLDS) | fine-tuned | 23.23 ± 2.80 | 38.42 ± 0.34 | 35.18 ± 1.79 | 15.66 ± 1.23 | 24.10 ± 1.17 | 20.31 ± 2.66 |
87
+ | [Qwen2.5 7B](https://huggingface.co/ipst/Qwen2.5-7B-Instruct-SLDS) | fine-tuned | 29.59 ± 1.97 | 41.40 ± 0.34 | 39.24 ± 1.59 | 18.26 ± 1.25 | 26.44 ± 1.15 | 28.37 ± 3.07 |
88
+ | [Qwen2.5 14B](https://huggingface.co/ipst/Qwen2.5-14B-Instruct-SLDS) | fine-tuned | **32.48 ± 1.98** | **41.80 ± 0.37** | 40.04 ± 1.74 | **19.99 ± 1.41** | **28.00 ± 1.28** | 31.38 ± 3.19 |
89
+ | GPT-4o | one-shot | 30.44 ± 1.74 | 31.89 ± 0.25 | **42.12 ± 1.79** | 18.92 ± 1.22 | 25.92 ± 1.05 | 39.70 ± 2.66 |
90
+ | Claude 3.5 Sonnet | one-shot | 5.53 ± 2.00 | 21.88 ± 0.25 | 41.86 ± 1.64 | 19.23 ± 1.19 | 27.67 ± 1.20 | 41.25 ± 2.90 |
91
+ | DeepSeek-R1 | one-shot | 20.28 ± 1.45 | 22.37 ± 0.18 | 38.30 ± 1.82 | 15.97 ± 0.85 | 21.03 ± 0.84 | **42.28 ± 2.21** |
92
+ | o3-mini | one-shot | 14.18 ± 1.31 | 20.55 ± 0.17 | 34.77 ± 1.43 | 11.92 ± 0.69 | 18.21 ± 0.67 | 34.82 ± 2.41 |
93
+
94
+ - **Lexical metrics**: Fine-tuned models outperform in overlap-based scores.
95
+ - **LLM-judge scores**: Larger proprietary and reasoning models outperform in legal precision.
96
+
97
+ ---
98
+
99
+ ## Limitations
100
+
101
+ - **Language imbalance**: German decisions dominate, while Italian remains underrepresented.
102
+ - **Biases**: Headnotes reflect judicial style and conventions, not neutral summaries.
103
+ - **Evaluation mismatch**: ROUGE and BLEU may not fully capture legal accuracy.
104
+ - **Overfitting risk**: Models may overfit to formulaic headnote structures.
105
+ - **Cross-lingual difficulty**: Some models struggle with non-monolingual headnote generation.
106
+
107
+ ---
108
+
109
+ ## Ethical Considerations
110
+
111
+ - **Sensitive information**: All data is anonymized by the Swiss Federal Supreme Court before publication.
112
+ - **Legal risk**: Generated headnotes must not be used as official legal advice.
113
+ - **Fair use**: Ensure attribution when reusing outputs.
114
+
115
+ ---
116
+
117
+ ## How to Cite
118
+
119
+ If you use this model, please cite the dataset paper:
120
+
121
+ ```bibtex
122
+ @article{rolshoven2025slds,
123
+ title={Unlocking Legal Knowledge: A Multilingual Dataset for Judicial Summarization in Switzerland},
124
+ author={Luca Rolshoven and Vishvaksenan Rasiah and Srinanda Brügger Bose and Sarah Hostettler and Lara Burkhalter and Matthias Stürmer and Joel Niklaus},
125
+ year={2025},
126
+ eprint={2410.13456},
127
+ archivePrefix={arXiv},
128
+ primaryClass={cs.CL},
129
+ url={https://arxiv.org/abs/2410.13456},
130
+ }
131
+ ```