Flamehaven commited on
Commit
7bed085
·
1 Parent(s): c3b5716

Update project files

Browse files
Files changed (2) hide show
  1. CRoM-EfficientLLM_Full_Report.md +2318 -0
  2. release_notes.md +12 -0
CRoM-EfficientLLM_Full_Report.md ADDED
@@ -0,0 +1,2318 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CRoM-EfficientLLM 전체 프로젝트 보고서
2
+
3
+ ## 1. 프로젝트 전체 구조 (Directory Tree)
4
+
5
+ ```
6
+ CRoM-EfficientLLM/
7
+ ├── .github/
8
+ │ └── workflows/
9
+ │ ├── ci.yml
10
+ │ └── release.yml
11
+ ├── benchmarks/
12
+ │ ├── efficiency_eval.py
13
+ │ ├── longbench_eval.py
14
+ │ └── sample_results.json
15
+ ├── dashboard/
16
+ │ ├── grafana_dashboard.json
17
+ │ └── prometheus_config.yml
18
+ ├── docs/
19
+ │ ├── architecture.md
20
+ │ └── versioning.md
21
+ ├── examples/
22
+ │ └── corpus/
23
+ │ ├── sample_docs.jsonl
24
+ │ └── sample_queries.jsonl
25
+ ├── scripts/
26
+ │ ├── gen_release_notes.py
27
+ │ └── release.sh
28
+ ├── src/
29
+ │ └── crom_efficientllm/
30
+ │ ├── budget_packer/
31
+ │ │ ├── __init__.py
32
+ │ │ └── packer.py
33
+ │ ├── drift_estimator/
34
+ │ │ ├── __init__.py
35
+ │ │ └── estimator.py
36
+ │ ├── plugins/
37
+ │ │ ├── evidently_drift.py
38
+ │ │ ├── flashrank_reranker.py
39
+ │ │ └── llmlingua_compressor.py
40
+ │ ├── rerank_engine/
41
+ │ │ ├── __init__.py
42
+ │ │ └── rerank.py
43
+ │ ├── __init__.py
44
+ │ ├── budget_packer.py
45
+ │ ├── capsule_logger.py
46
+ │ ├── cli.py
47
+ │ ├── cross_encoder.py
48
+ │ ├── demo.py
49
+ │ └── server.py
50
+ ├── tests/
51
+ │ ├── test_drift.py
52
+ │ ├── test_packer.py
53
+ │ └── test_rerank.py
54
+ ├── .gitignore
55
+ ├── CHANGELOG.md
56
+ ├── crom 1.0.1수정 업데이트 상세보고서.md
57
+ ├── LICENSE
58
+ ├── pyproject.toml
59
+ ├── README.md
60
+ ├── release_notes.md
61
+ └── requirements.txt
62
+ ```
63
+
64
+ ## 2. 파일별 상세 내용
65
+
66
+ ---
67
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\.github\\workflows\\ci.yml`
68
+ ```yaml
69
+ name: ci
70
+ on:
71
+ push:
72
+ branches: [ main ]
73
+ pull_request:
74
+
75
+ jobs:
76
+ test:
77
+ runs-on: ubuntu-latest
78
+ strategy:
79
+ matrix:
80
+ python-version: ["3.9", "3.10", "3.11", "3.12"]
81
+ steps:
82
+ - uses: actions/checkout@v4
83
+ - uses: actions/setup-python@v5
84
+ with:
85
+ python-version: ${{ matrix.python-version }}
86
+ - run: pip install -e .[dev]
87
+ - run: pre-commit run --all-files || true
88
+ - run: ruff --version && black --version
89
+ - run: pytest -q
90
+ ```
91
+ ---
92
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\.github\\workflows\\release.yml`
93
+ ```yaml
94
+ name: release
95
+ on:
96
+ push:
97
+ tags:
98
+ - 'v*'
99
+ jobs:
100
+ release:
101
+ runs-on: ubuntu-latest
102
+ steps:
103
+ - uses: actions/checkout@v4
104
+ with:
105
+ fetch-depth: 0
106
+ - uses: actions/setup-python@v5
107
+ with:
108
+ python-version: '3.11'
109
+ - run: pip install -e .[dev]
110
+ - run: pytest -q
111
+ - name: Build distribution
112
+ run: |
113
+ python -m pip install build
114
+ python -m build
115
+ - name: Generate release notes from CHANGELOG
116
+ run: |
117
+ python scripts/gen_release_notes.py "$GITHUB_REF_NAME"
118
+ - name: Publish GitHub Release
119
+ uses: softprops/action-gh-release@v2
120
+ with:
121
+ name: ${{ github.ref_name }}
122
+ body_path: release_notes.md
123
+ files: |
124
+ dist/*.whl
125
+ dist/*.tar.gz
126
+ ```
127
+ ---
128
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\.gitignore`
129
+ ```
130
+ # Python
131
+ __pycache__/
132
+ *.py[cod]
133
+ *.egg-info/
134
+ .env
135
+ .venv/
136
+ virtualenv/
137
+ .idea/
138
+ .vscode/
139
+ .ipynb_checkpoints/
140
+ .dist/
141
+ .build/
142
+ .coverage
143
+ .pytest_cache/
144
+
145
+ # OS
146
+ .DS_Store
147
+ Thumbs.db
148
+ ```
149
+ ---
150
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\CHANGELOG.md`
151
+ ```markdown
152
+ # Changelog
153
+
154
+ ## [1.0.1] - 2025-09-06
155
+ ### Added
156
+ - Implemented core modules from scratch based on design documents.
157
+ - Implemented FastAPI server with `/process` endpoint (`src/crom_efficientllm/server.py`).
158
+ - Added `enhanced_greedy_pack` with detailed statistics for budget packing (`src/crom_efficientllm/budget_packer.py`).
159
+ - Implemented `SafeCrossEncoderManager` for robust and observable Cross-Encoder handling (`src/crom_efficientllm/cross_encoder.py`).
160
+ - Added `ExplainCapsuleLogger` for structured JSONL logging of all processing events (`src/crom_efficientllm/capsule_logger.py`).
161
+
162
+ ### Changed
163
+ - Major version bump to reflect the first functional implementation of core logic.
164
+
165
+
166
+ ## [0.2.1] - 2025-09-02
167
+ ### Added
168
+ - CLI `--save-plots` option for `sweep` and `dp-curve`; saves PNG charts to `benchmarks/out/` (or `--out-dir`).
169
+ - README Quick Examples mention of plotting flag.
170
+ - This CHANGELOG.
171
+
172
+ ### Changed
173
+ - Dev tooling: recommend `matplotlib` via dev extra for plotting.
174
+
175
+ ## [0.2.0] - 2025-09-02
176
+ ### Added
177
+ - GitHub Actions CI (3.9–3.12), pre-commit(ruff/black).
178
+ - `crom-bench` CLI: `e2e`, `sweep`, `scale`, `dp-curve`, `haystack-compare`.
179
+ - Plugins: FlashRank/LLMLingua/Evidently (optional extras).
180
+ - Example corpus & queries (JSONL).
181
+
182
+ ## [0.1.0] - 2025-09-02
183
+ - Initial packaging; budget packer, hybrid rerank, drift estimator, demo & metrics.
184
+ ```
185
+ ---
186
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\LICENSE`
187
+ ```
188
+
189
+ Apache License
190
+ Version 2.0, January 2004
191
+ http://www.apache.org/licenses/
192
+
193
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
194
+
195
+ 1. Definitions.
196
+
197
+ "License" shall mean the terms and conditions for use, reproduction,
198
+ and distribution as defined by Sections 1 through 9 of this document.
199
+
200
+ "Licensor" shall mean the copyright owner or entity authorized by
201
+ the copyright owner that is granting the License.
202
+
203
+ "Legal Entity" shall mean the union of the acting entity and all
204
+ other entities that control, are controlled by, or are under common
205
+ control with that entity. For the purposes of this definition,
206
+ "control" means (i) the power, direct or indirect, to cause the
207
+ direction or management of such entity, whether by contract or
208
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
209
+ outstanding shares, or (iii) beneficial ownership of such entity.
210
+
211
+ "You" (or "Your") shall mean an individual or Legal Entity
212
+ exercising permissions granted by this License.
213
+
214
+ "Source" form shall mean the preferred form for making modifications,
215
+ including but not limited to software source code, documentation
216
+ source, and configuration files.
217
+
218
+ "Object" form shall mean any form resulting from mechanical
219
+ transformation or translation of a Source form, including but
220
+ not limited to compiled object code, generated documentation,
221
+ and conversions to other media types.
222
+
223
+ "Work" shall mean the work of authorship, whether in Source or
224
+ Object form, made available under the License, as indicated by a
225
+ copyright notice that is included in or attached to the work
226
+ (an example is provided in the Appendix below).
227
+
228
+ "Derivative Works" shall mean any work, whether in Source or Object
229
+ form, that is based on (or derived from) the Work and for which the
230
+ editorial revisions, annotations, elaborations, or other modifications
231
+ represent, as a whole, an original work of authorship. For the purposes
232
+ of this License, Derivative Works shall not include works that remain
233
+ separable from, or merely link (or bind by name) to the interfaces of,
234
+ the Work and Derivative Works thereof.
235
+
236
+ "Contribution" shall mean any work of authorship, including
237
+ the original version of the Work and any modifications or additions
238
+ to that Work or Derivative Works thereof, that is intentionally
239
+ submitted to Licensor for inclusion in the Work by the copyright owner
240
+ or by an individual or Legal Entity authorized to submit on behalf of
241
+ the copyright owner. For the purposes of this definition, "submitted"
242
+ means any form of electronic, verbal, or written communication sent
243
+ to the Licensor or its representatives, including but not limited to
244
+ communication on electronic mailing lists, source code control systems,
245
+ and issue tracking systems that are managed by, or on behalf of, the
246
+ Licensor for the purpose of discussing and improving the Work, but
247
+ excluding communication that is conspicuously marked or otherwise
248
+ designated in writing by the copyright owner as "Not a Contribution."
249
+
250
+ "Contributor" shall mean Licensor and any individual or Legal Entity
251
+ on behalf of whom a Contribution has been received by Licensor and
252
+ subsequently incorporated within the Work.
253
+
254
+ 2. Grant of Copyright License. Subject to the terms and conditions of
255
+ this License, each Contributor hereby grants to You a perpetual,
256
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
257
+ copyright license to reproduce, prepare Derivative Works of,
258
+ publicly display, publicly perform, sublicense, and distribute the
259
+ Work and such Derivative Works in Source or Object form.
260
+
261
+ 3. Grant of Patent License. Subject to the terms and conditions of
262
+ this License, each Contributor hereby grants to You a perpetual,
263
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
264
+ (except as stated in this section) patent license to make, have made,
265
+ use, offer to sell, sell, import, and otherwise transfer the Work,
266
+ where such license applies only to those patent claims licensable
267
+ by such Contributor that are necessarily infringed by their
268
+ Contribution(s) alone or by combination of their Contribution(s)
269
+ with the Work to which such Contribution(s) was submitted. If You
270
+ institute patent litigation against any entity (including a
271
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
272
+ or a Contribution incorporated within the Work constitutes direct
273
+ or contributory patent infringement, then any patent licenses
274
+ granted to You under this License for that Work shall terminate
275
+ as of the date such litigation is filed.
276
+
277
+ 4. Redistribution. You may reproduce and distribute copies of the
278
+ Work or Derivative Works thereof in any medium, with or without
279
+ modifications, and in Source or Object form, provided that You
280
+ meet the following conditions:
281
+
282
+ (a) You must give any other recipients of the Work or
283
+ Derivative Works a copy of this License; and
284
+
285
+ (b) You must cause any modified files to carry prominent notices
286
+ stating that You changed the files; and
287
+
288
+ (c) You must retain, in the Source form of any Derivative Works
289
+ that You distribute, all copyright, patent, trademark, and
290
+ attribution notices from the Source form of the Work,
291
+ excluding those notices that do not pertain to any part of
292
+ the Derivative Works; and
293
+
294
+ (d) If the Work includes a "NOTICE" text file as part of its
295
+ distribution, then any Derivative Works that You distribute must
296
+ include a readable copy of the attribution notices contained
297
+ within such NOTICE file, excluding those notices that do not
298
+ pertain to any part of the Derivative Works, in at least one
299
+ of the following places: within a NOTICE text file distributed
300
+ as part of the Derivative Works; within the Source form or
301
+ documentation, if provided along with the Derivative Works; or,
302
+ within a display generated by the Derivative Works, if and
303
+ wherever such third-party notices normally appear. The contents
304
+ of the NOTICE file are for informational purposes only and
305
+ do not modify the License. You may add Your own attribution
306
+ notices within Derivative Works that You distribute, alongside
307
+ or as an addendum to the NOTICE text from the Work, provided
308
+ that such additional attribution notices cannot be construed
309
+ as modifying the License.
310
+
311
+ You may add Your own copyright statement to Your modifications and
312
+ may provide additional or different license terms and conditions
313
+ for use, reproduction, or distribution of Your modifications, or
314
+ for any such Derivative Works as a whole, provided Your use,
315
+ reproduction, and distribution of the Work otherwise complies with
316
+ the conditions stated in this License.
317
+
318
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
319
+ any Contribution intentionally submitted for inclusion in the Work
320
+ by You to the Licensor shall be under the terms and conditions of
321
+ this License, without any additional terms or conditions.
322
+ Notwithstanding the above, nothing herein shall supersede or modify
323
+ the terms of any separate license agreement you may have executed
324
+ with the Licensor regarding such Contributions.
325
+
326
+ 6. Trademarks. This License does not grant permission to use the trade
327
+ names, trademarks, service marks, or product names of the Licensor,
328
+ except as required for reasonable and customary use in describing the
329
+ origin of the Work and reproducing the content of the NOTICE file.
330
+
331
+ 7. Disclaimer of Warranty. Unless required by applicable law or
332
+ agreed to in writing, Licensor provides the Work (and each
333
+ Contributor provides its Contributions) on an "AS IS" BASIS,
334
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
335
+ implied, including, without limitation, any warranties or conditions
336
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
337
+ PARTICULAR PURPOSE. You are solely responsible for determining the
338
+ appropriateness of using or redistributing the Work and assume any
339
+ risks associated with Your exercise of permissions under this License.
340
+
341
+ 8. Limitation of Liability. In no event and under no legal theory,
342
+ whether in tort (including negligence), contract, or otherwise,
343
+ unless required by applicable law (such as deliberate and grossly
344
+ negligent acts) or agreed to in writing, shall any Contributor be
345
+ liable to You for damages, including any direct, indirect, special,
346
+ incidental, or consequential damages of any character arising as a
347
+ result of this License or out of the use or inability to use the
348
+ Work (including but not limited to damages for loss of goodwill,
349
+ work stoppage, computer failure or malfunction, or any and all
350
+ other commercial damages or losses), even if such Contributor
351
+ has been advised of the possibility of such damages.
352
+
353
+ 9. Accepting Warranty or Additional Liability. While redistributing
354
+ the Work or Derivative Works thereof, You may choose to offer,
355
+ and charge a fee for, acceptance of support, warranty, indemnity,
356
+ or other liability obligations and/or rights consistent with this
357
+ License. However, in accepting such obligations, You may act only
358
+ on Your own behalf and on Your sole responsibility, not on behalf
359
+ of any other Contributor, and only if You agree to indemnify,
360
+ defend, and hold each Contributor harmless for any liability
361
+ incurred by, or claims asserted against, such Contributor by reason
362
+ of your accepting any such warranty or additional liability.
363
+
364
+ END OF TERMS AND CONDITIONS
365
+
366
+ APPENDIX: How to apply the Apache License to your work.
367
+
368
+ To apply the Apache License to your work, attach the following
369
+ boilerplate notice, with the fields enclosed by brackets "[]"
370
+ replaced with your own identifying information. (Don't include
371
+ the brackets!) The text should be enclosed in the appropriate
372
+ comment syntax for the file format. We also recommend that a
373
+ file or class name and description of purpose be included on the
374
+ same "printed page" as the copyright notice for easier
375
+ identification within third-party archives.
376
+
377
+ Copyright [yyyy] [name of copyright owner]
378
+
379
+ Licensed under the Apache License, Version 2.0 (the "License");
380
+ you may not use this file except in compliance with the License.
381
+ You may obtain a copy of the License at
382
+
383
+ http://www.apache.org/licenses/LICENSE-2.0
384
+
385
+ Unless required by applicable law or agreed to in writing, software
386
+ distributed under the License is distributed on an "AS IS" BASIS,
387
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
388
+ See the License for the specific language governing permissions and
389
+ limitations under the License.
390
+ ```
391
+ ---
392
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\README.md`
393
+ ```markdown
394
+ ---
395
+ language: en
396
+ license: apache-2.0
397
+ library_name: crom-efficientllm
398
+ tags:
399
+ - rag
400
+ - llm
401
+ - retrieval
402
+ - rerank
403
+ - reranker
404
+ - context-management
405
+ - prompt-engineering
406
+ - observability
407
+ - python
408
+ ---
409
+ # CRoM-Context-Rot-Mitigation--EfficientLLM: Context Reranking and Management for Efficient LLMs
410
+
411
+ <p align="left">
412
+ <a href="https://github.com/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM/actions">
413
+ <img alt="CI" src="https://img.shields.io/github/actions/workflow/status/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM/ci.yml?branch=main" />
414
+ </a>
415
+ <a href="#-benchmarks">
416
+ <img alt="Bench" src="https://img.shields.io/badge/benchmarks-ready-success" />
417
+ </a>
418
+ <a href="LICENSE">
419
+ <img alt="License" src="https://img.shields.io/badge/license-Apache%202.0-blue" />
420
+ </a>
421
+ <a href="https://github.com/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM/releases">
422
+ <img alt="Release" src="https://img.shields.io/github/v/release/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM?display_name=tag" />
423
+ </a>
424
+ <a href="CHANGELOG.md">
425
+ <img alt="Versioning" src="https://img.shields.io/badge/semver-0.2.x-lightgrey" />
426
+ </a>
427
+ <a href="https://github.com/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM/releases/latest">
428
+ <img alt="Wheel" src="https://img.shields.io/badge/wheel-available-success" />
429
+ </a>
430
+ </p>
431
+
432
+ **CRoM (Context Rot Mitigation)-EfficientLLM** is a Python toolkit designed to optimize the context provided to Large Language Models (LLMs). It provides a suite of tools to intelligently select, re-rank, and manage text chunks to fit within a model\'s context budget while maximizing relevance and minimizing performance drift.
433
+
434
+ This project is ideal for developers building RAG (Retrieval-Augmented Generation) pipelines who need to make the most of limited context windows.
435
+
436
+ ## Key Features
437
+
438
+ * **Budget Packer:** Greedily packs the highest-scoring text chunks into a defined token budget using a stable sorting algorithm.
439
+ * **Hybrid Reranker:** Combines sparse (TF-IDF) and dense (Sentence-Transformers) retrieval scores for robust and high-quality reranking of documents.
440
+ * **Drift Estimator:** Monitors the semantic drift between sequential model responses using L2 or cosine distance with EWMA smoothing.
441
+ * **Observability:** Exposes Prometheus metrics for monitoring token savings and drift alerts in production.
442
+ * **Extensible Plugins:** Supports optional plugins for advanced reranking (`FlashRank`), compression (`LLMLingua`), and drift analysis (`Evidently`).
443
+ * **Comprehensive Benchmarking:** Includes a CLI for end-to-end pipeline evaluation, budget sweeps, and quality-vs-optimal analysis.
444
+
445
+ ## Installation
446
+
447
+ Install the package directly from source using pip. For development, it\'s recommended to install in editable mode with the `[dev]` extras.
448
+
449
+ ```bash
450
+ # Clone the repository
451
+ git clone https://github.com/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM.git
452
+ cd CRoM-Context-Rot-Mitigation--EfficientLLM
453
+
454
+ # Install in editable mode with development and plugin dependencies
455
+ pip install -e .[dev,plugins]
456
+ ```
457
+
458
+ ## Quickstart
459
+
460
+ ### Demo
461
+
462
+ Run a simple, self-contained demonstration of the core components:
463
+
464
+ ```bash
465
+ # Run the demo script
466
+ crom-demo demo
467
+ ```
468
+
469
+ ### CLI Benchmarking Examples
470
+
471
+ The package includes a powerful `crom-bench` CLI for evaluation.
472
+
473
+ ```bash
474
+ # Default E2E (Search→Rerank→Pack→Mock LLM)
475
+ crom-bench e2e --budget 0.3
476
+
477
+ # Optional: High-precision configuration with plugins
478
+ crom-bench e2e --budget 0.3 \
479
+ --use-flashrank --flashrank-model ms-marco-TinyBERT-L-2-v2 \
480
+ --use-llmlingua --compress-ratio=0.6 \
481
+ --use-evidently
482
+ ```
483
+
484
+ ### Plotting
485
+
486
+ If `matplotlib` is installed (`pip install -e .[dev]`), you can save benchmark plots directly:
487
+
488
+ ```bash
489
+ # Save budget sweep result plots
490
+ crom-bench sweep --save-plots
491
+
492
+ # Save DP-curve plots
493
+ crom-bench dp-curve --save-plots
494
+ ```
495
+
496
+ ## Release & Changelog
497
+
498
+ This project follows semantic versioning. For detailed changes, see the [**CHANGELOG.md**](CHANGELOG.md).
499
+
500
+ Releases are automated via GitHub Actions when a `v*` tag is pushed.
501
+
502
+ ## License
503
+
504
+ This project is licensed under the Apache 2.0 License. See the [LICENSE](LICENSE) file for details.
505
+ ```
506
+ ---
507
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\benchmarks\\efficiency_eval.py`
508
+ ```python
509
+ """
510
+ Efficiency Evaluation for CRoM-EfficientLLM
511
+ - Synthetic workload to measure token savings, selection quality, and runtime.
512
+ - No third-party deps beyond numpy/matplotlib (pandas optional for CSVs).
513
+
514
+ Usage:
515
+ python benchmarks/efficiency_eval.py --budget 0.3 --n 5000 --seed 123 --plot --save
516
+ """
517
+ from __future__ import annotations
518
+
519
+ import argparse
520
+ import math
521
+ import time
522
+ from dataclasses import dataclass
523
+ from typing import List, Sequence, Tuple, Union
524
+
525
+ import numpy as np
526
+
527
+ try:
528
+ import pandas as pd # optional
529
+ except Exception: # pragma: no cover
530
+ pd = None
531
+
532
+ try:
533
+ import matplotlib.pyplot as plt # optional
534
+ except Exception: # pragma: no cover
535
+ plt = None
536
+
537
+ # --- Local packers (self-contained to avoid imports during quick eval) ---
538
+ @dataclass(frozen=True)
539
+ class Chunk:
540
+ text: str
541
+ score: float
542
+ tokens: int
543
+
544
+ def _estimate_tokens(text: str) -> int:
545
+ return max(1, len(text) // 4)
546
+
547
+ def _coerce_chunk(obj: Union[Chunk, dict], idx: int) -> Chunk:
548
+ if isinstance(obj, Chunk):
549
+ return obj
550
+ if not isinstance(obj, dict):
551
+ raise TypeError(f"Chunk #{idx} must be Chunk or dict, got {type(obj)}")
552
+ text = str(obj.get("text", ""))
553
+ if not text:
554
+ raise ValueError(f"Chunk #{idx} has empty text")
555
+ score = float(obj.get("score", 0.0))
556
+ tokens = int(obj["tokens"]) if "tokens" in obj else _estimate_tokens(text)
557
+ if tokens <= 0:
558
+ raise ValueError(f"Chunk #{idx} has non-positive tokens: {tokens}")
559
+ return Chunk(text=text, score=score, tokens=tokens)
560
+
561
+ def budget_pack(text_chunks: Sequence[Union[Chunk, dict]], budget: int = 1000) -> List[Chunk]:
562
+ if budget <= 0:
563
+ raise ValueError("budget must be > 0")
564
+ coerced: List[Chunk] = [_coerce_chunk(c, i) for i, c in enumerate(text_chunks)]
565
+ indexed = list(enumerate(coerced))
566
+ indexed.sort(key=lambda it: (-it[1].score, it[1].tokens, it[0]))
567
+ selected: List[Chunk] = []
568
+ total = 0
569
+ for _, ch in indexed:
570
+ if total + ch.tokens <= budget:
571
+ selected.append(ch)
572
+ total += ch.tokens
573
+ return selected
574
+
575
+ def pack_fcfs(text_chunks: Sequence[Union[Chunk, dict]], budget: int) -> List[Chunk]:
576
+ sel, total = [], 0
577
+ for i, obj in enumerate(text_chunks):
578
+ ch = _coerce_chunk(obj, i)
579
+ if total + ch.tokens <= budget:
580
+ sel.append(ch)
581
+ total += ch.tokens
582
+ return sel
583
+
584
+ def pack_random(text_chunks: Sequence[Union[Chunk, dict]], budget: int, seed: int = 0) -> List[Chunk]:
585
+ rng = np.random.default_rng(seed)
586
+ indices = np.arange(len(text_chunks))
587
+ rng.shuffle(indices)
588
+ sel, total = [], 0
589
+ for i in indices:
590
+ ch = _coerce_chunk(text_chunks[i], i)
591
+ if total + ch.tokens <= budget:
592
+ sel.append(ch)
593
+ total += ch.tokens
594
+ return sel
595
+
596
+ # --- Data generation and metrics ---
597
+
598
+ def make_synthetic_chunks(n=2000, seed=42, corr=0.6):
599
+ rng = np.random.default_rng(seed)
600
+ true_rel = rng.normal(0, 1, size=n)
601
+ noise = rng.normal(0, 1, size=n) * math.sqrt(1 - corr**2)
602
+ score = corr * true_rel + noise
603
+ tokens = np.clip(rng.lognormal(mean=4.0, sigma=0.6, size=n).astype(int), 5, 2000)
604
+ chunks = [Chunk(text=("x"*int(t*4)), score=float(s), tokens=int(t)) for s, t in zip(score, tokens)]
605
+ return chunks, true_rel
606
+
607
+ def eval_once(n=5000, budget_ratio=0.3, seed=123, corr=0.6):
608
+ chunks, true_rel = make_synthetic_chunks(n=n, seed=seed, corr=corr)
609
+ total_tokens = sum(c.tokens for c in chunks)
610
+ budget = int(total_tokens * budget_ratio)
611
+
612
+ def run(name, fn):
613
+ t0 = time.perf_counter()
614
+ sel = fn(chunks, budget)
615
+ dt = time.perf_counter() - t0
616
+ idx_map = {id(c): i for i, c in enumerate(chunks)}
617
+ picked_idx = [idx_map[id(c)] for c in sel]
618
+ rel_sum = float(np.sum(true_rel[picked_idx])) if picked_idx else 0.0
619
+ sel_tokens = sum(c.tokens for c in sel)
620
+ return {
621
+ "name": name,
622
+ "time_ms": dt*1000,
623
+ "selected_chunks": len(sel),
624
+ "selected_tokens": sel_tokens,
625
+ "tokens_budget": budget,
626
+ "tokens_total_unpacked": total_tokens,
627
+ "tokens_saved": total_tokens - sel_tokens,
628
+ "save_ratio": (total_tokens - sel_tokens)/total_tokens,
629
+ "relevance_sum": rel_sum,
630
+ }
631
+
632
+ rows = [
633
+ run("budget_pack", budget_pack),
634
+ run("fcfs", pack_fcfs),
635
+ run("random", lambda ch, b: pack_random(ch, b, seed=seed)),
636
+ ]
637
+ return rows
638
+
639
+ def quality_vs_optimal(n=200, budget_ratio=0.3, seed=123, corr=0.6):
640
+ chunks, true_rel = make_synthetic_chunks(n=n, seed=seed, corr=corr)
641
+ budget = int(sum(c.tokens for c in chunks) * budget_ratio)
642
+ values = np.maximum(true_rel, 0.0)
643
+
644
+ def optimal(chunks_sub, values, budget):
645
+ items = chunks_sub
646
+ vals = list(values)
647
+ B = budget
648
+ dp = [0.0]*(B+1)
649
+ keep = [[False]*(B+1) for _ in range(len(items))]
650
+ for i, it in enumerate(items):
651
+ wt = it.tokens
652
+ val = vals[i]
653
+ for b in range(B, wt-1, -1):
654
+ alt = dp[b - wt] + val
655
+ if alt > dp[b]:
656
+ dp[b] = alt
657
+ keep[i][b] = True
658
+ b = B
659
+ picked_idx = []
660
+ for i in range(len(items)-1, -1, -1):
661
+ if keep[i][b]:
662
+ picked_idx.append(i)
663
+ b -= items[i].tokens
664
+ picked_idx.reverse()
665
+ rel_sum = float(np.sum([values[i] for i in picked_idx])) if picked_idx else 0.0
666
+ total_tokens = sum(items[i].tokens for i in picked_idx)
667
+ return picked_idx, rel_sum, total_tokens
668
+
669
+ opt_idx, opt_rel, opt_tokens = optimal(chunks, values, budget)
670
+
671
+ # selections
672
+ idx_map = {id(c): i for i, c in enumerate(chunks)}
673
+ def rel_of(selection):
674
+ pid = [idx_map[id(c)] for c in selection]
675
+ return float(np.sum(values[pid])) if pid else 0.0
676
+
677
+ sel_bp = budget_pack(chunks, budget)
678
+ sel_fc = pack_fcfs(chunks, budget)
679
+ sel_rd = pack_random(chunks, budget, seed=seed)
680
+
681
+ rows = [
682
+ {"name":"optimal_true_rel", "relevance_sum": opt_rel, "selected_tokens": opt_tokens, "selected_chunks": len(opt_idx)},
683
+ {"name":"budget_pack_small", "relevance_sum": rel_of(sel_bp), "selected_tokens": sum(c.tokens for c in sel_bp), "selected_chunks": len(sel_bp)},
684
+ {"name":"fcfs_small", "relevance_sum": rel_of(sel_fc), "selected_tokens": sum(c.tokens for c in sel_fc), "selected_chunks": len(sel_fc)},
685
+ {"name":"random_small", "relevance_sum": rel_of(sel_rd), "selected_tokens": sum(c.tokens for c in sel_rd), "selected_chunks": len(sel_rd)},
686
+ ]
687
+ return rows
688
+
689
+ def main():
690
+ ap = argparse.ArgumentParser()
691
+ ap.add_argument("--n", type=int, default=5000)
692
+ ap.add_argument("--budget", type=float, default=0.3)
693
+ ap.add_argument("--seed", type=int, default=123)
694
+ ap.add_argument("--corr", type=float, default=0.6)
695
+ ap.add_argument("--plot", action="store_true")
696
+ ap.add_argument("--save", action="store_true")
697
+ args = ap.parse_args()
698
+
699
+ rows = eval_once(n=args.n, budget_ratio=args.budget, seed=args.seed, corr=args.corr)
700
+ rows_q = quality_vs_optimal(n=min(200, args.n), budget_ratio=args.budget, seed=args.seed, corr=args.corr)
701
+
702
+ print("\n=== Efficiency (n={}, budget={{:.0%}}) ===".format(args.n, args.budget))
703
+ for r in rows:
704
+ print("{name:12s} time={{time_ms:7.2f}}ms save_ratio={{save_ratio:6.3f}} tokens_saved={{tokens_saved:8d}} rel_sum={{relevance_sum:8.3f}}".format(**r))
705
+
706
+ print("\n=== Quality vs Optimal (subset) ===")
707
+ for r in rows_q:
708
+ print("{name:18s} rel_sum={{relevance_sum:8.3f}} tokens={{selected_tokens:5d}} chunks={{selected_chunks:4d}}".format(**r))
709
+
710
+ if pd is not None and args.save:
711
+ pd.DataFrame(rows).to_csv("benchmarks/results_efficiency.csv", index=False)
712
+ pd.DataFrame(rows_q).to_csv("benchmarks/results_quality.csv", index=False)
713
+ print("Saved CSVs to benchmarks حضرتك.")
714
+
715
+ if plt is not None and args.plot:
716
+ # single-figure plots, no explicit colors
717
+ x = [r["name"] for r in rows]
718
+ y = [r["time_ms"] for r in rows]
719
+ import matplotlib.pyplot as plt
720
+ plt.figure()
721
+ plt.bar(x, y)
722
+ plt.title("Packer Runtime (ms)")
723
+ plt.xlabel("method")
724
+ plt.ylabel("ms")
725
+ plt.show()
726
+
727
+ if __name__ == "__main__":
728
+ main()
729
+ ```
730
+ ---
731
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\benchmarks\\longbench_eval.py`
732
+ ```python
733
+ """
734
+ Benchmark script: LongBench-like evaluation.
735
+ Simulates context packing efficiency.
736
+ """
737
+ from crom_efficientllm.budget_packer.packer import budget_pack
738
+
739
+ def evaluate():
740
+ chunks = [{"text": f"chunk {i}", "score": i % 5, "tokens": 100} for i in range(20)]
741
+ packed = budget_pack(chunks, budget=500)
742
+ print("Selected:", len(packed), "chunks")
743
+
744
+ if __name__ == "__main__":
745
+ evaluate()
746
+ ```
747
+ ---
748
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\benchmarks\\sample_results.json`
749
+ ```json
750
+ {}
751
+ ```
752
+ ---
753
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\crom 1.0.1수정 업데이트 상세보고서.md`
754
+ ```markdown
755
+ # CRoM-EfficientLLM v1.0.1 업데이트 상세 보고서
756
+
757
+ **문서 목적:** 소셜 미디어 (LinkedIn, Twitter, Medium) 포스팅을 위한 마케팅 AI의 정보 소스 제공
758
+ **작성일:** 2025-09-06
759
+ **작성자:** CLI ↯C01∞ | Σψ∴
760
+
761
+ ---
762
+
763
+ ## 1. 개요 (Overview)
764
+
765
+ - **프로젝트명:** CRoM-EfficientLLM (Context Rot Mitigation for Efficient LLMs)
766
+ - **이전 버전:** 0.2.1
767
+ - **신규 버전:** 1.0.1
768
+
769
+ **핵심 요약:**
770
+ 이번 v1.0.1 업데이트는 CRoM-EfficientLLM 프로젝트의 **첫 번째 기능 구현(First Functional Implementation)**을 의미합니다. 기존의 아이디어와 뼈대만 있던 상태에서, 실제 동작하는 핵심 로직을 모두 구현하여 **작동 가능한 프로토타입(Working Prototype)**으로 전환했습니다. 이제 사용자들은 RAG 파이프라인의 컨텍스트를 효율적으로 관리하고 최적화하는 핵심 기능들을 직접 테스트하고 활용할 수 있습니다.
771
+
772
+ ---
773
+
774
+ ## 2. 배경 (Background)
775
+
776
+ 기존 v0.2.1은 `pyproject.toml`, `README.md` 등 프로젝트의 방향성과 구조만 정의된 **설계 단계의 스캐폴드(Scaffold)**였습니다. 실제 핵심 로직을 담고 있는 Python 소스 코드가 부재하여 아이디어를 실제로 검증할 수 없었습니다.
777
+
778
+ 이번 업데이트의 목표는 이 설계도에 따라, **처음부터(from scratch) 핵심 기능들을 모두 구현**하여 프로젝트에 생명을 불어넣고, 실제 사용 가능한 상태로 만드는 것이었습니다.
779
+
780
+ ---
781
+
782
+ ## 3. 상세 변경 내역 (Detailed Changes)
783
+
784
+ 이번 업데이트를 통해 4개의 핵심 모듈이 `src/crom_efficientllm/` 디렉토리 내에 새롭게 구현되었습니다.
785
+
786
+ ### 가. `budget_packer.py` - 지능형 컨텍스트 패킹 엔진
787
+ - **기능:** LLM에 전달할 컨텍스트(청크)를 주어진 토큰 예산 내에서 가장 효율적으로 구성합니다.
788
+ - **세부 사항:**
789
+ - 단순히 텍스트를 자르는 것이 아니라, **점수/토큰 비율**을 기준으로 가장 중요한 정보를 우선적으로 선택합니다.
790
+ - 패킹 후 **압축률, 절약된 토큰 수, 예산 효율성** 등 상세한 통계를 제공하여, 컨텍스트 관리 전략의 효과를 정량적으로 분석할 수 있는 기반을 마련했습니다.
791
+
792
+ ### 나. `cross_encoder.py` - 안정성 강화 Cross-Encoder 관리자
793
+ - **기능:** RAG 파이프라인의 핵심인 Cross-Encoder 모델을 안정적으로 관리하고 오류 발생 시 시스템 전체의 다운을 방지합니다.
794
+ - **세부 사항:**
795
+ - `sentence-transformers` 라이브러리가 없거나 모델 로딩에 실패하는 등 다양한 **오류 상황을 자동으로 감지하고 우아하게 처리(Graceful Fallback)**합니다.
796
+ - 시스템이 멈추는 대신, "비활성화", "오류" 등의 명확한 상태를 API 응답에 포함시켜 **시스템의 안정성과 예측 가능성**을 크게 높였습니다.
797
+
798
+ ### 다. `capsule_logger.py` - 투명성 확보를 위한 캡슐 로거
799
+ - **기능:** 시스템의 모든 처리 과정을 **구조화된 로그(Structured Log)**로 기록하여 투명성과 감사 가능성을 제공합니다.
800
+ - **세부 사항:**
801
+ - 모든 API 요청, 처리 통계, 시스템 상태를 **"설명 캡슐(Explain Capsule)"**이라는 JSONL 형식으로 영구 저장합니다.
802
+ - 이는 추후 시스템의 동작을 디버깅하거나, 성능 저하의 원인을 분석하고, AI의 판단 근거를 추적하는 데 필수적인 데이터가 됩니다.
803
+
804
+ ### 라. `server.py` - 핵심 기능 통합 API 서버
805
+ - **기능:** 위에서 설명한 모든 모듈(패킹, 리랭킹, 로깅)을 하나로 묶어, 사용자가 쉽게 접근할 수 있는 **FastAPI 기반의 API 서버**를 제공합니다.
806
+ - **세부 사항:**
807
+ - `/process` 엔드포인트를 통해 쿼리와 컨텍스트 데이터를 받아, 리랭킹부터 패킹, 로깅까지의 전 과정을 **하나의 트랜잭션으로 처리(Orchestration)**합니다.
808
+ - `/healthz` 엔드포인트를 통해 외부 모니터링 시스템이 서버의 상태를 쉽게 확인할 수 있도록 구현했습니다.
809
+
810
+ ---
811
+
812
+ ## 4. 버전 관리 및 문서화 (Versioning & Documentation)
813
+
814
+ - **버전 업데이트:** 핵심 기능이 구현됨에 따라, 프로젝트의 버전을 `0.2.1`에서 **`1.0.1`**로 상향 조정하여 중요한 진전을 명시했습니다.
815
+ - **변경 이력 관리:** `CHANGELOG.md` 파일에 상기된 모든 구현 내역을 상세히 기록하여, 사용자와 기여자가 프로젝트의 발전 과정을 쉽게 추적할 수 있도록 투명성을 확보했습니다.
816
+
817
+ ---
818
+
819
+ ## 5. 기대 효과 및 다음 단계 (Expected Impact & Next Steps)
820
+
821
+ - **기대 효과:**
822
+ - CRoM-EfficientLLM은 더 이상 아이디어가 아닌, **실제 RAG 시스템에 적용하여 컨텍스트 관리 효율성을 테스트할 수 있는 실용적인 도구**로 발전했습니다.
823
+ - 개발자들은 LLM의 제한된 컨텍스트 창을 어떻게 하면 가장 효율적으로 사용할 수 있는지에 대한 **정량적인 데이터**를 얻을 수 있게 되었습니다.
824
+
825
+ - **다음 단계:**
826
+ - `README.md`에 명시된 `crom-demo` 및 `crom-bench` CLI 기능 구현
827
+ - 사용자가 원하는 토크나이저(Tokenizer)를 선택할 수 있는 기능 추가
828
+ - 다양한 컨텍스트 관리 전략의 성능을 비교할 수 있는 벤치마크 시스템 고도화
829
+
830
+ ---
831
+
832
+ **보고서 종료.**
833
+ ```
834
+ ---
835
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\dashboard\\grafana_dashboard.json`
836
+ ```json
837
+ {}
838
+ ```
839
+ ---
840
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\dashboard\\prometheus_config.yml`
841
+ ```
842
+
843
+
844
+ ```
845
+ ---
846
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\docs\\architecture.md`
847
+ ```markdown
848
+ # Architecture
849
+
850
+ This document outlines the architecture of the CRoM-EfficientLLM project.
851
+ ```
852
+ ---
853
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\docs\\versioning.md`
854
+ ```markdown
855
+ # Versioning & PyPI Guidance
856
+
857
+ This document defines package naming, SemVer rules, and a future path to publish to PyPI.
858
+
859
+ ## 1) Package name
860
+ - Distribution name (PyPI): `crom-efficientllm` (lowercase, hyphen-separated)
861
+ - Import name (module): `crom_efficientllm` (PEP 8 underscore)
862
+
863
+ > **Tip**: Keep both names consistent to avoid confusion in docs.
864
+
865
+ ### Check name availability on PyPI
866
+ - Visit: https://pypi.org/project/crom-efficientllm/ (404 → available)
867
+ - If taken, consider: `crom-efficient-llm`, `crom-llm-efficient`, `crom-ctx-pack`
868
+ - Reserve on TestPyPI first: use `test.pypi.org` to validate metadata & upload
869
+
870
+ ## 2) Semantic Versioning (SemVer)
871
+ We follow **MAJOR.MINOR.PATCH**.
872
+
873
+ - **MAJOR**: Backward-incompatible API changes
874
+ - e.g., rename function signatures (`budget_pack`), move/rename modules, change return schemas
875
+ - **MINOR**: Backward-compatible features
876
+ - new functions/flags (e.g., `pack_summary`, CLI subcommands), performance improvements
877
+ - **PATCH**: Backward-compatible bug fixes
878
+ - logic corrections, docs/CI fixes, dependency pin updates without API changes
879
+
880
+ ### Pre-releases
881
+ Use suffixes: `-a.1`, `-b.1`, `-rc.1` (alpha/beta/release-candidate)
882
+ - Example: `0.3.0-rc.1`
883
+
884
+ ### Deprecation Policy
885
+ - Mark deprecated APIs in `CHANGELOG.md` and docstrings
886
+ - Provide at least **one MINOR release** with warnings before removal
887
+
888
+ ### Public API Surface
889
+ We commit compatibility for:
890
+ - `crom_efficientllm.budget_packer.packer`: `Chunk`, `budget_pack`, `pack_summary`
891
+ - `crom_efficientllm.rerank_engine.rerank`: `hybrid_rerank`
892
+ - `crom_efficientllm.drift_estimator.estimator`: `DriftEstimator`, `DriftMode`
893
+ - CLI entrypoints: `crom-demo`, `crom-bench` and their documented flags
894
+
895
+ ## 3) Release Flow (GitHub → PyPI later)
896
+ - Tag: `vX.Y.Z` → GitHub Actions builds & creates a Release (artifacts attached)
897
+ - Keep `CHANGELOG.md` updated per release
898
+ - After API stabilizes, enable **PyPI publish** using a separate workflow with `PYPI_API_TOKEN` secret
899
+
900
+ ### (Future) PyPI publishing steps
901
+ 1. Create a PyPI account & project
902
+ 2. Add `PYPI_API_TOKEN` to repo `Settings → Secrets and variables → Actions`
903
+ 3. Add `release-pypi.yml` workflow to upload on tag
904
+ 4. Verify install: `pip install crom-efficientllm` and import `crom_efficientllm`
905
+
906
+ ---
907
+ _Last updated: 2025-09-02_
908
+ ```
909
+ ---
910
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\examples\\corpus\\sample_docs.jsonl`
911
+ ```json
912
+ {"id": 1, "text": "AI ethics and governance frameworks for responsible AI."}
913
+ {"id": 2, "text": "Techniques for detecting model drift in production systems."}
914
+ {"id": 3, "text": "A recipe for sourdough bread and fermentation tips."}
915
+ {"id": 4, "text": "Hybrid search: combining sparse and dense retrieval methods."}
916
+ {"id": 5, "text": "Token budgets and prompt compression strategies for LLMs."}
917
+ {"id": 6, "text": "Monitoring with Prometheus and building Grafana dashboards."}
918
+ ```
919
+ ---
920
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\examples\\corpus\\sample_queries.jsonl`
921
+ ```json
922
+ {"query": "how to detect drift in ai models"}
923
+ {"query": "ways to reduce llm token usage"}
924
+ {"query": "observability stack prometheus grafana"}
925
+ ```
926
+ ---
927
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\pyproject.toml`
928
+ ```toml
929
+ [build-system]
930
+ requires = ["setuptools>=68", "wheel"]
931
+ build-backend = "setuptools.build_meta"
932
+
933
+ [project]
934
+ name = "crom-efficientllm"
935
+ version = "1.0.1"
936
+ description = "CRoM (Context Rot Mitigation)-EfficientLLM: Budget packing, hybrid rerank, and drift estimation with observability"
937
+ readme = "README.md"
938
+ requires-python = ">=3.9"
939
+ license = { text = "Apache-2.0" }
940
+ authors = [ { name = "Your Name" } ]
941
+ dependencies = [
942
+ "numpy>=1.24,<3",
943
+ "scikit-learn>=1.3,<2",
944
+ "transformers>=4.41,<5",
945
+ "sentence-transformers>=2.2,<3",
946
+ "flask>=3,<4",
947
+ "prometheus-client>=0.20,<1"
948
+ ]
949
+
950
+ [project.optional-dependencies]
951
+ dev = [
952
+ "pytest>=7",
953
+ "ruff>=0.4",
954
+ "black>=24.4",
955
+ "pre-commit>=3.6",
956
+ "matplotlib>=3.8,<4"
957
+ ]
958
+ plugins = [
959
+ "flashrank>=0.2; python_version>='3.9'",
960
+ "llmlingua>=0.2; python_version>='3.9'",
961
+ "evidently>=0.4; python_version>='3.9'"
962
+ ]
963
+ haystack = [
964
+ "farm-haystack[faiss,inference]>=1.26; python_version>='3.9'"
965
+ ]
966
+
967
+ [project.urls]
968
+ Homepage = "https://github.com/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM"
969
+
970
+ [project.scripts]
971
+ "crom-demo" = "crom_efficientllm.demo:main"
972
+ "crom-bench" = "crom_efficientllm.cli:main"
973
+
974
+ [tool.setuptools]
975
+ package-dir = {"" = "src"}
976
+ packages = { find = { where = ["src"] } }
977
+
978
+ [tool.pytest.ini_options]
979
+ addopts = "-q"
980
+
981
+ [tool.black]
982
+ line-length = 100
983
+
984
+ [tool.ruff]
985
+ target-version = "py39"
986
+
987
+ [tool.ruff.lint]
988
+ select = ["E","F","I","UP","B","C4","SIM","PL","PERF","RUF","ANN"]
989
+ ignore = ["ANN101","ANN102"]
990
+
991
+ [tool.ruff.lint.per-file-ignores]
992
+ "tests/*" = ["S101","ANN","PLR2004"]
993
+ ```
994
+ ---
995
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\release_notes.md`
996
+ ```markdown
997
+ # Release v0.2.1
998
+
999
+ ## [0.2.1] - 2025-09-02
1000
+ ### Added
1001
+ - CLI `--save-plots` option for `sweep` and `dp-curve`; saves PNG charts to `benchmarks/out/` (or `--out-dir`).
1002
+ - README Quick Examples mention of plotting flag.
1003
+ - This CHANGELOG.
1004
+
1005
+ ### Changed
1006
+ - Dev tooling: recommend `matplotlib` via dev extra for plotting.
1007
+
1008
+ — generated from [CHANGELOG.md](CHANGELOG.md)
1009
+ ```
1010
+ ---
1011
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\requirements.txt`
1012
+ ```
1013
+ numpy>=1.24,<3
1014
+ scikit-learn>=1.3,<2
1015
+ transformers>=4.41,<5
1016
+ sentence-transformers>=2.2,<3
1017
+ flask>=3,<4
1018
+ prometheus-client>=0.20,<1
1019
+ ```
1020
+ ---
1021
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\scripts\\gen_release_notes.py`
1022
+ ```python
1023
+ #!/usr/bin/env python3
1024
+ from __future__ import annotations
1025
+ import os
1026
+ import re
1027
+ import sys
1028
+ from pathlib import Path
1029
+
1030
+ ROOT = Path(__file__).resolve().parents[1]
1031
+ CHANGELOG = ROOT / "CHANGELOG.md"
1032
+ OUT = ROOT / "release_notes.md"
1033
+
1034
+ def main(tag: str) -> None:
1035
+ version = tag.lstrip("v").strip()
1036
+ if not CHANGELOG.exists():
1037
+ OUT.write_text(f"# Release {tag}\n\n(CHANGELOG.md not found)
1038
+ ", encoding="utf-8")
1039
+ return
1040
+ text = CHANGELOG.read_text(encoding="utf-8")
1041
+ pat = re.compile(rf"^##\s*[[^{re.escape(version)}]]?[^\n]*$", re.MULTILINE)
1042
+ m = pat.search(text)
1043
+ if not m:
1044
+ OUT.write_text(
1045
+ f"# Release {tag}\n\nSection for {version} not found in CHANGELOG.\n\n" + text,
1046
+ encoding="utf-8",
1047
+ )
1048
+ return
1049
+ start = m.end()
1050
+ m2 = re.search(r"^##\s+", text[start:], re.MULTILINE)
1051
+ end = start + (m2.start() if m2 else len(text) - start)
1052
+ section = text[m.start():end].strip()
1053
+ body = f"# Release {tag}\n\n{section}\n\n— generated from [CHANGELOG.md](CHANGELOG.md)"
1054
+ OUT.write_text(body, encoding="utf-8")
1055
+
1056
+ if __name__ == "__main__":
1057
+ tag = sys.argv[1] if len(sys.argv) > 1 else os.environ.get("GITHUB_REF_NAME", "")
1058
+ if not tag:
1059
+ print("Usage: gen_release_notes.py vX.Y.Z", file=sys.stderr)
1060
+ sys.exit(2)
1061
+ main(tag)
1062
+ ```
1063
+ ---
1064
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\scripts\\release.sh`
1065
+ ```bash
1066
+ #!/usr/bin/env bash
1067
+ set -euo pipefail
1068
+
1069
+ TAG=${1:-}
1070
+ if [[ -z "$TAG" ]]; then
1071
+ echo "Usage: scripts/release.sh vX.Y.Z"; exit 1
1072
+ fi
1073
+
1074
+ # sanity checks
1075
+ if [[ -n $(git status --porcelain) ]]; then
1076
+ echo "❌ Working tree not clean"; exit 1
1077
+ fi
1078
+
1079
+ # ensure deps
1080
+ python -m pip install -e .[dev]
1081
+ pre-commit run --all-files
1082
+ pytest -q
1083
+
1084
+ # generate release notes preview from CHANGELOG
1085
+ python scripts/gen_release_notes.py "$TAG"
1086
+ if [[ -f release_notes.md ]]; then
1087
+ echo "--- release_notes.md (preview top 60 lines) ---"
1088
+ head -n 60 release_notes.md || true
1089
+ echo "--- end preview ---"
1090
+ else
1091
+ echo "⚠️ release_notes.md not generated; will fall back to default notes in GH release"
1092
+ fi
1093
+
1094
+ # tag & push
1095
+
1096
+
1097
+ git tag -a "$TAG" -m "Release $TAG"
1098
+ git push origin "$TAG"
1099
+
1100
+ echo "✅ Pushed tag $TAG. GitHub Actions will create the Release automatically."
1101
+ echo "➡️ Watch: https://github.com/Flamehaven/CRoM-EfficientLLM/actions"
1102
+ ```
1103
+ ---
1104
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\__init__.py`
1105
+ ```python
1106
+ """Public API for CRoM-EfficientLLM."""
1107
+ from .budget_packer.packer import Chunk, budget_pack, pack_summary
1108
+ from .rerank_engine.rerank import hybrid_rerank
1109
+ from .drift_estimator.estimator import DriftEstimator, DriftMode
1110
+
1111
+ __all__ = [
1112
+ "Chunk",
1113
+ "budget_pack",
1114
+ "pack_summary",
1115
+ "hybrid_rerank",
1116
+ "DriftEstimator",
1117
+ "DriftMode",
1118
+ ]
1119
+ ```
1120
+ ---
1121
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\budget_packer.py`
1122
+ ```python
1123
+ from typing import List, Dict
1124
+ import logging
1125
+
1126
+ def enhanced_greedy_pack(chunks: List[Dict], budget: int,
1127
+ score_key: str = "score") -> tuple[List[Dict], Dict]:
1128
+ """
1129
+ 기존 greedy_pack 함수를 확장하여 상세 통계 반환
1130
+
1131
+ Returns:
1132
+ tuple: (packed_chunks, stats_dict)
1133
+ """
1134
+ if not chunks:
1135
+ return [], {
1136
+ "selected_count": 0,
1137
+ "packed_count": 0,
1138
+ "selected_tokens": 0,
1139
+ "packed_tokens": 0,
1140
+ "compression_ratio": 0.0,
1141
+ "token_savings": 0,
1142
+ "efficiency": 0.0
1143
+ }
1144
+
1145
+ # 토큰 수 미리 계산
1146
+ for chunk in chunks:
1147
+ if "token_count" not in chunk:
1148
+ chunk["token_count"] = max(1, len(chunk.get("text", "")) // 4)
1149
+
1150
+ # 효율성 기준 정렬 (score/token 비율)
1151
+ sorted_chunks = sorted(
1152
+ chunks,
1153
+ key=lambda x: x.get(score_key, 0) / x["token_count"],
1154
+ reverse=True
1155
+ )
1156
+
1157
+ # 그리디 패킹
1158
+ packed_chunks = []
1159
+ used_tokens = 0
1160
+
1161
+ for chunk in sorted_chunks:
1162
+ if used_tokens + chunk["token_count"] <= budget:
1163
+ packed_chunks.append(chunk)
1164
+ used_tokens += chunk["token_count"]
1165
+
1166
+ # 상세 통계 계산
1167
+ total_selected_tokens = sum(chunk["token_count"] for chunk in chunks)
1168
+
1169
+ stats = {
1170
+ "selected_count": len(chunks),
1171
+ "packed_count": len(packed_chunks),
1172
+ "selected_tokens": total_selected_tokens,
1173
+ "packed_tokens": used_tokens,
1174
+ "compression_ratio": len(packed_chunks) / len(chunks) if chunks else 0.0,
1175
+ "token_savings": total_selected_tokens - used_tokens,
1176
+ "efficiency": used_tokens / budget if budget > 0 else 0.0
1177
+ }
1178
+
1179
+ # 📊 로깅 추가 (기존 코드에 없던 통계 가시성)
1180
+ logging.info(f"Packing completed: {stats['packed_count']}/{stats['selected_count']} chunks, "
1181
+ f"tokens: {stats['packed_tokens']}/{stats['selected_tokens']} "
1182
+ f"(efficiency: {stats['efficiency']:.1%})")
1183
+
1184
+ return packed_chunks, stats
1185
+ ```
1186
+ ---
1187
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\capsule_logger.py`
1188
+ ```python
1189
+ import json
1190
+ from pathlib import Path
1191
+ from datetime import datetime
1192
+ from typing import Union, Dict
1193
+ import logging
1194
+
1195
+ class ExplainCapsuleLogger:
1196
+ """스키마 기반 설명 캡슐 저장 시스템"""
1197
+
1198
+ def __init__(self, log_directory: str = "artifacts/logs"):
1199
+ self.log_dir = Path(log_directory)
1200
+ self.log_dir.mkdir(parents=True, exist_ok=True)
1201
+
1202
+ # 로그 파일 경로들
1203
+ self.capsules_file = self.log_dir / "explain_capsules.jsonl"
1204
+ self.metrics_file = self.log_dir / "processing_metrics.jsonl"
1205
+ self.errors_file = self.log_dir / "error_log.jsonl"
1206
+
1207
+ logging.info(f"ExplainCapsule Logger initialized: {self.log_dir}")
1208
+
1209
+ def create_explain_capsule(self, query: str, response_data: Dict,
1210
+ processing_stats: Dict,
1211
+ cross_encoder_status: str) -> Dict:
1212
+ """스키마 준수 설명 캡슐 생성"""
1213
+
1214
+ capsule = {
1215
+ # 🔖 메타데이터 (필수)
1216
+ "timestamp": datetime.now().isoformat(),
1217
+ "version": "1.0",
1218
+ "processor": "CRoM-Enhanced",
1219
+
1220
+ # 📝 쿼리 정보
1221
+ "query": {
1222
+ "text": query,
1223
+ "length": len(query),
1224
+ "token_estimate": len(query) // 4
1225
+ },
1226
+
1227
+ # 📊 처리 통계 (패치 1에서 확장된 정보)
1228
+ "processing_stats": {
1229
+ **processing_stats,
1230
+ "cross_encoder_status": cross_encoder_status
1231
+ },
1232
+
1233
+ # 🔧 시스템 상태
1234
+ "system_state": {
1235
+ "cross_encoder_available": cross_encoder_status not in ["disabled", "unavailable"]
1236
+ },
1237
+
1238
+ # 📦 원본 및 결과 청크
1239
+ "chunks": {
1240
+ "packed": response_data.get("chunks", [])
1241
+ }
1242
+ }
1243
+ return capsule
1244
+
1245
+ def log_capsule(self, capsule: Dict):
1246
+ """설명 캡슐을 .jsonl 파일에 기록"""
1247
+ try:
1248
+ with open(self.capsules_file, "a", encoding="utf-8") as f:
1249
+ f.write(json.dumps(capsule, ensure_ascii=False) + "\n")
1250
+ except Exception as e:
1251
+ logging.error(f"Failed to log explain capsule: {e}")
1252
+
1253
+ def log_error(self, error_details: Dict):
1254
+ """오류 정보를 .jsonl 파일에 기록"""
1255
+ try:
1256
+ error_details["timestamp"] = datetime.now().isoformat()
1257
+ with open(self.errors_file, "a", encoding="utf-8") as f:
1258
+ f.write(json.dumps(error_details, ensure_ascii=False) + "\n")
1259
+ except Exception as e:
1260
+ logging.error(f"Failed to log error: {e}")
1261
+ ```
1262
+ ---
1263
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\cli.py`
1264
+ ```python
1265
+ from __future__ import annotations
1266
+
1267
+ import argparse
1268
+ import json
1269
+ import os
1270
+ import time
1271
+ from dataclasses import dataclass
1272
+ from typing import List, Dict, Sequence
1273
+
1274
+ import numpy as np
1275
+ from sklearn.feature_extraction.text import TfidfVectorizer
1276
+ from sklearn.metrics.pairwise import cosine_similarity
1277
+
1278
+ from crom_efficientllm.budget_packer.packer import budget_pack, Chunk
1279
+ from crom_efficientllm.rerank_engine.rerank import hybrid_rerank
1280
+
1281
+ try:
1282
+ from sentence_transformers import SentenceTransformer
1283
+ except Exception: # pragma: no cover
1284
+ SentenceTransformer = None # type: ignore
1285
+
1286
+ # Optional plugins are imported lazily when flags are set
1287
+
1288
+ @dataclass
1289
+ class Doc:
1290
+ id: str
1291
+ text: str
1292
+
1293
+ def load_jsonl(path: str) -> List[Dict]:
1294
+ with open(path, "r", encoding="utf-8") as f:
1295
+ return [json.loads(line) for line in f]
1296
+
1297
+ def build_corpus(path: str) -> List[Doc]:
1298
+ rows = load_jsonl(path)
1299
+ return [Doc(id=str(r.get("id", i)), text=str(r["text"])) for i, r in enumerate(rows)]
1300
+
1301
+ def sparse_retrieval(query: str, corpus: Sequence[Doc], k: int = 100) -> List[Dict]:
1302
+ texts = [d.text for d in corpus]
1303
+ vect = TfidfVectorizer(ngram_range=(1, 2)).fit(texts)
1304
+ D = vect.transform(texts)
1305
+ Q = vect.transform([query])
1306
+ sims = cosine_similarity(Q, D).ravel()
1307
+ order = np.argsort(-sims)[:k]
1308
+ return [{"id": corpus[i].id, "text": corpus[i].text, "score_sparse": float(sims[i])} for i in order]
1309
+
1310
+ def dense_embed_model(name: str):
1311
+ if SentenceTransformer is None:
1312
+ raise RuntimeError("sentence-transformers not installed. Install with `pip install -e .`.")
1313
+ return SentenceTransformer(name)
1314
+
1315
+ def _apply_flashrank(query: str, docs: List[Dict], model_name: str) -> List[Dict]:
1316
+ try:
1317
+ from crom_efficientllm.plugins.flashrank_reranker import flashrank_rerank
1318
+ except Exception as e: # pragma: no cover
1319
+ raise RuntimeError("FlashRank plugin not available. Install extras: pip install .[plugins]") from e
1320
+ ranked = flashrank_rerank(query, docs, model_name=model_name)
1321
+ # Normalize plugin score to 0..1 and put into score_final
1322
+ scores = np.array([d.get("score_flashrank", 0.0) for d in ranked], dtype=np.float32)
1323
+ if scores.size and float(scores.max() - scores.min()) > 1e-12:
1324
+ s = (scores - scores.min()) / (scores.max() - scores.min())
1325
+ else:
1326
+ s = np.zeros_like(scores)
1327
+ for i, d in enumerate(ranked):
1328
+ d["score_final"] = float(s[i])
1329
+ return ranked
1330
+
1331
+ def _apply_llmlingua(text: str, ratio: float) -> str:
1332
+ try:
1333
+ from crom_efficientllm.plugins.llmlingua_compressor import compress_prompt
1334
+ except Exception as e: # pragma: no cover
1335
+ raise RuntimeError("LLMLingua plugin not available. Install extras: pip install .[plugins]") from e
1336
+ return compress_prompt(text, target_ratio=ratio)
1337
+
1338
+ def _save_evidently_report(all_embs: List[List[float]], out_html: str) -> None:
1339
+ try:
1340
+ from crom_efficientllm.plugins.evidently_drift import drift_report
1341
+ except Exception as e: # pragma: no cover
1342
+ raise RuntimeError("Evidently plugin not available. Install extras: pip install .[plugins]") from e
1343
+ n = len(all_embs)
1344
+ if n < 4:
1345
+ return
1346
+ ref = all_embs[: n // 2]
1347
+ cur = all_embs[n // 2 :]
1348
+ rep = drift_report(ref, cur)
1349
+ rep.save_html(out_html)
1350
+
1351
+ def mock_llm_generate(prompt: str) -> str:
1352
+ time.sleep(0.005) # simulate small latency
1353
+ return "[MOCK] " + prompt[:160]
1354
+
1355
+ def e2e(args: argparse.Namespace) -> None:
1356
+ corpus = build_corpus(args.corpus)
1357
+ queries = [r["query"] for r in load_jsonl(args.queries)]
1358
+ embed = dense_embed_model(args.model)
1359
+ all_embs: List[List[float]] = []
1360
+
1361
+ t0 = time.perf_counter()
1362
+ all_rows = []
1363
+ for q in queries:
1364
+ t_s = time.perf_counter()
1365
+ cands = sparse_retrieval(q, corpus, k=args.k)
1366
+ t_sparse = (time.perf_counter() - t_s) * 1000
1367
+
1368
+ t_r = time.perf_counter()
1369
+ if args.use_flashrank:
1370
+ reranked = _apply_flashrank(q, cands, args.flashrank_model)
1371
+ else:
1372
+ reranked = hybrid_rerank(q, cands, embed, alpha=args.alpha)
1373
+ t_rerank = (time.perf_counter() - t_r) * 1000
1374
+
1375
+ # token heuristic + budget pack
1376
+ chunks = [
1377
+ Chunk(text=d["text"], score=d.get("score_final", d.get("score_sparse", 0.0)), tokens=max(1, len(d["text"]) // 4))
1378
+ for d in reranked
1379
+ ]
1380
+ budget_tokens = int(sum(c.tokens for c in chunks) * args.budget)
1381
+ t_p = time.perf_counter()
1382
+ packed = budget_pack(chunks, budget=budget_tokens)
1383
+ t_pack = (time.perf_counter() - t_p) * 1000
1384
+
1385
+ prompt = "\n\n".join(c.text for c in packed) + f"\n\nQ: {q}\nA:"
1386
+ if args.use_llmlingua:
1387
+ prompt = _apply_llmlingua(prompt, ratio=args.compress_ratio)
1388
+
1389
+ # collect embeddings for drift snapshot (mean-pooled)
1390
+ with np.errstate(all="ignore"):
1391
+ if len(packed) > 0:
1392
+ doc_embs = embed.encode([c.text for c in packed], convert_to_numpy=True)
1393
+ vec = np.mean(doc_embs, axis=0).tolist()
1394
+ all_embs.append(vec)
1395
+
1396
+ t_l = time.perf_counter()
1397
+ _ = mock_llm_generate(prompt)
1398
+ t_llm = (time.perf_counter() - t_l) * 1000
1399
+
1400
+ total = (time.perf_counter() - t_s) * 1000
1401
+ all_rows.append({
1402
+ "query": q,
1403
+ "sparse_ms": t_sparse,
1404
+ "rerank_ms": t_rerank,
1405
+ "pack_ms": t_pack,
1406
+ "llm_ms": t_llm,
1407
+ "total_ms": total,
1408
+ "packed_tokens": sum(c.tokens for c in packed),
1409
+ "orig_tokens": sum(c.tokens for c in chunks),
1410
+ "save_ratio": 1 - (sum(c.tokens for c in packed) / max(1, sum(c.tokens for c in chunks))),
1411
+ "used_flashrank": bool(args.use_flashrank),
1412
+ "used_llmlingua": bool(args.use_llmlingua),
1413
+ })
1414
+
1415
+ elapsed = (time.perf_counter() - t0) * 1000
1416
+ os.makedirs(args.out_dir, exist_ok=True)
1417
+ out_path = os.path.join(args.out_dir, "e2e_results.jsonl")
1418
+ with open(out_path, "w", encoding="utf-8") as f:
1419
+ for r in all_rows:
1420
+ f.write(json.dumps(r, ensure_ascii=False) + "\n")
1421
+ print(f"saved results -> {out_path} ({len(all_rows)} queries) ; elapsed={elapsed:.2f}ms")
1422
+
1423
+ if args.use_evidently and all_embs:
1424
+ html_path = os.path.join(args.out_dir, "evidently_report.html")
1425
+ _save_evidently_report(all_embs, html_path)
1426
+ print(f"evidently report -> {html_path}")
1427
+
1428
+ def budget_sweep(args: argparse.Namespace) -> None:
1429
+ import itertools
1430
+ corpus = build_corpus(args.corpus)
1431
+ queries = [r["query"] for r in load_jsonl(args.queries)][: args.max_q]
1432
+ embed = dense_embed_model(args.model)
1433
+
1434
+ budgets = [b / 100.0 for b in range(args.b_min, args.b_max + 1, args.b_step)]
1435
+ rows = []
1436
+ for q, b in itertools.product(queries, budgets):
1437
+ cands = sparse_retrieval(q, corpus, k=args.k)
1438
+ reranked = hybrid_rerank(q, cands, embed, alpha=args.alpha)
1439
+ chunks = [Chunk(text=d["text"], score=d["score_final"], tokens=max(1, len(d["text"]) // 4)) for d in reranked]
1440
+ budget_tokens = int(sum(c.tokens for c in chunks) * b)
1441
+ packed = budget_pack(chunks, budget=budget_tokens)
1442
+ rows.append({
1443
+ "query": q,
1444
+ "budget": b,
1445
+ "packed_tokens": sum(c.tokens for c in packed),
1446
+ "orig_tokens": sum(c.tokens for c in chunks),
1447
+ "save_ratio": 1 - (sum(c.tokens for c in packed) / max(1, sum(c.tokens for c in chunks))),
1448
+ "avg_score": float(np.mean([c.score for c in packed])) if packed else 0.0,
1449
+ })
1450
+
1451
+ os.makedirs(args.out_dir, exist_ok=True)
1452
+ out_path = os.path.join(args.out_dir, "budget_sweep.jsonl")
1453
+ with open(out_path, "w", encoding="utf-8") as f:
1454
+ for r in rows:
1455
+ f.write(json.dumps(r, ensure_ascii=False) + "\n")
1456
+ print(f"saved results -> {out_path} ; points={len(rows)}")
1457
+
1458
+ if args.save_plots:
1459
+ try:
1460
+ import matplotlib.pyplot as plt # noqa: F401
1461
+ import matplotlib.pyplot as _plt
1462
+ except Exception:
1463
+ print("[warn] matplotlib not installed; install dev extras: pip install -e .[dev]")
1464
+ else:
1465
+ # Aggregate by budget
1466
+ import collections
1467
+ agg = collections.defaultdict(list)
1468
+ for r in rows:
1469
+ agg[r["budget"]].append(r)
1470
+ budgets_sorted = sorted(agg.keys())
1471
+ avg_save = [float(np.mean([x["save_ratio"] for x in agg[b]])) for b in budgets_sorted]
1472
+ avg_score = [float(np.mean([x["avg_score"] for x in agg[b]])) for b in budgets_sorted]
1473
+
1474
+ _plt.figure()
1475
+ _plt.plot([b * 100 for b in budgets_sorted], [s * 100 for s in avg_save], marker="o")
1476
+ _plt.xlabel("Budget (%)")
1477
+ _plt.ylabel("Avg Save Ratio (%)")
1478
+ _plt.title("Budget Sweep: Save Ratio vs Budget")
1479
+ _plt.grid(True)
1480
+ _plt.tight_layout()
1481
+ _plt.savefig(os.path.join(args.out_dir, "budget_sweep.png")),
1482
+
1483
+ _plt.figure()
1484
+ _plt.plot([s * 100 for s in avg_save], avg_score, marker="o")
1485
+ _plt.xlabel("Save Ratio (%)")
1486
+ _plt.ylabel("Avg Score (packed)")
1487
+ _plt.title("Pareto: Quality vs Savings")
1488
+ _plt.grid(True)
1489
+ _plt.tight_layout()
1490
+ _plt.savefig(os.path.join(args.out_dir, "budget_pareto.png")),
1491
+ print("plots ->", os.path.join(args.out_dir, "budget_sweep.png"), ",", os.path.join(args.out_dir, "budget_pareto.png"))
1492
+
1493
+ def scaling(args: argparse.Namespace) -> None:
1494
+ def make_synth(n: int, seed: int = 42):
1495
+ rng = np.random.default_rng(seed)
1496
+ tokens = np.clip(rng.lognormal(4.0, 0.6, n).astype(int), 5, 2000)
1497
+ score = rng.normal(0, 1, n)
1498
+ return [Chunk(text="x" * int(t * 4), score=float(s), tokens=int(t)) for s, t in zip(score, tokens)]
1499
+
1500
+ for n in [1000, 5000, 10000, 20000, 50000, 100000]:
1501
+ if n > args.n_max:
1502
+ break
1503
+ chunks = make_synth(n)
1504
+ budget = int(sum(c.tokens for c in chunks) * args.budget)
1505
+ t0 = time.perf_counter()
1506
+ _ = budget_pack(chunks, budget)
1507
+ ms = (time.perf_counter() - t0) * 1000
1508
+ print(f"n={n:6d} budget={args.budget:.0%} time={ms:8.2f} ms")
1509
+
1510
+ def dp_curve(args: argparse.Namespace) -> None:
1511
+ def make_synth(n: int, seed: int = 123, corr: float = 0.6):
1512
+ rng = np.random.default_rng(seed)
1513
+ true_rel = rng.normal(0, 1, n)
1514
+ noise = rng.normal(0, 1, n) * np.sqrt(1 - corr**2)
1515
+ score = corr * true_rel + noise
1516
+ tokens = np.clip(rng.lognormal(4.0, 0.6, n).astype(int), 5, 2000)
1517
+ chunks = [Chunk(text="x" * int(t * 4), score=float(s), tokens=int(t)) for s, t in zip(score, tokens)]
1518
+ return chunks, true_rel
1519
+
1520
+ def optimal(chunks: Sequence[Chunk], values: np.ndarray, budget: int) -> float:
1521
+ B = budget
1522
+ dp = np.zeros(B + 1, dtype=np.float32)
1523
+ for i, ch in enumerate(chunks):
1524
+ wt = ch.tokens
1525
+ val = max(0.0, float(values[i]))
1526
+ for b in range(B, wt - 1, -1):
1527
+ dp[b] = max(dp[b], dp[b - wt] + val)
1528
+ return float(dp[B])
1529
+
1530
+ chunks, true_rel = make_synth(args.n)
1531
+ total = sum(c.tokens for c in chunks)
1532
+ budgets = [int(total * b / 100.0) for b in range(args.b_min, args.b_max + 1, args.b_step)]
1533
+ out_rows = []
1534
+
1535
+ for B in budgets:
1536
+ sel = budget_pack(chunks, B)
1537
+ idx_map = {id(c): i for i, c in enumerate(chunks)}
1538
+ rel_bp = float(np.sum([max(0.0, true_rel[idx_map[id(c)]]) for c in sel]))
1539
+ rel_opt = optimal(chunks[: args.n_opt], true_rel[: args.n_opt], min(B, sum(c.tokens for c in chunks[: args.n_opt])))
1540
+ pct = rel_bp / max(rel_opt, 1e-9)
1541
+ out_rows.append({"budget": B, "pct": pct, "rel_bp": rel_bp, "rel_opt": rel_opt})
1542
+ print(f"budget={B:8d} rel_bp={rel_bp:8.3f} rel_opt≈{rel_opt:8.3f} pct≈{pct*100:5.1f}% (subset n={args.n_opt})")
1543
+
1544
+ if args.save_plots:
1545
+ try:
1546
+ import matplotlib.pyplot as plt # noqa: F401
1547
+ import matplotlib.pyplot as _plt
1548
+ except Exception:
1549
+ print("[warn] matplotlib not installed; install dev extras: pip install -e .[dev]")
1550
+ else:
1551
+ _plt.figure()
1552
+ xs = [r["budget"] * 100.0 / total for r in out_rows]
1553
+ ys = [r["pct"] * 100 for r in out_rows]
1554
+ _plt.plot(xs, ys, marker="o")
1555
+ _plt.xlabel("Budget (%)")
1556
+ _plt.ylabel("% of optimal (subset)")
1557
+ _plt.title("DP Curve: Greedy vs Optimal")
1558
+ _plt.grid(True)
1559
+ _plt.tight_layout()
1560
+ os.makedirs(args.out_dir, exist_ok=True)
1561
+ _plt.savefig(os.path.join(args.out_dir, "dp_curve.png")),
1562
+ print("plot ->", os.path.join(args.out_dir, "dp_curve.png")),
1563
+
1564
+ def compare_haystack(args: argparse.Namespace) -> None:
1565
+ try:
1566
+ from haystack.nodes import BM25Retriever, SentenceTransformersRetriever
1567
+ from haystack.document_stores import InMemoryDocumentStore
1568
+ except Exception as e: # pragma: no cover
1569
+ raise RuntimeError("Install extras: pip install .[haystack]") from e
1570
+
1571
+ corpus = build_corpus(args.corpus)
1572
+ docs = [{"content": d.text, "meta": {"id": d.id}} for d in corpus]
1573
+ store = InMemoryDocumentStore(use_bm25=True)
1574
+ store.write_documents(docs)
1575
+
1576
+ bm25 = BM25Retriever(document_store=store)
1577
+ dretr = SentenceTransformersRetriever(document_store=store, model_name_or_path=args.model)
1578
+
1579
+ queries = [r["query"] for r in load_jsonl(args.queries)][: args.max_q]
1580
+ for q in queries:
1581
+ t0 = time.perf_counter()
1582
+ bm = bm25.retrieve(q, top_k=args.k)
1583
+ dn = dretr.retrieve(q, top_k=args.k)
1584
+ ms = (time.perf_counter() - t0) * 1000
1585
+ print(f"{q[:40]:40s} bm25={len(bm):3d} dense={len(dn):3d} time={ms:7.2f} ms")
1586
+
1587
+ def main() -> None:
1588
+ ap = argparse.ArgumentParser(prog="crom-bench")
1589
+ sub = ap.add_subparsers(dest="cmd", required=True)
1590
+
1591
+ p = sub.add_parser("e2e", help="end-to-end: retrieval → rerank → pack → mock LLM")
1592
+ p.add_argument("--corpus", default="examples/corpus/sample_docs.jsonl")
1593
+ p.add_argument("--queries", default="examples/corpus/sample_queries.jsonl")
1594
+ p.add_argument("--model", default="sentence-transformers/all-MiniLM-L6-v2")
1595
+ p.add_argument("--k", type=int, default=200)
1596
+ p.add_argument("--alpha", type=float, default=0.5)
1597
+ p.add_argument("--budget", type=float, default=0.3)
1598
+ # plugins
1599
+ p.add_argument("--use-flashrank", action="store_true")
1600
+ p.add_argument("--flashrank-model", default="ms-marco-TinyBERT-L-2-v2")
1601
+ p.add_argument("--use-llmlingua", action="store_true")
1602
+ p.add_argument("--compress-ratio", type=float, default=0.6)
1603
+ p.add_argument("--use-evidently", action="store_true")
1604
+
1605
+ p.add_argument("--out-dir", default="benchmarks/out")
1606
+ p.set_defaults(func=e2e)
1607
+
1608
+ p2 = sub.add_parser("sweep", help="budget sweep + Pareto csv")
1609
+ p2.add_argument("--corpus", default="examples/corpus/sample_docs.jsonl")
1610
+ p2.add_argument("--queries", default="examples/corpus/sample_queries.jsonl")
1611
+ p2.add_argument("--model", default="sentence-transformers/all-MiniLM-L6-v2")
1612
+ p2.add_argument("--k", type=int, default=200)
1613
+ p2.add_argument("--alpha", type=float, default=0.5)
1614
+ p2.add_argument("--b-min", type=int, default=10)
1615
+ p2.add_argument("--b-max", type=int, default=90)
1616
+ p2.add_argument("--b-step", type=int, default=10)
1617
+ p2.add_argument("--max-q", type=int, default=20)
1618
+ p2.add_argument("--out-dir", default="benchmarks/out")
1619
+ p2.add_argument("--save-plots", action="store_true")
1620
+ p2.set_defaults(func=budget_sweep)
1621
+
1622
+ p3 = sub.add_parser("scale", help="scaling runtime with synthetic data")
1623
+ p3.add_argument("--n-max", type=int, default=100000)
1624
+ p3.add_argument("--budget", type=float, default=0.3)
1625
+ p3.set_defaults(func=scaling)
1626
+
1627
+ p4 = sub.add_parser("dp-curve", help="% of optimal vs budget (synthetic)")
1628
+ p4.add_argument("--n", type=int, default=2000)
1629
+ p4.add_argument("--n-opt", type=int, default=200)
1630
+ p4.add_argument("--b-min", type=int, default=10)
1631
+ p4.add_argument("--b-max", type=int, default=90)
1632
+ p4.add_argument("--b-step", type=int, default=10)
1633
+ p4.add_argument("--out-dir", default="benchmarks/out")
1634
+ p4.add_argument("--save-plots", action="store_true")
1635
+ p4.set_defaults(func=dp_curve)
1636
+
1637
+ p5 = sub.add_parser("haystack-compare", help="compare BM25 vs dense retrievers (Haystack)")
1638
+ p5.add_argument("--corpus", default="examples/corpus/sample_docs.jsonl")
1639
+ p5.add_argument("--queries", default="examples/corpus/sample_queries.jsonl")
1640
+ p5.add_argument("--model", default="sentence-transformers/all-MiniLM-L6-v2")
1641
+ p5.add_argument("--k", type=int, default=50)
1642
+ p5.add_argument("--max-q", type=int, default=10)
1643
+ p5.set_defaults(func=compare_haystack)
1644
+
1645
+ args = ap.parse_args()
1646
+ args.func(args)
1647
+
1648
+ if __name__ == "__main__":
1649
+ main()
1650
+ ```
1651
+ ---
1652
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\cross_encoder.py`
1653
+ ```python
1654
+ from typing import List, Optional
1655
+ import logging
1656
+
1657
+ class SafeCrossEncoderManager:
1658
+ """Cross-Encoder 상태를 명시적으로 관리하는 클래스"""
1659
+
1660
+ def __init__(self, model_name: Optional[str] = None, device: str = "cpu"):
1661
+ self.model_name = model_name
1662
+ self.device = device
1663
+ self.model = None
1664
+ self.status = "unknown"
1665
+ self.last_error = None
1666
+
1667
+ self._initialize()
1668
+
1669
+ def _initialize(self):
1670
+ """Cross-Encoder 초기화 with 상세 상태 추적"""
1671
+ if not self.model_name:
1672
+ self.status = "disabled"
1673
+ logging.info("Cross-Encoder: DISABLED (no model specified)")
1674
+ return
1675
+
1676
+ try:
1677
+ # sentence-transformers 임포트 체크
1678
+ from sentence_transformers import CrossEncoder
1679
+
1680
+ # 모델 로딩 시도
1681
+ self.model = CrossEncoder(self.model_name, device=self.device)
1682
+ self.status = f"active ({self.model_name})"
1683
+
1684
+ # 🆕 성공 시 상세 로깅
1685
+ logging.info(f"Cross-Encoder: ACTIVE")
1686
+ logging.info(f" └─ Model: {self.model_name}")
1687
+ logging.info(f" └─ Device: {self.device}")
1688
+
1689
+ except ImportError as e:
1690
+ self.status = "unavailable (sentence-transformers not installed)"
1691
+ self.last_error = str(e)
1692
+
1693
+ # 🆕 의존성 누락 시 명확한 안내
1694
+ logging.warning("Cross-Encoder: UNAVAILABLE")
1695
+ logging.warning(" └─ Reason: sentence-transformers not installed")
1696
+ logging.warning(" └─ Install: pip install sentence-transformers")
1697
+
1698
+ except Exception as e:
1699
+ self.status = f"error ({type(e).__name__})"
1700
+ self.last_error = str(e)
1701
+
1702
+ # 🆕 기타 오류 시 상세 로깅
1703
+ logging.error(f"Cross-Encoder: ERROR")
1704
+ logging.error(f" └─ Model: {self.model_name}")
1705
+ logging.error(f" └─ Error: {str(e)}")
1706
+
1707
+ def get_status_for_response(self) -> str:
1708
+ """API 응답용 상태 문자열""" return self.status
1709
+
1710
+ def rerank(self, query: str, documents: List[str]) -> List[float]:
1711
+ """안전한 리랭킹 with 상태 로깅"""
1712
+ if self.model is None:
1713
+ # 🆕 비활성화 상태 명시적 로깅
1714
+ logging.debug(f"Cross-Encoder rerank skipped: {self.status}")
1715
+ return [0.5] * len(documents) # 중립 점수
1716
+
1717
+ try:
1718
+ pairs = [(query, doc) for doc in documents]
1719
+ scores = self.model.predict(pairs)
1720
+
1721
+ # 🆕 성공적 리랭킹 로깅
1722
+ logging.debug(f"Cross-Encoder reranked {len(documents)} documents")
1723
+
1724
+ return scores.tolist() if hasattr(scores, 'tolist') else list(scores)
1725
+
1726
+ except Exception as e:
1727
+ # 🆕 런타임 오류 시 상세 로깅
1728
+ logging.error(f"Cross-Encoder rerank failed: {str(e)}")
1729
+ logging.error(f" └─ Fallback: returning neutral scores")
1730
+ return [0.5] * len(documents)
1731
+ ```
1732
+ ---
1733
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\demo.py`
1734
+ ```python
1735
+ """
1736
+ Demo & Metrics Server for CRoM-EfficientLLM
1737
+ ------------------------------------------
1738
+ - `crom-demo demo` : run sample pipeline
1739
+ - `crom-demo serve` : start Flask + Prometheus metrics on :8000
1740
+ """
1741
+ from __future__ import annotations
1742
+
1743
+ import argparse
1744
+ from typing import List
1745
+
1746
+ from flask import Flask, Response
1747
+ from prometheus_client import Counter, Gauge, generate_latest, CONTENT_TYPE_LATEST
1748
+
1749
+ from crom_efficientllm.budget_packer.packer import budget_pack, pack_summary, Chunk
1750
+ from crom_efficientllm.rerank_engine.rerank import hybrid_rerank
1751
+ from crom_efficientllm.drift_estimator.estimator import DriftEstimator, DriftMode
1752
+
1753
+ # ---- Prometheus metrics ----
1754
+ TOKENS_SAVED = Gauge("crom_tokens_saved", "Tokens saved by budget packer")
1755
+ DRIFT_ALERTS = Counter("crom_drift_alerts_total", "Total drift alerts emitted")
1756
+
1757
+ class DummyEmbed:
1758
+ def encode(self, text_or_list, convert_to_numpy=False):
1759
+ if isinstance(text_or_list, list):
1760
+ return [self.encode(t) for t in text_or_list]
1761
+ vec = [ord(c) % 7 for c in str(text_or_list)[:16]]
1762
+ while len(vec) < 16:
1763
+ vec.append(0)
1764
+ return vec
1765
+
1766
+ def run_demo() -> None:
1767
+ chunks: List[Chunk] = [
1768
+ Chunk(text="AI ethics is crucial", score=0.9, tokens=50),
1769
+ Chunk(text="Unrelated text", score=0.2, tokens=40),
1770
+ Chunk(text="Drift detection research", score=0.8, tokens=60),
1771
+ ]
1772
+ packed = budget_pack(chunks, budget=80)
1773
+ summary = pack_summary(packed)
1774
+ print("Packed:", [c.text for c in packed], summary)
1775
+
1776
+ docs = [{"text": "AI drift measurement"}, {"text": "Cooking recipes"}]
1777
+ reranked = hybrid_rerank("AI ethics", docs, DummyEmbed(), alpha=0.5)
1778
+ print("Reranked:", [d["text"] for d in reranked])
1779
+
1780
+ de = DriftEstimator(threshold=0.5, mode=DriftMode.L2)
1781
+ print("Drift state:", de.state())
1782
+ print("Drift alert?", de.update([1, 2, 3]))
1783
+ print("Drift alert?", de.update([10, 10, 10]))
1784
+ print("Drift state:", de.state())
1785
+
1786
+ # Update metrics
1787
+ TOKENS_SAVED.set(max(0, sum(c.tokens for c in chunks) - summary["tokens"]))
1788
+ alert1, *_ = de.update([1, 2, 3])
1789
+ alert2, *_ = de.update([10, 10, 10])
1790
+ if alert1:
1791
+ DRIFT_ALERTS.inc()
1792
+ if alert2:
1793
+ DRIFT_ALERTS.inc()
1794
+
1795
+ def create_app() -> Flask:
1796
+ app = Flask(__name__)
1797
+
1798
+ @app.get("/healthz")
1799
+ def healthz():
1800
+ return {"status": "ok"}
1801
+
1802
+ @app.get("/metrics")
1803
+ def metrics():
1804
+ return Response(generate_latest(), mimetype=CONTENT_TYPE_LATEST)
1805
+
1806
+ return app
1807
+
1808
+ def main() -> None:
1809
+ parser = argparse.ArgumentParser(prog="crom-demo")
1810
+ sub = parser.add_subparsers(dest="cmd", required=True)
1811
+ sub.add_parser("demo", help="run sample pipeline")
1812
+
1813
+ pserve = sub.add_parser("serve", help="start metrics server on :8000")
1814
+ pserve.add_argument("--host", default="0.0.0.0")
1815
+ pserve.add_argument("--port", type=int, default=8000)
1816
+
1817
+ args = parser.parse_args()
1818
+
1819
+ if args.cmd == "demo":
1820
+ run_demo()
1821
+ return
1822
+
1823
+ if args.cmd == "serve":
1824
+ app = create_app()
1825
+ app.run(host=args.host, port=args.port)
1826
+ return
1827
+
1828
+ if __name__ == "__main__":
1829
+ main()
1830
+ ```
1831
+ ---
1832
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\server.py`
1833
+ ```python
1834
+ from fastapi import FastAPI, HTTPException
1835
+ import time
1836
+ from typing import List, Dict
1837
+ import logging
1838
+
1839
+ # 내부 모듈 임포트
1840
+ from .budget_packer import enhanced_greedy_pack
1841
+ from .cross_encoder import SafeCrossEncoderManager
1842
+ from .capsule_logger import ExplainCapsuleLogger
1843
+
1844
+ # --- FastAPI 앱 및 주요 컴포넌트 초기화 ---
1845
+
1846
+ app = FastAPI(
1847
+ title="CRoM-EfficientLLM Server",
1848
+ description="Context Reranking and Management for Efficient LLMs",
1849
+ version="1.0.1"
1850
+ )
1851
+
1852
+ logging.basicConfig(level=logging.INFO)
1853
+
1854
+ # 컴포넌트 인스턴스화
1855
+ # TODO: 설정 파일(config.yaml)에서 모델 이름 등을 로드하도록 개선 필요
1856
+ ce_manager = SafeCrossEncoderManager(model_name="ms-marco-TinyBERT-L-2-v2")
1857
+ capsule_logger = ExplainCapsuleLogger(log_directory="artifacts/logs")
1858
+
1859
+
1860
+ # --- 응답 스키마 및 헬퍼 함수 ---
1861
+
1862
+ class ProcessResponseV2:
1863
+ """확장된 /process 엔드포인트 응답 스키마 헬퍼"""
1864
+
1865
+ @staticmethod
1866
+ def create_response(query: str, packed_chunks: List[Dict],
1867
+ processing_stats: Dict, cross_encoder_status: str,
1868
+ processing_time: float) -> Dict:
1869
+ """개선된 응답 생성"""
1870
+
1871
+ response = {
1872
+ "success": True,
1873
+ "query": query,
1874
+ "chunks": packed_chunks,
1875
+ "stats": processing_stats, # packing 통계
1876
+ "meta": {
1877
+ "cross_encoder_status": cross_encoder_status,
1878
+ "processing_time_ms": processing_time * 1000,
1879
+ "timestamp": time.time()
1880
+ }
1881
+ }
1882
+ return response
1883
+
1884
+ # --- API 엔드포인트 정의 ---
1885
+
1886
+ @app.post("/process", summary="Rerank and pack text chunks")
1887
+ def process_chunks(query: str, chunks: List[Dict], budget: int = 4096):
1888
+ """
1889
+ 주어진 쿼리와 청크 목록을 리랭킹하고 예산에 맞게 패킹합니다.
1890
+ """
1891
+ start_time = time.time()
1892
+
1893
+ try:
1894
+ # 1. Cross-Encoder로 리랭킹 (활성화 시)
1895
+ doc_texts = [chunk.get("text", "") for chunk in chunks]
1896
+ scores = ce_manager.rerank(query, doc_texts)
1897
+ for chunk, score in zip(chunks, scores):
1898
+ chunk["score"] = score
1899
+
1900
+ # 2. 예산에 맞게 패킹
1901
+ packed_chunks, stats = enhanced_greedy_pack(chunks, budget=budget, score_key="score")
1902
+
1903
+ # 3. 최종 응답 생성
1904
+ processing_time = time.time() - start_time
1905
+ response_data = ProcessResponseV2.create_response(
1906
+ query=query,
1907
+ packed_chunks=packed_chunks,
1908
+ processing_stats=stats,
1909
+ cross_encoder_status=ce_manager.get_status_for_response(),
1910
+ processing_time=processing_time
1911
+ )
1912
+
1913
+ # 4. 설명 캡슐 로깅
1914
+ capsule = capsule_logger.create_explain_capsule(
1915
+ query=query,
1916
+ response_data=response_data,
1917
+ processing_stats=stats,
1918
+ cross_encoder_status=ce_manager.get_status_for_response()
1919
+ )
1920
+ capsule_logger.log_capsule(capsule)
1921
+
1922
+ return response_data
1923
+
1924
+ except Exception as e:
1925
+ logging.error(f"Error during /process: {e}", exc_info=True)
1926
+ # 오류 로깅
1927
+ capsule_logger.log_error({
1928
+ "endpoint": "/process",
1929
+ "error": str(e),
1930
+ "query": query,
1931
+ })
1932
+ raise HTTPException(status_code=500, detail=f"Internal Server Error: {e}")
1933
+
1934
+ @app.get("/healthz", summary="Health check")
1935
+ def health_check():
1936
+ """서버의 상태를 확인합니다."""
1937
+ return {"status": "ok", "cross_encoder": ce_manager.get_status_for_response()}
1938
+
1939
+ @app.get("/metrics", summary="Get Prometheus metrics")
1940
+ def get_metrics():
1941
+ """Prometheus 메트릭을 노출합니다."""
1942
+ # TODO: Prometheus-client를 사용하여 실제 메트릭을 구현해야 함
1943
+ return {"message": "Metrics endpoint is active. Implement with prometheus-client."}
1944
+ ```
1945
+ ---
1946
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\tests\\test_drift.py`
1947
+ ```python
1948
+ from crom_efficientllm.drift_estimator.estimator import DriftEstimator, DriftMode
1949
+
1950
+ def test_drift_triggers():
1951
+ de = DriftEstimator(threshold=0.1, mode=DriftMode.L2)
1952
+ alert, dist, ewma = de.update([0, 0, 0])
1953
+ assert alert is False
1954
+ alert, dist, ewma = de.update([1, 0, 0])
1955
+ assert isinstance(alert, bool)
1956
+ ```
1957
+ ---
1958
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\tests\\test_packer.py`
1959
+ ```python
1960
+ from crom_efficientllm.budget_packer.packer import budget_pack, Chunk
1961
+
1962
+ def test_budget_pack_respects_budget():
1963
+ chunks = [Chunk("a", 1.0, 60), Chunk("b", 0.9, 50), Chunk("c", 0.5, 20)]
1964
+ sel = budget_pack(chunks, budget=70)
1965
+ assert sum(c.tokens for c in sel) <= 70
1966
+
1967
+ def test_budget_pack_sorting_stable():
1968
+ chunks = [
1969
+ {"text": "x", "score": 0.9, "tokens": 30},
1970
+ {"text": "y", "score": 0.9, "tokens": 20},
1971
+ {"text": "z", "score": 0.8, "tokens": 10},
1972
+ ]
1973
+ sel = budget_pack(chunks, budget=60)
1974
+ assert [c.text for c in sel] == ["y", "x", "z"]
1975
+ ```
1976
+ ---
1977
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\tests\\test_rerank.py`
1978
+ ```python
1979
+ from crom_efficientllm.rerank_engine.rerank import hybrid_rerank
1980
+
1981
+ class Dummy:
1982
+ def encode(self, text_or_list, convert_to_numpy=False):
1983
+ if isinstance(text_or_list, list):
1984
+ return [self.encode(t) for t in text_or_list]
1985
+ vec = [ord(c) % 5 for c in str(text_or_list)[:8]]
1986
+ while len(vec) < 8:
1987
+ vec.append(0)
1988
+ return vec
1989
+
1990
+ def test_hybrid_rerank_returns_scores():
1991
+ docs = [{"text": "alpha"}, {"text": "beta"}]
1992
+ out = hybrid_rerank("alp", docs, Dummy(), alpha=0.5)
1993
+ assert len(out) == 2
1994
+ assert {"score_sparse", "score_dense", "score_final"} <= set(out[0].keys())
1995
+ ```
1996
+ ---
1997
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\budget_packer\\__init__.py`
1998
+ ```python
1999
+ from .packer import Chunk, budget_pack, pack_summary
2000
+ __all__ = ["Chunk", "budget_pack", "pack_summary"]
2001
+ ```
2002
+ ---
2003
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\budget_packer\\packer.py`
2004
+ ```python
2005
+ """
2006
+ Budget Packer
2007
+ -------------
2008
+ Greedy packing of highest-scoring chunks under a token budget.
2009
+ - Stable ordering (score desc, tokens asc, original index asc)
2010
+ - Input validation and optional token estimation
2011
+ """
2012
+ from __future__ import annotations
2013
+
2014
+ from dataclasses import dataclass
2015
+ from typing import Any, Iterable, List, Sequence, Tuple, Union, Optional
2016
+
2017
+ @dataclass(frozen=True)
2018
+ class Chunk:
2019
+ text: str
2020
+ score: float
2021
+ tokens: int
2022
+
2023
+ def _estimate_tokens(text: str) -> int:
2024
+ """Lightweight heuristic when `tokens` absent. Avoids heavy tokenizers.
2025
+ Why: keeps demo dependency-light and deterministic.
2026
+ """
2027
+ # approx: 4 chars ≈ 1 token; floor at 1
2028
+ return max(1, len(text) // 4)
2029
+
2030
+ def _coerce_chunk(obj: Union[Chunk, dict], idx: int) -> Chunk:
2031
+ if isinstance(obj, Chunk):
2032
+ return obj
2033
+ if not isinstance(obj, dict):
2034
+ raise TypeError(f"Chunk #{idx} must be Chunk or dict, got {type(obj)}")
2035
+ text = str(obj.get("text", ""))
2036
+ if not text:
2037
+ raise ValueError(f"Chunk #{idx} has empty text")
2038
+ score = float(obj.get("score", 0.0))
2039
+ tokens = int(obj["tokens"]) if "tokens" in obj else _estimate_tokens(text)
2040
+ if tokens <= 0:
2041
+ raise ValueError(f"Chunk #{idx} has non-positive tokens: {tokens}")
2042
+ return Chunk(text=text, score=score, tokens=tokens)
2043
+
2044
+ def budget_pack(
2045
+ text_chunks: Sequence[Union[Chunk, dict]],
2046
+ budget: int = 1000,
2047
+ ) -> List[Chunk]:
2048
+ """
2049
+ Args:
2050
+ text_chunks: iterable of Chunk or dict with keys {text, score, tokens}
2051
+ budget: max token budget (int > 0)
2052
+ Returns:
2053
+ list of selected chunks (order of selection)
2054
+ """
2055
+ if budget <= 0:
2056
+ raise ValueError("budget must be > 0")
2057
+
2058
+ coerced: List[Chunk] = [_coerce_chunk(c, i) for i, c in enumerate(text_chunks)]
2059
+
2060
+ # stable sort by (-score, tokens, original_index)
2061
+ indexed: List[Tuple[int, Chunk]] = list(enumerate(coerced))
2062
+ indexed.sort(key=lambda it: (-it[1].score, it[1].tokens, it[0]))
2063
+
2064
+ selected: List[Chunk] = []
2065
+ total = 0
2066
+ for _, ch in indexed:
2067
+ if total + ch.tokens <= budget:
2068
+ selected.append(ch)
2069
+ total += ch.tokens
2070
+ return selected
2071
+
2072
+ def pack_summary(selected: Sequence[Chunk]) -> dict:
2073
+ tokens = sum(c.tokens for c in selected)
2074
+ return {
2075
+ "num_chunks": len(selected),
2076
+ "tokens": tokens,
2077
+ "avg_score": (sum(c.score for c in selected) / len(selected)) if selected else 0.0,
2078
+ }
2079
+ ```
2080
+ ---
2081
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\drift_estimator\\__init__.py`
2082
+ ```python
2083
+ from .estimator import DriftEstimator, DriftMode
2084
+ __all__ = ["DriftEstimator", "DriftMode"]
2085
+ ```
2086
+ ---
2087
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\drift_estimator\\estimator.py`
2088
+ ```python
2089
+ """
2090
+ Drift Estimator
2091
+ ---------------
2092
+ Monitors embedding shift using L2 or cosine distance.
2093
+ Supports EWMA smoothing and exposes state for dashboards.
2094
+ """
2095
+ from __future__ import annotations
2096
+
2097
+ from dataclasses import dataclass, field
2098
+ from enum import Enum
2099
+ from typing import List, Optional, Tuple
2100
+ import numpy as np
2101
+
2102
+ class DriftMode(str, Enum):
2103
+ L2 = "l2"
2104
+ COSINE = "cosine"
2105
+
2106
+ @dataclass
2107
+ class DriftEstimator:
2108
+ threshold: float = 0.2
2109
+ mode: DriftMode = DriftMode.L2
2110
+ ewma_alpha: float = 0.3 # smoothing for stability
2111
+
2112
+ history: List[np.ndarray] = field(default_factory=list)
2113
+ distances: List[float] = field(default_factory=list)
2114
+ ewma: Optional[float] = None
2115
+
2116
+ def _distance(self, a: np.ndarray, b: np.ndarray) -> float:
2117
+ a = np.asarray(a, dtype=np.float32).ravel()
2118
+ b = np.asarray(b, dtype=np.float32).ravel()
2119
+ if self.mode == DriftMode.L2:
2120
+ return float(np.linalg.norm(a - b))
2121
+ # cosine distance = 1 - cosine similarity
2122
+ denom = (np.linalg.norm(a) * np.linalg.norm(b)) + 1e-12
2123
+ return float(1.0 - float(np.dot(a, b)) / denom)
2124
+
2125
+ def update(self, embedding) -> Tuple[bool, float, float]:
2126
+ """
2127
+ Args:
2128
+ embedding: vector representation of current response
2129
+ Returns:
2130
+ (drift_alert, distance, ewma)
2131
+ """
2132
+ emb = np.asarray(embedding, dtype=np.float32)
2133
+ if emb.ndim != 1:
2134
+ emb = emb.ravel()
2135
+
2136
+ if not self.history:
2137
+ self.history.append(emb)
2138
+ self.ewma = 0.0
2139
+ self.distances.append(0.0)
2140
+ return (False, 0.0, 0.0)
2141
+
2142
+ last = self.history[-1]
2143
+ dist = self._distance(emb, last)
2144
+ self.history.append(emb)
2145
+ self.distances.append(dist)
2146
+
2147
+ # EWMA update
2148
+ if self.ewma is None:
2149
+ self.ewma = dist
2150
+ else:
2151
+ self.ewma = self.ewma_alpha * dist + (1 - self.ewma_alpha) * self.ewma
2152
+
2153
+ return (bool(self.ewma > self.threshold), float(dist), float(self.ewma))
2154
+
2155
+ def state(self) -> dict:
2156
+ return {
2157
+ "count": len(self.history),
2158
+ "last_distance": self.distances[-1] if self.distances else 0.0,
2159
+ "ewma": self.ewma or 0.0,
2160
+ "mode": self.mode.value,
2161
+ "threshold": self.threshold,
2162
+ }
2163
+ ```
2164
+ ---
2165
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\plugins\\evidently_drift.py`
2166
+ ```python
2167
+ from __future__ import annotations
2168
+ from typing import List
2169
+
2170
+ try:
2171
+ from evidently.metric_preset import DataDriftPreset
2172
+ from evidently.report import Report
2173
+ import pandas as pd
2174
+ except Exception as e: # pragma: no cover
2175
+ raise RuntimeError("evidently not installed. Install extras: pip install .[plugins]") from e
2176
+
2177
+ def drift_report(ref: List[List[float]], cur: List[List[float]]):
2178
+ ref_df = pd.DataFrame(ref)
2179
+ cur_df = pd.DataFrame(cur)
2180
+ rep = Report(metrics=[DataDriftPreset()])
2181
+ rep.run(reference_data=ref_df, current_data=cur_df)
2182
+ return rep
2183
+ ```
2184
+ ---
2185
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\plugins\\flashrank_reranker.py`
2186
+ ```python
2187
+ from __future__ import annotations
2188
+ from typing import List, Dict
2189
+
2190
+ try:
2191
+ from flashrank import Reranker
2192
+ except Exception as e: # pragma: no cover
2193
+ raise RuntimeError("flashrank not installed. Install extras: pip install .[plugins]") from e
2194
+
2195
+ def flashrank_rerank(query: str, docs: List[Dict[str, str]], model_name: str = "ms-marco-TinyBERT-L-2-v2") -> List[Dict]:
2196
+ rr = Reranker(model_name)
2197
+ pairs = [(query, d["text"]) for d in docs]
2198
+ scores = rr.rerank(pairs)
2199
+ order = sorted(range(len(docs)), key=lambda i: -scores[i])
2200
+ return [docs[i] | {"score_flashrank": float(scores[i])} for i in order]
2201
+ ```
2202
+ ---
2203
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\plugins\\llmlingua_compressor.py`
2204
+ ```python
2205
+ from __future__ import annotations
2206
+
2207
+ try:
2208
+ from llmlingua import PromptCompressor
2209
+ except Exception as e: # pragma: no cover
2210
+ raise RuntimeError("llmlingua not installed. Install extras: pip install .[plugins]") from e
2211
+
2212
+ def compress_prompt(text: str, target_ratio: float = 0.5) -> str:
2213
+ pc = PromptCompressor()
2214
+ out = pc.compress(text, target_ratio=target_ratio)
2215
+ return out["compressed_prompt"] if isinstance(out, dict) and "compressed_prompt" in out else str(out)
2216
+ ```
2217
+ ---
2218
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\rerank_engine\\__init__.py`
2219
+ ```python
2220
+ from .rerank import hybrid_rerank
2221
+ __all__ = ["hybrid_rerank"]
2222
+ ```
2223
+ ---
2224
+ ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\rerank_engine\\rerank.py`
2225
+ ```python
2226
+ """
2227
+ Hybrid Rerank Engine
2228
+ --------------------
2229
+ Combines sparse (TF-IDF cosine) and dense (embedding cosine) scores with
2230
+ min-max normalization for robust fusion.
2231
+ """
2232
+ from __future__ import annotations
2233
+
2234
+ from typing import Dict, List, Sequence
2235
+ import numpy as np
2236
+ from sklearn.feature_extraction.text import TfidfVectorizer
2237
+ from sklearn.metrics.pairwise import cosine_similarity
2238
+
2239
+ def _to_numpy(x):
2240
+ arr = np.asarray(x)
2241
+ return arr.astype(np.float32)
2242
+
2243
+ def _batch_encode(embed_model, texts: Sequence[str]) -> np.ndarray:
2244
+ # Try common API of sentence-transformers: encode(list, convert_to_numpy=True)
2245
+ if hasattr(embed_model, "encode"):
2246
+ try:
2247
+ return _to_numpy(embed_model.encode(list(texts), convert_to_numpy=True))
2248
+ except TypeError:
2249
+ # Fallback: per-text encode
2250
+ return _to_numpy([embed_model.encode(t) for t in texts])
2251
+ raise TypeError("embed_model must provide .encode()")
2252
+
2253
+ def _minmax(x: np.ndarray) -> np.ndarray:
2254
+ if x.size == 0:
2255
+ return x
2256
+ mn, mx = float(np.min(x)), float(np.max(x))
2257
+ if mx - mn <= 1e-12:
2258
+ return np.zeros_like(x)
2259
+ return (x - mn) / (mx - mn)
2260
+
2261
+ def hybrid_rerank(
2262
+ query: str,
2263
+ docs: List[Dict[str, str]],
2264
+ embed_model,
2265
+ alpha: float = 0.5,
2266
+ ) -> List[Dict[str, object]]:
2267
+ """
2268
+ Args:
2269
+ query: query string
2270
+ docs: list of {"text": str}
2271
+ embed_model: model with .encode() -> vector(s)
2272
+ alpha: weight between sparse/dense in [0,1]
2273
+ Returns:
2274
+ ranked list of enriched docs with scores {score_sparse, score_dense, score_final}
2275
+ """
2276
+ if not 0.0 <= alpha <= 1.0:
2277
+ raise ValueError("alpha must be in [0, 1]")
2278
+ if not docs:
2279
+ return []
2280
+
2281
+ texts = [d.get("text", "") for d in docs]
2282
+
2283
+ # Sparse: TF-IDF cosine
2284
+ tfidf = TfidfVectorizer(ngram_range=(1, 2), min_df=1).fit(texts)
2285
+ Q = tfidf.transform([query])
2286
+ D = tfidf.transform(texts)
2287
+ sparse_scores = cosine_similarity(Q, D).ravel()
2288
+
2289
+ # Dense: cosine(sim) between L2-normalized embeddings
2290
+ q_emb = _to_numpy(embed_model.encode(query))
2291
+ d_embs = _batch_encode(embed_model, texts)
2292
+ # L2 normalize
2293
+ def _l2norm(a):
2294
+ n = np.linalg.norm(a, axis=-1, keepdims=True) + 1e-12
2295
+ return a / n
2296
+
2297
+ qn = _l2norm(q_emb.reshape(1, -1))
2298
+ dn = _l2norm(d_embs)
2299
+ dense_scores = cosine_similarity(qn, dn).ravel()
2300
+
2301
+ # Min-max to [0,1] before fusion to avoid scale issues
2302
+ s_sparse = _minmax(sparse_scores)
2303
+ s_dense = _minmax(dense_scores)
2304
+
2305
+ final_scores = alpha * s_sparse + (1 - alpha) * s_dense
2306
+ order = np.argsort(-final_scores)
2307
+
2308
+ ranked = []
2309
+ for i in order:
2310
+ item = dict(docs[i])
2311
+ item.update(
2312
+ score_sparse=float(s_sparse[i]),
2313
+ score_dense=float(s_dense[i]),
2314
+ score_final=float(final_scores[i]),
2315
+ )
2316
+ ranked.append(item)
2317
+ return ranked
2318
+ ```
release_notes.md ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Release v0.2.1
2
+
3
+ ## [0.2.1] - 2025-09-02
4
+ ### Added
5
+ - CLI `--save-plots` option for `sweep` and `dp-curve`; saves PNG charts to `benchmarks/out/` (or `--out-dir`).
6
+ - README Quick Examples mention of plotting flag.
7
+ - This CHANGELOG.
8
+
9
+ ### Changed
10
+ - Dev tooling: recommend `matplotlib` via dev extra for plotting.
11
+
12
+ — generated from [CHANGELOG.md](CHANGELOG.md)