Flamehaven commited on
Commit
aca22b8
·
0 Parent(s):

Initial commit: Add project structure and all source files

Browse files
.github/workflows/ci.yml ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: ci
2
+ on:
3
+ push:
4
+ branches: [ main ]
5
+ pull_request:
6
+
7
+ jobs:
8
+ test:
9
+ runs-on: ubuntu-latest
10
+ strategy:
11
+ matrix:
12
+ python-version: ["3.9", "3.10", "3.11", "3.12"]
13
+ steps:
14
+ - uses: actions/checkout@v4
15
+ - uses: actions/setup-python@v5
16
+ with:
17
+ python-version: ${{ matrix.python-version }}
18
+ - run: pip install -e .[dev]
19
+ - run: pre-commit run --all-files || true
20
+ - run: ruff --version && black --version
21
+ - run: pytest -q
.github/workflows/release.yml ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: release
2
+ on:
3
+ push:
4
+ tags:
5
+ - 'v*'
6
+ jobs:
7
+ release:
8
+ runs-on: ubuntu-latest
9
+ steps:
10
+ - uses: actions/checkout@v4
11
+ with:
12
+ fetch-depth: 0
13
+ - uses: actions/setup-python@v5
14
+ with:
15
+ python-version: '3.11'
16
+ - run: pip install -e .[dev]
17
+ - run: pytest -q
18
+ - name: Build distribution
19
+ run: |
20
+ python -m pip install build
21
+ python -m build
22
+ - name: Generate release notes from CHANGELOG
23
+ run: |
24
+ python scripts/gen_release_notes.py "$GITHUB_REF_NAME"
25
+ - name: Publish GitHub Release
26
+ uses: softprops/action-gh-release@v2
27
+ with:
28
+ name: ${{ github.ref_name }}
29
+ body_path: release_notes.md
30
+ files: |
31
+ dist/*.whl
32
+ dist/*.tar.gz
.gitignore ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *.egg-info/
5
+ .env
6
+ .venv/
7
+ venv/
8
+ .idea/
9
+ .vscode/
10
+ .ipynb_checkpoints/
11
+ .dist/
12
+ .build/
13
+ .coverage
14
+ .pytest_cache/
15
+
16
+ # OS
17
+ .DS_Store
18
+ Thumbs.db
CHANGELOG.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Changelog
2
+
3
+ ## [0.2.1] - 2025-09-02
4
+ ### Added
5
+ - CLI `--save-plots` option for `sweep` and `dp-curve`; saves PNG charts to `benchmarks/out/` (or `--out-dir`).
6
+ - README Quick Examples mention of plotting flag.
7
+ - This CHANGELOG.
8
+
9
+ ### Changed
10
+ - Dev tooling: recommend `matplotlib` via dev extra for plotting.
11
+
12
+ ## [0.2.0] - 2025-09-02
13
+ ### Added
14
+ - GitHub Actions CI (3.9–3.12), pre-commit(ruff/black).
15
+ - `crom-bench` CLI: `e2e`, `sweep`, `scale`, `dp-curve`, `haystack-compare`.
16
+ - Plugins: FlashRank/LLMLingua/Evidently (optional extras).
17
+ - Example corpus & queries (JSONL).
18
+
19
+ ## [0.1.0] - 2025-09-02
20
+ - Initial packaging; budget packer, hybrid rerank, drift estimator, demo & metrics.
LICENSE ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with the Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright [yyyy] [name of copyright owner]
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
README.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CRoM-EfficientLLM: Context Reranking and Management for Efficient LLMs
2
+
3
+ <p align="left">
4
+ <a href="https://github.com/Flamehaven/CRoM-EfficientLLM/actions">
5
+ <img alt="CI" src="https://img.shields.io/github/actions/workflow/status/Flamehaven/CRoM-EfficientLLM/ci.yml?branch=main" />
6
+ </a>
7
+ <a href="#-benchmarks">
8
+ <img alt="Bench" src="https://img.shields.io/badge/benchmarks-ready-success" />
9
+ </a>
10
+ <a href="LICENSE">
11
+ <img alt="License" src="https://img.shields.io/badge/license-Apache%202.0-blue" />
12
+ </a>
13
+ <a href="https://github.com/Flamehaven/CRoM-EfficientLLM/releases">
14
+ <img alt="Release" src="https://img.shields.io/github/v/release/Flamehaven/CRoM-EfficientLLM?display_name=tag" />
15
+ </a>
16
+ <a href="CHANGELOG.md">
17
+ <img alt="Versioning" src="https://img.shields.io/badge/semver-0.2.x-lightgrey" />
18
+ </a>
19
+ <a href="https://github.com/Flamehaven/CRoM-EfficientLLM/releases/latest">
20
+ <img alt="Wheel" src="https://img.shields.io/badge/wheel-available-success" />
21
+ </a>
22
+ </p>
23
+
24
+ **CRoM (Context Rot Mitigation)-EfficientLLM** is a Python toolkit designed to optimize the context provided to Large Language Models (LLMs). It provides a suite of tools to intelligently select, re-rank, and manage text chunks to fit within a model's context budget while maximizing relevance and minimizing performance drift.
25
+
26
+ This project is ideal for developers building RAG (Retrieval-Augmented Generation) pipelines who need to make the most of limited context windows.
27
+
28
+ ## Key Features
29
+
30
+ * **Budget Packer:** Greedily packs the highest-scoring text chunks into a defined token budget using a stable sorting algorithm.
31
+ * **Hybrid Reranker:** Combines sparse (TF-IDF) and dense (Sentence-Transformers) retrieval scores for robust and high-quality reranking of documents.
32
+ * **Drift Estimator:** Monitors the semantic drift between sequential model responses using L2 or cosine distance with EWMA smoothing.
33
+ * **Observability:** Exposes Prometheus metrics for monitoring token savings and drift alerts in production.
34
+ * **Extensible Plugins:** Supports optional plugins for advanced reranking (`FlashRank`), compression (`LLMLingua`), and drift analysis (`Evidently`).
35
+ * **Comprehensive Benchmarking:** Includes a CLI for end-to-end pipeline evaluation, budget sweeps, and quality-vs-optimal analysis.
36
+
37
+ ## Installation
38
+
39
+ Install the package directly from source using pip. For development, it's recommended to install in editable mode with the `[dev]` extras.
40
+
41
+ ```bash
42
+ # Clone the repository
43
+ git clone https://github.com/Flamehaven/CRoM-EfficientLLM.git
44
+ cd CRoM-EfficientLLM
45
+
46
+ # Install in editable mode with development and plugin dependencies
47
+ pip install -e .[dev,plugins]
48
+ ```
49
+
50
+ ## Quickstart
51
+
52
+ ### Demo
53
+
54
+ Run a simple, self-contained demonstration of the core components:
55
+
56
+ ```bash
57
+ # Run the demo script
58
+ crom-demo demo
59
+ ```
60
+
61
+ ### CLI Benchmarking Examples
62
+
63
+ The package includes a powerful `crom-bench` CLI for evaluation.
64
+
65
+ ```bash
66
+ # Default E2E (Search→Rerank→Pack→Mock LLM)
67
+ crom-bench e2e --budget 0.3
68
+
69
+ # Optional: High-precision configuration with plugins
70
+ crom-bench e2e --budget 0.3 \
71
+ --use-flashrank --flashrank-model ms-marco-TinyBERT-L-2-v2 \
72
+ --use-llmlingua --compress-ratio=0.6 \
73
+ --use-evidently
74
+ ```
75
+
76
+ ### Plotting
77
+
78
+ If `matplotlib` is installed (`pip install -e .[dev]`), you can save benchmark plots directly:
79
+
80
+ ```bash
81
+ # Save budget sweep result plots
82
+ crom-bench sweep --save-plots
83
+
84
+ # Save DP-curve plots
85
+ crom-bench dp-curve --save-plots
86
+ ```
87
+
88
+ ## Release & Changelog
89
+
90
+ This project follows semantic versioning. For detailed changes, see the [**CHANGELOG.md**](CHANGELOG.md).
91
+
92
+ Releases are automated via GitHub Actions when a `v*` tag is pushed.
93
+
94
+ ## License
95
+
96
+ This project is licensed under the Apache 2.0 License. See the [LICENSE](LICENSE) file for details.
benchmarks/efficiency_eval.py ADDED
@@ -0,0 +1,220 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Efficiency Evaluation for CRoM-EfficientLLM
3
+ - Synthetic workload to measure token savings, selection quality, and runtime.
4
+ - No third-party deps beyond numpy/matplotlib (pandas optional for CSVs).
5
+
6
+ Usage:
7
+ python benchmarks/efficiency_eval.py --budget 0.3 --n 5000 --seed 123 --plot --save
8
+ """
9
+ from __future__ import annotations
10
+
11
+ import argparse
12
+ import math
13
+ import time
14
+ from dataclasses import dataclass
15
+ from typing import List, Sequence, Tuple, Union
16
+
17
+ import numpy as np
18
+
19
+ try:
20
+ import pandas as pd # optional
21
+ except Exception: # pragma: no cover
22
+ pd = None
23
+
24
+ try:
25
+ import matplotlib.pyplot as plt # optional
26
+ except Exception: # pragma: no cover
27
+ plt = None
28
+
29
+ # --- Local packers (self-contained to avoid imports during quick eval) ---
30
+ @dataclass(frozen=True)
31
+ class Chunk:
32
+ text: str
33
+ score: float
34
+ tokens: int
35
+
36
+ def _estimate_tokens(text: str) -> int:
37
+ return max(1, len(text) // 4)
38
+
39
+ def _coerce_chunk(obj: Union[Chunk, dict], idx: int) -> Chunk:
40
+ if isinstance(obj, Chunk):
41
+ return obj
42
+ if not isinstance(obj, dict):
43
+ raise TypeError(f"Chunk #{idx} must be Chunk or dict, got {type(obj)}")
44
+ text = str(obj.get("text", ""))
45
+ if not text:
46
+ raise ValueError(f"Chunk #{idx} has empty text")
47
+ score = float(obj.get("score", 0.0))
48
+ tokens = int(obj["tokens"]) if "tokens" in obj else _estimate_tokens(text)
49
+ if tokens <= 0:
50
+ raise ValueError(f"Chunk #{idx} has non-positive tokens: {tokens}")
51
+ return Chunk(text=text, score=score, tokens=tokens)
52
+
53
+ def budget_pack(text_chunks: Sequence[Union[Chunk, dict]], budget: int = 1000) -> List[Chunk]:
54
+ if budget <= 0:
55
+ raise ValueError("budget must be > 0")
56
+ coerced: List[Chunk] = [_coerce_chunk(c, i) for i, c in enumerate(text_chunks)]
57
+ indexed = list(enumerate(coerced))
58
+ indexed.sort(key=lambda it: (-it[1].score, it[1].tokens, it[0]))
59
+ selected: List[Chunk] = []
60
+ total = 0
61
+ for _, ch in indexed:
62
+ if total + ch.tokens <= budget:
63
+ selected.append(ch)
64
+ total += ch.tokens
65
+ return selected
66
+
67
+ def pack_fcfs(text_chunks: Sequence[Union[Chunk, dict]], budget: int) -> List[Chunk]:
68
+ sel, total = [], 0
69
+ for i, obj in enumerate(text_chunks):
70
+ ch = _coerce_chunk(obj, i)
71
+ if total + ch.tokens <= budget:
72
+ sel.append(ch)
73
+ total += ch.tokens
74
+ return sel
75
+
76
+ def pack_random(text_chunks: Sequence[Union[Chunk, dict]], budget: int, seed: int = 0) -> List[Chunk]:
77
+ rng = np.random.default_rng(seed)
78
+ indices = np.arange(len(text_chunks))
79
+ rng.shuffle(indices)
80
+ sel, total = [], 0
81
+ for i in indices:
82
+ ch = _coerce_chunk(text_chunks[i], i)
83
+ if total + ch.tokens <= budget:
84
+ sel.append(ch)
85
+ total += ch.tokens
86
+ return sel
87
+
88
+ # --- Data generation and metrics ---
89
+
90
+ def make_synthetic_chunks(n=2000, seed=42, corr=0.6):
91
+ rng = np.random.default_rng(seed)
92
+ true_rel = rng.normal(0, 1, size=n)
93
+ noise = rng.normal(0, 1, size=n) * math.sqrt(1 - corr**2)
94
+ score = corr * true_rel + noise
95
+ tokens = np.clip(rng.lognormal(mean=4.0, sigma=0.6, size=n).astype(int), 5, 2000)
96
+ chunks = [Chunk(text=("x"*int(t*4)), score=float(s), tokens=int(t)) for s, t in zip(score, tokens)]
97
+ return chunks, true_rel
98
+
99
+ def eval_once(n=5000, budget_ratio=0.3, seed=123, corr=0.6):
100
+ chunks, true_rel = make_synthetic_chunks(n=n, seed=seed, corr=corr)
101
+ total_tokens = sum(c.tokens for c in chunks)
102
+ budget = int(total_tokens * budget_ratio)
103
+
104
+ def run(name, fn):
105
+ t0 = time.perf_counter()
106
+ sel = fn(chunks, budget)
107
+ dt = time.perf_counter() - t0
108
+ idx_map = {id(c): i for i, c in enumerate(chunks)}
109
+ picked_idx = [idx_map[id(c)] for c in sel]
110
+ rel_sum = float(np.sum(true_rel[picked_idx])) if picked_idx else 0.0
111
+ sel_tokens = sum(c.tokens for c in sel)
112
+ return {
113
+ "name": name,
114
+ "time_ms": dt*1000,
115
+ "selected_chunks": len(sel),
116
+ "selected_tokens": sel_tokens,
117
+ "tokens_budget": budget,
118
+ "tokens_total_unpacked": total_tokens,
119
+ "tokens_saved": total_tokens - sel_tokens,
120
+ "save_ratio": (total_tokens - sel_tokens)/total_tokens,
121
+ "relevance_sum": rel_sum,
122
+ }
123
+
124
+ rows = [
125
+ run("budget_pack", budget_pack),
126
+ run("fcfs", pack_fcfs),
127
+ run("random", lambda ch, b: pack_random(ch, b, seed=seed)),
128
+ ]
129
+ return rows
130
+
131
+ def quality_vs_optimal(n=200, budget_ratio=0.3, seed=123, corr=0.6):
132
+ chunks, true_rel = make_synthetic_chunks(n=n, seed=seed, corr=corr)
133
+ budget = int(sum(c.tokens for c in chunks) * budget_ratio)
134
+ values = np.maximum(true_rel, 0.0)
135
+
136
+ def optimal(chunks_sub, values, budget):
137
+ items = chunks_sub
138
+ vals = list(values)
139
+ B = budget
140
+ dp = [0.0]*(B+1)
141
+ keep = [[False]*(B+1) for _ in range(len(items))]
142
+ for i, it in enumerate(items):
143
+ wt = it.tokens
144
+ val = vals[i]
145
+ for b in range(B, wt-1, -1):
146
+ alt = dp[b - wt] + val
147
+ if alt > dp[b]:
148
+ dp[b] = alt
149
+ keep[i][b] = True
150
+ b = B
151
+ picked_idx = []
152
+ for i in range(len(items)-1, -1, -1):
153
+ if keep[i][b]:
154
+ picked_idx.append(i)
155
+ b -= items[i].tokens
156
+ picked_idx.reverse()
157
+ rel_sum = float(np.sum([values[i] for i in picked_idx])) if picked_idx else 0.0
158
+ total_tokens = sum(items[i].tokens for i in picked_idx)
159
+ return picked_idx, rel_sum, total_tokens
160
+
161
+ opt_idx, opt_rel, opt_tokens = optimal(chunks, values, budget)
162
+
163
+ # selections
164
+ idx_map = {id(c): i for i, c in enumerate(chunks)}
165
+ def rel_of(selection):
166
+ pid = [idx_map[id(c)] for c in selection]
167
+ return float(np.sum(values[pid])) if pid else 0.0
168
+
169
+ sel_bp = budget_pack(chunks, budget)
170
+ sel_fc = pack_fcfs(chunks, budget)
171
+ sel_rd = pack_random(chunks, budget, seed=seed)
172
+
173
+ rows = [
174
+ {"name":"optimal_true_rel", "relevance_sum": opt_rel, "selected_tokens": opt_tokens, "selected_chunks": len(opt_idx)},
175
+ {"name":"budget_pack_small", "relevance_sum": rel_of(sel_bp), "selected_tokens": sum(c.tokens for c in sel_bp), "selected_chunks": len(sel_bp)},
176
+ {"name":"fcfs_small", "relevance_sum": rel_of(sel_fc), "selected_tokens": sum(c.tokens for c in sel_fc), "selected_chunks": len(sel_fc)},
177
+ {"name":"random_small", "relevance_sum": rel_of(sel_rd), "selected_tokens": sum(c.tokens for c in sel_rd), "selected_chunks": len(sel_rd)},
178
+ ]
179
+ return rows
180
+
181
+ def main():
182
+ ap = argparse.ArgumentParser()
183
+ ap.add_argument("--n", type=int, default=5000)
184
+ ap.add_argument("--budget", type=float, default=0.3)
185
+ ap.add_argument("--seed", type=int, default=123)
186
+ ap.add_argument("--corr", type=float, default=0.6)
187
+ ap.add_argument("--plot", action="store_true")
188
+ ap.add_argument("--save", action="store_true")
189
+ args = ap.parse_args()
190
+
191
+ rows = eval_once(n=args.n, budget_ratio=args.budget, seed=args.seed, corr=args.corr)
192
+ rows_q = quality_vs_optimal(n=min(200, args.n), budget_ratio=args.budget, seed=args.seed, corr=args.corr)
193
+
194
+ print("\n=== Efficiency (n={}, budget={{:.0%}}) ===".format(args.n, args.budget))
195
+ for r in rows:
196
+ print("{name:12s} time={{time_ms:7.2f}}ms save_ratio={{save_ratio:6.3f}} tokens_saved={{tokens_saved:8d}} rel_sum={{relevance_sum:8.3f}}".format(**r))
197
+
198
+ print("\n=== Quality vs Optimal (subset) ===")
199
+ for r in rows_q:
200
+ print("{name:18s} rel_sum={{relevance_sum:8.3f}} tokens={{selected_tokens:5d}} chunks={{selected_chunks:4d}}".format(**r))
201
+
202
+ if pd is not None and args.save:
203
+ pd.DataFrame(rows).to_csv("benchmarks/results_efficiency.csv", index=False)
204
+ pd.DataFrame(rows_q).to_csv("benchmarks/results_quality.csv", index=False)
205
+ print("Saved CSVs to benchmarks حضرتك.")
206
+
207
+ if plt is not None and args.plot:
208
+ # single-figure plots, no explicit colors
209
+ x = [r["name"] for r in rows]
210
+ y = [r["time_ms"] for r in rows]
211
+ import matplotlib.pyplot as plt
212
+ plt.figure()
213
+ plt.bar(x, y)
214
+ plt.title("Packer Runtime (ms)")
215
+ plt.xlabel("method")
216
+ plt.ylabel("ms")
217
+ plt.show()
218
+
219
+ if __name__ == "__main__":
220
+ main()
benchmarks/longbench_eval.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Benchmark script: LongBench-like evaluation.
3
+ Simulates context packing efficiency.
4
+ """
5
+ from crom_efficientllm.budget_packer.packer import budget_pack
6
+
7
+ def evaluate():
8
+ chunks = [{"text": f"chunk {i}", "score": i % 5, "tokens": 100} for i in range(20)]
9
+ packed = budget_pack(chunks, budget=500)
10
+ print("Selected:", len(packed), "chunks")
11
+
12
+ if __name__ == "__main__":
13
+ evaluate()
benchmarks/sample_results.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {}
dashboard/grafana_dashboard.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {}
dashboard/prometheus_config.yml ADDED
File without changes
docs/architecture.md ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ # Architecture
2
+
3
+ This document outlines the architecture of the CRoM-EfficientLLM project.
docs/versioning.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Versioning & PyPI Guidance
2
+
3
+ This document defines package naming, SemVer rules, and a future path to publish to PyPI.
4
+
5
+ ## 1) Package name
6
+ - Distribution name (PyPI): `crom-efficientllm` (lowercase, hyphen-separated)
7
+ - Import name (module): `crom_efficientllm` (PEP 8 underscore)
8
+
9
+ > **Tip**: Keep both names consistent to avoid confusion in docs.
10
+
11
+ ### Check name availability on PyPI
12
+ - Visit: https://pypi.org/project/crom-efficientllm/ (404 → available)
13
+ - If taken, consider: `crom-efficient-llm`, `crom-llm-efficient`, `crom-ctx-pack`
14
+ - Reserve on TestPyPI first: use `test.pypi.org` to validate metadata & upload
15
+
16
+ ## 2) Semantic Versioning (SemVer)
17
+ We follow **MAJOR.MINOR.PATCH**.
18
+
19
+ - **MAJOR**: Backward-incompatible API changes
20
+ - e.g., rename function signatures (`budget_pack`), move/rename modules, change return schemas
21
+ - **MINOR**: Backward-compatible features
22
+ - new functions/flags (e.g., `pack_summary`, CLI subcommands), performance improvements
23
+ - **PATCH**: Backward-compatible bug fixes
24
+ - logic corrections, docs/CI fixes, dependency pin updates without API changes
25
+
26
+ ### Pre-releases
27
+ Use suffixes: `-a.1`, `-b.1`, `-rc.1` (alpha/beta/release-candidate)
28
+ - Example: `0.3.0-rc.1`
29
+
30
+ ### Deprecation Policy
31
+ - Mark deprecated APIs in `CHANGELOG.md` and docstrings
32
+ - Provide at least **one MINOR release** with warnings before removal
33
+
34
+ ### Public API Surface
35
+ We commit compatibility for:
36
+ - `crom_efficientllm.budget_packer.packer`: `Chunk`, `budget_pack`, `pack_summary`
37
+ - `crom_efficientllm.rerank_engine.rerank`: `hybrid_rerank`
38
+ - `crom_efficientllm.drift_estimator.estimator`: `DriftEstimator`, `DriftMode`
39
+ - CLI entrypoints: `crom-demo`, `crom-bench` and their documented flags
40
+
41
+ ## 3) Release Flow (GitHub → PyPI later)
42
+ - Tag: `vX.Y.Z` → GitHub Actions builds & creates a Release (artifacts attached)
43
+ - Keep `CHANGELOG.md` updated per release
44
+ - After API stabilizes, enable **PyPI publish** using a separate workflow with `PYPI_API_TOKEN` secret
45
+
46
+ ### (Future) PyPI publishing steps
47
+ 1. Create a PyPI account & project
48
+ 2. Add `PYPI_API_TOKEN` to repo `Settings → Secrets and variables → Actions`
49
+ 3. Add `release-pypi.yml` workflow to upload on tag
50
+ 4. Verify install: `pip install crom-efficientllm` and import `crom_efficientllm`
51
+
52
+ ---
53
+
54
+ _Last updated: 2025-09-02_
examples/corpus/sample_docs.jsonl ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {"id": 1, "text": "AI ethics and governance frameworks for responsible AI."}
2
+ {"id": 2, "text": "Techniques for detecting model drift in production systems."}
3
+ {"id": 3, "text": "A recipe for sourdough bread and fermentation tips."}
4
+ {"id": 4, "text": "Hybrid search: combining sparse and dense retrieval methods."}
5
+ {"id": 5, "text": "Token budgets and prompt compression strategies for LLMs."}
6
+ {"id": 6, "text": "Monitoring with Prometheus and building Grafana dashboards."}
examples/corpus/sample_queries.jsonl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {"query": "how to detect drift in ai models"}
2
+ {"query": "ways to reduce llm token usage"}
3
+ {"query": "observability stack prometheus grafana"}
pyproject.toml ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [build-system]
2
+ requires = ["setuptools>=68", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "crom-efficientllm"
7
+ version = "0.2.1"
8
+ description = "CRoM (Context Rot Mitigation)-EfficientLLM: Budget packing, hybrid rerank, and drift estimation with observability"
9
+ readme = "README.md"
10
+ requires-python = ">=3.9"
11
+ license = { text = "Apache-2.0" }
12
+ authors = [ { name = "Your Name" } ]
13
+ dependencies = [
14
+ "numpy>=1.24,<3",
15
+ "scikit-learn>=1.3,<2",
16
+ "transformers>=4.41,<5",
17
+ "sentence-transformers>=2.2,<3",
18
+ "flask>=3,<4",
19
+ "prometheus-client>=0.20,<1"
20
+ ]
21
+
22
+ [project.optional-dependencies]
23
+ dev = [
24
+ "pytest>=7",
25
+ "ruff>=0.4",
26
+ "black>=24.4",
27
+ "pre-commit>=3.6",
28
+ "matplotlib>=3.8,<4"
29
+ ]
30
+ plugins = [
31
+ "flashrank>=0.2; python_version>='3.9'",
32
+ "llmlingua>=0.2; python_version>='3.9'",
33
+ "evidently>=0.4; python_version>='3.9'"
34
+ ]
35
+ haystack = [
36
+ "farm-haystack[faiss,inference]>=1.26; python_version>='3.9'"
37
+ ]
38
+
39
+ [project.urls]
40
+ Homepage = "https://github.com/Flamehaven/CRoM-EfficientLLM"
41
+
42
+ [project.scripts]
43
+ "crom-demo" = "crom_efficientllm.demo:main"
44
+ "crom-bench" = "crom_efficientllm.cli:main"
45
+
46
+ [tool.setuptools]
47
+ package-dir = {"" = "src"}
48
+ packages = { find = { where = ["src"] } }
49
+
50
+ [tool.pytest.ini_options]
51
+ addopts = "-q"
52
+
53
+ [tool.black]
54
+ line-length = 100
55
+
56
+ [tool.ruff]
57
+ target-version = "py39"
58
+
59
+ [tool.ruff.lint]
60
+ select = ["E","F","I","UP","B","C4","SIM","PL","PERF","RUF","ANN"]
61
+ ignore = ["ANN101","ANN102"]
62
+
63
+ [tool.ruff.lint.per-file-ignores]
64
+ "tests/*" = ["S101","ANN","PLR2004"]
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ numpy>=1.24,<3
2
+ scikit-learn>=1.3,<2
3
+ transformers>=4.41,<5
4
+ sentence-transformers>=2.2,<3
5
+ flask>=3,<4
6
+ prometheus-client>=0.20,<1
scripts/gen_release_notes.py ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ from __future__ import annotations
3
+ import os
4
+ import re
5
+ import sys
6
+ from pathlib import Path
7
+
8
+ ROOT = Path(__file__).resolve().parents[1]
9
+ CHANGELOG = ROOT / "CHANGELOG.md"
10
+ OUT = ROOT / "release_notes.md"
11
+
12
+ def main(tag: str) -> None:
13
+ version = tag.lstrip("v").strip()
14
+ if not CHANGELOG.exists():
15
+ OUT.write_text(f"# Release {tag}\n\n(CHANGELOG.md not found)\n", encoding="utf-8")
16
+ return
17
+ text = CHANGELOG.read_text(encoding="utf-8")
18
+ pat = re.compile(rf"^##\s*[[^{re.escape(version)}]]?[^\n]*$", re.MULTILINE)
19
+ m = pat.search(text)
20
+ if not m:
21
+ OUT.write_text(
22
+ f"# Release {tag}\n\nSection for {version} not found in CHANGELOG.\n\n" + text,
23
+ encoding="utf-8",
24
+ )
25
+ return
26
+ start = m.end()
27
+ m2 = re.search(r"^##\s+", text[start:], re.MULTILINE)
28
+ end = start + (m2.start() if m2 else len(text) - start)
29
+ section = text[m.start():end].strip()
30
+ body = f"# Release {tag}\n\n{section}\n\n— generated from [CHANGELOG.md](CHANGELOG.md)"
31
+ OUT.write_text(body, encoding="utf-8")
32
+
33
+ if __name__ == "__main__":
34
+ tag = sys.argv[1] if len(sys.argv) > 1 else os.environ.get("GITHUB_REF_NAME", "")
35
+ if not tag:
36
+ print("Usage: gen_release_notes.py vX.Y.Z", file=sys.stderr)
37
+ sys.exit(2)
38
+ main(tag)
scripts/release.sh ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+
4
+ TAG=${1:-}
5
+ if [[ -z "$TAG" ]]; then
6
+ echo "Usage: scripts/release.sh vX.Y.Z"; exit 1
7
+ fi
8
+
9
+ # sanity checks
10
+ if [[ -n $(git status --porcelain) ]]; then
11
+ echo "❌ Working tree not clean"; exit 1
12
+ fi
13
+
14
+ # ensure deps
15
+ python -m pip install -e .[dev]
16
+ pre-commit run --all-files
17
+ pytest -q
18
+
19
+ # generate release notes preview from CHANGELOG
20
+ python scripts/gen_release_notes.py "$TAG"
21
+ if [[ -f release_notes.md ]]; then
22
+ echo "--- release_notes.md (preview top 60 lines) ---"
23
+ head -n 60 release_notes.md || true
24
+ echo "--- end preview ---"
25
+ else
26
+ echo "⚠️ release_notes.md not generated; will fall back to default notes in GH release"
27
+ fi
28
+
29
+ # tag & push
30
+ read -p "Tag ${TAG} and push? (y/N) " yn
31
+ if [[ "$yn" != "y" && "$yn" != "Y" ]]; then
32
+ echo "aborted"; exit 1
33
+ fi
34
+
35
+ git tag -a "$TAG" -m "Release $TAG"
36
+ git push origin "$TAG"
37
+
38
+ echo "✅ Pushed tag $TAG. GitHub Actions will create the Release automatically."
39
+ echo "➡️ Watch: https://github.com/Flamehaven/CRoM-EfficientLLM/actions"
src/crom_efficientllm/__init__.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Public API for CRoM-EfficientLLM."""
2
+ from .budget_packer.packer import Chunk, budget_pack, pack_summary
3
+ from .rerank_engine.rerank import hybrid_rerank
4
+ from .drift_estimator.estimator import DriftEstimator, DriftMode
5
+
6
+ __all__ = [
7
+ "Chunk",
8
+ "budget_pack",
9
+ "pack_summary",
10
+ "hybrid_rerank",
11
+ "DriftEstimator",
12
+ "DriftMode",
13
+ ]
src/crom_efficientllm/budget_packer/__init__.py ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ from .packer import Chunk, budget_pack, pack_summary
2
+ __all__ = ["Chunk", "budget_pack", "pack_summary"]
src/crom_efficientllm/budget_packer/packer.py ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Budget Packer
3
+ -------------
4
+ Greedy packing of highest-scoring chunks under a token budget.
5
+ - Stable ordering (score desc, tokens asc, original index asc)
6
+ - Input validation and optional token estimation
7
+ """
8
+ from __future__ import annotations
9
+
10
+ from dataclasses import dataclass
11
+ from typing import Any, Iterable, List, Sequence, Tuple, Union, Optional
12
+
13
+ @dataclass(frozen=True)
14
+ class Chunk:
15
+ text: str
16
+ score: float
17
+ tokens: int
18
+
19
+ def _estimate_tokens(text: str) -> int:
20
+ """Lightweight heuristic when `tokens` absent. Avoids heavy tokenizers.
21
+ Why: keeps demo dependency-light and deterministic.
22
+ """
23
+ # approx: 4 chars ≈ 1 token; floor at 1
24
+ return max(1, len(text) // 4)
25
+
26
+ def _coerce_chunk(obj: Union[Chunk, dict], idx: int) -> Chunk:
27
+ if isinstance(obj, Chunk):
28
+ return obj
29
+ if not isinstance(obj, dict):
30
+ raise TypeError(f"Chunk #{idx} must be Chunk or dict, got {type(obj)}")
31
+ text = str(obj.get("text", ""))
32
+ if not text:
33
+ raise ValueError(f"Chunk #{idx} has empty text")
34
+ score = float(obj.get("score", 0.0))
35
+ tokens = int(obj["tokens"]) if "tokens" in obj else _estimate_tokens(text)
36
+ if tokens <= 0:
37
+ raise ValueError(f"Chunk #{idx} has non-positive tokens: {tokens}")
38
+ return Chunk(text=text, score=score, tokens=tokens)
39
+
40
+ def budget_pack(
41
+ text_chunks: Sequence[Union[Chunk, dict]],
42
+ budget: int = 1000,
43
+ ) -> List[Chunk]:
44
+ """
45
+ Args:
46
+ text_chunks: iterable of Chunk or dict with keys {text, score, tokens}
47
+ budget: max token budget (int > 0)
48
+ Returns:
49
+ list of selected chunks (order of selection)
50
+ """
51
+ if budget <= 0:
52
+ raise ValueError("budget must be > 0")
53
+
54
+ coerced: List[Chunk] = [_coerce_chunk(c, i) for i, c in enumerate(text_chunks)]
55
+
56
+ # stable sort by (-score, tokens, original_index)
57
+ indexed: List[Tuple[int, Chunk]] = list(enumerate(coerced))
58
+ indexed.sort(key=lambda it: (-it[1].score, it[1].tokens, it[0]))
59
+
60
+ selected: List[Chunk] = []
61
+ total = 0
62
+ for _, ch in indexed:
63
+ if total + ch.tokens <= budget:
64
+ selected.append(ch)
65
+ total += ch.tokens
66
+ return selected
67
+
68
+ def pack_summary(selected: Sequence[Chunk]) -> dict:
69
+ tokens = sum(c.tokens for c in selected)
70
+ return {
71
+ "num_chunks": len(selected),
72
+ "tokens": tokens,
73
+ "avg_score": (sum(c.score for c in selected) / len(selected)) if selected else 0.0,
74
+ }
src/crom_efficientllm/cli.py ADDED
@@ -0,0 +1,385 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import argparse
4
+ import json
5
+ import os
6
+ import time
7
+ from dataclasses import dataclass
8
+ from typing import List, Dict, Sequence
9
+
10
+ import numpy as np
11
+ from sklearn.feature_extraction.text import TfidfVectorizer
12
+ from sklearn.metrics.pairwise import cosine_similarity
13
+
14
+ from crom_efficientllm.budget_packer.packer import budget_pack, Chunk
15
+ from crom_efficientllm.rerank_engine.rerank import hybrid_rerank
16
+
17
+ try:
18
+ from sentence_transformers import SentenceTransformer
19
+ except Exception: # pragma: no cover
20
+ SentenceTransformer = None # type: ignore
21
+
22
+ # Optional plugins are imported lazily when flags are set
23
+
24
+ @dataclass
25
+ class Doc:
26
+ id: str
27
+ text: str
28
+
29
+ def load_jsonl(path: str) -> List[Dict]:
30
+ with open(path, "r", encoding="utf-8") as f:
31
+ return [json.loads(line) for line in f]
32
+
33
+ def build_corpus(path: str) -> List[Doc]:
34
+ rows = load_jsonl(path)
35
+ return [Doc(id=str(r.get("id", i)), text=str(r["text"])) for i, r in enumerate(rows)]
36
+
37
+ def sparse_retrieval(query: str, corpus: Sequence[Doc], k: int = 100) -> List[Dict]:
38
+ texts = [d.text for d in corpus]
39
+ vect = TfidfVectorizer(ngram_range=(1, 2)).fit(texts)
40
+ D = vect.transform(texts)
41
+ Q = vect.transform([query])
42
+ sims = cosine_similarity(Q, D).ravel()
43
+ order = np.argsort(-sims)[:k]
44
+ return [{"id": corpus[i].id, "text": corpus[i].text, "score_sparse": float(sims[i])} for i in order]
45
+
46
+ def dense_embed_model(name: str):
47
+ if SentenceTransformer is None:
48
+ raise RuntimeError("sentence-transformers not installed. Install with `pip install -e .`.")
49
+ return SentenceTransformer(name)
50
+
51
+ def _apply_flashrank(query: str, docs: List[Dict], model_name: str) -> List[Dict]:
52
+ try:
53
+ from crom_efficientllm.plugins.flashrank_reranker import flashrank_rerank
54
+ except Exception as e: # pragma: no cover
55
+ raise RuntimeError("FlashRank plugin not available. Install extras: pip install .[plugins]") from e
56
+ ranked = flashrank_rerank(query, docs, model_name=model_name)
57
+ # Normalize plugin score to 0..1 and put into score_final
58
+ scores = np.array([d.get("score_flashrank", 0.0) for d in ranked], dtype=np.float32)
59
+ if scores.size and float(scores.max() - scores.min()) > 1e-12:
60
+ s = (scores - scores.min()) / (scores.max() - scores.min())
61
+ else:
62
+ s = np.zeros_like(scores)
63
+ for i, d in enumerate(ranked):
64
+ d["score_final"] = float(s[i])
65
+ return ranked
66
+
67
+ def _apply_llmlingua(text: str, ratio: float) -> str:
68
+ try:
69
+ from crom_efficientllm.plugins.llmlingua_compressor import compress_prompt
70
+ except Exception as e: # pragma: no cover
71
+ raise RuntimeError("LLMLingua plugin not available. Install extras: pip install .[plugins]") from e
72
+ return compress_prompt(text, target_ratio=ratio)
73
+
74
+ def _save_evidently_report(all_embs: List[List[float]], out_html: str) -> None:
75
+ try:
76
+ from crom_efficientllm.plugins.evidently_drift import drift_report
77
+ except Exception as e: # pragma: no cover
78
+ raise RuntimeError("Evidently plugin not available. Install extras: pip install .[plugins]") from e
79
+ n = len(all_embs)
80
+ if n < 4:
81
+ return
82
+ ref = all_embs[: n // 2]
83
+ cur = all_embs[n // 2 :]
84
+ rep = drift_report(ref, cur)
85
+ rep.save_html(out_html)
86
+
87
+ def mock_llm_generate(prompt: str) -> str:
88
+ time.sleep(0.005) # simulate small latency
89
+ return "[MOCK] " + prompt[:160]
90
+
91
+ def e2e(args: argparse.Namespace) -> None:
92
+ corpus = build_corpus(args.corpus)
93
+ queries = [r["query"] for r in load_jsonl(args.queries)]
94
+ embed = dense_embed_model(args.model)
95
+ all_embs: List[List[float]] = []
96
+
97
+ t0 = time.perf_counter()
98
+ all_rows = []
99
+ for q in queries:
100
+ t_s = time.perf_counter()
101
+ cands = sparse_retrieval(q, corpus, k=args.k)
102
+ t_sparse = (time.perf_counter() - t_s) * 1000
103
+
104
+ t_r = time.perf_counter()
105
+ if args.use_flashrank:
106
+ reranked = _apply_flashrank(q, cands, args.flashrank_model)
107
+ else:
108
+ reranked = hybrid_rerank(q, cands, embed, alpha=args.alpha)
109
+ t_rerank = (time.perf_counter() - t_r) * 1000
110
+
111
+ # token heuristic + budget pack
112
+ chunks = [
113
+ Chunk(text=d["text"], score=d.get("score_final", d.get("score_sparse", 0.0)), tokens=max(1, len(d["text"]) // 4))
114
+ for d in reranked
115
+ ]
116
+ budget_tokens = int(sum(c.tokens for c in chunks) * args.budget)
117
+ t_p = time.perf_counter()
118
+ packed = budget_pack(chunks, budget=budget_tokens)
119
+ t_pack = (time.perf_counter() - t_p) * 1000
120
+
121
+ prompt = "\n\n".join(c.text for c in packed) + f"\n\nQ: {q}\nA:"
122
+ if args.use_llmlingua:
123
+ prompt = _apply_llmlingua(prompt, ratio=args.compress_ratio)
124
+
125
+ # collect embeddings for drift snapshot (mean-pooled)
126
+ with np.errstate(all="ignore"):
127
+ if len(packed) > 0:
128
+ doc_embs = embed.encode([c.text for c in packed], convert_to_numpy=True)
129
+ vec = np.mean(doc_embs, axis=0).tolist()
130
+ all_embs.append(vec)
131
+
132
+ t_l = time.perf_counter()
133
+ _ = mock_llm_generate(prompt)
134
+ t_llm = (time.perf_counter() - t_l) * 1000
135
+
136
+ total = (time.perf_counter() - t_s) * 1000
137
+ all_rows.append({
138
+ "query": q,
139
+ "sparse_ms": t_sparse,
140
+ "rerank_ms": t_rerank,
141
+ "pack_ms": t_pack,
142
+ "llm_ms": t_llm,
143
+ "total_ms": total,
144
+ "packed_tokens": sum(c.tokens for c in packed),
145
+ "orig_tokens": sum(c.tokens for c in chunks),
146
+ "save_ratio": 1 - (sum(c.tokens for c in packed) / max(1, sum(c.tokens for c in chunks))),
147
+ "used_flashrank": bool(args.use_flashrank),
148
+ "used_llmlingua": bool(args.use_llmlingua),
149
+ })
150
+
151
+ elapsed = (time.perf_counter() - t0) * 1000
152
+ os.makedirs(args.out_dir, exist_ok=True)
153
+ out_path = os.path.join(args.out_dir, "e2e_results.jsonl")
154
+ with open(out_path, "w", encoding="utf-8") as f:
155
+ for r in all_rows:
156
+ f.write(json.dumps(r, ensure_ascii=False) + "\n")
157
+ print(f"saved results -> {out_path} ({len(all_rows)} queries) ; elapsed={elapsed:.2f}ms")
158
+
159
+ if args.use_evidently and all_embs:
160
+ html_path = os.path.join(args.out_dir, "evidently_report.html")
161
+ _save_evidently_report(all_embs, html_path)
162
+ print(f"evidently report -> {html_path}")
163
+
164
+ def budget_sweep(args: argparse.Namespace) -> None:
165
+ import itertools
166
+ corpus = build_corpus(args.corpus)
167
+ queries = [r["query"] for r in load_jsonl(args.queries)][: args.max_q]
168
+ embed = dense_embed_model(args.model)
169
+
170
+ budgets = [b / 100.0 for b in range(args.b_min, args.b_max + 1, args.b_step)]
171
+ rows = []
172
+ for q, b in itertools.product(queries, budgets):
173
+ cands = sparse_retrieval(q, corpus, k=args.k)
174
+ reranked = hybrid_rerank(q, cands, embed, alpha=args.alpha)
175
+ chunks = [Chunk(text=d["text"], score=d["score_final"], tokens=max(1, len(d["text"]) // 4)) for d in reranked]
176
+ budget_tokens = int(sum(c.tokens for c in chunks) * b)
177
+ packed = budget_pack(chunks, budget=budget_tokens)
178
+ rows.append({
179
+ "query": q,
180
+ "budget": b,
181
+ "packed_tokens": sum(c.tokens for c in packed),
182
+ "orig_tokens": sum(c.tokens for c in chunks),
183
+ "save_ratio": 1 - (sum(c.tokens for c in packed) / max(1, sum(c.tokens for c in chunks))),
184
+ "avg_score": float(np.mean([c.score for c in packed])) if packed else 0.0,
185
+ })
186
+
187
+ os.makedirs(args.out_dir, exist_ok=True)
188
+ out_path = os.path.join(args.out_dir, "budget_sweep.jsonl")
189
+ with open(out_path, "w", encoding="utf-8") as f:
190
+ for r in rows:
191
+ f.write(json.dumps(r, ensure_ascii=False) + "\n")
192
+ print(f"saved results -> {out_path} ; points={len(rows)}")
193
+
194
+ if args.save_plots:
195
+ try:
196
+ import matplotlib.pyplot as plt # noqa: F401
197
+ import matplotlib.pyplot as _plt
198
+ except Exception:
199
+ print("[warn] matplotlib not installed; install dev extras: pip install -e .[dev]")
200
+ else:
201
+ # Aggregate by budget
202
+ import collections
203
+ agg = collections.defaultdict(list)
204
+ for r in rows:
205
+ agg[r["budget"]].append(r)
206
+ budgets_sorted = sorted(agg.keys())
207
+ avg_save = [float(np.mean([x["save_ratio"] for x in agg[b]])) for b in budgets_sorted]
208
+ avg_score = [float(np.mean([x["avg_score"] for x in agg[b]])) for b in budgets_sorted]
209
+
210
+ _plt.figure()
211
+ _plt.plot([b * 100 for b in budgets_sorted], [s * 100 for s in avg_save], marker="o")
212
+ _plt.xlabel("Budget (%)")
213
+ _plt.ylabel("Avg Save Ratio (%)")
214
+ _plt.title("Budget Sweep: Save Ratio vs Budget")
215
+ _plt.grid(True)
216
+ _plt.tight_layout()
217
+ _plt.savefig(os.path.join(args.out_dir, "budget_sweep.png"))
218
+
219
+ _plt.figure()
220
+ _plt.plot([s * 100 for s in avg_save], avg_score, marker="o")
221
+ _plt.xlabel("Save Ratio (%)")
222
+ _plt.ylabel("Avg Score (packed)")
223
+ _plt.title("Pareto: Quality vs Savings")
224
+ _plt.grid(True)
225
+ _plt.tight_layout()
226
+ _plt.savefig(os.path.join(args.out_dir, "budget_pareto.png"))
227
+ print("plots ->", os.path.join(args.out_dir, "budget_sweep.png"), ",", os.path.join(args.out_dir, "budget_pareto.png"))
228
+
229
+ def scaling(args: argparse.Namespace) -> None:
230
+ def make_synth(n: int, seed: int = 42):
231
+ rng = np.random.default_rng(seed)
232
+ tokens = np.clip(rng.lognormal(4.0, 0.6, n).astype(int), 5, 2000)
233
+ score = rng.normal(0, 1, n)
234
+ return [Chunk(text="x" * int(t * 4), score=float(s), tokens=int(t)) for s, t in zip(score, tokens)]
235
+
236
+ for n in [1000, 5000, 10000, 20000, 50000, 100000]:
237
+ if n > args.n_max:
238
+ break
239
+ chunks = make_synth(n)
240
+ budget = int(sum(c.tokens for c in chunks) * args.budget)
241
+ t0 = time.perf_counter()
242
+ _ = budget_pack(chunks, budget)
243
+ ms = (time.perf_counter() - t0) * 1000
244
+ print(f"n={n:6d} budget={args.budget:.0%} time={ms:8.2f} ms")
245
+
246
+ def dp_curve(args: argparse.Namespace) -> None:
247
+ def make_synth(n: int, seed: int = 123, corr: float = 0.6):
248
+ rng = np.random.default_rng(seed)
249
+ true_rel = rng.normal(0, 1, n)
250
+ noise = rng.normal(0, 1, n) * np.sqrt(1 - corr**2)
251
+ score = corr * true_rel + noise
252
+ tokens = np.clip(rng.lognormal(4.0, 0.6, n).astype(int), 5, 2000)
253
+ chunks = [Chunk(text="x" * int(t * 4), score=float(s), tokens=int(t)) for s, t in zip(score, tokens)]
254
+ return chunks, true_rel
255
+
256
+ def optimal(chunks: Sequence[Chunk], values: np.ndarray, budget: int) -> float:
257
+ B = budget
258
+ dp = np.zeros(B + 1, dtype=np.float32)
259
+ for i, ch in enumerate(chunks):
260
+ wt = ch.tokens
261
+ val = max(0.0, float(values[i]))
262
+ for b in range(B, wt - 1, -1):
263
+ dp[b] = max(dp[b], dp[b - wt] + val)
264
+ return float(dp[B])
265
+
266
+ chunks, true_rel = make_synth(args.n)
267
+ total = sum(c.tokens for c in chunks)
268
+ budgets = [int(total * b / 100.0) for b in range(args.b_min, args.b_max + 1, args.b_step)]
269
+ out_rows = []
270
+
271
+ for B in budgets:
272
+ sel = budget_pack(chunks, B)
273
+ idx_map = {id(c): i for i, c in enumerate(chunks)}
274
+ rel_bp = float(np.sum([max(0.0, true_rel[idx_map[id(c)]]) for c in sel]))
275
+ rel_opt = optimal(chunks[: args.n_opt], true_rel[: args.n_opt], min(B, sum(c.tokens for c in chunks[: args.n_opt])))
276
+ pct = rel_bp / max(rel_opt, 1e-9)
277
+ out_rows.append({"budget": B, "pct": pct, "rel_bp": rel_bp, "rel_opt": rel_opt})
278
+ print(f"budget={B:8d} rel_bp={rel_bp:8.3f} rel_opt≈{rel_opt:8.3f} pct≈{pct*100:5.1f}% (subset n={args.n_opt})")
279
+
280
+ if args.save_plots:
281
+ try:
282
+ import matplotlib.pyplot as plt # noqa: F401
283
+ import matplotlib.pyplot as _plt
284
+ except Exception:
285
+ print("[warn] matplotlib not installed; install dev extras: pip install -e .[dev]")
286
+ else:
287
+ _plt.figure()
288
+ xs = [r["budget"] * 100.0 / total for r in out_rows]
289
+ ys = [r["pct"] * 100 for r in out_rows]
290
+ _plt.plot(xs, ys, marker="o")
291
+ _plt.xlabel("Budget (%)")
292
+ _plt.ylabel("% of optimal (subset)")
293
+ _plt.title("DP Curve: Greedy vs Optimal")
294
+ _plt.grid(True)
295
+ _plt.tight_layout()
296
+ os.makedirs(args.out_dir, exist_ok=True)
297
+ _plt.savefig(os.path.join(args.out_dir, "dp_curve.png"))
298
+ print("plot ->", os.path.join(args.out_dir, "dp_curve.png"))
299
+
300
+ def compare_haystack(args: argparse.Namespace) -> None:
301
+ try:
302
+ from haystack.nodes import BM25Retriever, SentenceTransformersRetriever
303
+ from haystack.document_stores import InMemoryDocumentStore
304
+ except Exception as e: # pragma: no cover
305
+ raise RuntimeError("Install extras: pip install .[haystack]") from e
306
+
307
+ corpus = build_corpus(args.corpus)
308
+ docs = [{"content": d.text, "meta": {"id": d.id}} for d in corpus]
309
+ store = InMemoryDocumentStore(use_bm25=True)
310
+ store.write_documents(docs)
311
+
312
+ bm25 = BM25Retriever(document_store=store)
313
+ dretr = SentenceTransformersRetriever(document_store=store, model_name_or_path=args.model)
314
+
315
+ queries = [r["query"] for r in load_jsonl(args.queries)][: args.max_q]
316
+ for q in queries:
317
+ t0 = time.perf_counter()
318
+ bm = bm25.retrieve(q, top_k=args.k)
319
+ dn = dretr.retrieve(q, top_k=args.k)
320
+ ms = (time.perf_counter() - t0) * 1000
321
+ print(f"{q[:40]:40s} bm25={len(bm):3d} dense={len(dn):3d} time={ms:7.2f} ms")
322
+
323
+ def main() -> None:
324
+ ap = argparse.ArgumentParser(prog="crom-bench")
325
+ sub = ap.add_subparsers(dest="cmd", required=True)
326
+
327
+ p = sub.add_parser("e2e", help="end-to-end: retrieval → rerank → pack → mock LLM")
328
+ p.add_argument("--corpus", default="examples/corpus/sample_docs.jsonl")
329
+ p.add_argument("--queries", default="examples/corpus/sample_queries.jsonl")
330
+ p.add_argument("--model", default="sentence-transformers/all-MiniLM-L6-v2")
331
+ p.add_argument("--k", type=int, default=200)
332
+ p.add_argument("--alpha", type=float, default=0.5)
333
+ p.add_argument("--budget", type=float, default=0.3)
334
+ # plugins
335
+ p.add_argument("--use-flashrank", action="store_true")
336
+ p.add_argument("--flashrank-model", default="ms-marco-TinyBERT-L-2-v2")
337
+ p.add_argument("--use-llmlingua", action="store_true")
338
+ p.add_argument("--compress-ratio", type=float, default=0.6)
339
+ p.add_argument("--use-evidently", action="store_true")
340
+
341
+ p.add_argument("--out-dir", default="benchmarks/out")
342
+ p.set_defaults(func=e2e)
343
+
344
+ p2 = sub.add_parser("sweep", help="budget sweep + Pareto csv")
345
+ p2.add_argument("--corpus", default="examples/corpus/sample_docs.jsonl")
346
+ p2.add_argument("--queries", default="examples/corpus/sample_queries.jsonl")
347
+ p2.add_argument("--model", default="sentence-transformers/all-MiniLM-L6-v2")
348
+ p2.add_argument("--k", type=int, default=200)
349
+ p2.add_argument("--alpha", type=float, default=0.5)
350
+ p2.add_argument("--b-min", type=int, default=10)
351
+ p2.add_argument("--b-max", type=int, default=90)
352
+ p2.add_argument("--b-step", type=int, default=10)
353
+ p2.add_argument("--max-q", type=int, default=20)
354
+ p2.add_argument("--out-dir", default="benchmarks/out")
355
+ p2.add_argument("--save-plots", action="store_true")
356
+ p2.set_defaults(func=budget_sweep)
357
+
358
+ p3 = sub.add_parser("scale", help="scaling runtime with synthetic data")
359
+ p3.add_argument("--n-max", type=int, default=100000)
360
+ p3.add_argument("--budget", type=float, default=0.3)
361
+ p3.set_defaults(func=scaling)
362
+
363
+ p4 = sub.add_parser("dp-curve", help="% of optimal vs budget (synthetic)")
364
+ p4.add_argument("--n", type=int, default=2000)
365
+ p4.add_argument("--n-opt", type=int, default=200)
366
+ p4.add_argument("--b-min", type=int, default=10)
367
+ p4.add_argument("--b-max", type=int, default=90)
368
+ p4.add_argument("--b-step", type=int, default=10)
369
+ p4.add_argument("--out-dir", default="benchmarks/out")
370
+ p4.add_argument("--save-plots", action="store_true")
371
+ p4.set_defaults(func=dp_curve)
372
+
373
+ p5 = sub.add_parser("haystack-compare", help="compare BM25 vs dense retrievers (Haystack)")
374
+ p5.add_argument("--corpus", default="examples/corpus/sample_docs.jsonl")
375
+ p5.add_argument("--queries", default="examples/corpus/sample_queries.jsonl")
376
+ p5.add_argument("--model", default="sentence-transformers/all-MiniLM-L6-v2")
377
+ p5.add_argument("--k", type=int, default=50)
378
+ p5.add_argument("--max-q", type=int, default=10)
379
+ p5.set_defaults(func=compare_haystack)
380
+
381
+ args = ap.parse_args()
382
+ args.func(args)
383
+
384
+ if __name__ == "__main__":
385
+ main()
src/crom_efficientllm/demo.py ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Demo & Metrics Server for CRoM-EfficientLLM
3
+ ------------------------------------------
4
+ - `crom-demo demo` : run sample pipeline
5
+ - `crom-demo serve` : start Flask + Prometheus metrics on :8000
6
+ """
7
+ from __future__ import annotations
8
+
9
+ import argparse
10
+ from typing import List
11
+
12
+ from flask import Flask, Response
13
+ from prometheus_client import Counter, Gauge, generate_latest, CONTENT_TYPE_LATEST
14
+
15
+ from crom_efficientllm.budget_packer.packer import budget_pack, pack_summary, Chunk
16
+ from crom_efficientllm.rerank_engine.rerank import hybrid_rerank
17
+ from crom_efficientllm.drift_estimator.estimator import DriftEstimator, DriftMode
18
+
19
+ # ---- Prometheus metrics ----
20
+ TOKENS_SAVED = Gauge("crom_tokens_saved", "Tokens saved by budget packer")
21
+ DRIFT_ALERTS = Counter("crom_drift_alerts_total", "Total drift alerts emitted")
22
+
23
+ class DummyEmbed:
24
+ def encode(self, text, convert_to_numpy=False):
25
+ vec = [ord(c) % 7 for c in str(text)[:16]]
26
+ return vec
27
+
28
+ def run_demo() -> None:
29
+ chunks: List[Chunk] = [
30
+ Chunk(text="AI ethics is crucial", score=0.9, tokens=50),
31
+ Chunk(text="Unrelated text", score=0.2, tokens=40),
32
+ Chunk(text="Drift detection research", score=0.8, tokens=60),
33
+ ]
34
+ packed = budget_pack(chunks, budget=80)
35
+ summary = pack_summary(packed)
36
+ print("📦 Packed:", [c.text for c in packed], summary)
37
+
38
+ docs = [{"text": "AI drift measurement"}, {"text": "Cooking recipes"}]
39
+ reranked = hybrid_rerank("AI ethics", docs, DummyEmbed(), alpha=0.5)
40
+ print("🔎 Reranked:", [d["text"] for d in reranked])
41
+
42
+ de = DriftEstimator(threshold=0.5, mode=DriftMode.L2)
43
+ print("⚙️ Drift state:", de.state())
44
+ print("⚠️ Drift alert?", de.update([1, 2, 3]))
45
+ print("⚠️ Drift alert?", de.update([10, 10, 10]))
46
+ print("⚙️ Drift state:", de.state())
47
+
48
+ # Update metrics
49
+ TOKENS_SAVED.set(max(0, sum(c.tokens for c in chunks) - summary["tokens"]))
50
+ alert1, *_ = de.update([1, 2, 3])
51
+ alert2, *_ = de.update([10, 10, 10])
52
+ if alert1:
53
+ DRIFT_ALERTS.inc()
54
+ if alert2:
55
+ DRIFT_ALERTS.inc()
56
+
57
+ def create_app() -> Flask:
58
+ app = Flask(__name__)
59
+
60
+ @app.get("/healthz")
61
+ def healthz():
62
+ return {"status": "ok"}
63
+
64
+ @app.get("/metrics")
65
+ def metrics():
66
+ return Response(generate_latest(), mimetype=CONTENT_TYPE_LATEST)
67
+
68
+ return app
69
+
70
+ def main() -> None:
71
+ parser = argparse.ArgumentParser(prog="crom-demo")
72
+ sub = parser.add_subparsers(dest="cmd", required=True)
73
+ sub.add_parser("demo", help="run sample pipeline")
74
+
75
+ pserve = sub.add_parser("serve", help="start metrics server on :8000")
76
+ pserve.add_argument("--host", default="0.0.0.0")
77
+ pserve.add_argument("--port", type=int, default=8000)
78
+
79
+ args = parser.parse_args()
80
+
81
+ if args.cmd == "demo":
82
+ run_demo()
83
+ return
84
+
85
+ if args.cmd == "serve":
86
+ app = create_app()
87
+ app.run(host=args.host, port=args.port)
88
+ return
89
+
90
+ if __name__ == "__main__":
91
+ main()
src/crom_efficientllm/drift_estimator/__init__.py ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ from .estimator import DriftEstimator, DriftMode
2
+ __all__ = ["DriftEstimator", "DriftMode"]
src/crom_efficientllm/drift_estimator/estimator.py ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Drift Estimator
3
+ ---------------
4
+ Monitors embedding shift using L2 or cosine distance.
5
+ Supports EWMA smoothing and exposes state for dashboards.
6
+ """
7
+ from __future__ import annotations
8
+
9
+ from dataclasses import dataclass, field
10
+ from enum import Enum
11
+ from typing import List, Optional, Tuple
12
+ import numpy as np
13
+
14
+ class DriftMode(str, Enum):
15
+ L2 = "l2"
16
+ COSINE = "cosine"
17
+
18
+ @dataclass
19
+ class DriftEstimator:
20
+ threshold: float = 0.2
21
+ mode: DriftMode = DriftMode.L2
22
+ ewma_alpha: float = 0.3 # smoothing for stability
23
+
24
+ history: List[np.ndarray] = field(default_factory=list)
25
+ distances: List[float] = field(default_factory=list)
26
+ ewma: Optional[float] = None
27
+
28
+ def _distance(self, a: np.ndarray, b: np.ndarray) -> float:
29
+ a = np.asarray(a, dtype=np.float32).ravel()
30
+ b = np.asarray(b, dtype=np.float32).ravel()
31
+ if self.mode == DriftMode.L2:
32
+ return float(np.linalg.norm(a - b))
33
+ # cosine distance = 1 - cosine similarity
34
+ denom = (np.linalg.norm(a) * np.linalg.norm(b)) + 1e-12
35
+ return float(1.0 - float(np.dot(a, b)) / denom)
36
+
37
+ def update(self, embedding) -> Tuple[bool, float, float]:
38
+ """
39
+ Args:
40
+ embedding: vector representation of current response
41
+ Returns:
42
+ (drift_alert, distance, ewma)
43
+ """
44
+ emb = np.asarray(embedding, dtype=np.float32)
45
+ if emb.ndim != 1:
46
+ emb = emb.ravel()
47
+
48
+ if not self.history:
49
+ self.history.append(emb)
50
+ self.ewma = 0.0
51
+ self.distances.append(0.0)
52
+ return (False, 0.0, 0.0)
53
+
54
+ last = self.history[-1]
55
+ dist = self._distance(emb, last)
56
+ self.history.append(emb)
57
+ self.distances.append(dist)
58
+
59
+ # EWMA update
60
+ if self.ewma is None:
61
+ self.ewma = dist
62
+ else:
63
+ self.ewma = self.ewma_alpha * dist + (1 - self.ewma_alpha) * self.ewma
64
+
65
+ return (bool(self.ewma > self.threshold), float(dist), float(self.ewma))
66
+
67
+ def state(self) -> dict:
68
+ return {
69
+ "count": len(self.history),
70
+ "last_distance": self.distances[-1] if self.distances else 0.0,
71
+ "ewma": self.ewma or 0.0,
72
+ "mode": self.mode.value,
73
+ "threshold": self.threshold,
74
+ }
src/crom_efficientllm/plugins/evidently_drift.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+ from typing import List
3
+
4
+ try:
5
+ from evidently.metric_preset import DataDriftPreset
6
+ from evidently.report import Report
7
+ import pandas as pd
8
+ except Exception as e: # pragma: no cover
9
+ raise RuntimeError("evidently not installed. Install extras: pip install .[plugins]") from e
10
+
11
+ def drift_report(ref: List[List[float]], cur: List[List[float]]):
12
+ ref_df = pd.DataFrame(ref)
13
+ cur_df = pd.DataFrame(cur)
14
+ rep = Report(metrics=[DataDriftPreset()])
15
+ rep.run(reference_data=ref_df, current_data=cur_df)
16
+ return rep
src/crom_efficientllm/plugins/flashrank_reranker.py ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+ from typing import List, Dict
3
+
4
+ try:
5
+ from flashrank import Reranker
6
+ except Exception as e: # pragma: no cover
7
+ raise RuntimeError("flashrank not installed. Install extras: pip install .[plugins]") from e
8
+
9
+ def flashrank_rerank(query: str, docs: List[Dict[str, str]], model_name: str = "ms-marco-TinyBERT-L-2-v2") -> List[Dict]:
10
+ rr = Reranker(model_name)
11
+ pairs = [(query, d["text"]) for d in docs]
12
+ scores = rr.rerank(pairs)
13
+ order = sorted(range(len(docs)), key=lambda i: -scores[i])
14
+ return [docs[i] | {"score_flashrank": float(scores[i])} for i in order]
src/crom_efficientllm/plugins/llmlingua_compressor.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ try:
4
+ from llmlingua import PromptCompressor
5
+ except Exception as e: # pragma: no cover
6
+ raise RuntimeError("llmlingua not installed. Install extras: pip install .[plugins]") from e
7
+
8
+ def compress_prompt(text: str, target_ratio: float = 0.5) -> str:
9
+ pc = PromptCompressor()
10
+ out = pc.compress(text, target_ratio=target_ratio)
11
+ return out["compressed_prompt"] if isinstance(out, dict) and "compressed_prompt" in out else str(out)
src/crom_efficientllm/rerank_engine/__init__.py ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ from .rerank import hybrid_rerank
2
+ __all__ = ["hybrid_rerank"]
src/crom_efficientllm/rerank_engine/rerank.py ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Hybrid Rerank Engine
3
+ --------------------
4
+ Combines sparse (TF-IDF cosine) and dense (embedding cosine) scores with
5
+ min-max normalization for robust fusion.
6
+ """
7
+ from __future__ import annotations
8
+
9
+ from typing import Dict, List, Sequence
10
+ import numpy as np
11
+ from sklearn.feature_extraction.text import TfidfVectorizer
12
+ from sklearn.metrics.pairwise import cosine_similarity
13
+
14
+ def _to_numpy(x):
15
+ arr = np.asarray(x)
16
+ return arr.astype(np.float32)
17
+
18
+ def _batch_encode(embed_model, texts: Sequence[str]) -> np.ndarray:
19
+ # Try common API of sentence-transformers: encode(list, convert_to_numpy=True)
20
+ if hasattr(embed_model, "encode"):
21
+ try:
22
+ return _to_numpy(embed_model.encode(list(texts), convert_to_numpy=True))
23
+ except TypeError:
24
+ # Fallback: per-text encode
25
+ return _to_numpy([embed_model.encode(t) for t in texts])
26
+ raise TypeError("embed_model must provide .encode()")
27
+
28
+ def _minmax(x: np.ndarray) -> np.ndarray:
29
+ if x.size == 0:
30
+ return x
31
+ mn, mx = float(np.min(x)), float(np.max(x))
32
+ if mx - mn <= 1e-12:
33
+ return np.zeros_like(x)
34
+ return (x - mn) / (mx - mn)
35
+
36
+ def hybrid_rerank(
37
+ query: str,
38
+ docs: List[Dict[str, str]],
39
+ embed_model,
40
+ alpha: float = 0.5,
41
+ ) -> List[Dict[str, object]]:
42
+ """
43
+ Args:
44
+ query: query string
45
+ docs: list of {"text": str}
46
+ embed_model: model with .encode() -> vector(s)
47
+ alpha: weight between sparse/dense in [0,1]
48
+ Returns:
49
+ ranked list of enriched docs with scores {score_sparse, score_dense, score_final}
50
+ """
51
+ if not 0.0 <= alpha <= 1.0:
52
+ raise ValueError("alpha must be in [0, 1]")
53
+ if not docs:
54
+ return []
55
+
56
+ texts = [d.get("text", "") for d in docs]
57
+
58
+ # Sparse: TF-IDF cosine
59
+ tfidf = TfidfVectorizer(ngram_range=(1, 2), min_df=1).fit(texts)
60
+ Q = tfidf.transform([query])
61
+ D = tfidf.transform(texts)
62
+ sparse_scores = cosine_similarity(Q, D).ravel()
63
+
64
+ # Dense: cosine(sim) between L2-normalized embeddings
65
+ q_emb = _to_numpy(embed_model.encode(query))
66
+ d_embs = _batch_encode(embed_model, texts)
67
+ # L2 normalize
68
+ def _l2norm(a):
69
+ n = np.linalg.norm(a, axis=-1, keepdims=True) + 1e-12
70
+ return a / n
71
+
72
+ qn = _l2norm(q_emb.reshape(1, -1))
73
+ dn = _l2norm(d_embs)
74
+ dense_scores = cosine_similarity(qn, dn).ravel()
75
+
76
+ # Min-max to [0,1] before fusion to avoid scale issues
77
+ s_sparse = _minmax(sparse_scores)
78
+ s_dense = _minmax(dense_scores)
79
+
80
+ final_scores = alpha * s_sparse + (1 - alpha) * s_dense
81
+ order = np.argsort(-final_scores)
82
+
83
+ ranked = []
84
+ for i in order:
85
+ item = dict(docs[i])
86
+ item.update(
87
+ score_sparse=float(s_sparse[i]),
88
+ score_dense=float(s_dense[i]),
89
+ score_final=float(final_scores[i]),
90
+ )
91
+ ranked.append(item)
92
+ return ranked
tests/test_drift.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ from crom_efficientllm.drift_estimator.estimator import DriftEstimator, DriftMode
2
+
3
+ def test_drift_triggers():
4
+ de = DriftEstimator(threshold=0.1, mode=DriftMode.L2)
5
+ alert, dist, ewma = de.update([0, 0, 0])
6
+ assert alert is False
7
+ alert, dist, ewma = de.update([1, 0, 0])
8
+ assert isinstance(alert, bool)
tests/test_packer.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from crom_efficientllm.budget_packer.packer import budget_pack, Chunk
2
+
3
+ def test_budget_pack_respects_budget():
4
+ chunks = [Chunk("a", 1.0, 60), Chunk("b", 0.9, 50), Chunk("c", 0.5, 20)]
5
+ sel = budget_pack(chunks, budget=70)
6
+ assert sum(c.tokens for c in sel) <= 70
7
+
8
+ def test_budget_pack_sorting_stable():
9
+ chunks = [
10
+ {"text": "x", "score": 0.9, "tokens": 30},
11
+ {"text": "y", "score": 0.9, "tokens": 20},
12
+ {"text": "z", "score": 0.8, "tokens": 10},
13
+ ]
14
+ sel = budget_pack(chunks, budget=60)
15
+ assert [c.text for c in sel] == ["y", "x"]
tests/test_rerank.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from crom_efficientllm.rerank_engine.rerank import hybrid_rerank
2
+
3
+ class Dummy:
4
+ def encode(self, text, convert_to_numpy=False):
5
+ return [ord(c) % 5 for c in str(text)[:8]]
6
+
7
+ def test_hybrid_rerank_returns_scores():
8
+ docs = [{"text": "alpha"}, {"text": "beta"}]
9
+ out = hybrid_rerank("alp", docs, Dummy(), alpha=0.5)
10
+ assert len(out) == 2
11
+ assert {"score_sparse", "score_dense", "score_final"} <= set(out[0].keys())