# CRoM-EfficientLLM 전체 프로젝트 보고서 ## 1. 프로젝트 전체 구조 (Directory Tree) ``` CRoM-EfficientLLM/ ├── .github/ │ └── workflows/ │ ├── ci.yml │ └── release.yml ├── benchmarks/ │ ├── efficiency_eval.py │ ├── longbench_eval.py │ └── sample_results.json ├── dashboard/ │ ├── grafana_dashboard.json │ └── prometheus_config.yml ├── docs/ │ ├── architecture.md │ └── versioning.md ├── examples/ │ └── corpus/ │ ├── sample_docs.jsonl │ └── sample_queries.jsonl ├── scripts/ │ ├── gen_release_notes.py │ └── release.sh ├── src/ │ └── crom_efficientllm/ │ ├── budget_packer/ │ │ ├── __init__.py │ │ └── packer.py │ ├── drift_estimator/ │ │ ├── __init__.py │ │ └── estimator.py │ ├── plugins/ │ │ ├── evidently_drift.py │ │ ├── flashrank_reranker.py │ │ └── llmlingua_compressor.py │ ├── rerank_engine/ │ │ ├── __init__.py │ │ └── rerank.py │ ├── __init__.py │ ├── budget_packer.py │ ├── capsule_logger.py │ ├── cli.py │ ├── cross_encoder.py │ ├── demo.py │ └── server.py ├── tests/ │ ├── test_drift.py │ ├── test_packer.py │ └── test_rerank.py ├── .gitignore ├── CHANGELOG.md ├── crom 1.0.1수정 업데이트 상세보고서.md ├── LICENSE ├── pyproject.toml ├── README.md ├── release_notes.md └── requirements.txt ``` ## 2. 파일별 상세 내용 --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\.github\\workflows\\ci.yml` ```yaml name: ci on: push: branches: [ main ] pull_request: jobs: test: runs-on: ubuntu-latest strategy: matrix: python-version: ["3.9", "3.10", "3.11", "3.12"] steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: ${{ matrix.python-version }} - run: pip install -e .[dev] - run: pre-commit run --all-files || true - run: ruff --version && black --version - run: pytest -q ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\.github\\workflows\\release.yml` ```yaml name: release on: push: tags: - 'v*' jobs: release: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - uses: actions/setup-python@v5 with: python-version: '3.11' - run: pip install -e .[dev] - run: pytest -q - name: Build distribution run: | python -m pip install build python -m build - name: Generate release notes from CHANGELOG run: | python scripts/gen_release_notes.py "$GITHUB_REF_NAME" - name: Publish GitHub Release uses: softprops/action-gh-release@v2 with: name: ${{ github.ref_name }} body_path: release_notes.md files: | dist/*.whl dist/*.tar.gz ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\.gitignore` ``` # Python __pycache__/ *.py[cod] *.egg-info/ .env .venv/ virtualenv/ .idea/ .vscode/ .ipynb_checkpoints/ .dist/ .build/ .coverage .pytest_cache/ # OS .DS_Store Thumbs.db ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\CHANGELOG.md` ```markdown # Changelog ## [1.0.1] - 2025-09-06 ### Added - Implemented core modules from scratch based on design documents. - Implemented FastAPI server with `/process` endpoint (`src/crom_efficientllm/server.py`). - Added `enhanced_greedy_pack` with detailed statistics for budget packing (`src/crom_efficientllm/budget_packer.py`). - Implemented `SafeCrossEncoderManager` for robust and observable Cross-Encoder handling (`src/crom_efficientllm/cross_encoder.py`). - Added `ExplainCapsuleLogger` for structured JSONL logging of all processing events (`src/crom_efficientllm/capsule_logger.py`). ### Changed - Major version bump to reflect the first functional implementation of core logic. ## [0.2.1] - 2025-09-02 ### Added - CLI `--save-plots` option for `sweep` and `dp-curve`; saves PNG charts to `benchmarks/out/` (or `--out-dir`). - README Quick Examples mention of plotting flag. - This CHANGELOG. ### Changed - Dev tooling: recommend `matplotlib` via dev extra for plotting. ## [0.2.0] - 2025-09-02 ### Added - GitHub Actions CI (3.9–3.12), pre-commit(ruff/black). - `crom-bench` CLI: `e2e`, `sweep`, `scale`, `dp-curve`, `haystack-compare`. - Plugins: FlashRank/LLMLingua/Evidently (optional extras). - Example corpus & queries (JSONL). ## [0.1.0] - 2025-09-02 - Initial packaging; budget packer, hybrid rerank, drift estimator, demo & metrics. ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\LICENSE` ``` Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with the Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\README.md` ```markdown --- language: en license: apache-2.0 library_name: crom-efficientllm tags: - rag - llm - retrieval - rerank - reranker - context-management - prompt-engineering - observability - python --- # CRoM-Context-Rot-Mitigation--EfficientLLM: Context Reranking and Management for Efficient LLMs
**CRoM (Context Rot Mitigation)-EfficientLLM** is a Python toolkit designed to optimize the context provided to Large Language Models (LLMs). It provides a suite of tools to intelligently select, re-rank, and manage text chunks to fit within a model\'s context budget while maximizing relevance and minimizing performance drift. This project is ideal for developers building RAG (Retrieval-Augmented Generation) pipelines who need to make the most of limited context windows. ## Key Features * **Budget Packer:** Greedily packs the highest-scoring text chunks into a defined token budget using a stable sorting algorithm. * **Hybrid Reranker:** Combines sparse (TF-IDF) and dense (Sentence-Transformers) retrieval scores for robust and high-quality reranking of documents. * **Drift Estimator:** Monitors the semantic drift between sequential model responses using L2 or cosine distance with EWMA smoothing. * **Observability:** Exposes Prometheus metrics for monitoring token savings and drift alerts in production. * **Extensible Plugins:** Supports optional plugins for advanced reranking (`FlashRank`), compression (`LLMLingua`), and drift analysis (`Evidently`). * **Comprehensive Benchmarking:** Includes a CLI for end-to-end pipeline evaluation, budget sweeps, and quality-vs-optimal analysis. ## Installation Install the package directly from source using pip. For development, it\'s recommended to install in editable mode with the `[dev]` extras. ```bash # Clone the repository git clone https://github.com/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM.git cd CRoM-Context-Rot-Mitigation--EfficientLLM # Install in editable mode with development and plugin dependencies pip install -e .[dev,plugins] ``` ## Quickstart ### Demo Run a simple, self-contained demonstration of the core components: ```bash # Run the demo script crom-demo demo ``` ### CLI Benchmarking Examples The package includes a powerful `crom-bench` CLI for evaluation. ```bash # Default E2E (Search→Rerank→Pack→Mock LLM) crom-bench e2e --budget 0.3 # Optional: High-precision configuration with plugins crom-bench e2e --budget 0.3 \ --use-flashrank --flashrank-model ms-marco-TinyBERT-L-2-v2 \ --use-llmlingua --compress-ratio=0.6 \ --use-evidently ``` ### Plotting If `matplotlib` is installed (`pip install -e .[dev]`), you can save benchmark plots directly: ```bash # Save budget sweep result plots crom-bench sweep --save-plots # Save DP-curve plots crom-bench dp-curve --save-plots ``` ## Release & Changelog This project follows semantic versioning. For detailed changes, see the [**CHANGELOG.md**](CHANGELOG.md). Releases are automated via GitHub Actions when a `v*` tag is pushed. ## License This project is licensed under the Apache 2.0 License. See the [LICENSE](LICENSE) file for details. ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\benchmarks\\efficiency_eval.py` ```python """ Efficiency Evaluation for CRoM-EfficientLLM - Synthetic workload to measure token savings, selection quality, and runtime. - No third-party deps beyond numpy/matplotlib (pandas optional for CSVs). Usage: python benchmarks/efficiency_eval.py --budget 0.3 --n 5000 --seed 123 --plot --save """ from __future__ import annotations import argparse import math import time from dataclasses import dataclass from typing import List, Sequence, Tuple, Union import numpy as np try: import pandas as pd # optional except Exception: # pragma: no cover pd = None try: import matplotlib.pyplot as plt # optional except Exception: # pragma: no cover plt = None # --- Local packers (self-contained to avoid imports during quick eval) --- @dataclass(frozen=True) class Chunk: text: str score: float tokens: int def _estimate_tokens(text: str) -> int: return max(1, len(text) // 4) def _coerce_chunk(obj: Union[Chunk, dict], idx: int) -> Chunk: if isinstance(obj, Chunk): return obj if not isinstance(obj, dict): raise TypeError(f"Chunk #{idx} must be Chunk or dict, got {type(obj)}") text = str(obj.get("text", "")) if not text: raise ValueError(f"Chunk #{idx} has empty text") score = float(obj.get("score", 0.0)) tokens = int(obj["tokens"]) if "tokens" in obj else _estimate_tokens(text) if tokens <= 0: raise ValueError(f"Chunk #{idx} has non-positive tokens: {tokens}") return Chunk(text=text, score=score, tokens=tokens) def budget_pack(text_chunks: Sequence[Union[Chunk, dict]], budget: int = 1000) -> List[Chunk]: if budget <= 0: raise ValueError("budget must be > 0") coerced: List[Chunk] = [_coerce_chunk(c, i) for i, c in enumerate(text_chunks)] indexed = list(enumerate(coerced)) indexed.sort(key=lambda it: (-it[1].score, it[1].tokens, it[0])) selected: List[Chunk] = [] total = 0 for _, ch in indexed: if total + ch.tokens <= budget: selected.append(ch) total += ch.tokens return selected def pack_fcfs(text_chunks: Sequence[Union[Chunk, dict]], budget: int) -> List[Chunk]: sel, total = [], 0 for i, obj in enumerate(text_chunks): ch = _coerce_chunk(obj, i) if total + ch.tokens <= budget: sel.append(ch) total += ch.tokens return sel def pack_random(text_chunks: Sequence[Union[Chunk, dict]], budget: int, seed: int = 0) -> List[Chunk]: rng = np.random.default_rng(seed) indices = np.arange(len(text_chunks)) rng.shuffle(indices) sel, total = [], 0 for i in indices: ch = _coerce_chunk(text_chunks[i], i) if total + ch.tokens <= budget: sel.append(ch) total += ch.tokens return sel # --- Data generation and metrics --- def make_synthetic_chunks(n=2000, seed=42, corr=0.6): rng = np.random.default_rng(seed) true_rel = rng.normal(0, 1, size=n) noise = rng.normal(0, 1, size=n) * math.sqrt(1 - corr**2) score = corr * true_rel + noise tokens = np.clip(rng.lognormal(mean=4.0, sigma=0.6, size=n).astype(int), 5, 2000) chunks = [Chunk(text=("x"*int(t*4)), score=float(s), tokens=int(t)) for s, t in zip(score, tokens)] return chunks, true_rel def eval_once(n=5000, budget_ratio=0.3, seed=123, corr=0.6): chunks, true_rel = make_synthetic_chunks(n=n, seed=seed, corr=corr) total_tokens = sum(c.tokens for c in chunks) budget = int(total_tokens * budget_ratio) def run(name, fn): t0 = time.perf_counter() sel = fn(chunks, budget) dt = time.perf_counter() - t0 idx_map = {id(c): i for i, c in enumerate(chunks)} picked_idx = [idx_map[id(c)] for c in sel] rel_sum = float(np.sum(true_rel[picked_idx])) if picked_idx else 0.0 sel_tokens = sum(c.tokens for c in sel) return { "name": name, "time_ms": dt*1000, "selected_chunks": len(sel), "selected_tokens": sel_tokens, "tokens_budget": budget, "tokens_total_unpacked": total_tokens, "tokens_saved": total_tokens - sel_tokens, "save_ratio": (total_tokens - sel_tokens)/total_tokens, "relevance_sum": rel_sum, } rows = [ run("budget_pack", budget_pack), run("fcfs", pack_fcfs), run("random", lambda ch, b: pack_random(ch, b, seed=seed)), ] return rows def quality_vs_optimal(n=200, budget_ratio=0.3, seed=123, corr=0.6): chunks, true_rel = make_synthetic_chunks(n=n, seed=seed, corr=corr) budget = int(sum(c.tokens for c in chunks) * budget_ratio) values = np.maximum(true_rel, 0.0) def optimal(chunks_sub, values, budget): items = chunks_sub vals = list(values) B = budget dp = [0.0]*(B+1) keep = [[False]*(B+1) for _ in range(len(items))] for i, it in enumerate(items): wt = it.tokens val = vals[i] for b in range(B, wt-1, -1): alt = dp[b - wt] + val if alt > dp[b]: dp[b] = alt keep[i][b] = True b = B picked_idx = [] for i in range(len(items)-1, -1, -1): if keep[i][b]: picked_idx.append(i) b -= items[i].tokens picked_idx.reverse() rel_sum = float(np.sum([values[i] for i in picked_idx])) if picked_idx else 0.0 total_tokens = sum(items[i].tokens for i in picked_idx) return picked_idx, rel_sum, total_tokens opt_idx, opt_rel, opt_tokens = optimal(chunks, values, budget) # selections idx_map = {id(c): i for i, c in enumerate(chunks)} def rel_of(selection): pid = [idx_map[id(c)] for c in selection] return float(np.sum(values[pid])) if pid else 0.0 sel_bp = budget_pack(chunks, budget) sel_fc = pack_fcfs(chunks, budget) sel_rd = pack_random(chunks, budget, seed=seed) rows = [ {"name":"optimal_true_rel", "relevance_sum": opt_rel, "selected_tokens": opt_tokens, "selected_chunks": len(opt_idx)}, {"name":"budget_pack_small", "relevance_sum": rel_of(sel_bp), "selected_tokens": sum(c.tokens for c in sel_bp), "selected_chunks": len(sel_bp)}, {"name":"fcfs_small", "relevance_sum": rel_of(sel_fc), "selected_tokens": sum(c.tokens for c in sel_fc), "selected_chunks": len(sel_fc)}, {"name":"random_small", "relevance_sum": rel_of(sel_rd), "selected_tokens": sum(c.tokens for c in sel_rd), "selected_chunks": len(sel_rd)}, ] return rows def main(): ap = argparse.ArgumentParser() ap.add_argument("--n", type=int, default=5000) ap.add_argument("--budget", type=float, default=0.3) ap.add_argument("--seed", type=int, default=123) ap.add_argument("--corr", type=float, default=0.6) ap.add_argument("--plot", action="store_true") ap.add_argument("--save", action="store_true") args = ap.parse_args() rows = eval_once(n=args.n, budget_ratio=args.budget, seed=args.seed, corr=args.corr) rows_q = quality_vs_optimal(n=min(200, args.n), budget_ratio=args.budget, seed=args.seed, corr=args.corr) print("\n=== Efficiency (n={}, budget={{:.0%}}) ===".format(args.n, args.budget)) for r in rows: print("{name:12s} time={{time_ms:7.2f}}ms save_ratio={{save_ratio:6.3f}} tokens_saved={{tokens_saved:8d}} rel_sum={{relevance_sum:8.3f}}".format(**r)) print("\n=== Quality vs Optimal (subset) ===") for r in rows_q: print("{name:18s} rel_sum={{relevance_sum:8.3f}} tokens={{selected_tokens:5d}} chunks={{selected_chunks:4d}}".format(**r)) if pd is not None and args.save: pd.DataFrame(rows).to_csv("benchmarks/results_efficiency.csv", index=False) pd.DataFrame(rows_q).to_csv("benchmarks/results_quality.csv", index=False) print("Saved CSVs to benchmarks حضرتك.") if plt is not None and args.plot: # single-figure plots, no explicit colors x = [r["name"] for r in rows] y = [r["time_ms"] for r in rows] import matplotlib.pyplot as plt plt.figure() plt.bar(x, y) plt.title("Packer Runtime (ms)") plt.xlabel("method") plt.ylabel("ms") plt.show() if __name__ == "__main__": main() ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\benchmarks\\longbench_eval.py` ```python """ Benchmark script: LongBench-like evaluation. Simulates context packing efficiency. """ from crom_efficientllm.budget_packer.packer import budget_pack def evaluate(): chunks = [{"text": f"chunk {i}", "score": i % 5, "tokens": 100} for i in range(20)] packed = budget_pack(chunks, budget=500) print("Selected:", len(packed), "chunks") if __name__ == "__main__": evaluate() ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\benchmarks\\sample_results.json` ```json {} ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\crom 1.0.1수정 업데이트 상세보고서.md` ```markdown # CRoM-EfficientLLM v1.0.1 업데이트 상세 보고서 **문서 목적:** 소셜 미디어 (LinkedIn, Twitter, Medium) 포스팅을 위한 마케팅 AI의 정보 소스 제공 **작성일:** 2025-09-06 **작성자:** CLI ↯C01∞ | Σψ∴ --- ## 1. 개요 (Overview) - **프로젝트명:** CRoM-EfficientLLM (Context Rot Mitigation for Efficient LLMs) - **이전 버전:** 0.2.1 - **신규 버전:** 1.0.1 **핵심 요약:** 이번 v1.0.1 업데이트는 CRoM-EfficientLLM 프로젝트의 **첫 번째 기능 구현(First Functional Implementation)**을 의미합니다. 기존의 아이디어와 뼈대만 있던 상태에서, 실제 동작하는 핵심 로직을 모두 구현하여 **작동 가능한 프로토타입(Working Prototype)**으로 전환했습니다. 이제 사용자들은 RAG 파이프라인의 컨텍스트를 효율적으로 관리하고 최적화하는 핵심 기능들을 직접 테스트하고 활용할 수 있습니다. --- ## 2. 배경 (Background) 기존 v0.2.1은 `pyproject.toml`, `README.md` 등 프로젝트의 방향성과 구조만 정의된 **설계 단계의 스캐폴드(Scaffold)**였습니다. 실제 핵심 로직을 담고 있는 Python 소스 코드가 부재하여 아이디어를 실제로 검증할 수 없었습니다. 이번 업데이트의 목표는 이 설계도에 따라, **처음부터(from scratch) 핵심 기능들을 모두 구현**하여 프로젝트에 생명을 불어넣고, 실제 사용 가능한 상태로 만드는 것이었습니다. --- ## 3. 상세 변경 내역 (Detailed Changes) 이번 업데이트를 통해 4개의 핵심 모듈이 `src/crom_efficientllm/` 디렉토리 내에 새롭게 구현되었습니다. ### 가. `budget_packer.py` - 지능형 컨텍스트 패킹 엔진 - **기능:** LLM에 전달할 컨텍스트(청크)를 주어진 토큰 예산 내에서 가장 효율적으로 구성합니다. - **세부 사항:** - 단순히 텍스트를 자르는 것이 아니라, **점수/토큰 비율**을 기준으로 가장 중요한 정보를 우선적으로 선택합니다. - 패킹 후 **압축률, 절약된 토큰 수, 예산 효율성** 등 상세한 통계를 제공하여, 컨텍스트 관리 전략의 효과를 정량적으로 분석할 수 있는 기반을 마련했습니다. ### 나. `cross_encoder.py` - 안정성 강화 Cross-Encoder 관리자 - **기능:** RAG 파이프라인의 핵심인 Cross-Encoder 모델을 안정적으로 관리하고 오류 발생 시 시스템 전체의 다운을 방지합니다. - **세부 사항:** - `sentence-transformers` 라이브러리가 없거나 모델 로딩에 실패하는 등 다양한 **오류 상황을 자동으로 감지하고 우아하게 처리(Graceful Fallback)**합니다. - 시스템이 멈추는 대신, "비활성화", "오류" 등의 명확한 상태를 API 응답에 포함시켜 **시스템의 안정성과 예측 가능성**을 크게 높였습니다. ### 다. `capsule_logger.py` - 투명성 확보를 위한 캡슐 로거 - **기능:** 시스템의 모든 처리 과정을 **구조화된 로그(Structured Log)**로 기록하여 투명성과 감사 가능성을 제공합니다. - **세부 사항:** - 모든 API 요청, 처리 통계, 시스템 상태를 **"설명 캡슐(Explain Capsule)"**이라는 JSONL 형식으로 영구 저장합니다. - 이는 추후 시스템의 동작을 디버깅하거나, 성능 저하의 원인을 분석하고, AI의 판단 근거를 추적하는 데 필수적인 데이터가 됩니다. ### 라. `server.py` - 핵심 기능 통합 API 서버 - **기능:** 위에서 설명한 모든 모듈(패킹, 리랭킹, 로깅)을 하나로 묶어, 사용자가 쉽게 접근할 수 있는 **FastAPI 기반의 API 서버**를 제공합니다. - **세부 사항:** - `/process` 엔드포인트를 통해 쿼리와 컨텍스트 데이터를 받아, 리랭킹부터 패킹, 로깅까지의 전 과정을 **하나의 트랜잭션으로 처리(Orchestration)**합니다. - `/healthz` 엔드포인트를 통해 외부 모니터링 시스템이 서버의 상태를 쉽게 확인할 수 있도록 구현했습니다. --- ## 4. 버전 관리 및 문서화 (Versioning & Documentation) - **버전 업데이트:** 핵심 기능이 구현됨에 따라, 프로젝트의 버전을 `0.2.1`에서 **`1.0.1`**로 상향 조정하여 중요한 진전을 명시했습니다. - **변경 이력 관리:** `CHANGELOG.md` 파일에 상기된 모든 구현 내역을 상세히 기록하여, 사용자와 기여자가 프로젝트의 발전 과정을 쉽게 추적할 수 있도록 투명성을 확보했습니다. --- ## 5. 기대 효과 및 다음 단계 (Expected Impact & Next Steps) - **기대 효과:** - CRoM-EfficientLLM은 더 이상 아이디어가 아닌, **실제 RAG 시스템에 적용하여 컨텍스트 관리 효율성을 테스트할 수 있는 실용적인 도구**로 발전했습니다. - 개발자들은 LLM의 제한된 컨텍스트 창을 어떻게 하면 가장 효율적으로 사용할 수 있는지에 대한 **정량적인 데이터**를 얻을 수 있게 되었습니다. - **다음 단계:** - `README.md`에 명시된 `crom-demo` 및 `crom-bench` CLI 기능 구현 - 사용자가 원하는 토크나이저(Tokenizer)를 선택할 수 있는 기능 추가 - 다양한 컨텍스트 관리 전략의 성능을 비교할 수 있는 벤치마크 시스템 고도화 --- **보고서 종료.** ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\dashboard\\grafana_dashboard.json` ```json {} ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\dashboard\\prometheus_config.yml` ``` ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\docs\\architecture.md` ```markdown # Architecture This document outlines the architecture of the CRoM-EfficientLLM project. ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\docs\\versioning.md` ```markdown # Versioning & PyPI Guidance This document defines package naming, SemVer rules, and a future path to publish to PyPI. ## 1) Package name - Distribution name (PyPI): `crom-efficientllm` (lowercase, hyphen-separated) - Import name (module): `crom_efficientllm` (PEP 8 underscore) > **Tip**: Keep both names consistent to avoid confusion in docs. ### Check name availability on PyPI - Visit: https://pypi.org/project/crom-efficientllm/ (404 → available) - If taken, consider: `crom-efficient-llm`, `crom-llm-efficient`, `crom-ctx-pack` - Reserve on TestPyPI first: use `test.pypi.org` to validate metadata & upload ## 2) Semantic Versioning (SemVer) We follow **MAJOR.MINOR.PATCH**. - **MAJOR**: Backward-incompatible API changes - e.g., rename function signatures (`budget_pack`), move/rename modules, change return schemas - **MINOR**: Backward-compatible features - new functions/flags (e.g., `pack_summary`, CLI subcommands), performance improvements - **PATCH**: Backward-compatible bug fixes - logic corrections, docs/CI fixes, dependency pin updates without API changes ### Pre-releases Use suffixes: `-a.1`, `-b.1`, `-rc.1` (alpha/beta/release-candidate) - Example: `0.3.0-rc.1` ### Deprecation Policy - Mark deprecated APIs in `CHANGELOG.md` and docstrings - Provide at least **one MINOR release** with warnings before removal ### Public API Surface We commit compatibility for: - `crom_efficientllm.budget_packer.packer`: `Chunk`, `budget_pack`, `pack_summary` - `crom_efficientllm.rerank_engine.rerank`: `hybrid_rerank` - `crom_efficientllm.drift_estimator.estimator`: `DriftEstimator`, `DriftMode` - CLI entrypoints: `crom-demo`, `crom-bench` and their documented flags ## 3) Release Flow (GitHub → PyPI later) - Tag: `vX.Y.Z` → GitHub Actions builds & creates a Release (artifacts attached) - Keep `CHANGELOG.md` updated per release - After API stabilizes, enable **PyPI publish** using a separate workflow with `PYPI_API_TOKEN` secret ### (Future) PyPI publishing steps 1. Create a PyPI account & project 2. Add `PYPI_API_TOKEN` to repo `Settings → Secrets and variables → Actions` 3. Add `release-pypi.yml` workflow to upload on tag 4. Verify install: `pip install crom-efficientllm` and import `crom_efficientllm` --- _Last updated: 2025-09-02_ ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\examples\\corpus\\sample_docs.jsonl` ```json {"id": 1, "text": "AI ethics and governance frameworks for responsible AI."} {"id": 2, "text": "Techniques for detecting model drift in production systems."} {"id": 3, "text": "A recipe for sourdough bread and fermentation tips."} {"id": 4, "text": "Hybrid search: combining sparse and dense retrieval methods."} {"id": 5, "text": "Token budgets and prompt compression strategies for LLMs."} {"id": 6, "text": "Monitoring with Prometheus and building Grafana dashboards."} ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\examples\\corpus\\sample_queries.jsonl` ```json {"query": "how to detect drift in ai models"} {"query": "ways to reduce llm token usage"} {"query": "observability stack prometheus grafana"} ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\pyproject.toml` ```toml [build-system] requires = ["setuptools>=68", "wheel"] build-backend = "setuptools.build_meta" [project] name = "crom-efficientllm" version = "1.0.1" description = "CRoM (Context Rot Mitigation)-EfficientLLM: Budget packing, hybrid rerank, and drift estimation with observability" readme = "README.md" requires-python = ">=3.9" license = { text = "Apache-2.0" } authors = [ { name = "Your Name" } ] dependencies = [ "numpy>=1.24,<3", "scikit-learn>=1.3,<2", "transformers>=4.41,<5", "sentence-transformers>=2.2,<3", "flask>=3,<4", "prometheus-client>=0.20,<1" ] [project.optional-dependencies] dev = [ "pytest>=7", "ruff>=0.4", "black>=24.4", "pre-commit>=3.6", "matplotlib>=3.8,<4" ] plugins = [ "flashrank>=0.2; python_version>='3.9'", "llmlingua>=0.2; python_version>='3.9'", "evidently>=0.4; python_version>='3.9'" ] haystack = [ "farm-haystack[faiss,inference]>=1.26; python_version>='3.9'" ] [project.urls] Homepage = "https://github.com/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM" [project.scripts] "crom-demo" = "crom_efficientllm.demo:main" "crom-bench" = "crom_efficientllm.cli:main" [tool.setuptools] package-dir = {"" = "src"} packages = { find = { where = ["src"] } } [tool.pytest.ini_options] addopts = "-q" [tool.black] line-length = 100 [tool.ruff] target-version = "py39" [tool.ruff.lint] select = ["E","F","I","UP","B","C4","SIM","PL","PERF","RUF","ANN"] ignore = ["ANN101","ANN102"] [tool.ruff.lint.per-file-ignores] "tests/*" = ["S101","ANN","PLR2004"] ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\release_notes.md` ```markdown # Release v0.2.1 ## [0.2.1] - 2025-09-02 ### Added - CLI `--save-plots` option for `sweep` and `dp-curve`; saves PNG charts to `benchmarks/out/` (or `--out-dir`). - README Quick Examples mention of plotting flag. - This CHANGELOG. ### Changed - Dev tooling: recommend `matplotlib` via dev extra for plotting. — generated from [CHANGELOG.md](CHANGELOG.md) ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\requirements.txt` ``` numpy>=1.24,<3 scikit-learn>=1.3,<2 transformers>=4.41,<5 sentence-transformers>=2.2,<3 flask>=3,<4 prometheus-client>=0.20,<1 ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\scripts\\gen_release_notes.py` ```python #!/usr/bin/env python3 from __future__ import annotations import os import re import sys from pathlib import Path ROOT = Path(__file__).resolve().parents[1] CHANGELOG = ROOT / "CHANGELOG.md" OUT = ROOT / "release_notes.md" def main(tag: str) -> None: version = tag.lstrip("v").strip() if not CHANGELOG.exists(): OUT.write_text(f"# Release {tag}\n\n(CHANGELOG.md not found) ", encoding="utf-8") return text = CHANGELOG.read_text(encoding="utf-8") pat = re.compile(rf"^##\s*[[^{re.escape(version)}]]?[^\n]*$", re.MULTILINE) m = pat.search(text) if not m: OUT.write_text( f"# Release {tag}\n\nSection for {version} not found in CHANGELOG.\n\n" + text, encoding="utf-8", ) return start = m.end() m2 = re.search(r"^##\s+", text[start:], re.MULTILINE) end = start + (m2.start() if m2 else len(text) - start) section = text[m.start():end].strip() body = f"# Release {tag}\n\n{section}\n\n— generated from [CHANGELOG.md](CHANGELOG.md)" OUT.write_text(body, encoding="utf-8") if __name__ == "__main__": tag = sys.argv[1] if len(sys.argv) > 1 else os.environ.get("GITHUB_REF_NAME", "") if not tag: print("Usage: gen_release_notes.py vX.Y.Z", file=sys.stderr) sys.exit(2) main(tag) ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\scripts\\release.sh` ```bash #!/usr/bin/env bash set -euo pipefail TAG=${1:-} if [[ -z "$TAG" ]]; then echo "Usage: scripts/release.sh vX.Y.Z"; exit 1 fi # sanity checks if [[ -n $(git status --porcelain) ]]; then echo "❌ Working tree not clean"; exit 1 fi # ensure deps python -m pip install -e .[dev] pre-commit run --all-files pytest -q # generate release notes preview from CHANGELOG python scripts/gen_release_notes.py "$TAG" if [[ -f release_notes.md ]]; then echo "--- release_notes.md (preview top 60 lines) ---" head -n 60 release_notes.md || true echo "--- end preview ---" else echo "⚠️ release_notes.md not generated; will fall back to default notes in GH release" fi # tag & push git tag -a "$TAG" -m "Release $TAG" git push origin "$TAG" echo "✅ Pushed tag $TAG. GitHub Actions will create the Release automatically." echo "➡️ Watch: https://github.com/Flamehaven/CRoM-EfficientLLM/actions" ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\__init__.py` ```python """Public API for CRoM-EfficientLLM.""" from .budget_packer.packer import Chunk, budget_pack, pack_summary from .rerank_engine.rerank import hybrid_rerank from .drift_estimator.estimator import DriftEstimator, DriftMode __all__ = [ "Chunk", "budget_pack", "pack_summary", "hybrid_rerank", "DriftEstimator", "DriftMode", ] ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\budget_packer.py` ```python from typing import List, Dict import logging def enhanced_greedy_pack(chunks: List[Dict], budget: int, score_key: str = "score") -> tuple[List[Dict], Dict]: """ 기존 greedy_pack 함수를 확장하여 상세 통계 반환 Returns: tuple: (packed_chunks, stats_dict) """ if not chunks: return [], { "selected_count": 0, "packed_count": 0, "selected_tokens": 0, "packed_tokens": 0, "compression_ratio": 0.0, "token_savings": 0, "efficiency": 0.0 } # 토큰 수 미리 계산 for chunk in chunks: if "token_count" not in chunk: chunk["token_count"] = max(1, len(chunk.get("text", "")) // 4) # 효율성 기준 정렬 (score/token 비율) sorted_chunks = sorted( chunks, key=lambda x: x.get(score_key, 0) / x["token_count"], reverse=True ) # 그리디 패킹 packed_chunks = [] used_tokens = 0 for chunk in sorted_chunks: if used_tokens + chunk["token_count"] <= budget: packed_chunks.append(chunk) used_tokens += chunk["token_count"] # 상세 통계 계산 total_selected_tokens = sum(chunk["token_count"] for chunk in chunks) stats = { "selected_count": len(chunks), "packed_count": len(packed_chunks), "selected_tokens": total_selected_tokens, "packed_tokens": used_tokens, "compression_ratio": len(packed_chunks) / len(chunks) if chunks else 0.0, "token_savings": total_selected_tokens - used_tokens, "efficiency": used_tokens / budget if budget > 0 else 0.0 } # 📊 로깅 추가 (기존 코드에 없던 통계 가시성) logging.info(f"Packing completed: {stats['packed_count']}/{stats['selected_count']} chunks, " f"tokens: {stats['packed_tokens']}/{stats['selected_tokens']} " f"(efficiency: {stats['efficiency']:.1%})") return packed_chunks, stats ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\capsule_logger.py` ```python import json from pathlib import Path from datetime import datetime from typing import Union, Dict import logging class ExplainCapsuleLogger: """스키마 기반 설명 캡슐 저장 시스템""" def __init__(self, log_directory: str = "artifacts/logs"): self.log_dir = Path(log_directory) self.log_dir.mkdir(parents=True, exist_ok=True) # 로그 파일 경로들 self.capsules_file = self.log_dir / "explain_capsules.jsonl" self.metrics_file = self.log_dir / "processing_metrics.jsonl" self.errors_file = self.log_dir / "error_log.jsonl" logging.info(f"ExplainCapsule Logger initialized: {self.log_dir}") def create_explain_capsule(self, query: str, response_data: Dict, processing_stats: Dict, cross_encoder_status: str) -> Dict: """스키마 준수 설명 캡슐 생성""" capsule = { # 🔖 메타데이터 (필수) "timestamp": datetime.now().isoformat(), "version": "1.0", "processor": "CRoM-Enhanced", # 📝 쿼리 정보 "query": { "text": query, "length": len(query), "token_estimate": len(query) // 4 }, # 📊 처리 통계 (패치 1에서 확장된 정보) "processing_stats": { **processing_stats, "cross_encoder_status": cross_encoder_status }, # 🔧 시스템 상태 "system_state": { "cross_encoder_available": cross_encoder_status not in ["disabled", "unavailable"] }, # 📦 원본 및 결과 청크 "chunks": { "packed": response_data.get("chunks", []) } } return capsule def log_capsule(self, capsule: Dict): """설명 캡슐을 .jsonl 파일에 기록""" try: with open(self.capsules_file, "a", encoding="utf-8") as f: f.write(json.dumps(capsule, ensure_ascii=False) + "\n") except Exception as e: logging.error(f"Failed to log explain capsule: {e}") def log_error(self, error_details: Dict): """오류 정보를 .jsonl 파일에 기록""" try: error_details["timestamp"] = datetime.now().isoformat() with open(self.errors_file, "a", encoding="utf-8") as f: f.write(json.dumps(error_details, ensure_ascii=False) + "\n") except Exception as e: logging.error(f"Failed to log error: {e}") ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\cli.py` ```python from __future__ import annotations import argparse import json import os import time from dataclasses import dataclass from typing import List, Dict, Sequence import numpy as np from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity from crom_efficientllm.budget_packer.packer import budget_pack, Chunk from crom_efficientllm.rerank_engine.rerank import hybrid_rerank try: from sentence_transformers import SentenceTransformer except Exception: # pragma: no cover SentenceTransformer = None # type: ignore # Optional plugins are imported lazily when flags are set @dataclass class Doc: id: str text: str def load_jsonl(path: str) -> List[Dict]: with open(path, "r", encoding="utf-8") as f: return [json.loads(line) for line in f] def build_corpus(path: str) -> List[Doc]: rows = load_jsonl(path) return [Doc(id=str(r.get("id", i)), text=str(r["text"])) for i, r in enumerate(rows)] def sparse_retrieval(query: str, corpus: Sequence[Doc], k: int = 100) -> List[Dict]: texts = [d.text for d in corpus] vect = TfidfVectorizer(ngram_range=(1, 2)).fit(texts) D = vect.transform(texts) Q = vect.transform([query]) sims = cosine_similarity(Q, D).ravel() order = np.argsort(-sims)[:k] return [{"id": corpus[i].id, "text": corpus[i].text, "score_sparse": float(sims[i])} for i in order] def dense_embed_model(name: str): if SentenceTransformer is None: raise RuntimeError("sentence-transformers not installed. Install with `pip install -e .`.") return SentenceTransformer(name) def _apply_flashrank(query: str, docs: List[Dict], model_name: str) -> List[Dict]: try: from crom_efficientllm.plugins.flashrank_reranker import flashrank_rerank except Exception as e: # pragma: no cover raise RuntimeError("FlashRank plugin not available. Install extras: pip install .[plugins]") from e ranked = flashrank_rerank(query, docs, model_name=model_name) # Normalize plugin score to 0..1 and put into score_final scores = np.array([d.get("score_flashrank", 0.0) for d in ranked], dtype=np.float32) if scores.size and float(scores.max() - scores.min()) > 1e-12: s = (scores - scores.min()) / (scores.max() - scores.min()) else: s = np.zeros_like(scores) for i, d in enumerate(ranked): d["score_final"] = float(s[i]) return ranked def _apply_llmlingua(text: str, ratio: float) -> str: try: from crom_efficientllm.plugins.llmlingua_compressor import compress_prompt except Exception as e: # pragma: no cover raise RuntimeError("LLMLingua plugin not available. Install extras: pip install .[plugins]") from e return compress_prompt(text, target_ratio=ratio) def _save_evidently_report(all_embs: List[List[float]], out_html: str) -> None: try: from crom_efficientllm.plugins.evidently_drift import drift_report except Exception as e: # pragma: no cover raise RuntimeError("Evidently plugin not available. Install extras: pip install .[plugins]") from e n = len(all_embs) if n < 4: return ref = all_embs[: n // 2] cur = all_embs[n // 2 :] rep = drift_report(ref, cur) rep.save_html(out_html) def mock_llm_generate(prompt: str) -> str: time.sleep(0.005) # simulate small latency return "[MOCK] " + prompt[:160] def e2e(args: argparse.Namespace) -> None: corpus = build_corpus(args.corpus) queries = [r["query"] for r in load_jsonl(args.queries)] embed = dense_embed_model(args.model) all_embs: List[List[float]] = [] t0 = time.perf_counter() all_rows = [] for q in queries: t_s = time.perf_counter() cands = sparse_retrieval(q, corpus, k=args.k) t_sparse = (time.perf_counter() - t_s) * 1000 t_r = time.perf_counter() if args.use_flashrank: reranked = _apply_flashrank(q, cands, args.flashrank_model) else: reranked = hybrid_rerank(q, cands, embed, alpha=args.alpha) t_rerank = (time.perf_counter() - t_r) * 1000 # token heuristic + budget pack chunks = [ Chunk(text=d["text"], score=d.get("score_final", d.get("score_sparse", 0.0)), tokens=max(1, len(d["text"]) // 4)) for d in reranked ] budget_tokens = int(sum(c.tokens for c in chunks) * args.budget) t_p = time.perf_counter() packed = budget_pack(chunks, budget=budget_tokens) t_pack = (time.perf_counter() - t_p) * 1000 prompt = "\n\n".join(c.text for c in packed) + f"\n\nQ: {q}\nA:" if args.use_llmlingua: prompt = _apply_llmlingua(prompt, ratio=args.compress_ratio) # collect embeddings for drift snapshot (mean-pooled) with np.errstate(all="ignore"): if len(packed) > 0: doc_embs = embed.encode([c.text for c in packed], convert_to_numpy=True) vec = np.mean(doc_embs, axis=0).tolist() all_embs.append(vec) t_l = time.perf_counter() _ = mock_llm_generate(prompt) t_llm = (time.perf_counter() - t_l) * 1000 total = (time.perf_counter() - t_s) * 1000 all_rows.append({ "query": q, "sparse_ms": t_sparse, "rerank_ms": t_rerank, "pack_ms": t_pack, "llm_ms": t_llm, "total_ms": total, "packed_tokens": sum(c.tokens for c in packed), "orig_tokens": sum(c.tokens for c in chunks), "save_ratio": 1 - (sum(c.tokens for c in packed) / max(1, sum(c.tokens for c in chunks))), "used_flashrank": bool(args.use_flashrank), "used_llmlingua": bool(args.use_llmlingua), }) elapsed = (time.perf_counter() - t0) * 1000 os.makedirs(args.out_dir, exist_ok=True) out_path = os.path.join(args.out_dir, "e2e_results.jsonl") with open(out_path, "w", encoding="utf-8") as f: for r in all_rows: f.write(json.dumps(r, ensure_ascii=False) + "\n") print(f"saved results -> {out_path} ({len(all_rows)} queries) ; elapsed={elapsed:.2f}ms") if args.use_evidently and all_embs: html_path = os.path.join(args.out_dir, "evidently_report.html") _save_evidently_report(all_embs, html_path) print(f"evidently report -> {html_path}") def budget_sweep(args: argparse.Namespace) -> None: import itertools corpus = build_corpus(args.corpus) queries = [r["query"] for r in load_jsonl(args.queries)][: args.max_q] embed = dense_embed_model(args.model) budgets = [b / 100.0 for b in range(args.b_min, args.b_max + 1, args.b_step)] rows = [] for q, b in itertools.product(queries, budgets): cands = sparse_retrieval(q, corpus, k=args.k) reranked = hybrid_rerank(q, cands, embed, alpha=args.alpha) chunks = [Chunk(text=d["text"], score=d["score_final"], tokens=max(1, len(d["text"]) // 4)) for d in reranked] budget_tokens = int(sum(c.tokens for c in chunks) * b) packed = budget_pack(chunks, budget=budget_tokens) rows.append({ "query": q, "budget": b, "packed_tokens": sum(c.tokens for c in packed), "orig_tokens": sum(c.tokens for c in chunks), "save_ratio": 1 - (sum(c.tokens for c in packed) / max(1, sum(c.tokens for c in chunks))), "avg_score": float(np.mean([c.score for c in packed])) if packed else 0.0, }) os.makedirs(args.out_dir, exist_ok=True) out_path = os.path.join(args.out_dir, "budget_sweep.jsonl") with open(out_path, "w", encoding="utf-8") as f: for r in rows: f.write(json.dumps(r, ensure_ascii=False) + "\n") print(f"saved results -> {out_path} ; points={len(rows)}") if args.save_plots: try: import matplotlib.pyplot as plt # noqa: F401 import matplotlib.pyplot as _plt except Exception: print("[warn] matplotlib not installed; install dev extras: pip install -e .[dev]") else: # Aggregate by budget import collections agg = collections.defaultdict(list) for r in rows: agg[r["budget"]].append(r) budgets_sorted = sorted(agg.keys()) avg_save = [float(np.mean([x["save_ratio"] for x in agg[b]])) for b in budgets_sorted] avg_score = [float(np.mean([x["avg_score"] for x in agg[b]])) for b in budgets_sorted] _plt.figure() _plt.plot([b * 100 for b in budgets_sorted], [s * 100 for s in avg_save], marker="o") _plt.xlabel("Budget (%)") _plt.ylabel("Avg Save Ratio (%)") _plt.title("Budget Sweep: Save Ratio vs Budget") _plt.grid(True) _plt.tight_layout() _plt.savefig(os.path.join(args.out_dir, "budget_sweep.png")), _plt.figure() _plt.plot([s * 100 for s in avg_save], avg_score, marker="o") _plt.xlabel("Save Ratio (%)") _plt.ylabel("Avg Score (packed)") _plt.title("Pareto: Quality vs Savings") _plt.grid(True) _plt.tight_layout() _plt.savefig(os.path.join(args.out_dir, "budget_pareto.png")), print("plots ->", os.path.join(args.out_dir, "budget_sweep.png"), ",", os.path.join(args.out_dir, "budget_pareto.png")) def scaling(args: argparse.Namespace) -> None: def make_synth(n: int, seed: int = 42): rng = np.random.default_rng(seed) tokens = np.clip(rng.lognormal(4.0, 0.6, n).astype(int), 5, 2000) score = rng.normal(0, 1, n) return [Chunk(text="x" * int(t * 4), score=float(s), tokens=int(t)) for s, t in zip(score, tokens)] for n in [1000, 5000, 10000, 20000, 50000, 100000]: if n > args.n_max: break chunks = make_synth(n) budget = int(sum(c.tokens for c in chunks) * args.budget) t0 = time.perf_counter() _ = budget_pack(chunks, budget) ms = (time.perf_counter() - t0) * 1000 print(f"n={n:6d} budget={args.budget:.0%} time={ms:8.2f} ms") def dp_curve(args: argparse.Namespace) -> None: def make_synth(n: int, seed: int = 123, corr: float = 0.6): rng = np.random.default_rng(seed) true_rel = rng.normal(0, 1, n) noise = rng.normal(0, 1, n) * np.sqrt(1 - corr**2) score = corr * true_rel + noise tokens = np.clip(rng.lognormal(4.0, 0.6, n).astype(int), 5, 2000) chunks = [Chunk(text="x" * int(t * 4), score=float(s), tokens=int(t)) for s, t in zip(score, tokens)] return chunks, true_rel def optimal(chunks: Sequence[Chunk], values: np.ndarray, budget: int) -> float: B = budget dp = np.zeros(B + 1, dtype=np.float32) for i, ch in enumerate(chunks): wt = ch.tokens val = max(0.0, float(values[i])) for b in range(B, wt - 1, -1): dp[b] = max(dp[b], dp[b - wt] + val) return float(dp[B]) chunks, true_rel = make_synth(args.n) total = sum(c.tokens for c in chunks) budgets = [int(total * b / 100.0) for b in range(args.b_min, args.b_max + 1, args.b_step)] out_rows = [] for B in budgets: sel = budget_pack(chunks, B) idx_map = {id(c): i for i, c in enumerate(chunks)} rel_bp = float(np.sum([max(0.0, true_rel[idx_map[id(c)]]) for c in sel])) rel_opt = optimal(chunks[: args.n_opt], true_rel[: args.n_opt], min(B, sum(c.tokens for c in chunks[: args.n_opt]))) pct = rel_bp / max(rel_opt, 1e-9) out_rows.append({"budget": B, "pct": pct, "rel_bp": rel_bp, "rel_opt": rel_opt}) print(f"budget={B:8d} rel_bp={rel_bp:8.3f} rel_opt≈{rel_opt:8.3f} pct≈{pct*100:5.1f}% (subset n={args.n_opt})") if args.save_plots: try: import matplotlib.pyplot as plt # noqa: F401 import matplotlib.pyplot as _plt except Exception: print("[warn] matplotlib not installed; install dev extras: pip install -e .[dev]") else: _plt.figure() xs = [r["budget"] * 100.0 / total for r in out_rows] ys = [r["pct"] * 100 for r in out_rows] _plt.plot(xs, ys, marker="o") _plt.xlabel("Budget (%)") _plt.ylabel("% of optimal (subset)") _plt.title("DP Curve: Greedy vs Optimal") _plt.grid(True) _plt.tight_layout() os.makedirs(args.out_dir, exist_ok=True) _plt.savefig(os.path.join(args.out_dir, "dp_curve.png")), print("plot ->", os.path.join(args.out_dir, "dp_curve.png")), def compare_haystack(args: argparse.Namespace) -> None: try: from haystack.nodes import BM25Retriever, SentenceTransformersRetriever from haystack.document_stores import InMemoryDocumentStore except Exception as e: # pragma: no cover raise RuntimeError("Install extras: pip install .[haystack]") from e corpus = build_corpus(args.corpus) docs = [{"content": d.text, "meta": {"id": d.id}} for d in corpus] store = InMemoryDocumentStore(use_bm25=True) store.write_documents(docs) bm25 = BM25Retriever(document_store=store) dretr = SentenceTransformersRetriever(document_store=store, model_name_or_path=args.model) queries = [r["query"] for r in load_jsonl(args.queries)][: args.max_q] for q in queries: t0 = time.perf_counter() bm = bm25.retrieve(q, top_k=args.k) dn = dretr.retrieve(q, top_k=args.k) ms = (time.perf_counter() - t0) * 1000 print(f"{q[:40]:40s} bm25={len(bm):3d} dense={len(dn):3d} time={ms:7.2f} ms") def main() -> None: ap = argparse.ArgumentParser(prog="crom-bench") sub = ap.add_subparsers(dest="cmd", required=True) p = sub.add_parser("e2e", help="end-to-end: retrieval → rerank → pack → mock LLM") p.add_argument("--corpus", default="examples/corpus/sample_docs.jsonl") p.add_argument("--queries", default="examples/corpus/sample_queries.jsonl") p.add_argument("--model", default="sentence-transformers/all-MiniLM-L6-v2") p.add_argument("--k", type=int, default=200) p.add_argument("--alpha", type=float, default=0.5) p.add_argument("--budget", type=float, default=0.3) # plugins p.add_argument("--use-flashrank", action="store_true") p.add_argument("--flashrank-model", default="ms-marco-TinyBERT-L-2-v2") p.add_argument("--use-llmlingua", action="store_true") p.add_argument("--compress-ratio", type=float, default=0.6) p.add_argument("--use-evidently", action="store_true") p.add_argument("--out-dir", default="benchmarks/out") p.set_defaults(func=e2e) p2 = sub.add_parser("sweep", help="budget sweep + Pareto csv") p2.add_argument("--corpus", default="examples/corpus/sample_docs.jsonl") p2.add_argument("--queries", default="examples/corpus/sample_queries.jsonl") p2.add_argument("--model", default="sentence-transformers/all-MiniLM-L6-v2") p2.add_argument("--k", type=int, default=200) p2.add_argument("--alpha", type=float, default=0.5) p2.add_argument("--b-min", type=int, default=10) p2.add_argument("--b-max", type=int, default=90) p2.add_argument("--b-step", type=int, default=10) p2.add_argument("--max-q", type=int, default=20) p2.add_argument("--out-dir", default="benchmarks/out") p2.add_argument("--save-plots", action="store_true") p2.set_defaults(func=budget_sweep) p3 = sub.add_parser("scale", help="scaling runtime with synthetic data") p3.add_argument("--n-max", type=int, default=100000) p3.add_argument("--budget", type=float, default=0.3) p3.set_defaults(func=scaling) p4 = sub.add_parser("dp-curve", help="% of optimal vs budget (synthetic)") p4.add_argument("--n", type=int, default=2000) p4.add_argument("--n-opt", type=int, default=200) p4.add_argument("--b-min", type=int, default=10) p4.add_argument("--b-max", type=int, default=90) p4.add_argument("--b-step", type=int, default=10) p4.add_argument("--out-dir", default="benchmarks/out") p4.add_argument("--save-plots", action="store_true") p4.set_defaults(func=dp_curve) p5 = sub.add_parser("haystack-compare", help="compare BM25 vs dense retrievers (Haystack)") p5.add_argument("--corpus", default="examples/corpus/sample_docs.jsonl") p5.add_argument("--queries", default="examples/corpus/sample_queries.jsonl") p5.add_argument("--model", default="sentence-transformers/all-MiniLM-L6-v2") p5.add_argument("--k", type=int, default=50) p5.add_argument("--max-q", type=int, default=10) p5.set_defaults(func=compare_haystack) args = ap.parse_args() args.func(args) if __name__ == "__main__": main() ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\cross_encoder.py` ```python from typing import List, Optional import logging class SafeCrossEncoderManager: """Cross-Encoder 상태를 명시적으로 관리하는 클래스""" def __init__(self, model_name: Optional[str] = None, device: str = "cpu"): self.model_name = model_name self.device = device self.model = None self.status = "unknown" self.last_error = None self._initialize() def _initialize(self): """Cross-Encoder 초기화 with 상세 상태 추적""" if not self.model_name: self.status = "disabled" logging.info("Cross-Encoder: DISABLED (no model specified)") return try: # sentence-transformers 임포트 체크 from sentence_transformers import CrossEncoder # 모델 로딩 시도 self.model = CrossEncoder(self.model_name, device=self.device) self.status = f"active ({self.model_name})" # 🆕 성공 시 상세 로깅 logging.info(f"Cross-Encoder: ACTIVE") logging.info(f" └─ Model: {self.model_name}") logging.info(f" └─ Device: {self.device}") except ImportError as e: self.status = "unavailable (sentence-transformers not installed)" self.last_error = str(e) # 🆕 의존성 누락 시 명확한 안내 logging.warning("Cross-Encoder: UNAVAILABLE") logging.warning(" └─ Reason: sentence-transformers not installed") logging.warning(" └─ Install: pip install sentence-transformers") except Exception as e: self.status = f"error ({type(e).__name__})" self.last_error = str(e) # 🆕 기타 오류 시 상세 로깅 logging.error(f"Cross-Encoder: ERROR") logging.error(f" └─ Model: {self.model_name}") logging.error(f" └─ Error: {str(e)}") def get_status_for_response(self) -> str: """API 응답용 상태 문자열""" return self.status def rerank(self, query: str, documents: List[str]) -> List[float]: """안전한 리랭킹 with 상태 로깅""" if self.model is None: # 🆕 비활성화 상태 명시적 로깅 logging.debug(f"Cross-Encoder rerank skipped: {self.status}") return [0.5] * len(documents) # 중립 점수 try: pairs = [(query, doc) for doc in documents] scores = self.model.predict(pairs) # 🆕 성공적 리랭킹 로깅 logging.debug(f"Cross-Encoder reranked {len(documents)} documents") return scores.tolist() if hasattr(scores, 'tolist') else list(scores) except Exception as e: # 🆕 런타임 오류 시 상세 로깅 logging.error(f"Cross-Encoder rerank failed: {str(e)}") logging.error(f" └─ Fallback: returning neutral scores") return [0.5] * len(documents) ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\demo.py` ```python """ Demo & Metrics Server for CRoM-EfficientLLM ------------------------------------------ - `crom-demo demo` : run sample pipeline - `crom-demo serve` : start Flask + Prometheus metrics on :8000 """ from __future__ import annotations import argparse from typing import List from flask import Flask, Response from prometheus_client import Counter, Gauge, generate_latest, CONTENT_TYPE_LATEST from crom_efficientllm.budget_packer.packer import budget_pack, pack_summary, Chunk from crom_efficientllm.rerank_engine.rerank import hybrid_rerank from crom_efficientllm.drift_estimator.estimator import DriftEstimator, DriftMode # ---- Prometheus metrics ---- TOKENS_SAVED = Gauge("crom_tokens_saved", "Tokens saved by budget packer") DRIFT_ALERTS = Counter("crom_drift_alerts_total", "Total drift alerts emitted") class DummyEmbed: def encode(self, text_or_list, convert_to_numpy=False): if isinstance(text_or_list, list): return [self.encode(t) for t in text_or_list] vec = [ord(c) % 7 for c in str(text_or_list)[:16]] while len(vec) < 16: vec.append(0) return vec def run_demo() -> None: chunks: List[Chunk] = [ Chunk(text="AI ethics is crucial", score=0.9, tokens=50), Chunk(text="Unrelated text", score=0.2, tokens=40), Chunk(text="Drift detection research", score=0.8, tokens=60), ] packed = budget_pack(chunks, budget=80) summary = pack_summary(packed) print("Packed:", [c.text for c in packed], summary) docs = [{"text": "AI drift measurement"}, {"text": "Cooking recipes"}] reranked = hybrid_rerank("AI ethics", docs, DummyEmbed(), alpha=0.5) print("Reranked:", [d["text"] for d in reranked]) de = DriftEstimator(threshold=0.5, mode=DriftMode.L2) print("Drift state:", de.state()) print("Drift alert?", de.update([1, 2, 3])) print("Drift alert?", de.update([10, 10, 10])) print("Drift state:", de.state()) # Update metrics TOKENS_SAVED.set(max(0, sum(c.tokens for c in chunks) - summary["tokens"])) alert1, *_ = de.update([1, 2, 3]) alert2, *_ = de.update([10, 10, 10]) if alert1: DRIFT_ALERTS.inc() if alert2: DRIFT_ALERTS.inc() def create_app() -> Flask: app = Flask(__name__) @app.get("/healthz") def healthz(): return {"status": "ok"} @app.get("/metrics") def metrics(): return Response(generate_latest(), mimetype=CONTENT_TYPE_LATEST) return app def main() -> None: parser = argparse.ArgumentParser(prog="crom-demo") sub = parser.add_subparsers(dest="cmd", required=True) sub.add_parser("demo", help="run sample pipeline") pserve = sub.add_parser("serve", help="start metrics server on :8000") pserve.add_argument("--host", default="0.0.0.0") pserve.add_argument("--port", type=int, default=8000) args = parser.parse_args() if args.cmd == "demo": run_demo() return if args.cmd == "serve": app = create_app() app.run(host=args.host, port=args.port) return if __name__ == "__main__": main() ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\server.py` ```python from fastapi import FastAPI, HTTPException import time from typing import List, Dict import logging # 내부 모듈 임포트 from .budget_packer import enhanced_greedy_pack from .cross_encoder import SafeCrossEncoderManager from .capsule_logger import ExplainCapsuleLogger # --- FastAPI 앱 및 주요 컴포넌트 초기화 --- app = FastAPI( title="CRoM-EfficientLLM Server", description="Context Reranking and Management for Efficient LLMs", version="1.0.1" ) logging.basicConfig(level=logging.INFO) # 컴포넌트 인스턴스화 # TODO: 설정 파일(config.yaml)에서 모델 이름 등을 로드하도록 개선 필요 ce_manager = SafeCrossEncoderManager(model_name="ms-marco-TinyBERT-L-2-v2") capsule_logger = ExplainCapsuleLogger(log_directory="artifacts/logs") # --- 응답 스키마 및 헬퍼 함수 --- class ProcessResponseV2: """확장된 /process 엔드포인트 응답 스키마 헬퍼""" @staticmethod def create_response(query: str, packed_chunks: List[Dict], processing_stats: Dict, cross_encoder_status: str, processing_time: float) -> Dict: """개선된 응답 생성""" response = { "success": True, "query": query, "chunks": packed_chunks, "stats": processing_stats, # packing 통계 "meta": { "cross_encoder_status": cross_encoder_status, "processing_time_ms": processing_time * 1000, "timestamp": time.time() } } return response # --- API 엔드포인트 정의 --- @app.post("/process", summary="Rerank and pack text chunks") def process_chunks(query: str, chunks: List[Dict], budget: int = 4096): """ 주어진 쿼리와 청크 목록을 리랭킹하고 예산에 맞게 패킹합니다. """ start_time = time.time() try: # 1. Cross-Encoder로 리랭킹 (활성화 시) doc_texts = [chunk.get("text", "") for chunk in chunks] scores = ce_manager.rerank(query, doc_texts) for chunk, score in zip(chunks, scores): chunk["score"] = score # 2. 예산에 맞게 패킹 packed_chunks, stats = enhanced_greedy_pack(chunks, budget=budget, score_key="score") # 3. 최종 응답 생성 processing_time = time.time() - start_time response_data = ProcessResponseV2.create_response( query=query, packed_chunks=packed_chunks, processing_stats=stats, cross_encoder_status=ce_manager.get_status_for_response(), processing_time=processing_time ) # 4. 설명 캡슐 로깅 capsule = capsule_logger.create_explain_capsule( query=query, response_data=response_data, processing_stats=stats, cross_encoder_status=ce_manager.get_status_for_response() ) capsule_logger.log_capsule(capsule) return response_data except Exception as e: logging.error(f"Error during /process: {e}", exc_info=True) # 오류 로깅 capsule_logger.log_error({ "endpoint": "/process", "error": str(e), "query": query, }) raise HTTPException(status_code=500, detail=f"Internal Server Error: {e}") @app.get("/healthz", summary="Health check") def health_check(): """서버의 상태를 확인합니다.""" return {"status": "ok", "cross_encoder": ce_manager.get_status_for_response()} @app.get("/metrics", summary="Get Prometheus metrics") def get_metrics(): """Prometheus 메트릭을 노출합니다.""" # TODO: Prometheus-client를 사용하여 실제 메트릭을 구현해야 함 return {"message": "Metrics endpoint is active. Implement with prometheus-client."} ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\tests\\test_drift.py` ```python from crom_efficientllm.drift_estimator.estimator import DriftEstimator, DriftMode def test_drift_triggers(): de = DriftEstimator(threshold=0.1, mode=DriftMode.L2) alert, dist, ewma = de.update([0, 0, 0]) assert alert is False alert, dist, ewma = de.update([1, 0, 0]) assert isinstance(alert, bool) ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\tests\\test_packer.py` ```python from crom_efficientllm.budget_packer.packer import budget_pack, Chunk def test_budget_pack_respects_budget(): chunks = [Chunk("a", 1.0, 60), Chunk("b", 0.9, 50), Chunk("c", 0.5, 20)] sel = budget_pack(chunks, budget=70) assert sum(c.tokens for c in sel) <= 70 def test_budget_pack_sorting_stable(): chunks = [ {"text": "x", "score": 0.9, "tokens": 30}, {"text": "y", "score": 0.9, "tokens": 20}, {"text": "z", "score": 0.8, "tokens": 10}, ] sel = budget_pack(chunks, budget=60) assert [c.text for c in sel] == ["y", "x", "z"] ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\tests\\test_rerank.py` ```python from crom_efficientllm.rerank_engine.rerank import hybrid_rerank class Dummy: def encode(self, text_or_list, convert_to_numpy=False): if isinstance(text_or_list, list): return [self.encode(t) for t in text_or_list] vec = [ord(c) % 5 for c in str(text_or_list)[:8]] while len(vec) < 8: vec.append(0) return vec def test_hybrid_rerank_returns_scores(): docs = [{"text": "alpha"}, {"text": "beta"}] out = hybrid_rerank("alp", docs, Dummy(), alpha=0.5) assert len(out) == 2 assert {"score_sparse", "score_dense", "score_final"} <= set(out[0].keys()) ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\budget_packer\\__init__.py` ```python from .packer import Chunk, budget_pack, pack_summary __all__ = ["Chunk", "budget_pack", "pack_summary"] ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\budget_packer\\packer.py` ```python """ Budget Packer ------------- Greedy packing of highest-scoring chunks under a token budget. - Stable ordering (score desc, tokens asc, original index asc) - Input validation and optional token estimation """ from __future__ import annotations from dataclasses import dataclass from typing import Any, Iterable, List, Sequence, Tuple, Union, Optional @dataclass(frozen=True) class Chunk: text: str score: float tokens: int def _estimate_tokens(text: str) -> int: """Lightweight heuristic when `tokens` absent. Avoids heavy tokenizers. Why: keeps demo dependency-light and deterministic. """ # approx: 4 chars ≈ 1 token; floor at 1 return max(1, len(text) // 4) def _coerce_chunk(obj: Union[Chunk, dict], idx: int) -> Chunk: if isinstance(obj, Chunk): return obj if not isinstance(obj, dict): raise TypeError(f"Chunk #{idx} must be Chunk or dict, got {type(obj)}") text = str(obj.get("text", "")) if not text: raise ValueError(f"Chunk #{idx} has empty text") score = float(obj.get("score", 0.0)) tokens = int(obj["tokens"]) if "tokens" in obj else _estimate_tokens(text) if tokens <= 0: raise ValueError(f"Chunk #{idx} has non-positive tokens: {tokens}") return Chunk(text=text, score=score, tokens=tokens) def budget_pack( text_chunks: Sequence[Union[Chunk, dict]], budget: int = 1000, ) -> List[Chunk]: """ Args: text_chunks: iterable of Chunk or dict with keys {text, score, tokens} budget: max token budget (int > 0) Returns: list of selected chunks (order of selection) """ if budget <= 0: raise ValueError("budget must be > 0") coerced: List[Chunk] = [_coerce_chunk(c, i) for i, c in enumerate(text_chunks)] # stable sort by (-score, tokens, original_index) indexed: List[Tuple[int, Chunk]] = list(enumerate(coerced)) indexed.sort(key=lambda it: (-it[1].score, it[1].tokens, it[0])) selected: List[Chunk] = [] total = 0 for _, ch in indexed: if total + ch.tokens <= budget: selected.append(ch) total += ch.tokens return selected def pack_summary(selected: Sequence[Chunk]) -> dict: tokens = sum(c.tokens for c in selected) return { "num_chunks": len(selected), "tokens": tokens, "avg_score": (sum(c.score for c in selected) / len(selected)) if selected else 0.0, } ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\drift_estimator\\__init__.py` ```python from .estimator import DriftEstimator, DriftMode __all__ = ["DriftEstimator", "DriftMode"] ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\drift_estimator\\estimator.py` ```python """ Drift Estimator --------------- Monitors embedding shift using L2 or cosine distance. Supports EWMA smoothing and exposes state for dashboards. """ from __future__ import annotations from dataclasses import dataclass, field from enum import Enum from typing import List, Optional, Tuple import numpy as np class DriftMode(str, Enum): L2 = "l2" COSINE = "cosine" @dataclass class DriftEstimator: threshold: float = 0.2 mode: DriftMode = DriftMode.L2 ewma_alpha: float = 0.3 # smoothing for stability history: List[np.ndarray] = field(default_factory=list) distances: List[float] = field(default_factory=list) ewma: Optional[float] = None def _distance(self, a: np.ndarray, b: np.ndarray) -> float: a = np.asarray(a, dtype=np.float32).ravel() b = np.asarray(b, dtype=np.float32).ravel() if self.mode == DriftMode.L2: return float(np.linalg.norm(a - b)) # cosine distance = 1 - cosine similarity denom = (np.linalg.norm(a) * np.linalg.norm(b)) + 1e-12 return float(1.0 - float(np.dot(a, b)) / denom) def update(self, embedding) -> Tuple[bool, float, float]: """ Args: embedding: vector representation of current response Returns: (drift_alert, distance, ewma) """ emb = np.asarray(embedding, dtype=np.float32) if emb.ndim != 1: emb = emb.ravel() if not self.history: self.history.append(emb) self.ewma = 0.0 self.distances.append(0.0) return (False, 0.0, 0.0) last = self.history[-1] dist = self._distance(emb, last) self.history.append(emb) self.distances.append(dist) # EWMA update if self.ewma is None: self.ewma = dist else: self.ewma = self.ewma_alpha * dist + (1 - self.ewma_alpha) * self.ewma return (bool(self.ewma > self.threshold), float(dist), float(self.ewma)) def state(self) -> dict: return { "count": len(self.history), "last_distance": self.distances[-1] if self.distances else 0.0, "ewma": self.ewma or 0.0, "mode": self.mode.value, "threshold": self.threshold, } ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\plugins\\evidently_drift.py` ```python from __future__ import annotations from typing import List try: from evidently.metric_preset import DataDriftPreset from evidently.report import Report import pandas as pd except Exception as e: # pragma: no cover raise RuntimeError("evidently not installed. Install extras: pip install .[plugins]") from e def drift_report(ref: List[List[float]], cur: List[List[float]]): ref_df = pd.DataFrame(ref) cur_df = pd.DataFrame(cur) rep = Report(metrics=[DataDriftPreset()]) rep.run(reference_data=ref_df, current_data=cur_df) return rep ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\plugins\\flashrank_reranker.py` ```python from __future__ import annotations from typing import List, Dict try: from flashrank import Reranker except Exception as e: # pragma: no cover raise RuntimeError("flashrank not installed. Install extras: pip install .[plugins]") from e def flashrank_rerank(query: str, docs: List[Dict[str, str]], model_name: str = "ms-marco-TinyBERT-L-2-v2") -> List[Dict]: rr = Reranker(model_name) pairs = [(query, d["text"]) for d in docs] scores = rr.rerank(pairs) order = sorted(range(len(docs)), key=lambda i: -scores[i]) return [docs[i] | {"score_flashrank": float(scores[i])} for i in order] ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\plugins\\llmlingua_compressor.py` ```python from __future__ import annotations try: from llmlingua import PromptCompressor except Exception as e: # pragma: no cover raise RuntimeError("llmlingua not installed. Install extras: pip install .[plugins]") from e def compress_prompt(text: str, target_ratio: float = 0.5) -> str: pc = PromptCompressor() out = pc.compress(text, target_ratio=target_ratio) return out["compressed_prompt"] if isinstance(out, dict) and "compressed_prompt" in out else str(out) ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\rerank_engine\\__init__.py` ```python from .rerank import hybrid_rerank __all__ = ["hybrid_rerank"] ``` --- ### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\rerank_engine\\rerank.py` ```python """ Hybrid Rerank Engine -------------------- Combines sparse (TF-IDF cosine) and dense (embedding cosine) scores with min-max normalization for robust fusion. """ from __future__ import annotations from typing import Dict, List, Sequence import numpy as np from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity def _to_numpy(x): arr = np.asarray(x) return arr.astype(np.float32) def _batch_encode(embed_model, texts: Sequence[str]) -> np.ndarray: # Try common API of sentence-transformers: encode(list, convert_to_numpy=True) if hasattr(embed_model, "encode"): try: return _to_numpy(embed_model.encode(list(texts), convert_to_numpy=True)) except TypeError: # Fallback: per-text encode return _to_numpy([embed_model.encode(t) for t in texts]) raise TypeError("embed_model must provide .encode()") def _minmax(x: np.ndarray) -> np.ndarray: if x.size == 0: return x mn, mx = float(np.min(x)), float(np.max(x)) if mx - mn <= 1e-12: return np.zeros_like(x) return (x - mn) / (mx - mn) def hybrid_rerank( query: str, docs: List[Dict[str, str]], embed_model, alpha: float = 0.5, ) -> List[Dict[str, object]]: """ Args: query: query string docs: list of {"text": str} embed_model: model with .encode() -> vector(s) alpha: weight between sparse/dense in [0,1] Returns: ranked list of enriched docs with scores {score_sparse, score_dense, score_final} """ if not 0.0 <= alpha <= 1.0: raise ValueError("alpha must be in [0, 1]") if not docs: return [] texts = [d.get("text", "") for d in docs] # Sparse: TF-IDF cosine tfidf = TfidfVectorizer(ngram_range=(1, 2), min_df=1).fit(texts) Q = tfidf.transform([query]) D = tfidf.transform(texts) sparse_scores = cosine_similarity(Q, D).ravel() # Dense: cosine(sim) between L2-normalized embeddings q_emb = _to_numpy(embed_model.encode(query)) d_embs = _batch_encode(embed_model, texts) # L2 normalize def _l2norm(a): n = np.linalg.norm(a, axis=-1, keepdims=True) + 1e-12 return a / n qn = _l2norm(q_emb.reshape(1, -1)) dn = _l2norm(d_embs) dense_scores = cosine_similarity(qn, dn).ravel() # Min-max to [0,1] before fusion to avoid scale issues s_sparse = _minmax(sparse_scores) s_dense = _minmax(dense_scores) final_scores = alpha * s_sparse + (1 - alpha) * s_dense order = np.argsort(-final_scores) ranked = [] for i in order: item = dict(docs[i]) item.update( score_sparse=float(s_sparse[i]), score_dense=float(s_dense[i]), score_final=float(final_scores[i]), ) ranked.append(item) return ranked ```