Flamehaven
/

CRoM-Context-Rot-Mitigation-EfficientLLM

+# CRoM-EfficientLLM 전체 프로젝트 보고서
+## 1. 프로젝트 전체 구조 (Directory Tree)
+```
+CRoM-EfficientLLM/
+├── .github/
+│   └── workflows/
+│       ├── ci.yml
+│       └── release.yml
+├── benchmarks/
+│   ├── efficiency_eval.py
+│   ├── longbench_eval.py
+│   └── sample_results.json
+├── dashboard/
+│   ├── grafana_dashboard.json
+│   └── prometheus_config.yml
+├── docs/
+│   ├── architecture.md
+│   └── versioning.md
+├── examples/
+│   └── corpus/
+│       ├── sample_docs.jsonl
+│       └── sample_queries.jsonl
+├── scripts/
+│   ├── gen_release_notes.py
+│   └── release.sh
+├── src/
+│   └── crom_efficientllm/
+│       ├── budget_packer/
+│       │   ├── __init__.py
+│       │   └── packer.py
+│       ├── drift_estimator/
+│       │   ├── __init__.py
+│       │   └── estimator.py
+│       ├── plugins/
+│       │   ├── evidently_drift.py
+│       │   ├── flashrank_reranker.py
+│       │   └── llmlingua_compressor.py
+│       ├── rerank_engine/
+│       │   ├── __init__.py
+│       │   └── rerank.py
+│       ├── __init__.py
+│       ├── budget_packer.py
+│       ├── capsule_logger.py
+│       ├── cli.py
+│       ├── cross_encoder.py
+│       ├── demo.py
+│       └── server.py
+├── tests/
+│   ├── test_drift.py
+│   ├── test_packer.py
+│   └── test_rerank.py
+├── .gitignore
+├── CHANGELOG.md
+├── crom 1.0.1수정 업데이트 상세보고서.md
+├── LICENSE
+├── pyproject.toml
+├── README.md
+├── release_notes.md
+└── requirements.txt
+```
+## 2. 파일별 상세 내용
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\.github\\workflows\\ci.yml`
+```yaml
+name: ci
+on:
+  push:
+    branches: [ main ]
+  pull_request:
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.9", "3.10", "3.11", "3.12"]
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+      - run: pip install -e .[dev]
+      - run: pre-commit run --all-files || true
+      - run: ruff --version && black --version
+      - run: pytest -q
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\.github\\workflows\\release.yml`
+```yaml
+name: release
+on:
+  push:
+    tags:
+      - 'v*'
+jobs:
+  release:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+      - run: pip install -e .[dev]
+      - run: pytest -q
+      - name: Build distribution
+        run: |
+          python -m pip install build
+          python -m build
+      - name: Generate release notes from CHANGELOG
+        run: |
+          python scripts/gen_release_notes.py "$GITHUB_REF_NAME"
+      - name: Publish GitHub Release
+        uses: softprops/action-gh-release@v2
+        with:
+          name: ${{ github.ref_name }}
+          body_path: release_notes.md
+          files: |
+            dist/*.whl
+            dist/*.tar.gz
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\.gitignore`
+```
+# Python
+__pycache__/
+*.py[cod]
+*.egg-info/
+.env
+.venv/
+virtualenv/
+.idea/
+.vscode/
+.ipynb_checkpoints/
+.dist/
+.build/
+.coverage
+.pytest_cache/
+# OS
+.DS_Store
+Thumbs.db
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\CHANGELOG.md`
+```markdown
+# Changelog
+## [1.0.1] - 2025-09-06
+### Added
+- Implemented core modules from scratch based on design documents.
+- Implemented FastAPI server with `/process` endpoint (`src/crom_efficientllm/server.py`).
+- Added `enhanced_greedy_pack` with detailed statistics for budget packing (`src/crom_efficientllm/budget_packer.py`).
+- Implemented `SafeCrossEncoderManager` for robust and observable Cross-Encoder handling (`src/crom_efficientllm/cross_encoder.py`).
+- Added `ExplainCapsuleLogger` for structured JSONL logging of all processing events (`src/crom_efficientllm/capsule_logger.py`).
+### Changed
+- Major version bump to reflect the first functional implementation of core logic.
+## [0.2.1] - 2025-09-02
+### Added
+- CLI `--save-plots` option for `sweep` and `dp-curve`; saves PNG charts to `benchmarks/out/` (or `--out-dir`).
+- README Quick Examples mention of plotting flag.
+- This CHANGELOG.
+### Changed
+- Dev tooling: recommend `matplotlib` via dev extra for plotting.
+## [0.2.0] - 2025-09-02
+### Added
+- GitHub Actions CI (3.9–3.12), pre-commit(ruff/black).
+- `crom-bench` CLI: `e2e`, `sweep`, `scale`, `dp-curve`, `haystack-compare`.
+- Plugins: FlashRank/LLMLingua/Evidently (optional extras).
+- Example corpus & queries (JSONL).
+## [0.1.0] - 2025-09-02
+- Initial packaging; budget packer, hybrid rerank, drift estimator, demo & metrics.
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\LICENSE`
+```
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with the Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\README.md`
+```markdown
+---
+language: en
+license: apache-2.0
+library_name: crom-efficientllm
+tags:
+- rag
+- llm
+- retrieval
+- rerank
+- reranker
+- context-management
+- prompt-engineering
+- observability
+- python
+---
+# CRoM-Context-Rot-Mitigation--EfficientLLM: Context Reranking and Management for Efficient LLMs
+<p align="left">
+  <a href="https://github.com/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM/actions">
+    <img alt="CI" src="https://img.shields.io/github/actions/workflow/status/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM/ci.yml?branch=main" />
+  </a>
+  <a href="#-benchmarks">
+    <img alt="Bench" src="https://img.shields.io/badge/benchmarks-ready-success" />
+  </a>
+  <a href="LICENSE">
+    <img alt="License" src="https://img.shields.io/badge/license-Apache%202.0-blue" />
+  </a>
+  <a href="https://github.com/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM/releases">
+    <img alt="Release" src="https://img.shields.io/github/v/release/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM?display_name=tag" />
+  </a>
+  <a href="CHANGELOG.md">
+    <img alt="Versioning" src="https://img.shields.io/badge/semver-0.2.x-lightgrey" />
+  </a>
+  <a href="https://github.com/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM/releases/latest">
+    <img alt="Wheel" src="https://img.shields.io/badge/wheel-available-success" />
+  </a>
+</p>
+**CRoM (Context Rot Mitigation)-EfficientLLM** is a Python toolkit designed to optimize the context provided to Large Language Models (LLMs). It provides a suite of tools to intelligently select, re-rank, and manage text chunks to fit within a model\'s context budget while maximizing relevance and minimizing performance drift.
+This project is ideal for developers building RAG (Retrieval-Augmented Generation) pipelines who need to make the most of limited context windows.
+## Key Features
+*   **Budget Packer:** Greedily packs the highest-scoring text chunks into a defined token budget using a stable sorting algorithm.
+*   **Hybrid Reranker:** Combines sparse (TF-IDF) and dense (Sentence-Transformers) retrieval scores for robust and high-quality reranking of documents.
+*   **Drift Estimator:** Monitors the semantic drift between sequential model responses using L2 or cosine distance with EWMA smoothing.
+*   **Observability:** Exposes Prometheus metrics for monitoring token savings and drift alerts in production.
+*   **Extensible Plugins:** Supports optional plugins for advanced reranking (`FlashRank`), compression (`LLMLingua`), and drift analysis (`Evidently`).
+*   **Comprehensive Benchmarking:** Includes a CLI for end-to-end pipeline evaluation, budget sweeps, and quality-vs-optimal analysis.
+## Installation
+Install the package directly from source using pip. For development, it\'s recommended to install in editable mode with the `[dev]` extras.
+```bash
+# Clone the repository
+git clone https://github.com/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM.git
+cd CRoM-Context-Rot-Mitigation--EfficientLLM
+# Install in editable mode with development and plugin dependencies
+pip install -e .[dev,plugins]
+```
+## Quickstart
+### Demo
+Run a simple, self-contained demonstration of the core components:
+```bash
+# Run the demo script
+crom-demo demo
+```
+### CLI Benchmarking Examples
+The package includes a powerful `crom-bench` CLI for evaluation.
+```bash
+# Default E2E (Search→Rerank→Pack→Mock LLM)
+crom-bench e2e --budget 0.3
+# Optional: High-precision configuration with plugins
+crom-bench e2e --budget 0.3 \
+  --use-flashrank --flashrank-model ms-marco-TinyBERT-L-2-v2 \
+  --use-llmlingua --compress-ratio=0.6 \
+  --use-evidently
+```
+### Plotting
+If `matplotlib` is installed (`pip install -e .[dev]`), you can save benchmark plots directly:
+```bash
+# Save budget sweep result plots
+crom-bench sweep --save-plots
+# Save DP-curve plots
+crom-bench dp-curve --save-plots
+```
+## Release & Changelog
+This project follows semantic versioning. For detailed changes, see the [**CHANGELOG.md**](CHANGELOG.md).
+Releases are automated via GitHub Actions when a `v*` tag is pushed.
+## License
+This project is licensed under the Apache 2.0 License. See the [LICENSE](LICENSE) file for details.
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\benchmarks\\efficiency_eval.py`
+```python
+"""
+Efficiency Evaluation for CRoM-EfficientLLM
+- Synthetic workload to measure token savings, selection quality, and runtime.
+- No third-party deps beyond numpy/matplotlib (pandas optional for CSVs).
+Usage:
+  python benchmarks/efficiency_eval.py --budget 0.3 --n 5000 --seed 123 --plot --save
+"""
+from __future__ import annotations
+import argparse
+import math
+import time
+from dataclasses import dataclass
+from typing import List, Sequence, Tuple, Union
+import numpy as np
+try:
+    import pandas as pd  # optional
+except Exception:  # pragma: no cover
+    pd = None
+try:
+    import matplotlib.pyplot as plt  # optional
+except Exception:  # pragma: no cover
+    plt = None
+# --- Local packers (self-contained to avoid imports during quick eval) ---
+@dataclass(frozen=True)
+class Chunk:
+    text: str
+    score: float
+    tokens: int
+def _estimate_tokens(text: str) -> int:
+    return max(1, len(text) // 4)
+def _coerce_chunk(obj: Union[Chunk, dict], idx: int) -> Chunk:
+    if isinstance(obj, Chunk):
+        return obj
+    if not isinstance(obj, dict):
+        raise TypeError(f"Chunk #{idx} must be Chunk or dict, got {type(obj)}")
+    text = str(obj.get("text", ""))
+    if not text:
+        raise ValueError(f"Chunk #{idx} has empty text")
+    score = float(obj.get("score", 0.0))
+    tokens = int(obj["tokens"]) if "tokens" in obj else _estimate_tokens(text)
+    if tokens <= 0:
+        raise ValueError(f"Chunk #{idx} has non-positive tokens: {tokens}")
+    return Chunk(text=text, score=score, tokens=tokens)
+def budget_pack(text_chunks: Sequence[Union[Chunk, dict]], budget: int = 1000) -> List[Chunk]:
+    if budget <= 0:
+        raise ValueError("budget must be > 0")
+    coerced: List[Chunk] = [_coerce_chunk(c, i) for i, c in enumerate(text_chunks)]
+    indexed = list(enumerate(coerced))
+    indexed.sort(key=lambda it: (-it[1].score, it[1].tokens, it[0]))
+    selected: List[Chunk] = []
+    total = 0
+    for _, ch in indexed:
+        if total + ch.tokens <= budget:
+            selected.append(ch)
+            total += ch.tokens
+    return selected
+def pack_fcfs(text_chunks: Sequence[Union[Chunk, dict]], budget: int) -> List[Chunk]:
+    sel, total = [], 0
+    for i, obj in enumerate(text_chunks):
+        ch = _coerce_chunk(obj, i)
+        if total + ch.tokens <= budget:
+            sel.append(ch)
+            total += ch.tokens
+    return sel
+def pack_random(text_chunks: Sequence[Union[Chunk, dict]], budget: int, seed: int = 0) -> List[Chunk]:
+    rng = np.random.default_rng(seed)
+    indices = np.arange(len(text_chunks))
+    rng.shuffle(indices)
+    sel, total = [], 0
+    for i in indices:
+        ch = _coerce_chunk(text_chunks[i], i)
+        if total + ch.tokens <= budget:
+            sel.append(ch)
+            total += ch.tokens
+    return sel
+# --- Data generation and metrics ---
+def make_synthetic_chunks(n=2000, seed=42, corr=0.6):
+    rng = np.random.default_rng(seed)
+    true_rel = rng.normal(0, 1, size=n)
+    noise = rng.normal(0, 1, size=n) * math.sqrt(1 - corr**2)
+    score = corr * true_rel + noise
+    tokens = np.clip(rng.lognormal(mean=4.0, sigma=0.6, size=n).astype(int), 5, 2000)
+    chunks = [Chunk(text=("x"*int(t*4)), score=float(s), tokens=int(t)) for s, t in zip(score, tokens)]
+    return chunks, true_rel
+def eval_once(n=5000, budget_ratio=0.3, seed=123, corr=0.6):
+    chunks, true_rel = make_synthetic_chunks(n=n, seed=seed, corr=corr)
+    total_tokens = sum(c.tokens for c in chunks)
+    budget = int(total_tokens * budget_ratio)
+    def run(name, fn):
+        t0 = time.perf_counter()
+        sel = fn(chunks, budget)
+        dt = time.perf_counter() - t0
+        idx_map = {id(c): i for i, c in enumerate(chunks)}
+        picked_idx = [idx_map[id(c)] for c in sel]
+        rel_sum = float(np.sum(true_rel[picked_idx])) if picked_idx else 0.0
+        sel_tokens = sum(c.tokens for c in sel)
+        return {
+            "name": name,
+            "time_ms": dt*1000,
+            "selected_chunks": len(sel),
+            "selected_tokens": sel_tokens,
+            "tokens_budget": budget,
+            "tokens_total_unpacked": total_tokens,
+            "tokens_saved": total_tokens - sel_tokens,
+            "save_ratio": (total_tokens - sel_tokens)/total_tokens,
+            "relevance_sum": rel_sum,
+        }
+    rows = [
+        run("budget_pack", budget_pack),
+        run("fcfs", pack_fcfs),
+        run("random", lambda ch, b: pack_random(ch, b, seed=seed)),
+    ]
+    return rows
+def quality_vs_optimal(n=200, budget_ratio=0.3, seed=123, corr=0.6):
+    chunks, true_rel = make_synthetic_chunks(n=n, seed=seed, corr=corr)
+    budget = int(sum(c.tokens for c in chunks) * budget_ratio)
+    values = np.maximum(true_rel, 0.0)
+    def optimal(chunks_sub, values, budget):
+        items = chunks_sub
+        vals = list(values)
+        B = budget
+        dp = [0.0]*(B+1)
+        keep = [[False]*(B+1) for _ in range(len(items))]
+        for i, it in enumerate(items):
+            wt = it.tokens
+            val = vals[i]
+            for b in range(B, wt-1, -1):
+                alt = dp[b - wt] + val
+                if alt > dp[b]:
+                    dp[b] = alt
+                    keep[i][b] = True
+        b = B
+        picked_idx = []
+        for i in range(len(items)-1, -1, -1):
+            if keep[i][b]:
+                picked_idx.append(i)
+                b -= items[i].tokens
+        picked_idx.reverse()
+        rel_sum = float(np.sum([values[i] for i in picked_idx])) if picked_idx else 0.0
+        total_tokens = sum(items[i].tokens for i in picked_idx)
+        return picked_idx, rel_sum, total_tokens
+    opt_idx, opt_rel, opt_tokens = optimal(chunks, values, budget)
+    # selections
+    idx_map = {id(c): i for i, c in enumerate(chunks)}
+    def rel_of(selection):
+        pid = [idx_map[id(c)] for c in selection]
+        return float(np.sum(values[pid])) if pid else 0.0
+    sel_bp = budget_pack(chunks, budget)
+    sel_fc = pack_fcfs(chunks, budget)
+    sel_rd = pack_random(chunks, budget, seed=seed)
+    rows = [
+        {"name":"optimal_true_rel", "relevance_sum": opt_rel, "selected_tokens": opt_tokens, "selected_chunks": len(opt_idx)},
+        {"name":"budget_pack_small", "relevance_sum": rel_of(sel_bp), "selected_tokens": sum(c.tokens for c in sel_bp), "selected_chunks": len(sel_bp)},
+        {"name":"fcfs_small", "relevance_sum": rel_of(sel_fc), "selected_tokens": sum(c.tokens for c in sel_fc), "selected_chunks": len(sel_fc)},
+        {"name":"random_small", "relevance_sum": rel_of(sel_rd), "selected_tokens": sum(c.tokens for c in sel_rd), "selected_chunks": len(sel_rd)},
+    ]
+    return rows
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--n", type=int, default=5000)
+    ap.add_argument("--budget", type=float, default=0.3)
+    ap.add_argument("--seed", type=int, default=123)
+    ap.add_argument("--corr", type=float, default=0.6)
+    ap.add_argument("--plot", action="store_true")
+    ap.add_argument("--save", action="store_true")
+    args = ap.parse_args()
+    rows = eval_once(n=args.n, budget_ratio=args.budget, seed=args.seed, corr=args.corr)
+    rows_q = quality_vs_optimal(n=min(200, args.n), budget_ratio=args.budget, seed=args.seed, corr=args.corr)
+    print("\n=== Efficiency (n={}, budget={{:.0%}}) ===".format(args.n, args.budget))
+    for r in rows:
+        print("{name:12s} time={{time_ms:7.2f}}ms  save_ratio={{save_ratio:6.3f}}  tokens_saved={{tokens_saved:8d}}  rel_sum={{relevance_sum:8.3f}}".format(**r))
+    print("\n=== Quality vs Optimal (subset) ===")
+    for r in rows_q:
+        print("{name:18s} rel_sum={{relevance_sum:8.3f}}  tokens={{selected_tokens:5d}} chunks={{selected_chunks:4d}}".format(**r))
+    if pd is not None and args.save:
+        pd.DataFrame(rows).to_csv("benchmarks/results_efficiency.csv", index=False)
+        pd.DataFrame(rows_q).to_csv("benchmarks/results_quality.csv", index=False)
+        print("Saved CSVs to benchmarks حضرتك.")
+    if plt is not None and args.plot:
+        # single-figure plots, no explicit colors
+        x = [r["name"] for r in rows]
+        y = [r["time_ms"] for r in rows]
+        import matplotlib.pyplot as plt
+        plt.figure()
+        plt.bar(x, y)
+        plt.title("Packer Runtime (ms)")
+        plt.xlabel("method")
+        plt.ylabel("ms")
+        plt.show()
+if __name__ == "__main__":
+    main()
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\benchmarks\\longbench_eval.py`
+```python
+"""
+Benchmark script: LongBench-like evaluation.
+Simulates context packing efficiency.
+"""
+from crom_efficientllm.budget_packer.packer import budget_pack
+def evaluate():
+    chunks = [{"text": f"chunk {i}", "score": i % 5, "tokens": 100} for i in range(20)]
+    packed = budget_pack(chunks, budget=500)
+    print("Selected:", len(packed), "chunks")
+if __name__ == "__main__":
+    evaluate()
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\benchmarks\\sample_results.json`
+```json
+{}
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\crom 1.0.1수정 업데이트 상세보고서.md`
+```markdown
+# CRoM-EfficientLLM v1.0.1 업데이트 상세 보고서
+**문서 목적:** 소셜 미디어 (LinkedIn, Twitter, Medium) 포스팅을 위한 마케팅 AI의 정보 소스 제공
+**작성일:** 2025-09-06
+**작성자:** CLI ↯C01∞ | Σψ∴
+---
+## 1. 개요 (Overview)
+- **프로젝트명:** CRoM-EfficientLLM (Context Rot Mitigation for Efficient LLMs)
+- **이전 버전:** 0.2.1
+- **신규 버전:** 1.0.1
+**핵심 요약:**
+이번 v1.0.1 업데이트는 CRoM-EfficientLLM 프로젝트의 **첫 번째 기능 구현(First Functional Implementation)**을 의미합니다. 기존의 아이디어와 뼈대만 있던 상태에서, 실제 동작하는 핵심 로직을 모두 구현하여 **작동 가능한 프로토타입(Working Prototype)**으로 전환했습니다. 이제 사용자들은 RAG 파이프라인의 컨텍스트를 효율적으로 관리하고 최적화하는 핵심 기능들을 직접 테스트하고 활용할 수 있습니다.
+---
+## 2. 배경 (Background)
+기존 v0.2.1은 `pyproject.toml`, `README.md` 등 프로젝트의 방향성과 구조만 정의된 **설계 단계의 스캐폴드(Scaffold)**였습니다. 실제 핵심 로직을 담고 있는 Python 소스 코드가 부재하여 아이디어를 실제로 검증할 수 없었습니다.
+이번 업데이트의 목표는 이 설계도에 따라, **처음부터(from scratch) 핵심 기능들을 모두 구현**하여 프로젝트에 생명을 불어넣고, 실제 사용 가능한 상태로 만드는 것이었습니다.
+---
+## 3. 상세 변경 내역 (Detailed Changes)
+이번 업데이트를 통해 4개의 핵심 모듈이 `src/crom_efficientllm/` 디렉토리 내에 새롭게 구현되었습니다.
+### 가. `budget_packer.py` - 지능형 컨텍스트 패킹 엔진
+- **기능:** LLM에 전달할 컨텍스트(청크)를 주어진 토큰 예산 내에서 가장 효율적으로 구성합니다.
+- **세부 사항:**
+    - 단순히 텍스트를 자르는 것이 아니라, **점수/토큰 비율**을 기준으로 가장 중요한 정보를 우선적으로 선택합니다.
+    - 패킹 후 **압축률, 절약된 토큰 수, 예산 효율성** 등 상세한 통계를 제공하여, 컨텍스트 관리 전략의 효과를 정량적으로 분석할 수 있는 기반을 마련했습니다.
+### 나. `cross_encoder.py` - 안정성 강화 Cross-Encoder 관리자
+- **기능:** RAG 파이프라인의 핵심인 Cross-Encoder 모델을 안정적으로 관리하고 오류 발생 시 시스템 전체의 다운을 방지합니다.
+- **세부 사항:**
+    - `sentence-transformers` 라이브러리가 없거나 모델 로딩에 실패하는 등 다양한 **오류 상황을 자동으로 감지하고 우아하게 처리(Graceful Fallback)**합니다.
+    - 시스템이 멈추는 대신, "비활성화", "오류" 등의 명확한 상태를 API 응답에 포함시켜 **시스템의 안정성과 예측 가능성**을 크게 높였습니다.
+### 다. `capsule_logger.py` - 투명성 확보를 위한 캡슐 로거
+- **기능:** 시스템의 모든 처리 과정을 **구조화된 로그(Structured Log)**로 기록하여 투명성과 감사 가능성을 제공합니다.
+- **세부 사항:**
+    - 모든 API 요청, 처리 통계, 시스템 상태를 **"설명 캡슐(Explain Capsule)"**이라는 JSONL 형식으로 영구 저장합니다.
+    - 이는 추후 시스템의 동작을 디버깅하거나, 성능 저하의 원인을 분석하고, AI의 판단 근거를 추적하는 데 필수적인 데이터가 됩니다.
+### 라. `server.py` - 핵심 기능 통합 API 서버
+- **기능:** 위에서 설명한 모든 모듈(패킹, 리랭킹, 로깅)을 하나로 묶어, 사용자가 쉽게 접근할 수 있는 **FastAPI 기반의 API 서버**를 제공합니다.
+- **세부 사항:**
+    - `/process` 엔드포인트를 통해 쿼리와 컨텍스트 데이터를 받아, 리랭킹부터 패킹, 로깅까지의 전 과정을 **하나의 트랜잭션으로 처리(Orchestration)**합니다.
+    - `/healthz` 엔드포인트를 통해 외부 모니터링 시스템이 서버의 상태를 쉽게 확인할 수 있도록 구현했습니다.
+---
+## 4. 버전 관리 및 문서화 (Versioning & Documentation)
+- **버전 업데이트:** 핵심 기능이 구현됨에 따라, 프로젝트의 버전을 `0.2.1`에서 **`1.0.1`**로 상향 조정하여 중요한 진전을 명시했습니다.
+- **변경 이력 관리:** `CHANGELOG.md` 파일에 상기된 모든 구현 내역을 상세히 기록하여, 사용자와 기여자가 프로젝트의 발전 과정을 쉽게 추적할 수 있도록 투명성을 확보했습니다.
+---
+## 5. 기대 효과 및 다음 단계 (Expected Impact & Next Steps)
+- **기대 효과:**
+    - CRoM-EfficientLLM은 더 이상 아이디어가 아닌, **실제 RAG 시스템에 적용하여 컨텍스트 관리 효율성을 테스트할 수 있는 실용적인 도구**로 발전했습니다.
+    - 개발자들은 LLM의 제한된 컨텍스트 창을 어떻게 하면 가장 효율적으로 사용할 수 있는지에 대한 **정량적인 데이터**를 얻을 수 있게 되었습니다.
+- **다음 단계:**
+    - `README.md`에 명시된 `crom-demo` 및 `crom-bench` CLI 기능 구현
+    - 사용자가 원하는 토크나이저(Tokenizer)를 선택할 수 있는 기능 추가
+    - 다양한 컨텍스트 관리 전략의 성능을 비교할 수 있는 벤치마크 시스템 고도화
+---
+**보고서 종료.**
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\dashboard\\grafana_dashboard.json`
+```json
+{}
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\dashboard\\prometheus_config.yml`
+```
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\docs\\architecture.md`
+```markdown
+# Architecture
+This document outlines the architecture of the CRoM-EfficientLLM project.
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\docs\\versioning.md`
+```markdown
+# Versioning & PyPI Guidance
+This document defines package naming, SemVer rules, and a future path to publish to PyPI.
+## 1) Package name
+- Distribution name (PyPI): `crom-efficientllm` (lowercase, hyphen-separated)
+- Import name (module): `crom_efficientllm` (PEP 8 underscore)
+> **Tip**: Keep both names consistent to avoid confusion in docs.
+### Check name availability on PyPI
+- Visit: https://pypi.org/project/crom-efficientllm/ (404 → available)
+- If taken, consider: `crom-efficient-llm`, `crom-llm-efficient`, `crom-ctx-pack`
+- Reserve on TestPyPI first: use `test.pypi.org` to validate metadata & upload
+## 2) Semantic Versioning (SemVer)
+We follow **MAJOR.MINOR.PATCH**.
+- **MAJOR**: Backward-incompatible API changes
+  - e.g., rename function signatures (`budget_pack`), move/rename modules, change return schemas
+- **MINOR**: Backward-compatible features
+  - new functions/flags (e.g., `pack_summary`, CLI subcommands), performance improvements
+- **PATCH**: Backward-compatible bug fixes
+  - logic corrections, docs/CI fixes, dependency pin updates without API changes
+### Pre-releases
+Use suffixes: `-a.1`, `-b.1`, `-rc.1` (alpha/beta/release-candidate)
+- Example: `0.3.0-rc.1`
+### Deprecation Policy
+- Mark deprecated APIs in `CHANGELOG.md` and docstrings
+- Provide at least **one MINOR release** with warnings before removal
+### Public API Surface
+We commit compatibility for:
+- `crom_efficientllm.budget_packer.packer`: `Chunk`, `budget_pack`, `pack_summary`
+- `crom_efficientllm.rerank_engine.rerank`: `hybrid_rerank`
+- `crom_efficientllm.drift_estimator.estimator`: `DriftEstimator`, `DriftMode`
+- CLI entrypoints: `crom-demo`, `crom-bench` and their documented flags
+## 3) Release Flow (GitHub → PyPI later)
+- Tag: `vX.Y.Z` → GitHub Actions builds & creates a Release (artifacts attached)
+- Keep `CHANGELOG.md` updated per release
+- After API stabilizes, enable **PyPI publish** using a separate workflow with `PYPI_API_TOKEN` secret
+### (Future) PyPI publishing steps
+1. Create a PyPI account & project
+2. Add `PYPI_API_TOKEN` to repo `Settings → Secrets and variables → Actions`
+3. Add `release-pypi.yml` workflow to upload on tag
+4. Verify install: `pip install crom-efficientllm` and import `crom_efficientllm`
+---
+_Last updated: 2025-09-02_
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\examples\\corpus\\sample_docs.jsonl`
+```json
+{"id": 1, "text": "AI ethics and governance frameworks for responsible AI."}
+{"id": 2, "text": "Techniques for detecting model drift in production systems."}
+{"id": 3, "text": "A recipe for sourdough bread and fermentation tips."}
+{"id": 4, "text": "Hybrid search: combining sparse and dense retrieval methods."}
+{"id": 5, "text": "Token budgets and prompt compression strategies for LLMs."}
+{"id": 6, "text": "Monitoring with Prometheus and building Grafana dashboards."}
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\examples\\corpus\\sample_queries.jsonl`
+```json
+{"query": "how to detect drift in ai models"}
+{"query": "ways to reduce llm token usage"}
+{"query": "observability stack prometheus grafana"}
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\pyproject.toml`
+```toml
+[build-system]
+requires = ["setuptools>=68", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "crom-efficientllm"
+version = "1.0.1"
+description = "CRoM (Context Rot Mitigation)-EfficientLLM: Budget packing, hybrid rerank, and drift estimation with observability"
+readme = "README.md"
+requires-python = ">=3.9"
+license = { text = "Apache-2.0" }
+authors = [ { name = "Your Name" } ]
+dependencies = [
+  "numpy>=1.24,<3",
+  "scikit-learn>=1.3,<2",
+  "transformers>=4.41,<5",
+  "sentence-transformers>=2.2,<3",
+  "flask>=3,<4",
+  "prometheus-client>=0.20,<1"
+]
+[project.optional-dependencies]
+dev = [
+  "pytest>=7",
+  "ruff>=0.4",
+  "black>=24.4",
+  "pre-commit>=3.6",
+  "matplotlib>=3.8,<4"
+]
+plugins = [
+  "flashrank>=0.2; python_version>='3.9'",
+  "llmlingua>=0.2; python_version>='3.9'",
+  "evidently>=0.4; python_version>='3.9'"
+]
+haystack = [
+  "farm-haystack[faiss,inference]>=1.26; python_version>='3.9'"
+]
+[project.urls]
+Homepage = "https://github.com/Flamehaven/CRoM-Context-Rot-Mitigation--EfficientLLM"
+[project.scripts]
+"crom-demo" = "crom_efficientllm.demo:main"
+"crom-bench" = "crom_efficientllm.cli:main"
+[tool.setuptools]
+package-dir = {"" = "src"}
+packages = { find = { where = ["src"] } }
+[tool.pytest.ini_options]
+addopts = "-q"
+[tool.black]
+line-length = 100
+[tool.ruff]
+target-version = "py39"
+[tool.ruff.lint]
+select = ["E","F","I","UP","B","C4","SIM","PL","PERF","RUF","ANN"]
+ignore = ["ANN101","ANN102"]
+[tool.ruff.lint.per-file-ignores]
+"tests/*" = ["S101","ANN","PLR2004"]
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\release_notes.md`
+```markdown
+# Release v0.2.1
+## [0.2.1] - 2025-09-02
+### Added
+- CLI `--save-plots` option for `sweep` and `dp-curve`; saves PNG charts to `benchmarks/out/` (or `--out-dir`).
+- README Quick Examples mention of plotting flag.
+- This CHANGELOG.
+### Changed
+- Dev tooling: recommend `matplotlib` via dev extra for plotting.
+— generated from [CHANGELOG.md](CHANGELOG.md)
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\requirements.txt`
+```
+numpy>=1.24,<3
+scikit-learn>=1.3,<2
+transformers>=4.41,<5
+sentence-transformers>=2.2,<3
+flask>=3,<4
+prometheus-client>=0.20,<1
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\scripts\\gen_release_notes.py`
+```python
+#!/usr/bin/env python3
+from __future__ import annotations
+import os
+import re
+import sys
+from pathlib import Path
+ROOT = Path(__file__).resolve().parents[1]
+CHANGELOG = ROOT / "CHANGELOG.md"
+OUT = ROOT / "release_notes.md"
+def main(tag: str) -> None:
+    version = tag.lstrip("v").strip()
+    if not CHANGELOG.exists():
+        OUT.write_text(f"# Release {tag}\n\n(CHANGELOG.md not found)
+", encoding="utf-8")
+        return
+    text = CHANGELOG.read_text(encoding="utf-8")
+    pat = re.compile(rf"^##\s*[[^{re.escape(version)}]]?[^\n]*$", re.MULTILINE)
+    m = pat.search(text)
+    if not m:
+        OUT.write_text(
+            f"# Release {tag}\n\nSection for {version} not found in CHANGELOG.\n\n" + text,
+            encoding="utf-8",
+        )
+        return
+    start = m.end()
+    m2 = re.search(r"^##\s+", text[start:], re.MULTILINE)
+    end = start + (m2.start() if m2 else len(text) - start)
+    section = text[m.start():end].strip()
+    body = f"# Release {tag}\n\n{section}\n\n— generated from [CHANGELOG.md](CHANGELOG.md)"
+    OUT.write_text(body, encoding="utf-8")
+if __name__ == "__main__":
+    tag = sys.argv[1] if len(sys.argv) > 1 else os.environ.get("GITHUB_REF_NAME", "")
+    if not tag:
+        print("Usage: gen_release_notes.py vX.Y.Z", file=sys.stderr)
+        sys.exit(2)
+    main(tag)
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\scripts\\release.sh`
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+TAG=${1:-}
+if [[ -z "$TAG" ]]; then
+  echo "Usage: scripts/release.sh vX.Y.Z"; exit 1
+fi
+# sanity checks
+if [[ -n $(git status --porcelain) ]]; then
+  echo "❌ Working tree not clean"; exit 1
+fi
+# ensure deps
+python -m pip install -e .[dev]
+pre-commit run --all-files
+pytest -q
+# generate release notes preview from CHANGELOG
+python scripts/gen_release_notes.py "$TAG"
+if [[ -f release_notes.md ]]; then
+  echo "--- release_notes.md (preview top 60 lines) ---"
+  head -n 60 release_notes.md || true
+  echo "--- end preview ---"
+else
+  echo "⚠️ release_notes.md not generated; will fall back to default notes in GH release"
+fi
+# tag & push
+git tag -a "$TAG" -m "Release $TAG"
+git push origin "$TAG"
+echo "✅ Pushed tag $TAG. GitHub Actions will create the Release automatically."
+echo "➡️  Watch: https://github.com/Flamehaven/CRoM-EfficientLLM/actions"
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\__init__.py`
+```python
+"""Public API for CRoM-EfficientLLM."""
+from .budget_packer.packer import Chunk, budget_pack, pack_summary
+from .rerank_engine.rerank import hybrid_rerank
+from .drift_estimator.estimator import DriftEstimator, DriftMode
+__all__ = [
+    "Chunk",
+    "budget_pack",
+    "pack_summary",
+    "hybrid_rerank",
+    "DriftEstimator",
+    "DriftMode",
+]
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\budget_packer.py`
+```python
+from typing import List, Dict
+import logging
+def enhanced_greedy_pack(chunks: List[Dict], budget: int,
+                        score_key: str = "score") -> tuple[List[Dict], Dict]:
+    """
+    기존 greedy_pack 함수를 확장하여 상세 통계 반환
+    Returns:
+        tuple: (packed_chunks, stats_dict)
+    """
+    if not chunks:
+        return [], {
+            "selected_count": 0,
+            "packed_count": 0,
+            "selected_tokens": 0,
+            "packed_tokens": 0,
+            "compression_ratio": 0.0,
+            "token_savings": 0,
+            "efficiency": 0.0
+        }
+    # 토큰 수 미리 계산
+    for chunk in chunks:
+        if "token_count" not in chunk:
+            chunk["token_count"] = max(1, len(chunk.get("text", "")) // 4)
+    # 효율성 기준 정렬 (score/token 비율)
+    sorted_chunks = sorted(
+        chunks,
+        key=lambda x: x.get(score_key, 0) / x["token_count"],
+        reverse=True
+    )
+    # 그리디 패킹
+    packed_chunks = []
+    used_tokens = 0
+    for chunk in sorted_chunks:
+        if used_tokens + chunk["token_count"] <= budget:
+            packed_chunks.append(chunk)
+            used_tokens += chunk["token_count"]
+    # 상세 통계 계산
+    total_selected_tokens = sum(chunk["token_count"] for chunk in chunks)
+    stats = {
+        "selected_count": len(chunks),
+        "packed_count": len(packed_chunks),
+        "selected_tokens": total_selected_tokens,
+        "packed_tokens": used_tokens,
+        "compression_ratio": len(packed_chunks) / len(chunks) if chunks else 0.0,
+        "token_savings": total_selected_tokens - used_tokens,
+        "efficiency": used_tokens / budget if budget > 0 else 0.0
+    }
+    # 📊 로깅 추가 (기존 코드에 없던 통계 가시성)
+    logging.info(f"Packing completed: {stats['packed_count']}/{stats['selected_count']} chunks, "
+                f"tokens: {stats['packed_tokens']}/{stats['selected_tokens']} "
+                f"(efficiency: {stats['efficiency']:.1%})")
+    return packed_chunks, stats
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\capsule_logger.py`
+```python
+import json
+from pathlib import Path
+from datetime import datetime
+from typing import Union, Dict
+import logging
+class ExplainCapsuleLogger:
+    """스키마 기반 설명 캡슐 저장 시스템"""
+    def __init__(self, log_directory: str = "artifacts/logs"):
+        self.log_dir = Path(log_directory)
+        self.log_dir.mkdir(parents=True, exist_ok=True)
+        # 로그 파일 경로들
+        self.capsules_file = self.log_dir / "explain_capsules.jsonl"
+        self.metrics_file = self.log_dir / "processing_metrics.jsonl"
+        self.errors_file = self.log_dir / "error_log.jsonl"
+        logging.info(f"ExplainCapsule Logger initialized: {self.log_dir}")
+    def create_explain_capsule(self, query: str, response_data: Dict,
+                              processing_stats: Dict,
+                              cross_encoder_status: str) -> Dict:
+        """스키마 준수 설명 캡슐 생성"""
+        capsule = {
+            # 🔖 메타데이터 (필수)
+            "timestamp": datetime.now().isoformat(),
+            "version": "1.0",
+            "processor": "CRoM-Enhanced",
+            # 📝 쿼리 정보
+            "query": {
+                "text": query,
+                "length": len(query),
+                "token_estimate": len(query) // 4
+            },
+            # 📊 처리 통계 (패치 1에서 확장된 정보)
+            "processing_stats": {
+                **processing_stats,
+                "cross_encoder_status": cross_encoder_status
+            },
+            # 🔧 시스템 상태
+            "system_state": {
+                "cross_encoder_available": cross_encoder_status not in ["disabled", "unavailable"]
+            },
+            # 📦 원본 및 결과 청크
+            "chunks": {
+                "packed": response_data.get("chunks", [])
+            }
+        }
+        return capsule
+    def log_capsule(self, capsule: Dict):
+        """설명 캡슐을 .jsonl 파일에 기록"""
+        try:
+            with open(self.capsules_file, "a", encoding="utf-8") as f:
+                f.write(json.dumps(capsule, ensure_ascii=False) + "\n")
+        except Exception as e:
+            logging.error(f"Failed to log explain capsule: {e}")
+    def log_error(self, error_details: Dict):
+        """오류 정보를 .jsonl 파일에 기록"""
+        try:
+            error_details["timestamp"] = datetime.now().isoformat()
+            with open(self.errors_file, "a", encoding="utf-8") as f:
+                f.write(json.dumps(error_details, ensure_ascii=False) + "\n")
+        except Exception as e:
+            logging.error(f"Failed to log error: {e}")
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\cli.py`
+```python
+from __future__ import annotations
+import argparse
+import json
+import os
+import time
+from dataclasses import dataclass
+from typing import List, Dict, Sequence
+import numpy as np
+from sklearn.feature_extraction.text import TfidfVectorizer
+from sklearn.metrics.pairwise import cosine_similarity
+from crom_efficientllm.budget_packer.packer import budget_pack, Chunk
+from crom_efficientllm.rerank_engine.rerank import hybrid_rerank
+try:
+    from sentence_transformers import SentenceTransformer
+except Exception:  # pragma: no cover
+    SentenceTransformer = None  # type: ignore
+# Optional plugins are imported lazily when flags are set
+@dataclass
+class Doc:
+    id: str
+    text: str
+def load_jsonl(path: str) -> List[Dict]:
+    with open(path, "r", encoding="utf-8") as f:
+        return [json.loads(line) for line in f]
+def build_corpus(path: str) -> List[Doc]:
+    rows = load_jsonl(path)
+    return [Doc(id=str(r.get("id", i)), text=str(r["text"])) for i, r in enumerate(rows)]
+def sparse_retrieval(query: str, corpus: Sequence[Doc], k: int = 100) -> List[Dict]:
+    texts = [d.text for d in corpus]
+    vect = TfidfVectorizer(ngram_range=(1, 2)).fit(texts)
+    D = vect.transform(texts)
+    Q = vect.transform([query])
+    sims = cosine_similarity(Q, D).ravel()
+    order = np.argsort(-sims)[:k]
+    return [{"id": corpus[i].id, "text": corpus[i].text, "score_sparse": float(sims[i])} for i in order]
+def dense_embed_model(name: str):
+    if SentenceTransformer is None:
+        raise RuntimeError("sentence-transformers not installed. Install with `pip install -e .`.")
+    return SentenceTransformer(name)
+def _apply_flashrank(query: str, docs: List[Dict], model_name: str) -> List[Dict]:
+    try:
+        from crom_efficientllm.plugins.flashrank_reranker import flashrank_rerank
+    except Exception as e:  # pragma: no cover
+        raise RuntimeError("FlashRank plugin not available. Install extras: pip install .[plugins]") from e
+    ranked = flashrank_rerank(query, docs, model_name=model_name)
+    # Normalize plugin score to 0..1 and put into score_final
+    scores = np.array([d.get("score_flashrank", 0.0) for d in ranked], dtype=np.float32)
+    if scores.size and float(scores.max() - scores.min()) > 1e-12:
+        s = (scores - scores.min()) / (scores.max() - scores.min())
+    else:
+        s = np.zeros_like(scores)
+    for i, d in enumerate(ranked):
+        d["score_final"] = float(s[i])
+    return ranked
+def _apply_llmlingua(text: str, ratio: float) -> str:
+    try:
+        from crom_efficientllm.plugins.llmlingua_compressor import compress_prompt
+    except Exception as e:  # pragma: no cover
+        raise RuntimeError("LLMLingua plugin not available. Install extras: pip install .[plugins]") from e
+    return compress_prompt(text, target_ratio=ratio)
+def _save_evidently_report(all_embs: List[List[float]], out_html: str) -> None:
+    try:
+        from crom_efficientllm.plugins.evidently_drift import drift_report
+    except Exception as e:  # pragma: no cover
+        raise RuntimeError("Evidently plugin not available. Install extras: pip install .[plugins]") from e
+    n = len(all_embs)
+    if n < 4:
+        return
+    ref = all_embs[: n // 2]
+    cur = all_embs[n // 2 :]
+    rep = drift_report(ref, cur)
+    rep.save_html(out_html)
+def mock_llm_generate(prompt: str) -> str:
+    time.sleep(0.005)  # simulate small latency
+    return "[MOCK] " + prompt[:160]
+def e2e(args: argparse.Namespace) -> None:
+    corpus = build_corpus(args.corpus)
+    queries = [r["query"] for r in load_jsonl(args.queries)]
+    embed = dense_embed_model(args.model)
+    all_embs: List[List[float]] = []
+    t0 = time.perf_counter()
+    all_rows = []
+    for q in queries:
+        t_s = time.perf_counter()
+        cands = sparse_retrieval(q, corpus, k=args.k)
+        t_sparse = (time.perf_counter() - t_s) * 1000
+        t_r = time.perf_counter()
+        if args.use_flashrank:
+            reranked = _apply_flashrank(q, cands, args.flashrank_model)
+        else:
+            reranked = hybrid_rerank(q, cands, embed, alpha=args.alpha)
+        t_rerank = (time.perf_counter() - t_r) * 1000
+        # token heuristic + budget pack
+        chunks = [
+            Chunk(text=d["text"], score=d.get("score_final", d.get("score_sparse", 0.0)), tokens=max(1, len(d["text"]) // 4))
+            for d in reranked
+        ]
+        budget_tokens = int(sum(c.tokens for c in chunks) * args.budget)
+        t_p = time.perf_counter()
+        packed = budget_pack(chunks, budget=budget_tokens)
+        t_pack = (time.perf_counter() - t_p) * 1000
+        prompt = "\n\n".join(c.text for c in packed) + f"\n\nQ: {q}\nA:"
+        if args.use_llmlingua:
+            prompt = _apply_llmlingua(prompt, ratio=args.compress_ratio)
+        # collect embeddings for drift snapshot (mean-pooled)
+        with np.errstate(all="ignore"):
+            if len(packed) > 0:
+                doc_embs = embed.encode([c.text for c in packed], convert_to_numpy=True)
+                vec = np.mean(doc_embs, axis=0).tolist()
+                all_embs.append(vec)
+        t_l = time.perf_counter()
+        _ = mock_llm_generate(prompt)
+        t_llm = (time.perf_counter() - t_l) * 1000
+        total = (time.perf_counter() - t_s) * 1000
+        all_rows.append({
+            "query": q,
+            "sparse_ms": t_sparse,
+            "rerank_ms": t_rerank,
+            "pack_ms": t_pack,
+            "llm_ms": t_llm,
+            "total_ms": total,
+            "packed_tokens": sum(c.tokens for c in packed),
+            "orig_tokens": sum(c.tokens for c in chunks),
+            "save_ratio": 1 - (sum(c.tokens for c in packed) / max(1, sum(c.tokens for c in chunks))),
+            "used_flashrank": bool(args.use_flashrank),
+            "used_llmlingua": bool(args.use_llmlingua),
+        })
+    elapsed = (time.perf_counter() - t0) * 1000
+    os.makedirs(args.out_dir, exist_ok=True)
+    out_path = os.path.join(args.out_dir, "e2e_results.jsonl")
+    with open(out_path, "w", encoding="utf-8") as f:
+        for r in all_rows:
+            f.write(json.dumps(r, ensure_ascii=False) + "\n")
+    print(f"saved results -> {out_path} ({len(all_rows)} queries) ; elapsed={elapsed:.2f}ms")
+    if args.use_evidently and all_embs:
+        html_path = os.path.join(args.out_dir, "evidently_report.html")
+        _save_evidently_report(all_embs, html_path)
+        print(f"evidently report -> {html_path}")
+def budget_sweep(args: argparse.Namespace) -> None:
+    import itertools
+    corpus = build_corpus(args.corpus)
+    queries = [r["query"] for r in load_jsonl(args.queries)][: args.max_q]
+    embed = dense_embed_model(args.model)
+    budgets = [b / 100.0 for b in range(args.b_min, args.b_max + 1, args.b_step)]
+    rows = []
+    for q, b in itertools.product(queries, budgets):
+        cands = sparse_retrieval(q, corpus, k=args.k)
+        reranked = hybrid_rerank(q, cands, embed, alpha=args.alpha)
+        chunks = [Chunk(text=d["text"], score=d["score_final"], tokens=max(1, len(d["text"]) // 4)) for d in reranked]
+        budget_tokens = int(sum(c.tokens for c in chunks) * b)
+        packed = budget_pack(chunks, budget=budget_tokens)
+        rows.append({
+            "query": q,
+            "budget": b,
+            "packed_tokens": sum(c.tokens for c in packed),
+            "orig_tokens": sum(c.tokens for c in chunks),
+            "save_ratio": 1 - (sum(c.tokens for c in packed) / max(1, sum(c.tokens for c in chunks))),
+            "avg_score": float(np.mean([c.score for c in packed])) if packed else 0.0,
+        })
+    os.makedirs(args.out_dir, exist_ok=True)
+    out_path = os.path.join(args.out_dir, "budget_sweep.jsonl")
+    with open(out_path, "w", encoding="utf-8") as f:
+        for r in rows:
+            f.write(json.dumps(r, ensure_ascii=False) + "\n")
+    print(f"saved results -> {out_path} ; points={len(rows)}")
+    if args.save_plots:
+        try:
+            import matplotlib.pyplot as plt  # noqa: F401
+            import matplotlib.pyplot as _plt
+        except Exception:
+            print("[warn] matplotlib not installed; install dev extras: pip install -e .[dev]")
+        else:
+            # Aggregate by budget
+            import collections
+            agg = collections.defaultdict(list)
+            for r in rows:
+                agg[r["budget"]].append(r)
+            budgets_sorted = sorted(agg.keys())
+            avg_save = [float(np.mean([x["save_ratio"] for x in agg[b]])) for b in budgets_sorted]
+            avg_score = [float(np.mean([x["avg_score"] for x in agg[b]])) for b in budgets_sorted]
+            _plt.figure()
+            _plt.plot([b * 100 for b in budgets_sorted], [s * 100 for s in avg_save], marker="o")
+            _plt.xlabel("Budget (%)")
+            _plt.ylabel("Avg Save Ratio (%)")
+            _plt.title("Budget Sweep: Save Ratio vs Budget")
+            _plt.grid(True)
+            _plt.tight_layout()
+            _plt.savefig(os.path.join(args.out_dir, "budget_sweep.png")),
+            _plt.figure()
+            _plt.plot([s * 100 for s in avg_save], avg_score, marker="o")
+            _plt.xlabel("Save Ratio (%)")
+            _plt.ylabel("Avg Score (packed)")
+            _plt.title("Pareto: Quality vs Savings")
+            _plt.grid(True)
+            _plt.tight_layout()
+            _plt.savefig(os.path.join(args.out_dir, "budget_pareto.png")),
+            print("plots ->", os.path.join(args.out_dir, "budget_sweep.png"), ",", os.path.join(args.out_dir, "budget_pareto.png"))
+def scaling(args: argparse.Namespace) -> None:
+    def make_synth(n: int, seed: int = 42):
+        rng = np.random.default_rng(seed)
+        tokens = np.clip(rng.lognormal(4.0, 0.6, n).astype(int), 5, 2000)
+        score = rng.normal(0, 1, n)
+        return [Chunk(text="x" * int(t * 4), score=float(s), tokens=int(t)) for s, t in zip(score, tokens)]
+    for n in [1000, 5000, 10000, 20000, 50000, 100000]:
+        if n > args.n_max:
+            break
+        chunks = make_synth(n)
+        budget = int(sum(c.tokens for c in chunks) * args.budget)
+        t0 = time.perf_counter()
+        _ = budget_pack(chunks, budget)
+        ms = (time.perf_counter() - t0) * 1000
+        print(f"n={n:6d}  budget={args.budget:.0%}  time={ms:8.2f} ms")
+def dp_curve(args: argparse.Namespace) -> None:
+    def make_synth(n: int, seed: int = 123, corr: float = 0.6):
+        rng = np.random.default_rng(seed)
+        true_rel = rng.normal(0, 1, n)
+        noise = rng.normal(0, 1, n) * np.sqrt(1 - corr**2)
+        score = corr * true_rel + noise
+        tokens = np.clip(rng.lognormal(4.0, 0.6, n).astype(int), 5, 2000)
+        chunks = [Chunk(text="x" * int(t * 4), score=float(s), tokens=int(t)) for s, t in zip(score, tokens)]
+        return chunks, true_rel
+    def optimal(chunks: Sequence[Chunk], values: np.ndarray, budget: int) -> float:
+        B = budget
+        dp = np.zeros(B + 1, dtype=np.float32)
+        for i, ch in enumerate(chunks):
+            wt = ch.tokens
+            val = max(0.0, float(values[i]))
+            for b in range(B, wt - 1, -1):
+                dp[b] = max(dp[b], dp[b - wt] + val)
+        return float(dp[B])
+    chunks, true_rel = make_synth(args.n)
+    total = sum(c.tokens for c in chunks)
+    budgets = [int(total * b / 100.0) for b in range(args.b_min, args.b_max + 1, args.b_step)]
+    out_rows = []
+    for B in budgets:
+        sel = budget_pack(chunks, B)
+        idx_map = {id(c): i for i, c in enumerate(chunks)}
+        rel_bp = float(np.sum([max(0.0, true_rel[idx_map[id(c)]]) for c in sel]))
+        rel_opt = optimal(chunks[: args.n_opt], true_rel[: args.n_opt], min(B, sum(c.tokens for c in chunks[: args.n_opt])))
+        pct = rel_bp / max(rel_opt, 1e-9)
+        out_rows.append({"budget": B, "pct": pct, "rel_bp": rel_bp, "rel_opt": rel_opt})
+        print(f"budget={B:8d}  rel_bp={rel_bp:8.3f}  rel_opt≈{rel_opt:8.3f}  pct≈{pct*100:5.1f}% (subset n={args.n_opt})")
+    if args.save_plots:
+        try:
+            import matplotlib.pyplot as plt  # noqa: F401
+            import matplotlib.pyplot as _plt
+        except Exception:
+            print("[warn] matplotlib not installed; install dev extras: pip install -e .[dev]")
+        else:
+            _plt.figure()
+            xs = [r["budget"] * 100.0 / total for r in out_rows]
+            ys = [r["pct"] * 100 for r in out_rows]
+            _plt.plot(xs, ys, marker="o")
+            _plt.xlabel("Budget (%)")
+            _plt.ylabel("% of optimal (subset)")
+            _plt.title("DP Curve: Greedy vs Optimal")
+            _plt.grid(True)
+            _plt.tight_layout()
+            os.makedirs(args.out_dir, exist_ok=True)
+            _plt.savefig(os.path.join(args.out_dir, "dp_curve.png")),
+            print("plot ->", os.path.join(args.out_dir, "dp_curve.png")),
+def compare_haystack(args: argparse.Namespace) -> None:
+    try:
+        from haystack.nodes import BM25Retriever, SentenceTransformersRetriever
+        from haystack.document_stores import InMemoryDocumentStore
+    except Exception as e:  # pragma: no cover
+        raise RuntimeError("Install extras: pip install .[haystack]") from e
+    corpus = build_corpus(args.corpus)
+    docs = [{"content": d.text, "meta": {"id": d.id}} for d in corpus]
+    store = InMemoryDocumentStore(use_bm25=True)
+    store.write_documents(docs)
+    bm25 = BM25Retriever(document_store=store)
+    dretr = SentenceTransformersRetriever(document_store=store, model_name_or_path=args.model)
+    queries = [r["query"] for r in load_jsonl(args.queries)][: args.max_q]
+    for q in queries:
+        t0 = time.perf_counter()
+        bm = bm25.retrieve(q, top_k=args.k)
+        dn = dretr.retrieve(q, top_k=args.k)
+        ms = (time.perf_counter() - t0) * 1000
+        print(f"{q[:40]:40s}  bm25={len(bm):3d}  dense={len(dn):3d}  time={ms:7.2f} ms")
+def main() -> None:
+    ap = argparse.ArgumentParser(prog="crom-bench")
+    sub = ap.add_subparsers(dest="cmd", required=True)
+    p = sub.add_parser("e2e", help="end-to-end: retrieval → rerank → pack → mock LLM")
+    p.add_argument("--corpus", default="examples/corpus/sample_docs.jsonl")
+    p.add_argument("--queries", default="examples/corpus/sample_queries.jsonl")
+    p.add_argument("--model", default="sentence-transformers/all-MiniLM-L6-v2")
+    p.add_argument("--k", type=int, default=200)
+    p.add_argument("--alpha", type=float, default=0.5)
+    p.add_argument("--budget", type=float, default=0.3)
+    # plugins
+    p.add_argument("--use-flashrank", action="store_true")
+    p.add_argument("--flashrank-model", default="ms-marco-TinyBERT-L-2-v2")
+    p.add_argument("--use-llmlingua", action="store_true")
+    p.add_argument("--compress-ratio", type=float, default=0.6)
+    p.add_argument("--use-evidently", action="store_true")
+    p.add_argument("--out-dir", default="benchmarks/out")
+    p.set_defaults(func=e2e)
+    p2 = sub.add_parser("sweep", help="budget sweep + Pareto csv")
+    p2.add_argument("--corpus", default="examples/corpus/sample_docs.jsonl")
+    p2.add_argument("--queries", default="examples/corpus/sample_queries.jsonl")
+    p2.add_argument("--model", default="sentence-transformers/all-MiniLM-L6-v2")
+    p2.add_argument("--k", type=int, default=200)
+    p2.add_argument("--alpha", type=float, default=0.5)
+    p2.add_argument("--b-min", type=int, default=10)
+    p2.add_argument("--b-max", type=int, default=90)
+    p2.add_argument("--b-step", type=int, default=10)
+    p2.add_argument("--max-q", type=int, default=20)
+    p2.add_argument("--out-dir", default="benchmarks/out")
+    p2.add_argument("--save-plots", action="store_true")
+    p2.set_defaults(func=budget_sweep)
+    p3 = sub.add_parser("scale", help="scaling runtime with synthetic data")
+    p3.add_argument("--n-max", type=int, default=100000)
+    p3.add_argument("--budget", type=float, default=0.3)
+    p3.set_defaults(func=scaling)
+    p4 = sub.add_parser("dp-curve", help="% of optimal vs budget (synthetic)")
+    p4.add_argument("--n", type=int, default=2000)
+    p4.add_argument("--n-opt", type=int, default=200)
+    p4.add_argument("--b-min", type=int, default=10)
+    p4.add_argument("--b-max", type=int, default=90)
+    p4.add_argument("--b-step", type=int, default=10)
+    p4.add_argument("--out-dir", default="benchmarks/out")
+    p4.add_argument("--save-plots", action="store_true")
+    p4.set_defaults(func=dp_curve)
+    p5 = sub.add_parser("haystack-compare", help="compare BM25 vs dense retrievers (Haystack)")
+    p5.add_argument("--corpus", default="examples/corpus/sample_docs.jsonl")
+    p5.add_argument("--queries", default="examples/corpus/sample_queries.jsonl")
+    p5.add_argument("--model", default="sentence-transformers/all-MiniLM-L6-v2")
+    p5.add_argument("--k", type=int, default=50)
+    p5.add_argument("--max-q", type=int, default=10)
+    p5.set_defaults(func=compare_haystack)
+    args = ap.parse_args()
+    args.func(args)
+if __name__ == "__main__":
+    main()
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\cross_encoder.py`
+```python
+from typing import List, Optional
+import logging
+class SafeCrossEncoderManager:
+    """Cross-Encoder 상태를 명시적으로 관리하는 클래스"""
+    def __init__(self, model_name: Optional[str] = None, device: str = "cpu"):
+        self.model_name = model_name
+        self.device = device
+        self.model = None
+        self.status = "unknown"
+        self.last_error = None
+        self._initialize()
+    def _initialize(self):
+        """Cross-Encoder 초기화 with 상세 상태 추적"""
+        if not self.model_name:
+            self.status = "disabled"
+            logging.info("Cross-Encoder: DISABLED (no model specified)")
+            return
+        try:
+            # sentence-transformers 임포트 체크
+            from sentence_transformers import CrossEncoder
+            # 모델 로딩 시도
+            self.model = CrossEncoder(self.model_name, device=self.device)
+            self.status = f"active ({self.model_name})"
+            # 🆕 성공 시 상세 로깅
+            logging.info(f"Cross-Encoder: ACTIVE")
+            logging.info(f"  └─ Model: {self.model_name}")
+            logging.info(f"  └─ Device: {self.device}")
+        except ImportError as e:
+            self.status = "unavailable (sentence-transformers not installed)"
+            self.last_error = str(e)
+            # 🆕 의존성 누락 시 명확한 안내
+            logging.warning("Cross-Encoder: UNAVAILABLE")
+            logging.warning("  └─ Reason: sentence-transformers not installed")
+            logging.warning("  └─ Install: pip install sentence-transformers")
+        except Exception as e:
+            self.status = f"error ({type(e).__name__})"
+            self.last_error = str(e)
+            # 🆕 기타 오류 시 상세 로깅
+            logging.error(f"Cross-Encoder: ERROR")
+            logging.error(f"  └─ Model: {self.model_name}")
+            logging.error(f"  └─ Error: {str(e)}")
+    def get_status_for_response(self) -> str:
+        """API 응답용 상태 문자열"""        return self.status
+    def rerank(self, query: str, documents: List[str]) -> List[float]:
+        """안전한 리랭킹 with 상태 로깅"""
+        if self.model is None:
+            # 🆕 비활성화 상태 명시적 로깅
+            logging.debug(f"Cross-Encoder rerank skipped: {self.status}")
+            return [0.5] * len(documents)  # 중립 점수
+        try:
+            pairs = [(query, doc) for doc in documents]
+            scores = self.model.predict(pairs)
+            # 🆕 성공적 리랭킹 로깅
+            logging.debug(f"Cross-Encoder reranked {len(documents)} documents")
+            return scores.tolist() if hasattr(scores, 'tolist') else list(scores)
+        except Exception as e:
+            # 🆕 런타임 오류 시 상세 로깅
+            logging.error(f"Cross-Encoder rerank failed: {str(e)}")
+            logging.error(f"  └─ Fallback: returning neutral scores")
+            return [0.5] * len(documents)
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\demo.py`
+```python
+"""
+Demo & Metrics Server for CRoM-EfficientLLM
+------------------------------------------
+- `crom-demo demo`  : run sample pipeline
+- `crom-demo serve` : start Flask + Prometheus metrics on :8000
+"""
+from __future__ import annotations
+import argparse
+from typing import List
+from flask import Flask, Response
+from prometheus_client import Counter, Gauge, generate_latest, CONTENT_TYPE_LATEST
+from crom_efficientllm.budget_packer.packer import budget_pack, pack_summary, Chunk
+from crom_efficientllm.rerank_engine.rerank import hybrid_rerank
+from crom_efficientllm.drift_estimator.estimator import DriftEstimator, DriftMode
+# ---- Prometheus metrics ----
+TOKENS_SAVED = Gauge("crom_tokens_saved", "Tokens saved by budget packer")
+DRIFT_ALERTS = Counter("crom_drift_alerts_total", "Total drift alerts emitted")
+class DummyEmbed:
+    def encode(self, text_or_list, convert_to_numpy=False):
+        if isinstance(text_or_list, list):
+            return [self.encode(t) for t in text_or_list]
+        vec = [ord(c) % 7 for c in str(text_or_list)[:16]]
+        while len(vec) < 16:
+            vec.append(0)
+        return vec
+def run_demo() -> None:
+    chunks: List[Chunk] = [
+        Chunk(text="AI ethics is crucial", score=0.9, tokens=50),
+        Chunk(text="Unrelated text", score=0.2, tokens=40),
+        Chunk(text="Drift detection research", score=0.8, tokens=60),
+    ]
+    packed = budget_pack(chunks, budget=80)
+    summary = pack_summary(packed)
+    print("Packed:", [c.text for c in packed], summary)
+    docs = [{"text": "AI drift measurement"}, {"text": "Cooking recipes"}]
+    reranked = hybrid_rerank("AI ethics", docs, DummyEmbed(), alpha=0.5)
+    print("Reranked:", [d["text"] for d in reranked])
+    de = DriftEstimator(threshold=0.5, mode=DriftMode.L2)
+    print("Drift state:", de.state())
+    print("Drift alert?", de.update([1, 2, 3]))
+    print("Drift alert?", de.update([10, 10, 10]))
+    print("Drift state:", de.state())
+    # Update metrics
+    TOKENS_SAVED.set(max(0, sum(c.tokens for c in chunks) - summary["tokens"]))
+    alert1, *_ = de.update([1, 2, 3])
+    alert2, *_ = de.update([10, 10, 10])
+    if alert1:
+        DRIFT_ALERTS.inc()
+    if alert2:
+        DRIFT_ALERTS.inc()
+def create_app() -> Flask:
+    app = Flask(__name__)
+    @app.get("/healthz")
+    def healthz():
+        return {"status": "ok"}
+    @app.get("/metrics")
+    def metrics():
+        return Response(generate_latest(), mimetype=CONTENT_TYPE_LATEST)
+    return app
+def main() -> None:
+    parser = argparse.ArgumentParser(prog="crom-demo")
+    sub = parser.add_subparsers(dest="cmd", required=True)
+    sub.add_parser("demo", help="run sample pipeline")
+    pserve = sub.add_parser("serve", help="start metrics server on :8000")
+    pserve.add_argument("--host", default="0.0.0.0")
+    pserve.add_argument("--port", type=int, default=8000)
+    args = parser.parse_args()
+    if args.cmd == "demo":
+        run_demo()
+        return
+    if args.cmd == "serve":
+        app = create_app()
+        app.run(host=args.host, port=args.port)
+        return
+if __name__ == "__main__":
+    main()
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\server.py`
+```python
+from fastapi import FastAPI, HTTPException
+import time
+from typing import List, Dict
+import logging
+# 내부 모듈 임포트
+from .budget_packer import enhanced_greedy_pack
+from .cross_encoder import SafeCrossEncoderManager
+from .capsule_logger import ExplainCapsuleLogger
+# --- FastAPI 앱 및 주요 컴포넌트 초기화 ---
+app = FastAPI(
+    title="CRoM-EfficientLLM Server",
+    description="Context Reranking and Management for Efficient LLMs",
+    version="1.0.1"
+)
+logging.basicConfig(level=logging.INFO)
+# 컴포넌트 인스턴스화
+# TODO: 설정 파일(config.yaml)에서 모델 이름 등을 로드하도록 개선 필요
+ce_manager = SafeCrossEncoderManager(model_name="ms-marco-TinyBERT-L-2-v2")
+capsule_logger = ExplainCapsuleLogger(log_directory="artifacts/logs")
+# --- 응답 스키마 및 헬퍼 함수 ---
+class ProcessResponseV2:
+    """확장된 /process 엔드포인트 응답 스키마 헬퍼"""
+    @staticmethod
+    def create_response(query: str, packed_chunks: List[Dict],
+                       processing_stats: Dict, cross_encoder_status: str,
+                       processing_time: float) -> Dict:
+        """개선된 응답 생성"""
+        response = {
+            "success": True,
+            "query": query,
+            "chunks": packed_chunks,
+            "stats": processing_stats, # packing 통계
+            "meta": {
+                "cross_encoder_status": cross_encoder_status,
+                "processing_time_ms": processing_time * 1000,
+                "timestamp": time.time()
+            }
+        }
+        return response
+# --- API 엔드포인트 정의 ---
+@app.post("/process", summary="Rerank and pack text chunks")
+def process_chunks(query: str, chunks: List[Dict], budget: int = 4096):
+    """
+    주어진 쿼리와 청크 목록을 리랭킹하고 예산에 맞게 패킹합니다.
+    """
+    start_time = time.time()
+    try:
+        # 1. Cross-Encoder로 리랭킹 (활성화 시)
+        doc_texts = [chunk.get("text", "") for chunk in chunks]
+        scores = ce_manager.rerank(query, doc_texts)
+        for chunk, score in zip(chunks, scores):
+            chunk["score"] = score
+        # 2. 예산에 맞게 패킹
+        packed_chunks, stats = enhanced_greedy_pack(chunks, budget=budget, score_key="score")
+        # 3. 최종 응답 생성
+        processing_time = time.time() - start_time
+        response_data = ProcessResponseV2.create_response(
+            query=query,
+            packed_chunks=packed_chunks,
+            processing_stats=stats,
+            cross_encoder_status=ce_manager.get_status_for_response(),
+            processing_time=processing_time
+        )
+        # 4. 설명 캡슐 로깅
+        capsule = capsule_logger.create_explain_capsule(
+            query=query,
+            response_data=response_data,
+            processing_stats=stats,
+            cross_encoder_status=ce_manager.get_status_for_response()
+        )
+        capsule_logger.log_capsule(capsule)
+        return response_data
+    except Exception as e:
+        logging.error(f"Error during /process: {e}", exc_info=True)
+        # 오류 로깅
+        capsule_logger.log_error({
+            "endpoint": "/process",
+            "error": str(e),
+            "query": query,
+        })
+        raise HTTPException(status_code=500, detail=f"Internal Server Error: {e}")
+@app.get("/healthz", summary="Health check")
+def health_check():
+    """서버의 상태를 확인합니다."""
+    return {"status": "ok", "cross_encoder": ce_manager.get_status_for_response()}
+@app.get("/metrics", summary="Get Prometheus metrics")
+def get_metrics():
+    """Prometheus 메트릭을 노출합니다."""
+    # TODO: Prometheus-client를 사용하여 실제 메트릭을 구현해야 함
+    return {"message": "Metrics endpoint is active. Implement with prometheus-client."}
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\tests\\test_drift.py`
+```python
+from crom_efficientllm.drift_estimator.estimator import DriftEstimator, DriftMode
+def test_drift_triggers():
+    de = DriftEstimator(threshold=0.1, mode=DriftMode.L2)
+    alert, dist, ewma = de.update([0, 0, 0])
+    assert alert is False
+    alert, dist, ewma = de.update([1, 0, 0])
+    assert isinstance(alert, bool)
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\tests\\test_packer.py`
+```python
+from crom_efficientllm.budget_packer.packer import budget_pack, Chunk
+def test_budget_pack_respects_budget():
+    chunks = [Chunk("a", 1.0, 60), Chunk("b", 0.9, 50), Chunk("c", 0.5, 20)]
+    sel = budget_pack(chunks, budget=70)
+    assert sum(c.tokens for c in sel) <= 70
+def test_budget_pack_sorting_stable():
+    chunks = [
+        {"text": "x", "score": 0.9, "tokens": 30},
+        {"text": "y", "score": 0.9, "tokens": 20},
+        {"text": "z", "score": 0.8, "tokens": 10},
+    ]
+    sel = budget_pack(chunks, budget=60)
+    assert [c.text for c in sel] == ["y", "x", "z"]
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\tests\\test_rerank.py`
+```python
+from crom_efficientllm.rerank_engine.rerank import hybrid_rerank
+class Dummy:
+    def encode(self, text_or_list, convert_to_numpy=False):
+        if isinstance(text_or_list, list):
+            return [self.encode(t) for t in text_or_list]
+        vec = [ord(c) % 5 for c in str(text_or_list)[:8]]
+        while len(vec) < 8:
+            vec.append(0)
+        return vec
+def test_hybrid_rerank_returns_scores():
+    docs = [{"text": "alpha"}, {"text": "beta"}]
+    out = hybrid_rerank("alp", docs, Dummy(), alpha=0.5)
+    assert len(out) == 2
+    assert {"score_sparse", "score_dense", "score_final"} <= set(out[0].keys())
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\budget_packer\\__init__.py`
+```python
+from .packer import Chunk, budget_pack, pack_summary
+__all__ = ["Chunk", "budget_pack", "pack_summary"]
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\budget_packer\\packer.py`
+```python
+"""
+Budget Packer
+-------------
+Greedy packing of highest-scoring chunks under a token budget.
+- Stable ordering (score desc, tokens asc, original index asc)
+- Input validation and optional token estimation
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+from typing import Any, Iterable, List, Sequence, Tuple, Union, Optional
+@dataclass(frozen=True)
+class Chunk:
+    text: str
+    score: float
+    tokens: int
+def _estimate_tokens(text: str) -> int:
+    """Lightweight heuristic when `tokens` absent. Avoids heavy tokenizers.
+    Why: keeps demo dependency-light and deterministic.
+    """
+    # approx: 4 chars ≈ 1 token; floor at 1
+    return max(1, len(text) // 4)
+def _coerce_chunk(obj: Union[Chunk, dict], idx: int) -> Chunk:
+    if isinstance(obj, Chunk):
+        return obj
+    if not isinstance(obj, dict):
+        raise TypeError(f"Chunk #{idx} must be Chunk or dict, got {type(obj)}")
+    text = str(obj.get("text", ""))
+    if not text:
+        raise ValueError(f"Chunk #{idx} has empty text")
+    score = float(obj.get("score", 0.0))
+    tokens = int(obj["tokens"]) if "tokens" in obj else _estimate_tokens(text)
+    if tokens <= 0:
+        raise ValueError(f"Chunk #{idx} has non-positive tokens: {tokens}")
+    return Chunk(text=text, score=score, tokens=tokens)
+def budget_pack(
+    text_chunks: Sequence[Union[Chunk, dict]],
+    budget: int = 1000,
+) -> List[Chunk]:
+    """
+    Args:
+        text_chunks: iterable of Chunk or dict with keys {text, score, tokens}
+        budget: max token budget (int > 0)
+    Returns:
+        list of selected chunks (order of selection)
+    """
+    if budget <= 0:
+        raise ValueError("budget must be > 0")
+    coerced: List[Chunk] = [_coerce_chunk(c, i) for i, c in enumerate(text_chunks)]
+    # stable sort by (-score, tokens, original_index)
+    indexed: List[Tuple[int, Chunk]] = list(enumerate(coerced))
+    indexed.sort(key=lambda it: (-it[1].score, it[1].tokens, it[0]))
+    selected: List[Chunk] = []
+    total = 0
+    for _, ch in indexed:
+        if total + ch.tokens <= budget:
+            selected.append(ch)
+            total += ch.tokens
+    return selected
+def pack_summary(selected: Sequence[Chunk]) -> dict:
+    tokens = sum(c.tokens for c in selected)
+    return {
+        "num_chunks": len(selected),
+        "tokens": tokens,
+        "avg_score": (sum(c.score for c in selected) / len(selected)) if selected else 0.0,
+    }
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\drift_estimator\\__init__.py`
+```python
+from .estimator import DriftEstimator, DriftMode
+__all__ = ["DriftEstimator", "DriftMode"]
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\drift_estimator\\estimator.py`
+```python
+"""
+Drift Estimator
+---------------
+Monitors embedding shift using L2 or cosine distance.
+Supports EWMA smoothing and exposes state for dashboards.
+"""
+from __future__ import annotations
+from dataclasses import dataclass, field
+from enum import Enum
+from typing import List, Optional, Tuple
+import numpy as np
+class DriftMode(str, Enum):
+    L2 = "l2"
+    COSINE = "cosine"
+@dataclass
+class DriftEstimator:
+    threshold: float = 0.2
+    mode: DriftMode = DriftMode.L2
+    ewma_alpha: float = 0.3  # smoothing for stability
+    history: List[np.ndarray] = field(default_factory=list)
+    distances: List[float] = field(default_factory=list)
+    ewma: Optional[float] = None
+    def _distance(self, a: np.ndarray, b: np.ndarray) -> float:
+        a = np.asarray(a, dtype=np.float32).ravel()
+        b = np.asarray(b, dtype=np.float32).ravel()
+        if self.mode == DriftMode.L2:
+            return float(np.linalg.norm(a - b))
+        # cosine distance = 1 - cosine similarity
+        denom = (np.linalg.norm(a) * np.linalg.norm(b)) + 1e-12
+        return float(1.0 - float(np.dot(a, b)) / denom)
+    def update(self, embedding) -> Tuple[bool, float, float]:
+        """
+        Args:
+            embedding: vector representation of current response
+        Returns:
+            (drift_alert, distance, ewma)
+        """
+        emb = np.asarray(embedding, dtype=np.float32)
+        if emb.ndim != 1:
+            emb = emb.ravel()
+        if not self.history:
+            self.history.append(emb)
+            self.ewma = 0.0
+            self.distances.append(0.0)
+            return (False, 0.0, 0.0)
+        last = self.history[-1]
+        dist = self._distance(emb, last)
+        self.history.append(emb)
+        self.distances.append(dist)
+        # EWMA update
+        if self.ewma is None:
+            self.ewma = dist
+        else:
+            self.ewma = self.ewma_alpha * dist + (1 - self.ewma_alpha) * self.ewma
+        return (bool(self.ewma > self.threshold), float(dist), float(self.ewma))
+    def state(self) -> dict:
+        return {
+            "count": len(self.history),
+            "last_distance": self.distances[-1] if self.distances else 0.0,
+            "ewma": self.ewma or 0.0,
+            "mode": self.mode.value,
+            "threshold": self.threshold,
+        }
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\plugins\\evidently_drift.py`
+```python
+from __future__ import annotations
+from typing import List
+try:
+    from evidently.metric_preset import DataDriftPreset
+    from evidently.report import Report
+    import pandas as pd
+except Exception as e:  # pragma: no cover
+    raise RuntimeError("evidently not installed. Install extras: pip install .[plugins]") from e
+def drift_report(ref: List[List[float]], cur: List[List[float]]):
+    ref_df = pd.DataFrame(ref)
+    cur_df = pd.DataFrame(cur)
+    rep = Report(metrics=[DataDriftPreset()])
+    rep.run(reference_data=ref_df, current_data=cur_df)
+    return rep
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\plugins\\flashrank_reranker.py`
+```python
+from __future__ import annotations
+from typing import List, Dict
+try:
+    from flashrank import Reranker
+except Exception as e:  # pragma: no cover
+    raise RuntimeError("flashrank not installed. Install extras: pip install .[plugins]") from e
+def flashrank_rerank(query: str, docs: List[Dict[str, str]], model_name: str = "ms-marco-TinyBERT-L-2-v2") -> List[Dict]:
+    rr = Reranker(model_name)
+    pairs = [(query, d["text"]) for d in docs]
+    scores = rr.rerank(pairs)
+    order = sorted(range(len(docs)), key=lambda i: -scores[i])
+    return [docs[i] | {"score_flashrank": float(scores[i])} for i in order]
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\plugins\\llmlingua_compressor.py`
+```python
+from __future__ import annotations
+try:
+    from llmlingua import PromptCompressor
+except Exception as e:  # pragma: no cover
+    raise RuntimeError("llmlingua not installed. Install extras: pip install .[plugins]") from e
+def compress_prompt(text: str, target_ratio: float = 0.5) -> str:
+    pc = PromptCompressor()
+    out = pc.compress(text, target_ratio=target_ratio)
+    return out["compressed_prompt"] if isinstance(out, dict) and "compressed_prompt" in out else str(out)
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\rerank_engine\\__init__.py`
+```python
+from .rerank import hybrid_rerank
+__all__ = ["hybrid_rerank"]
+```
+---
+### **File:** `D:\\Sanctum\\CRoM-EfficientLLM\\src\\crom_efficientllm\\rerank_engine\\rerank.py`
+```python
+"""
+Hybrid Rerank Engine
+--------------------
+Combines sparse (TF-IDF cosine) and dense (embedding cosine) scores with
+min-max normalization for robust fusion.
+"""
+from __future__ import annotations
+from typing import Dict, List, Sequence
+import numpy as np
+from sklearn.feature_extraction.text import TfidfVectorizer
+from sklearn.metrics.pairwise import cosine_similarity
+def _to_numpy(x):
+    arr = np.asarray(x)
+    return arr.astype(np.float32)
+def _batch_encode(embed_model, texts: Sequence[str]) -> np.ndarray:
+    # Try common API of sentence-transformers: encode(list, convert_to_numpy=True)
+    if hasattr(embed_model, "encode"):
+        try:
+            return _to_numpy(embed_model.encode(list(texts), convert_to_numpy=True))
+        except TypeError:
+            # Fallback: per-text encode
+            return _to_numpy([embed_model.encode(t) for t in texts])
+    raise TypeError("embed_model must provide .encode()")
+def _minmax(x: np.ndarray) -> np.ndarray:
+    if x.size == 0:
+        return x
+    mn, mx = float(np.min(x)), float(np.max(x))
+    if mx - mn <= 1e-12:
+        return np.zeros_like(x)
+    return (x - mn) / (mx - mn)
+def hybrid_rerank(
+    query: str,
+    docs: List[Dict[str, str]],
+    embed_model,
+    alpha: float = 0.5,
+) -> List[Dict[str, object]]:
+    """
+    Args:
+        query: query string
+        docs: list of {"text": str}
+        embed_model: model with .encode() -> vector(s)
+        alpha: weight between sparse/dense in [0,1]
+    Returns:
+        ranked list of enriched docs with scores {score_sparse, score_dense, score_final}
+    """
+    if not 0.0 <= alpha <= 1.0:
+        raise ValueError("alpha must be in [0, 1]")
+    if not docs:
+        return []
+    texts = [d.get("text", "") for d in docs]
+    # Sparse: TF-IDF cosine
+    tfidf = TfidfVectorizer(ngram_range=(1, 2), min_df=1).fit(texts)
+    Q = tfidf.transform([query])
+    D = tfidf.transform(texts)
+    sparse_scores = cosine_similarity(Q, D).ravel()
+    # Dense: cosine(sim) between L2-normalized embeddings
+    q_emb = _to_numpy(embed_model.encode(query))
+    d_embs = _batch_encode(embed_model, texts)
+    # L2 normalize
+    def _l2norm(a):
+        n = np.linalg.norm(a, axis=-1, keepdims=True) + 1e-12
+        return a / n
+    qn = _l2norm(q_emb.reshape(1, -1))
+    dn = _l2norm(d_embs)
+    dense_scores = cosine_similarity(qn, dn).ravel()
+    # Min-max to [0,1] before fusion to avoid scale issues
+    s_sparse = _minmax(sparse_scores)
+    s_dense = _minmax(dense_scores)
+    final_scores = alpha * s_sparse + (1 - alpha) * s_dense
+    order = np.argsort(-final_scores)
+    ranked = []
+    for i in order:
+        item = dict(docs[i])
+        item.update(
+            score_sparse=float(s_sparse[i]),
+            score_dense=float(s_dense[i]),
+            score_final=float(final_scores[i]),
+        )
+        ranked.append(item)
+    return ranked
+```

release_notes.md ADDED Viewed

	@@ -0,0 +1,12 @@

+# Release v0.2.1
+## [0.2.1] - 2025-09-02
+### Added
+- CLI `--save-plots` option for `sweep` and `dp-curve`; saves PNG charts to `benchmarks/out/` (or `--out-dir`).
+- README Quick Examples mention of plotting flag.
+- This CHANGELOG.
+### Changed
+- Dev tooling: recommend `matplotlib` via dev extra for plotting.
+— generated from [CHANGELOG.md](CHANGELOG.md)