WCNegentropy
/

BitTransformerLM

@@ -1,66 +0,0 @@
-# AGENTS Guidelines for BitTransformerLM
-## Repository Scope and Purpose
-- **BitTransformerLM** models raw binary streams using reversible transformer blocks and safety telemetry. The project is the canonical implementation under WCNegentropy.
-- Core capabilities include bit-native modeling, telemetry metrics (negentropy, LZ complexity, symbiosis), progressive scaling, compression, context extension, diffusion mode (linear/cosine/exp noise schedules with parity correction), dashboard control, distributed training, and quantization.
-- Phase 1 optimizations provide configurable batch sizing, gradient accumulation, mixed-precision, memory-mapped dataset streaming, scheduled compression ramps, selective `torch.compile`, and an EMA-smoothed safety gate with burn-in.
-## Environment Setup
-- Requires **Python 3.10+**.
-- Install dependencies:
-  - CPU: `pip install --extra-index-url https://download.pytorch.org/whl/cpu -r requirements.txt`
-  - Optional GPU: `pip install --extra-index-url https://download.pytorch.org/whl/cu118 torch==2.7.1+cu118`
-- The package name is `bit-transformer`; project metadata lives in `pyproject.toml`.
-## Repository Layout
-- `bit_transformer/` – core package (`model`, `compression`, `telemetry`, `safety`, `dashboard_app`, `quantization`, etc.).
-- `tests/` – pytest suite and historical `TEST_RESULTS.md`.
-- Scripts: `example.py`, `unified_workflow.py`, `full_bits_train.py`, `build_full_bits.py`, `mcp_server.py`, `wikitext_*` utilities. The legacy `progressive_scaleup.py` is retained for reference but superseded by `integration_schedule.py`.
-- Docs and specs: `README.md`, `state_of_the_repo_audit.md`, licensing files in `LICENSE/`.
-## Development Practices
-- Follow snake_case for functions and CamelCase for classes.
-- Keep functions under ~300 lines and minimize deeply nested control flow.
-- Avoid reintroducing the deprecated dashboard `/exec` endpoint or other insecure code paths.
-- Use the `/status` endpoint for model introspection; all routes return JSON and surface errors with stack traces.
-- Ensure compression, decompression, and halting logic stay consistent with current implementation.
-- Use the `cpu_autocast()` helper for BF16 mixed precision on CPU instead of
-  calling `torch.amp.autocast` directly.
-- Adaptive training now expands depth, width, or context only when validation loss plateaus and automatically decays the base learning rate by √2 after each expansion with a 100‑step warm‑up.
-## Workflow & Commands
-- Run the example: `python example.py`.
-- Adaptive scaling now lives in `integration_schedule.py`; `progressive_scaleup.py` is deprecated.
-- Unified workflow (optionally with dashboard or diffusion): `python unified_workflow.py --dashboard` or `python unified_workflow.py --diffusion --diffusion-steps 8 --dataset-size 32`.
-- Increase `--diffusion-steps` for higher fidelity (8–16) and add `--diffusion-curriculum` to linearly decay noise over epochs.
-- Disable checkpointing or reversible blocks when speed is prioritized over memory: `python unified_workflow.py --no-checkpoint --no-reversible`.
-- Enable 4-bit quantization-aware training: `python unified_workflow.py --qat`.
-- Skip full attention logging during chunked attention for memory savings by constructing the model with `full_attn_logging=False`.
-- Start MCP server: `python mcp_server.py` and launch dashboard: `MCP_SERVER_ADDR=http://127.0.0.1:7000 python -m bit_transformer.dashboard_app`.
-- `/metrics` and `/model_config` endpoints expose telemetry streams and hyperparameters.
-- `/save_checkpoint` and `/download_checkpoint` sync weights with Hugging Face (token defaults to `HF_TOKEN`).
-- Container build: `docker build -t bittransformerlm .` and run with exposed ports `5000` (dashboard) and `7000` (MCP).
-## Telemetry Metrics
-| Metric | Meaning | Range |
-|--------|---------|-------|
-| **K** | Negentropy – deviation from random noise | 0–1 (1 = ordered) |
-| **C** | LZ Complexity – compressibility proxy | 0–1 (higher = more changes) |
-| **S** | Symbiosis – agreement with reference distribution | 0–1 (1 = aligned) |
-ACT halting exports `halt_probs` in telemetry showing how many layers executed. For robust sampling under safety constraints, call `safe_sample_with_retry(model, bits)` which retries with diffusion mode and exponential backoff.
-`TelemetrySynthesizer.cluster_sequences` can be used to select representative training samples before invoking `collapse_submodel`. The distillation helper deepens the model and widens once (`width_scale` = 1.5) if floors are missed, and `save_distilled_model` emits a `metrics.json` summary beside the weights.
-## Testing
-- Run unit tests after any change: `pytest -q`.
-- Use `watcher.py` for auto-reload and test on local development if desired.
-- During training, call `model.train()` and keep dropout probabilities around `0.1–0.2`.
-- Before running tests, inference, or pushing weights, switch to `model.eval()` and set all dropout probabilities to `0` to avoid flaky results.
-- Dashboard will warn if telemetry metrics drift by more than 0.2 over the last 10 steps. Adjust via `ModelManager(drift_window, drift_threshold)` as needed.
-## Licensing
-- Project governed by documents in `LICENSE/` (AGPLv3, commercial terms, disclaimers, etc.). Ensure compliance before contributing or distributing.
-These guidelines keep the repository consistent with the project roadmap and previous audits. Maintain security, style, and testing discipline to keep BitTransformerLM production-ready.