WCNegentropy commited on
Commit
4786c90
·
verified ·
1 Parent(s): 621280c

Remove AGENTS.md - cleanup for OS launch

Browse files
Files changed (1) hide show
  1. AGENTS.md +0 -66
AGENTS.md DELETED
@@ -1,66 +0,0 @@
1
- # AGENTS Guidelines for BitTransformerLM
2
-
3
- ## Repository Scope and Purpose
4
- - **BitTransformerLM** models raw binary streams using reversible transformer blocks and safety telemetry. The project is the canonical implementation under WCNegentropy.
5
- - Core capabilities include bit-native modeling, telemetry metrics (negentropy, LZ complexity, symbiosis), progressive scaling, compression, context extension, diffusion mode (linear/cosine/exp noise schedules with parity correction), dashboard control, distributed training, and quantization.
6
- - Phase 1 optimizations provide configurable batch sizing, gradient accumulation, mixed-precision, memory-mapped dataset streaming, scheduled compression ramps, selective `torch.compile`, and an EMA-smoothed safety gate with burn-in.
7
-
8
- ## Environment Setup
9
- - Requires **Python 3.10+**.
10
- - Install dependencies:
11
- - CPU: `pip install --extra-index-url https://download.pytorch.org/whl/cpu -r requirements.txt`
12
- - Optional GPU: `pip install --extra-index-url https://download.pytorch.org/whl/cu118 torch==2.7.1+cu118`
13
- - The package name is `bit-transformer`; project metadata lives in `pyproject.toml`.
14
-
15
- ## Repository Layout
16
- - `bit_transformer/` – core package (`model`, `compression`, `telemetry`, `safety`, `dashboard_app`, `quantization`, etc.).
17
- - `tests/` – pytest suite and historical `TEST_RESULTS.md`.
18
- - Scripts: `example.py`, `unified_workflow.py`, `full_bits_train.py`, `build_full_bits.py`, `mcp_server.py`, `wikitext_*` utilities. The legacy `progressive_scaleup.py` is retained for reference but superseded by `integration_schedule.py`.
19
- - Docs and specs: `README.md`, `state_of_the_repo_audit.md`, licensing files in `LICENSE/`.
20
-
21
- ## Development Practices
22
- - Follow snake_case for functions and CamelCase for classes.
23
- - Keep functions under ~300 lines and minimize deeply nested control flow.
24
- - Avoid reintroducing the deprecated dashboard `/exec` endpoint or other insecure code paths.
25
- - Use the `/status` endpoint for model introspection; all routes return JSON and surface errors with stack traces.
26
- - Ensure compression, decompression, and halting logic stay consistent with current implementation.
27
- - Use the `cpu_autocast()` helper for BF16 mixed precision on CPU instead of
28
- calling `torch.amp.autocast` directly.
29
- - Adaptive training now expands depth, width, or context only when validation loss plateaus and automatically decays the base learning rate by √2 after each expansion with a 100‑step warm‑up.
30
-
31
- ## Workflow & Commands
32
- - Run the example: `python example.py`.
33
- - Adaptive scaling now lives in `integration_schedule.py`; `progressive_scaleup.py` is deprecated.
34
- - Unified workflow (optionally with dashboard or diffusion): `python unified_workflow.py --dashboard` or `python unified_workflow.py --diffusion --diffusion-steps 8 --dataset-size 32`.
35
- - Increase `--diffusion-steps` for higher fidelity (8–16) and add `--diffusion-curriculum` to linearly decay noise over epochs.
36
- - Disable checkpointing or reversible blocks when speed is prioritized over memory: `python unified_workflow.py --no-checkpoint --no-reversible`.
37
- - Enable 4-bit quantization-aware training: `python unified_workflow.py --qat`.
38
- - Skip full attention logging during chunked attention for memory savings by constructing the model with `full_attn_logging=False`.
39
- - Start MCP server: `python mcp_server.py` and launch dashboard: `MCP_SERVER_ADDR=http://127.0.0.1:7000 python -m bit_transformer.dashboard_app`.
40
- - `/metrics` and `/model_config` endpoints expose telemetry streams and hyperparameters.
41
- - `/save_checkpoint` and `/download_checkpoint` sync weights with Hugging Face (token defaults to `HF_TOKEN`).
42
- - Container build: `docker build -t bittransformerlm .` and run with exposed ports `5000` (dashboard) and `7000` (MCP).
43
-
44
- ## Telemetry Metrics
45
- | Metric | Meaning | Range |
46
- |--------|---------|-------|
47
- | **K** | Negentropy – deviation from random noise | 0–1 (1 = ordered) |
48
- | **C** | LZ Complexity – compressibility proxy | 0–1 (higher = more changes) |
49
- | **S** | Symbiosis – agreement with reference distribution | 0–1 (1 = aligned) |
50
-
51
- ACT halting exports `halt_probs` in telemetry showing how many layers executed. For robust sampling under safety constraints, call `safe_sample_with_retry(model, bits)` which retries with diffusion mode and exponential backoff.
52
-
53
- `TelemetrySynthesizer.cluster_sequences` can be used to select representative training samples before invoking `collapse_submodel`. The distillation helper deepens the model and widens once (`width_scale` = 1.5) if floors are missed, and `save_distilled_model` emits a `metrics.json` summary beside the weights.
54
-
55
- ## Testing
56
- - Run unit tests after any change: `pytest -q`.
57
- - Use `watcher.py` for auto-reload and test on local development if desired.
58
- - During training, call `model.train()` and keep dropout probabilities around `0.1–0.2`.
59
- - Before running tests, inference, or pushing weights, switch to `model.eval()` and set all dropout probabilities to `0` to avoid flaky results.
60
- - Dashboard will warn if telemetry metrics drift by more than 0.2 over the last 10 steps. Adjust via `ModelManager(drift_window, drift_threshold)` as needed.
61
-
62
- ## Licensing
63
- - Project governed by documents in `LICENSE/` (AGPLv3, commercial terms, disclaimers, etc.). Ensure compliance before contributing or distributing.
64
-
65
- These guidelines keep the repository consistent with the project roadmap and previous audits. Maintain security, style, and testing discipline to keep BitTransformerLM production-ready.
66
-