|
# AGENTS Guidelines for BitTransformerLM |
|
|
|
## Repository Scope and Purpose |
|
- **BitTransformerLM** models raw binary streams using reversible transformer blocks and safety telemetry. The project is the canonical implementation under WCNegentropy. |
|
- Core capabilities include bit-native modeling, telemetry metrics (negentropy, LZ complexity, symbiosis), progressive scaling, compression, context extension, diffusion mode (linear/cosine/exp noise schedules with parity correction), dashboard control, distributed training, and quantization. |
|
- Phase 1 optimizations provide configurable batch sizing, gradient accumulation, mixed-precision, memory-mapped dataset streaming, scheduled compression ramps, selective `torch.compile`, and an EMA-smoothed safety gate with burn-in. |
|
|
|
## Environment Setup |
|
- Requires **Python 3.10+**. |
|
- Install dependencies: |
|
- CPU: `pip install --extra-index-url https://download.pytorch.org/whl/cpu -r requirements.txt` |
|
- Optional GPU: `pip install --extra-index-url https://download.pytorch.org/whl/cu118 torch==2.7.1+cu118` |
|
- The package name is `bit-transformer`; project metadata lives in `pyproject.toml`. |
|
|
|
## Repository Layout |
|
- `bit_transformer/` – core package (`model`, `compression`, `telemetry`, `safety`, `dashboard_app`, `quantization`, etc.). |
|
- `tests/` – pytest suite and historical `TEST_RESULTS.md`. |
|
- Scripts: `example.py`, `unified_workflow.py`, `full_bits_train.py`, `build_full_bits.py`, `mcp_server.py`, `wikitext_*` utilities. The legacy `progressive_scaleup.py` is retained for reference but superseded by `integration_schedule.py`. |
|
- Docs and specs: `README.md`, `state_of_the_repo_audit.md`, licensing files in `LICENSE/`. |
|
|
|
## Development Practices |
|
- Follow snake_case for functions and CamelCase for classes. |
|
- Keep functions under ~300 lines and minimize deeply nested control flow. |
|
- Avoid reintroducing the deprecated dashboard `/exec` endpoint or other insecure code paths. |
|
- Use the `/status` endpoint for model introspection; all routes return JSON and surface errors with stack traces. |
|
- Ensure compression, decompression, and halting logic stay consistent with current implementation. |
|
- Use the `cpu_autocast()` helper for BF16 mixed precision on CPU instead of |
|
calling `torch.amp.autocast` directly. |
|
- Adaptive training now expands depth, width, or context only when validation loss plateaus and automatically decays the base learning rate by √2 after each expansion with a 100‑step warm‑up. |
|
|
|
## Workflow & Commands |
|
- Run the example: `python example.py`. |
|
- Adaptive scaling now lives in `integration_schedule.py`; `progressive_scaleup.py` is deprecated. |
|
- Unified workflow (optionally with dashboard or diffusion): `python unified_workflow.py --dashboard` or `python unified_workflow.py --diffusion --diffusion-steps 8 --dataset-size 32`. |
|
- Increase `--diffusion-steps` for higher fidelity (8–16) and add `--diffusion-curriculum` to linearly decay noise over epochs. |
|
- Disable checkpointing or reversible blocks when speed is prioritized over memory: `python unified_workflow.py --no-checkpoint --no-reversible`. |
|
- Enable 4-bit quantization-aware training: `python unified_workflow.py --qat`. |
|
- Skip full attention logging during chunked attention for memory savings by constructing the model with `full_attn_logging=False`. |
|
- Start MCP server: `python mcp_server.py` and launch dashboard: `MCP_SERVER_ADDR=http://127.0.0.1:7000 python -m bit_transformer.dashboard_app`. |
|
- `/metrics` and `/model_config` endpoints expose telemetry streams and hyperparameters. |
|
- `/save_checkpoint` and `/download_checkpoint` sync weights with Hugging Face (token defaults to `HF_TOKEN`). |
|
- Container build: `docker build -t bittransformerlm .` and run with exposed ports `5000` (dashboard) and `7000` (MCP). |
|
|
|
## Telemetry Metrics |
|
| Metric | Meaning | Range | |
|
|--------|---------|-------| |
|
| **K** | Negentropy – deviation from random noise | 0–1 (1 = ordered) | |
|
| **C** | LZ Complexity – compressibility proxy | 0–1 (higher = more changes) | |
|
| **S** | Symbiosis – agreement with reference distribution | 0–1 (1 = aligned) | |
|
|
|
ACT halting exports `halt_probs` in telemetry showing how many layers executed. For robust sampling under safety constraints, call `safe_sample_with_retry(model, bits)` which retries with diffusion mode and exponential backoff. |
|
|
|
`TelemetrySynthesizer.cluster_sequences` can be used to select representative training samples before invoking `collapse_submodel`. The distillation helper deepens the model and widens once (`width_scale` = 1.5) if floors are missed, and `save_distilled_model` emits a `metrics.json` summary beside the weights. |
|
|
|
## Testing |
|
- Run unit tests after any change: `pytest -q`. |
|
- Use `watcher.py` for auto-reload and test on local development if desired. |
|
- During training, call `model.train()` and keep dropout probabilities around `0.1–0.2`. |
|
- Before running tests, inference, or pushing weights, switch to `model.eval()` and set all dropout probabilities to `0` to avoid flaky results. |
|
- Dashboard will warn if telemetry metrics drift by more than 0.2 over the last 10 steps. Adjust via `ModelManager(drift_window, drift_threshold)` as needed. |
|
|
|
## Licensing |
|
- Project governed by documents in `LICENSE/` (AGPLv3, commercial terms, disclaimers, etc.). Ensure compliance before contributing or distributing. |
|
|
|
These guidelines keep the repository consistent with the project roadmap and previous audits. Maintain security, style, and testing discipline to keep BitTransformerLM production-ready. |
|
|
|
|