Remove AGENTS.md - cleanup for OS launch
Browse files
AGENTS.md
DELETED
@@ -1,66 +0,0 @@
|
|
1 |
-
# AGENTS Guidelines for BitTransformerLM
|
2 |
-
|
3 |
-
## Repository Scope and Purpose
|
4 |
-
- **BitTransformerLM** models raw binary streams using reversible transformer blocks and safety telemetry. The project is the canonical implementation under WCNegentropy.
|
5 |
-
- Core capabilities include bit-native modeling, telemetry metrics (negentropy, LZ complexity, symbiosis), progressive scaling, compression, context extension, diffusion mode (linear/cosine/exp noise schedules with parity correction), dashboard control, distributed training, and quantization.
|
6 |
-
- Phase 1 optimizations provide configurable batch sizing, gradient accumulation, mixed-precision, memory-mapped dataset streaming, scheduled compression ramps, selective `torch.compile`, and an EMA-smoothed safety gate with burn-in.
|
7 |
-
|
8 |
-
## Environment Setup
|
9 |
-
- Requires **Python 3.10+**.
|
10 |
-
- Install dependencies:
|
11 |
-
- CPU: `pip install --extra-index-url https://download.pytorch.org/whl/cpu -r requirements.txt`
|
12 |
-
- Optional GPU: `pip install --extra-index-url https://download.pytorch.org/whl/cu118 torch==2.7.1+cu118`
|
13 |
-
- The package name is `bit-transformer`; project metadata lives in `pyproject.toml`.
|
14 |
-
|
15 |
-
## Repository Layout
|
16 |
-
- `bit_transformer/` – core package (`model`, `compression`, `telemetry`, `safety`, `dashboard_app`, `quantization`, etc.).
|
17 |
-
- `tests/` – pytest suite and historical `TEST_RESULTS.md`.
|
18 |
-
- Scripts: `example.py`, `unified_workflow.py`, `full_bits_train.py`, `build_full_bits.py`, `mcp_server.py`, `wikitext_*` utilities. The legacy `progressive_scaleup.py` is retained for reference but superseded by `integration_schedule.py`.
|
19 |
-
- Docs and specs: `README.md`, `state_of_the_repo_audit.md`, licensing files in `LICENSE/`.
|
20 |
-
|
21 |
-
## Development Practices
|
22 |
-
- Follow snake_case for functions and CamelCase for classes.
|
23 |
-
- Keep functions under ~300 lines and minimize deeply nested control flow.
|
24 |
-
- Avoid reintroducing the deprecated dashboard `/exec` endpoint or other insecure code paths.
|
25 |
-
- Use the `/status` endpoint for model introspection; all routes return JSON and surface errors with stack traces.
|
26 |
-
- Ensure compression, decompression, and halting logic stay consistent with current implementation.
|
27 |
-
- Use the `cpu_autocast()` helper for BF16 mixed precision on CPU instead of
|
28 |
-
calling `torch.amp.autocast` directly.
|
29 |
-
- Adaptive training now expands depth, width, or context only when validation loss plateaus and automatically decays the base learning rate by √2 after each expansion with a 100‑step warm‑up.
|
30 |
-
|
31 |
-
## Workflow & Commands
|
32 |
-
- Run the example: `python example.py`.
|
33 |
-
- Adaptive scaling now lives in `integration_schedule.py`; `progressive_scaleup.py` is deprecated.
|
34 |
-
- Unified workflow (optionally with dashboard or diffusion): `python unified_workflow.py --dashboard` or `python unified_workflow.py --diffusion --diffusion-steps 8 --dataset-size 32`.
|
35 |
-
- Increase `--diffusion-steps` for higher fidelity (8–16) and add `--diffusion-curriculum` to linearly decay noise over epochs.
|
36 |
-
- Disable checkpointing or reversible blocks when speed is prioritized over memory: `python unified_workflow.py --no-checkpoint --no-reversible`.
|
37 |
-
- Enable 4-bit quantization-aware training: `python unified_workflow.py --qat`.
|
38 |
-
- Skip full attention logging during chunked attention for memory savings by constructing the model with `full_attn_logging=False`.
|
39 |
-
- Start MCP server: `python mcp_server.py` and launch dashboard: `MCP_SERVER_ADDR=http://127.0.0.1:7000 python -m bit_transformer.dashboard_app`.
|
40 |
-
- `/metrics` and `/model_config` endpoints expose telemetry streams and hyperparameters.
|
41 |
-
- `/save_checkpoint` and `/download_checkpoint` sync weights with Hugging Face (token defaults to `HF_TOKEN`).
|
42 |
-
- Container build: `docker build -t bittransformerlm .` and run with exposed ports `5000` (dashboard) and `7000` (MCP).
|
43 |
-
|
44 |
-
## Telemetry Metrics
|
45 |
-
| Metric | Meaning | Range |
|
46 |
-
|--------|---------|-------|
|
47 |
-
| **K** | Negentropy – deviation from random noise | 0–1 (1 = ordered) |
|
48 |
-
| **C** | LZ Complexity – compressibility proxy | 0–1 (higher = more changes) |
|
49 |
-
| **S** | Symbiosis – agreement with reference distribution | 0–1 (1 = aligned) |
|
50 |
-
|
51 |
-
ACT halting exports `halt_probs` in telemetry showing how many layers executed. For robust sampling under safety constraints, call `safe_sample_with_retry(model, bits)` which retries with diffusion mode and exponential backoff.
|
52 |
-
|
53 |
-
`TelemetrySynthesizer.cluster_sequences` can be used to select representative training samples before invoking `collapse_submodel`. The distillation helper deepens the model and widens once (`width_scale` = 1.5) if floors are missed, and `save_distilled_model` emits a `metrics.json` summary beside the weights.
|
54 |
-
|
55 |
-
## Testing
|
56 |
-
- Run unit tests after any change: `pytest -q`.
|
57 |
-
- Use `watcher.py` for auto-reload and test on local development if desired.
|
58 |
-
- During training, call `model.train()` and keep dropout probabilities around `0.1–0.2`.
|
59 |
-
- Before running tests, inference, or pushing weights, switch to `model.eval()` and set all dropout probabilities to `0` to avoid flaky results.
|
60 |
-
- Dashboard will warn if telemetry metrics drift by more than 0.2 over the last 10 steps. Adjust via `ModelManager(drift_window, drift_threshold)` as needed.
|
61 |
-
|
62 |
-
## Licensing
|
63 |
-
- Project governed by documents in `LICENSE/` (AGPLv3, commercial terms, disclaimers, etc.). Ensure compliance before contributing or distributing.
|
64 |
-
|
65 |
-
These guidelines keep the repository consistent with the project roadmap and previous audits. Maintain security, style, and testing discipline to keep BitTransformerLM production-ready.
|
66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|