BitTransformerLM / AGENTS.md
WCNegentropy's picture
🤖 Updated BitTransformerLM from development space
36c78b1 verified
|
raw
history blame
5.54 kB

AGENTS Guidelines for BitTransformerLM

Repository Scope and Purpose

  • BitTransformerLM models raw binary streams using reversible transformer blocks and safety telemetry. The project is the canonical implementation under WCNegentropy.
  • Core capabilities include bit-native modeling, telemetry metrics (negentropy, LZ complexity, symbiosis), progressive scaling, compression, context extension, diffusion mode (linear/cosine/exp noise schedules with parity correction), dashboard control, distributed training, and quantization.
  • Phase 1 optimizations provide configurable batch sizing, gradient accumulation, mixed-precision, memory-mapped dataset streaming, scheduled compression ramps, selective torch.compile, and an EMA-smoothed safety gate with burn-in.

Environment Setup

  • Requires Python 3.10+.
  • Install dependencies:
    • CPU: pip install --extra-index-url https://download.pytorch.org/whl/cpu -r requirements.txt
    • Optional GPU: pip install --extra-index-url https://download.pytorch.org/whl/cu118 torch==2.7.1+cu118
  • The package name is bit-transformer; project metadata lives in pyproject.toml.

Repository Layout

  • bit_transformer/ – core package (model, compression, telemetry, safety, dashboard_app, quantization, etc.).
  • tests/ – pytest suite and historical TEST_RESULTS.md.
  • Scripts: example.py, unified_workflow.py, full_bits_train.py, build_full_bits.py, mcp_server.py, wikitext_* utilities. The legacy progressive_scaleup.py is retained for reference but superseded by integration_schedule.py.
  • Docs and specs: README.md, state_of_the_repo_audit.md, licensing files in LICENSE/.

Development Practices

  • Follow snake_case for functions and CamelCase for classes.
  • Keep functions under ~300 lines and minimize deeply nested control flow.
  • Avoid reintroducing the deprecated dashboard /exec endpoint or other insecure code paths.
  • Use the /status endpoint for model introspection; all routes return JSON and surface errors with stack traces.
  • Ensure compression, decompression, and halting logic stay consistent with current implementation.
  • Use the cpu_autocast() helper for BF16 mixed precision on CPU instead of calling torch.amp.autocast directly.
  • Adaptive training now expands depth, width, or context only when validation loss plateaus and automatically decays the base learning rate by √2 after each expansion with a 100‑step warm‑up.

Workflow & Commands

  • Run the example: python example.py.
  • Adaptive scaling now lives in integration_schedule.py; progressive_scaleup.py is deprecated.
  • Unified workflow (optionally with dashboard or diffusion): python unified_workflow.py --dashboard or python unified_workflow.py --diffusion --diffusion-steps 8 --dataset-size 32.
  • Increase --diffusion-steps for higher fidelity (8–16) and add --diffusion-curriculum to linearly decay noise over epochs.
  • Disable checkpointing or reversible blocks when speed is prioritized over memory: python unified_workflow.py --no-checkpoint --no-reversible.
  • Enable 4-bit quantization-aware training: python unified_workflow.py --qat.
  • Skip full attention logging during chunked attention for memory savings by constructing the model with full_attn_logging=False.
  • Start MCP server: python mcp_server.py and launch dashboard: MCP_SERVER_ADDR=http://127.0.0.1:7000 python -m bit_transformer.dashboard_app.
  • /metrics and /model_config endpoints expose telemetry streams and hyperparameters.
  • /save_checkpoint and /download_checkpoint sync weights with Hugging Face (token defaults to HF_TOKEN).
  • Container build: docker build -t bittransformerlm . and run with exposed ports 5000 (dashboard) and 7000 (MCP).

Telemetry Metrics

Metric Meaning Range
K Negentropy – deviation from random noise 0–1 (1 = ordered)
C LZ Complexity – compressibility proxy 0–1 (higher = more changes)
S Symbiosis – agreement with reference distribution 0–1 (1 = aligned)

ACT halting exports halt_probs in telemetry showing how many layers executed. For robust sampling under safety constraints, call safe_sample_with_retry(model, bits) which retries with diffusion mode and exponential backoff.

TelemetrySynthesizer.cluster_sequences can be used to select representative training samples before invoking collapse_submodel. The distillation helper deepens the model and widens once (width_scale = 1.5) if floors are missed, and save_distilled_model emits a metrics.json summary beside the weights.

Testing

  • Run unit tests after any change: pytest -q.
  • Use watcher.py for auto-reload and test on local development if desired.
  • During training, call model.train() and keep dropout probabilities around 0.1–0.2.
  • Before running tests, inference, or pushing weights, switch to model.eval() and set all dropout probabilities to 0 to avoid flaky results.
  • Dashboard will warn if telemetry metrics drift by more than 0.2 over the last 10 steps. Adjust via ModelManager(drift_window, drift_threshold) as needed.

Licensing

  • Project governed by documents in LICENSE/ (AGPLv3, commercial terms, disclaimers, etc.). Ensure compliance before contributing or distributing.

These guidelines keep the repository consistent with the project roadmap and previous audits. Maintain security, style, and testing discipline to keep BitTransformerLM production-ready.