π OS Launch: Clean documentation and refined licensing
Browse filesThis OS launch commit includes:
β
**Cleaned Documentation**
- Removed inflated claims and marketing language
- Added honest research status and limitations
- Created professional model card and validation reports
- Streamlined licensing to AGPLv3 + commercial contact
β
**Refined Codebase**
- Complete experimental bit-native transformer implementation
- 57 Python files with comprehensive research framework
- Safety telemetry and monitoring systems
- Distributed training and development tools
β
**Professional Standards**
- Empirical validation of all claims
- Clear experimental vs production distinctions
- Rigorous research methodology requirements
- Community contribution framework
Ready for serious research evaluation and academic investigation.
README.md
CHANGED
|
@@ -1,10 +1,10 @@
|
|
| 1 |
# BitTransformerLM
|
| 2 |
|
| 3 |
-
**Project Status:**
|
| 4 |
-
**Codebase Maturity:** 57 Python files, 10,699 lines of
|
| 5 |
-
**
|
| 6 |
|
| 7 |
-
BitTransformerLM is
|
| 8 |
|
| 9 |
## Historical Background
|
| 10 |
- **Early Experiments** β Initial prototypes explored mapping text to parity-protected bits and training a minimal transformer on random data.
|
|
@@ -17,9 +17,9 @@ BitTransformerLM is the world's first **bit-native transformer language model**
|
|
| 17 |
- **Dashboard & MCP Server** β Built a lightweight web UI backed by a management server for realβtime training, inference and model collapse. New `/metrics` and `/model_config` endpoints surface live telemetry and hyperparameters, and `/save_checkpoint` and `/download_checkpoint` enable Hugging Face weight sync. The insecure `/exec` route has been removed.
|
| 18 |
- **Phase 1 Optimizations** β Configurable batch sizes with aligned OneCycle scheduling, gradient accumulation, mixedβprecision, memoryβmapped dataset streaming, scheduled compression ramps, selective ``torch.compile``, and an EMAβsmoothed safety gate with burnβin to cut false positives.
|
| 19 |
|
| 20 |
-
The codebase
|
| 21 |
|
| 22 |
-
##
|
| 23 |
|
| 24 |
### Core Architecture Innovations
|
| 25 |
- β
**Bit-Native Processing**: Direct 0/1 computation without token intermediates
|
|
@@ -28,42 +28,42 @@ The codebase has undergone extensive testing, optimization, and real-world valid
|
|
| 28 |
- β
**Progressive Scaling**: Dynamic architecture expansion based on performance metrics
|
| 29 |
- β
**Diffusion Mode**: Bidirectional denoising for advanced generation capabilities
|
| 30 |
|
| 31 |
-
###
|
| 32 |
-
- β
**Multi-GPU FSDP**: Fully Sharded Data Parallel
|
| 33 |
-
- β
**Pipeline Parallelism**: Distributed training
|
| 34 |
- β
**Mixed Precision**: FP16/BF16 optimization with CPU autocast support
|
| 35 |
- β
**Gradient Checkpointing**: Memory-efficient training for large models
|
| 36 |
-
- β
**Dynamic Quantization**: Runtime INT8 conversion + 4-bit QAT
|
| 37 |
|
| 38 |
-
###
|
| 39 |
- β
**Real-Time Telemetry**: Live K/C/S metric tracking with drift detection
|
| 40 |
- β
**Safety Gates**: EMA-smoothed thresholds with configurable burn-in
|
| 41 |
- β
**Metric Synthesis**: Clustering-based activation analysis
|
| 42 |
- β
**Collapse Detection**: Automated model collapse prevention and recovery
|
| 43 |
- β
**Human-in-Loop**: Safe inference with retry mechanisms
|
| 44 |
|
| 45 |
-
###
|
| 46 |
- β
**Interactive Dashboard**: Real-time training control and visualization
|
| 47 |
-
- β
**MCP Server**: Management Control Protocol for
|
| 48 |
-
- β
**HuggingFace Integration**:
|
| 49 |
- β
**Enhanced Checkpointing**: Multi-run management with cloud backup
|
| 50 |
-
- β
**CLI Standardization**: Unified command-line interface across
|
| 51 |
|
| 52 |
-
###
|
| 53 |
- β
**Comprehensive Testing**: 11 test modules with automated CI validation
|
| 54 |
- β
**Type Safety**: Full type annotations with custom type system
|
| 55 |
- β
**Error Recovery**: Robust error handling with automatic retry logic
|
| 56 |
- β
**Memory Management**: Intelligent caching with automatic cleanup
|
| 57 |
-
- β
**Documentation**:
|
| 58 |
|
| 59 |
-
###
|
| 60 |
- β
**Torch.Compile**: Selective compilation for performance-critical paths
|
| 61 |
- β
**Chunked Attention**: Memory-efficient processing of long sequences
|
| 62 |
- β
**Compression Pipeline**: Lossless bit compression with performance ramps
|
| 63 |
- β
**Context Extension**: Sliding window inference for arbitrary lengths
|
| 64 |
- β
**ACT Integration**: Adaptive Computation Time for dynamic depth
|
| 65 |
|
| 66 |
-
**
|
| 67 |
|
| 68 |
## Quick Start
|
| 69 |
Install dependencies using the CPU wheel of PyTorch (default):
|
|
@@ -203,43 +203,44 @@ By default the container installs the CPU-only PyTorch wheel. Set the build
|
|
| 203 |
argument `TORCH_CUDA=cu118` to preinstall the GPU version. The container sets
|
| 204 |
`MCP_SERVER_ADDR=http://127.0.0.1:7000` and exposes the dashboard on port 5000.
|
| 205 |
|
| 206 |
-
##
|
| 207 |
|
| 208 |
-
### β
**COMPLETED -
|
| 209 |
- **Architecture**: Bit-native transformer with reversible layers β
|
| 210 |
- **Safety Systems**: K/C/S telemetry with real-time monitoring β
|
| 211 |
-
- **Distributed Training**: FSDP
|
| 212 |
-
- **
|
| 213 |
- **Testing & Validation**: Comprehensive test suite with CI β
|
| 214 |
-
- **Documentation**:
|
| 215 |
- **Performance**: Memory optimization, quantization, compression β
|
| 216 |
|
| 217 |
-
### π― **
|
| 218 |
-
- **
|
| 219 |
-
- **
|
| 220 |
-
- **
|
| 221 |
-
- **
|
| 222 |
|
| 223 |
-
### π **
|
| 224 |
-
- **Scale Validation**: Multi-billion parameter experiments
|
| 225 |
- **Hardware Optimization**: Custom CUDA kernels and neuromorphic support
|
| 226 |
-
- **Application
|
| 227 |
-
- **
|
| 228 |
|
| 229 |
-
**Current Status**:
|
| 230 |
|
| 231 |
## Licensing
|
| 232 |
|
| 233 |
-
|
| 234 |
|
| 235 |
-
|
|
|
|
| 236 |
|
| 237 |
-
|
| 238 |
-
* `COMMERCIAL_LICENSE.txt`: Terms for commercial use of the software.
|
| 239 |
-
* `DISCLAIMER.txt`: Important legal disclaimers.
|
| 240 |
-
* `ALIGNMENT_AND_TRANSPARENCY.txt`: Our commitment to alignment and transparency.
|
| 241 |
-
* `TRADEMARK_POLICY.txt`: Guidelines for using the project's trademarks.
|
| 242 |
-
* `CONTRIBUTOR_LICENSE_AGREEMENT.txt`: The agreement for all contributors to sign.
|
| 243 |
|
| 244 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 245 |
|
|
|
|
| 1 |
# BitTransformerLM
|
| 2 |
|
| 3 |
+
**Project Status:** Experimental Research Implementation
|
| 4 |
+
**Codebase Maturity:** 57 Python files, 10,699 lines of research code
|
| 5 |
+
**Current Stage:** Pre-release requiring validation and baseline comparisons
|
| 6 |
|
| 7 |
+
BitTransformerLM is an experimental **bit-native transformer language model** with built-in safety telemetry, exploring a novel approach to language modeling at the bit level. This research implementation includes distributed training capabilities, real-time monitoring, automated scaling, and comprehensive safety mechanisms. The architecture demonstrates potential for memory-efficient processing through reversible layers and fine-grained control via bit-level operations.
|
| 8 |
|
| 9 |
## Historical Background
|
| 10 |
- **Early Experiments** β Initial prototypes explored mapping text to parity-protected bits and training a minimal transformer on random data.
|
|
|
|
| 17 |
- **Dashboard & MCP Server** β Built a lightweight web UI backed by a management server for realβtime training, inference and model collapse. New `/metrics` and `/model_config` endpoints surface live telemetry and hyperparameters, and `/save_checkpoint` and `/download_checkpoint` enable Hugging Face weight sync. The insecure `/exec` route has been removed.
|
| 18 |
- **Phase 1 Optimizations** β Configurable batch sizes with aligned OneCycle scheduling, gradient accumulation, mixedβprecision, memoryβmapped dataset streaming, scheduled compression ramps, selective ``torch.compile``, and an EMAβsmoothed safety gate with burnβin to cut false positives.
|
| 19 |
|
| 20 |
+
The codebase includes comprehensive testing and experimental validation, representing a complete research implementation with potential for production deployment pending rigorous evaluation against standard baselines.
|
| 21 |
|
| 22 |
+
## π§ͺ Experimental Feature Matrix
|
| 23 |
|
| 24 |
### Core Architecture Innovations
|
| 25 |
- β
**Bit-Native Processing**: Direct 0/1 computation without token intermediates
|
|
|
|
| 28 |
- β
**Progressive Scaling**: Dynamic architecture expansion based on performance metrics
|
| 29 |
- β
**Diffusion Mode**: Bidirectional denoising for advanced generation capabilities
|
| 30 |
|
| 31 |
+
### Distributed Training Framework
|
| 32 |
+
- β
**Multi-GPU FSDP**: Fully Sharded Data Parallel implementation (tested up to 771M parameters)
|
| 33 |
+
- β
**Pipeline Parallelism**: Distributed training infrastructure
|
| 34 |
- β
**Mixed Precision**: FP16/BF16 optimization with CPU autocast support
|
| 35 |
- β
**Gradient Checkpointing**: Memory-efficient training for large models
|
| 36 |
+
- β
**Dynamic Quantization**: Runtime INT8 conversion + experimental 4-bit QAT
|
| 37 |
|
| 38 |
+
### Experimental Safety & Monitoring
|
| 39 |
- β
**Real-Time Telemetry**: Live K/C/S metric tracking with drift detection
|
| 40 |
- β
**Safety Gates**: EMA-smoothed thresholds with configurable burn-in
|
| 41 |
- β
**Metric Synthesis**: Clustering-based activation analysis
|
| 42 |
- β
**Collapse Detection**: Automated model collapse prevention and recovery
|
| 43 |
- β
**Human-in-Loop**: Safe inference with retry mechanisms
|
| 44 |
|
| 45 |
+
### Research Tools
|
| 46 |
- β
**Interactive Dashboard**: Real-time training control and visualization
|
| 47 |
+
- β
**MCP Server**: Management Control Protocol for research workflows
|
| 48 |
+
- β
**HuggingFace Integration**: Model weight sharing and checkpoint management
|
| 49 |
- β
**Enhanced Checkpointing**: Multi-run management with cloud backup
|
| 50 |
+
- β
**CLI Standardization**: Unified command-line interface across tools
|
| 51 |
|
| 52 |
+
### Development Infrastructure
|
| 53 |
- β
**Comprehensive Testing**: 11 test modules with automated CI validation
|
| 54 |
- β
**Type Safety**: Full type annotations with custom type system
|
| 55 |
- β
**Error Recovery**: Robust error handling with automatic retry logic
|
| 56 |
- β
**Memory Management**: Intelligent caching with automatic cleanup
|
| 57 |
+
- β
**Documentation**: Research-grade docstrings and API reference
|
| 58 |
|
| 59 |
+
### Performance Optimizations
|
| 60 |
- β
**Torch.Compile**: Selective compilation for performance-critical paths
|
| 61 |
- β
**Chunked Attention**: Memory-efficient processing of long sequences
|
| 62 |
- β
**Compression Pipeline**: Lossless bit compression with performance ramps
|
| 63 |
- β
**Context Extension**: Sliding window inference for arbitrary lengths
|
| 64 |
- β
**ACT Integration**: Adaptive Computation Time for dynamic depth
|
| 65 |
|
| 66 |
+
**Research Status**: BitTransformerLM provides a complete experimental framework for bit-native language modeling research, requiring baseline comparisons and rigorous evaluation for production use.
|
| 67 |
|
| 68 |
## Quick Start
|
| 69 |
Install dependencies using the CPU wheel of PyTorch (default):
|
|
|
|
| 203 |
argument `TORCH_CUDA=cu118` to preinstall the GPU version. The container sets
|
| 204 |
`MCP_SERVER_ADDR=http://127.0.0.1:7000` and exposes the dashboard on port 5000.
|
| 205 |
|
| 206 |
+
## Research Development Roadmap
|
| 207 |
|
| 208 |
+
### β
**COMPLETED - Experimental Implementation**
|
| 209 |
- **Architecture**: Bit-native transformer with reversible layers β
|
| 210 |
- **Safety Systems**: K/C/S telemetry with real-time monitoring β
|
| 211 |
+
- **Distributed Training**: FSDP implementation (tested up to 771M parameters) β
|
| 212 |
+
- **Research Tools**: Dashboard, MCP server, HF integration β
|
| 213 |
- **Testing & Validation**: Comprehensive test suite with CI β
|
| 214 |
+
- **Documentation**: Research-grade API documentation β
|
| 215 |
- **Performance**: Memory optimization, quantization, compression β
|
| 216 |
|
| 217 |
+
### π― **VALIDATION TARGETS**
|
| 218 |
+
- **Baseline Comparisons**: Rigorous evaluation against standard transformers
|
| 219 |
+
- **Statistical Analysis**: Multiple runs with proper significance testing
|
| 220 |
+
- **Long-Duration Training**: Training convergence studies on real datasets
|
| 221 |
+
- **Scaling Studies**: Systematic evaluation of model sizes and architectures
|
| 222 |
|
| 223 |
+
### π **FUTURE RESEARCH DIRECTIONS**
|
| 224 |
+
- **Scale Validation**: Multi-billion parameter experiments with proper baselines
|
| 225 |
- **Hardware Optimization**: Custom CUDA kernels and neuromorphic support
|
| 226 |
+
- **Application Studies**: Real-world deployment case studies with evaluation
|
| 227 |
+
- **Academic Validation**: Peer review and publication processes
|
| 228 |
|
| 229 |
+
**Current Status**: Complete experimental framework requiring rigorous validation against established baselines before production deployment.
|
| 230 |
|
| 231 |
## Licensing
|
| 232 |
|
| 233 |
+
BitTransformerLM is available under a dual licensing scheme:
|
| 234 |
|
| 235 |
+
* **Open Source License:** AGPLv3 (see `LICENSE/LICENSE.txt`)
|
| 236 |
+
* **Commercial License:** Available by contacting **[email protected]**
|
| 237 |
|
| 238 |
+
Additional licensing documents in the `LICENSE/` directory:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 239 |
|
| 240 |
+
* `COMMERCIAL_LICENSE.txt`: Information about commercial licensing options
|
| 241 |
+
* `DISCLAIMER.txt`: Important legal disclaimers and limitations
|
| 242 |
+
* `TRADEMARK_POLICY.txt`: Guidelines for using project trademarks
|
| 243 |
+
* `CONTRIBUTOR_LICENSE_AGREEMENT.txt`: Terms for contributors
|
| 244 |
+
|
| 245 |
+
For commercial use cases that require different licensing terms than AGPLv3, please contact **[email protected]** to discuss commercial licensing options.
|
| 246 |
|