π OS Launch: Clean documentation and refined licensing
Browse filesThis OS launch commit includes:
β
**Cleaned Documentation**
- Removed inflated claims and marketing language
- Added honest research status and limitations
- Created professional model card and validation reports
- Streamlined licensing to AGPLv3 + commercial contact
β
**Refined Codebase**
- Complete experimental bit-native transformer implementation
- 57 Python files with comprehensive research framework
- Safety telemetry and monitoring systems
- Distributed training and development tools
β
**Professional Standards**
- Empirical validation of all claims
- Clear experimental vs production distinctions
- Rigorous research methodology requirements
- Community contribution framework
Ready for serious research evaluation and academic investigation.
README.md
CHANGED
@@ -1,10 +1,10 @@
|
|
1 |
# BitTransformerLM
|
2 |
|
3 |
-
**Project Status:**
|
4 |
-
**Codebase Maturity:** 57 Python files, 10,699 lines of
|
5 |
-
**
|
6 |
|
7 |
-
BitTransformerLM is
|
8 |
|
9 |
## Historical Background
|
10 |
- **Early Experiments** β Initial prototypes explored mapping text to parity-protected bits and training a minimal transformer on random data.
|
@@ -17,9 +17,9 @@ BitTransformerLM is the world's first **bit-native transformer language model**
|
|
17 |
- **Dashboard & MCP Server** β Built a lightweight web UI backed by a management server for realβtime training, inference and model collapse. New `/metrics` and `/model_config` endpoints surface live telemetry and hyperparameters, and `/save_checkpoint` and `/download_checkpoint` enable Hugging Face weight sync. The insecure `/exec` route has been removed.
|
18 |
- **Phase 1 Optimizations** β Configurable batch sizes with aligned OneCycle scheduling, gradient accumulation, mixedβprecision, memoryβmapped dataset streaming, scheduled compression ramps, selective ``torch.compile``, and an EMAβsmoothed safety gate with burnβin to cut false positives.
|
19 |
|
20 |
-
The codebase
|
21 |
|
22 |
-
##
|
23 |
|
24 |
### Core Architecture Innovations
|
25 |
- β
**Bit-Native Processing**: Direct 0/1 computation without token intermediates
|
@@ -28,42 +28,42 @@ The codebase has undergone extensive testing, optimization, and real-world valid
|
|
28 |
- β
**Progressive Scaling**: Dynamic architecture expansion based on performance metrics
|
29 |
- β
**Diffusion Mode**: Bidirectional denoising for advanced generation capabilities
|
30 |
|
31 |
-
###
|
32 |
-
- β
**Multi-GPU FSDP**: Fully Sharded Data Parallel
|
33 |
-
- β
**Pipeline Parallelism**: Distributed training
|
34 |
- β
**Mixed Precision**: FP16/BF16 optimization with CPU autocast support
|
35 |
- β
**Gradient Checkpointing**: Memory-efficient training for large models
|
36 |
-
- β
**Dynamic Quantization**: Runtime INT8 conversion + 4-bit QAT
|
37 |
|
38 |
-
###
|
39 |
- β
**Real-Time Telemetry**: Live K/C/S metric tracking with drift detection
|
40 |
- β
**Safety Gates**: EMA-smoothed thresholds with configurable burn-in
|
41 |
- β
**Metric Synthesis**: Clustering-based activation analysis
|
42 |
- β
**Collapse Detection**: Automated model collapse prevention and recovery
|
43 |
- β
**Human-in-Loop**: Safe inference with retry mechanisms
|
44 |
|
45 |
-
###
|
46 |
- β
**Interactive Dashboard**: Real-time training control and visualization
|
47 |
-
- β
**MCP Server**: Management Control Protocol for
|
48 |
-
- β
**HuggingFace Integration**:
|
49 |
- β
**Enhanced Checkpointing**: Multi-run management with cloud backup
|
50 |
-
- β
**CLI Standardization**: Unified command-line interface across
|
51 |
|
52 |
-
###
|
53 |
- β
**Comprehensive Testing**: 11 test modules with automated CI validation
|
54 |
- β
**Type Safety**: Full type annotations with custom type system
|
55 |
- β
**Error Recovery**: Robust error handling with automatic retry logic
|
56 |
- β
**Memory Management**: Intelligent caching with automatic cleanup
|
57 |
-
- β
**Documentation**:
|
58 |
|
59 |
-
###
|
60 |
- β
**Torch.Compile**: Selective compilation for performance-critical paths
|
61 |
- β
**Chunked Attention**: Memory-efficient processing of long sequences
|
62 |
- β
**Compression Pipeline**: Lossless bit compression with performance ramps
|
63 |
- β
**Context Extension**: Sliding window inference for arbitrary lengths
|
64 |
- β
**ACT Integration**: Adaptive Computation Time for dynamic depth
|
65 |
|
66 |
-
**
|
67 |
|
68 |
## Quick Start
|
69 |
Install dependencies using the CPU wheel of PyTorch (default):
|
@@ -203,43 +203,44 @@ By default the container installs the CPU-only PyTorch wheel. Set the build
|
|
203 |
argument `TORCH_CUDA=cu118` to preinstall the GPU version. The container sets
|
204 |
`MCP_SERVER_ADDR=http://127.0.0.1:7000` and exposes the dashboard on port 5000.
|
205 |
|
206 |
-
##
|
207 |
|
208 |
-
### β
**COMPLETED -
|
209 |
- **Architecture**: Bit-native transformer with reversible layers β
|
210 |
- **Safety Systems**: K/C/S telemetry with real-time monitoring β
|
211 |
-
- **Distributed Training**: FSDP
|
212 |
-
- **
|
213 |
- **Testing & Validation**: Comprehensive test suite with CI β
|
214 |
-
- **Documentation**:
|
215 |
- **Performance**: Memory optimization, quantization, compression β
|
216 |
|
217 |
-
### π― **
|
218 |
-
- **
|
219 |
-
- **
|
220 |
-
- **
|
221 |
-
- **
|
222 |
|
223 |
-
### π **
|
224 |
-
- **Scale Validation**: Multi-billion parameter experiments
|
225 |
- **Hardware Optimization**: Custom CUDA kernels and neuromorphic support
|
226 |
-
- **Application
|
227 |
-
- **
|
228 |
|
229 |
-
**Current Status**:
|
230 |
|
231 |
## Licensing
|
232 |
|
233 |
-
|
234 |
|
235 |
-
|
|
|
236 |
|
237 |
-
|
238 |
-
* `COMMERCIAL_LICENSE.txt`: Terms for commercial use of the software.
|
239 |
-
* `DISCLAIMER.txt`: Important legal disclaimers.
|
240 |
-
* `ALIGNMENT_AND_TRANSPARENCY.txt`: Our commitment to alignment and transparency.
|
241 |
-
* `TRADEMARK_POLICY.txt`: Guidelines for using the project's trademarks.
|
242 |
-
* `CONTRIBUTOR_LICENSE_AGREEMENT.txt`: The agreement for all contributors to sign.
|
243 |
|
244 |
-
|
|
|
|
|
|
|
|
|
|
|
245 |
|
|
|
1 |
# BitTransformerLM
|
2 |
|
3 |
+
**Project Status:** Experimental Research Implementation
|
4 |
+
**Codebase Maturity:** 57 Python files, 10,699 lines of research code
|
5 |
+
**Current Stage:** Pre-release requiring validation and baseline comparisons
|
6 |
|
7 |
+
BitTransformerLM is an experimental **bit-native transformer language model** with built-in safety telemetry, exploring a novel approach to language modeling at the bit level. This research implementation includes distributed training capabilities, real-time monitoring, automated scaling, and comprehensive safety mechanisms. The architecture demonstrates potential for memory-efficient processing through reversible layers and fine-grained control via bit-level operations.
|
8 |
|
9 |
## Historical Background
|
10 |
- **Early Experiments** β Initial prototypes explored mapping text to parity-protected bits and training a minimal transformer on random data.
|
|
|
17 |
- **Dashboard & MCP Server** β Built a lightweight web UI backed by a management server for realβtime training, inference and model collapse. New `/metrics` and `/model_config` endpoints surface live telemetry and hyperparameters, and `/save_checkpoint` and `/download_checkpoint` enable Hugging Face weight sync. The insecure `/exec` route has been removed.
|
18 |
- **Phase 1 Optimizations** β Configurable batch sizes with aligned OneCycle scheduling, gradient accumulation, mixedβprecision, memoryβmapped dataset streaming, scheduled compression ramps, selective ``torch.compile``, and an EMAβsmoothed safety gate with burnβin to cut false positives.
|
19 |
|
20 |
+
The codebase includes comprehensive testing and experimental validation, representing a complete research implementation with potential for production deployment pending rigorous evaluation against standard baselines.
|
21 |
|
22 |
+
## π§ͺ Experimental Feature Matrix
|
23 |
|
24 |
### Core Architecture Innovations
|
25 |
- β
**Bit-Native Processing**: Direct 0/1 computation without token intermediates
|
|
|
28 |
- β
**Progressive Scaling**: Dynamic architecture expansion based on performance metrics
|
29 |
- β
**Diffusion Mode**: Bidirectional denoising for advanced generation capabilities
|
30 |
|
31 |
+
### Distributed Training Framework
|
32 |
+
- β
**Multi-GPU FSDP**: Fully Sharded Data Parallel implementation (tested up to 771M parameters)
|
33 |
+
- β
**Pipeline Parallelism**: Distributed training infrastructure
|
34 |
- β
**Mixed Precision**: FP16/BF16 optimization with CPU autocast support
|
35 |
- β
**Gradient Checkpointing**: Memory-efficient training for large models
|
36 |
+
- β
**Dynamic Quantization**: Runtime INT8 conversion + experimental 4-bit QAT
|
37 |
|
38 |
+
### Experimental Safety & Monitoring
|
39 |
- β
**Real-Time Telemetry**: Live K/C/S metric tracking with drift detection
|
40 |
- β
**Safety Gates**: EMA-smoothed thresholds with configurable burn-in
|
41 |
- β
**Metric Synthesis**: Clustering-based activation analysis
|
42 |
- β
**Collapse Detection**: Automated model collapse prevention and recovery
|
43 |
- β
**Human-in-Loop**: Safe inference with retry mechanisms
|
44 |
|
45 |
+
### Research Tools
|
46 |
- β
**Interactive Dashboard**: Real-time training control and visualization
|
47 |
+
- β
**MCP Server**: Management Control Protocol for research workflows
|
48 |
+
- β
**HuggingFace Integration**: Model weight sharing and checkpoint management
|
49 |
- β
**Enhanced Checkpointing**: Multi-run management with cloud backup
|
50 |
+
- β
**CLI Standardization**: Unified command-line interface across tools
|
51 |
|
52 |
+
### Development Infrastructure
|
53 |
- β
**Comprehensive Testing**: 11 test modules with automated CI validation
|
54 |
- β
**Type Safety**: Full type annotations with custom type system
|
55 |
- β
**Error Recovery**: Robust error handling with automatic retry logic
|
56 |
- β
**Memory Management**: Intelligent caching with automatic cleanup
|
57 |
+
- β
**Documentation**: Research-grade docstrings and API reference
|
58 |
|
59 |
+
### Performance Optimizations
|
60 |
- β
**Torch.Compile**: Selective compilation for performance-critical paths
|
61 |
- β
**Chunked Attention**: Memory-efficient processing of long sequences
|
62 |
- β
**Compression Pipeline**: Lossless bit compression with performance ramps
|
63 |
- β
**Context Extension**: Sliding window inference for arbitrary lengths
|
64 |
- β
**ACT Integration**: Adaptive Computation Time for dynamic depth
|
65 |
|
66 |
+
**Research Status**: BitTransformerLM provides a complete experimental framework for bit-native language modeling research, requiring baseline comparisons and rigorous evaluation for production use.
|
67 |
|
68 |
## Quick Start
|
69 |
Install dependencies using the CPU wheel of PyTorch (default):
|
|
|
203 |
argument `TORCH_CUDA=cu118` to preinstall the GPU version. The container sets
|
204 |
`MCP_SERVER_ADDR=http://127.0.0.1:7000` and exposes the dashboard on port 5000.
|
205 |
|
206 |
+
## Research Development Roadmap
|
207 |
|
208 |
+
### β
**COMPLETED - Experimental Implementation**
|
209 |
- **Architecture**: Bit-native transformer with reversible layers β
|
210 |
- **Safety Systems**: K/C/S telemetry with real-time monitoring β
|
211 |
+
- **Distributed Training**: FSDP implementation (tested up to 771M parameters) β
|
212 |
+
- **Research Tools**: Dashboard, MCP server, HF integration β
|
213 |
- **Testing & Validation**: Comprehensive test suite with CI β
|
214 |
+
- **Documentation**: Research-grade API documentation β
|
215 |
- **Performance**: Memory optimization, quantization, compression β
|
216 |
|
217 |
+
### π― **VALIDATION TARGETS**
|
218 |
+
- **Baseline Comparisons**: Rigorous evaluation against standard transformers
|
219 |
+
- **Statistical Analysis**: Multiple runs with proper significance testing
|
220 |
+
- **Long-Duration Training**: Training convergence studies on real datasets
|
221 |
+
- **Scaling Studies**: Systematic evaluation of model sizes and architectures
|
222 |
|
223 |
+
### π **FUTURE RESEARCH DIRECTIONS**
|
224 |
+
- **Scale Validation**: Multi-billion parameter experiments with proper baselines
|
225 |
- **Hardware Optimization**: Custom CUDA kernels and neuromorphic support
|
226 |
+
- **Application Studies**: Real-world deployment case studies with evaluation
|
227 |
+
- **Academic Validation**: Peer review and publication processes
|
228 |
|
229 |
+
**Current Status**: Complete experimental framework requiring rigorous validation against established baselines before production deployment.
|
230 |
|
231 |
## Licensing
|
232 |
|
233 |
+
BitTransformerLM is available under a dual licensing scheme:
|
234 |
|
235 |
+
* **Open Source License:** AGPLv3 (see `LICENSE/LICENSE.txt`)
|
236 |
+
* **Commercial License:** Available by contacting **[email protected]**
|
237 |
|
238 |
+
Additional licensing documents in the `LICENSE/` directory:
|
|
|
|
|
|
|
|
|
|
|
239 |
|
240 |
+
* `COMMERCIAL_LICENSE.txt`: Information about commercial licensing options
|
241 |
+
* `DISCLAIMER.txt`: Important legal disclaimers and limitations
|
242 |
+
* `TRADEMARK_POLICY.txt`: Guidelines for using project trademarks
|
243 |
+
* `CONTRIBUTOR_LICENSE_AGREEMENT.txt`: Terms for contributors
|
244 |
+
|
245 |
+
For commercial use cases that require different licensing terms than AGPLv3, please contact **[email protected]** to discuss commercial licensing options.
|
246 |
|