Model Card for TAMELM-AFMoER (1B) — Blackhole Rope Expansion

Time Aware Model of Emergence / Adaptive Fuzzy Model of Expert Routers

With Blackhole Rope Dynamics


Research Vision

This model builds on the original 421M TAMELM-AFMoER by introducing the Blackhole Rope (BHR) mechanism—a dynamic field-based routing system designed to stabilize, amplify, and concentrate information flow across multiple temporal scales.

While the original AFMoER established efficiency in routing-based intelligence, the BHR variant explores how structured gravitational-like attractors can further enhance reasoning depth without exponential increases in computation or parameters.


What is the Blackhole Rope?

The Blackhole Rope is a symplectic, multiscale vortex mechanism inside AFMoER that:

  • Anchors routing decisions: Information tokens fall into a “gravitational well” that pulls semantically coherent content into alignment.
  • Stabilizes multiscale clocks: Prevents runaway dynamics between fast, mid, and slow timescales by acting as a “tether” between them.
  • Amplifies discrepancy gradients: Uses controlled energy amplification (θ, α, β parameters) to magnify meaningful discrepancies, making weak reasoning signals more detectable.
  • Preserves boundedness: Even under strong amplification, the rope ties dynamics back to stable attractors, avoiding mode collapse or instability.

Metaphorically:
If AFMoER routes are like neuronal pathways, the Blackhole Rope is the myelinated tether that keeps them from dispersing into noise—while also letting them “fall deeper” into coherent reasoning attractors.


Model Details

  • Parameters: ~1B (2.5× scale-up from base 421M)
  • Architecture: TAMELM with Adaptive Vortex + Blackhole Rope (AFMoER-BHR)
  • Layers: 26
  • Embed dim: 512
  • Phase dim: 64
  • Experts: 16 (sparse routing, expert dim 128)
  • Scales: 3 (fast, mid, slow; dt = 0.1 / 0.02 / 0.0005)
  • Energy amplification: 1e4
  • Routing entropy regularization: λ = 0.01
  • Discrepancy & quantum terms: λ_discrepancy = 0.3, λ_quantum = 0.001

Training Regime:

  • Datasets:
    • O1-OPEN/OpenO1-SFT (~500k tokens)
    • WeMake/Intelligent-Content-Understanding (~500k tokens)
  • Batch sizes: 8, 16, 32
  • Sequence lengths: 512 and 1024
  • Optimizer: AdamW (lr = 5e-4)
  • Device: CPU-only (FP32)
  • Total tokens trained: ~1M

Key Innovations vs. Base TAMELM

  1. Blackhole Rope Stabilization

    • Adds controlled attractors to prevent chaotic drift across temporal scales.
    • Increases reasoning persistence by keeping token trajectories bound.
  2. Adaptive Vortex Dynamics

    • Multi-phase oscillators (fast/mid/slow) simulate different “thinking speeds.”
    • Rope stabilizes resonance between them.
  3. Energy Amplification Without Instability

    • By tying amplification to rope-bound attractors, the model can magnify weak discrepancies without divergence.

Expert Dynamics

TAMELM-AFMoER (1B) employs 16 experts under sparse routing. Typically, a full forward pass engages 4 experts per step, giving the model partial but diverse exposure on each pass.

  • Early specialization: The first 5-8 experts learn quickly, handling common reasoning and language tasks with efficiency.
  • Adaptive load balancing: As training progresses and the model begins to plateau, the remaining experts pick up the slack, activating more frequently to refine complex or underrepresented patterns.
  • Emergent coordination: This staged progression allows the system to avoid overfitting early while ensuring the broader expert pool contributes meaningfully to long-term generalization.

The result is a model where expert specialization unfolds in phases, guided by both the Blackhole Rope stabilization and routing entropy regularization.


Training Efficiency Achievements

  • Tokens: ~1M (vs. billions typical for 1B models)
  • Training Time: Surprisingly, despite being larger than the 421M base, this 1B model trained significantly faster.
    • At sequence length 1024 with batch size 16, per-step times dropped to 3–7 seconds per step FP32, compared to ~22 seconds for the smaller model.
  • Loss Profile: Currently at 3.8, meaning the model is still in pretraining phase, but stability has already been achieved.
  • Sample Efficiency: Maintains ~1000x reduction in required tokens for reasoning emergence.

Next Steps in Efficient AI

This model sets the stage for:

  1. Exploring rope tension tuning (varying α, β, θ) to balance exploration vs. stability.
  2. Combining BHR with discrepancy calculus for hybrid emergence frameworks.
  3. Investigating sub-100M parameter BHR variants for mobile/edge deployment.
  4. Experimenting with quantum discrepancy extensions in rope-stabilized spaces.

Environmental & Accessibility Impact

  • CPU-only training: Accessible to researchers without GPU clusters.
  • Low energy footprint: Maintains sustainable training practices even at 1B scale.
  • Democratization: Expands the AFMoER vision to more powerful variants while preserving accessibility.

Citation

If you use this model, please cite:
Colca Jr., R. S. (2025). TAMELM-AFMoER with Blackhole Rope: Efficient Cognitive Emergence via Symplectic Routing.

The mathematics behind the model can be found: https://www.researchgate.net/publication/395539824_Negative-Space_Mathematics_A_New_Approach_for_Geometric_Computation_in_the_All-Negative_Orthant

I am the creator of the math and the models.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train reaperdoesntknow/TameForCasualLM

Collection including reaperdoesntknow/TameForCasualLM