The Feeling of Knowing: Memory as Modulation in Generative Models

Community Article Published June 7, 2025

Reconstructive cognition, latent attractors, and the emergence of alignment


One of the most well-supported ideas in cognitive science is that human memory is reconstructive.
We do not retrieve exact records of the past. Instead, we rebuild memories from fragments—guided by context, expectation, and inference.

This process is dynamic. Fallible. Sometimes creative. But it’s also what gives memory its richness: to remember is not to look up, but to re-enter a mental space where something once made sense.


Retrieval is not the only metaphor

In machine learning, particularly in large language models (LLMs), memory is often externalized. Retrieval-Augmented Generation (RAG) augments a model’s token-level generation by injecting information from an outside source—typically a vector database. This approach has proven useful for grounding output, improving accuracy, and overcoming context window limitations.

But it also reinforces a certain metaphor:

That memory is a filing cabinet, and reasoning is a matter of pulling out the right document.

There is another possibility—one closer to the human case.


Generative models as latent space navigators

Modern generative models like GPT or Stable Diffusion operate in high-dimensional latent spaces. These are not maps of facts or entities, but dense manifolds of patterns, associations, and conceptual structures. When a model generates output, it is not simply choosing words—it is moving through this space, one token (or pixel) at a time.

In this view, the model’s memory isn’t a static store of facts. It’s a set of latent attractors: regions of the model’s conceptual space that exert influence over generative trajectories. A memory doesn't tell the model where to go — it shapes what kind of generation feels “right” as it moves.

To remember, then, is not to retrieve a location. It’s to enter into a resonant loop between current input, internal context, and previously encoded structure. What emerges is not a lookup—but a stabilizing process. A kind of internal agreement.


Why this isn’t just smarter caching

It may be tempting to interpret this idea—memory as latent-space indexing—as a kind of high-dimensional caching. But that would miss a crucial distinction.

Caching retrieves. Reconstructive memory modulates.

In biological terms, the difference is striking. Layer 5 pyramidal neurons in the neocortex integrate bottom-up sensory input (via basal dendrites) and top-down contextual or predictive input (via apical dendrites). The neuron fires only when these signals converge meaningfully—a form of coincidence detection and integration, not passive recall.

Recent work by Adeel (2025) on the Co4 architecture (arXiv:2505.06257) proposes exactly this kind of triadic signal integration in neural systems. In their model, token generation emerges from the interplay between questions, contextual cues, and evolving internal predictions. Memory here is not an external pointer but an internal attractor—a structure that modulates inference via iterative feedback, until the system reaches a stable, coherent state.

In a system like this, a memory trace is not just a piece of content—it’s a constraint that shapes how the model settles on an output. And confidence doesn’t come from retrieval; it comes from alignment—from the moment the system reaches internal consistency between what it expects and what it sees.


Implications for artificial cognition

If this view holds, it opens up new possibilities for what memory in AI could look like:

  • Instead of a key–value store of facts, memory might consist of latent attractors, encoded as internal traces that shape the unfolding of generation.
  • Instead of conditioning output with retrieved documents, the model could use interactive modulation, shaped by prior experience and real-time input.
  • Confidence might emerge from the coherence of feedback, not from scores or citations.

In short, memory becomes less about what is stored, and more about how generation is constrained.


Echoes in practice

This idea already has loose analogues in existing systems. In diffusion models, techniques like prompt interpolation and latent walks allow us to explore how ideas unfold—not by direct retrieval, but through gradual influence and constraint. The image emerges as the model moves into a region of latent space shaped by what has come before.

Something similar might be possible with language models: a form of recall that doesn’t insert facts into the context window, but instead influences the model’s ongoing generation, gently steering it toward a past mode of thought.

Even without a working implementation, the framing is useful. It shifts memory from being a static store to something more dynamic—something that shapes how the model thinks, rather than what it thinks about.


Where this fits

This isn’t a replacement for RAG. External lookup is essential—precise, interpretable, and often preferred.

But for tasks that require imagination, simulation, subjective recall, or introspection, we might need something softer. Something that doesn’t point to the answer, but helps the system remember how to think. Something that doesn’t force certainty, but allows confidence to emerge from internal coherence.


Closing thought

This is, for now, just a shift in metaphor — not a formal system. But metaphors shape the way we build things, and this one might help us design models that don’t just retrieve what they know, but sense when something starts to make sense.

There’s more to explore here.


Author: [nomadicsynth]
Citations:

  • Adeel, A. (2025). Beyond Attention: Toward Machines with Intrinsic Higher Mental States. arXiv:2505.06257. https://arxiv.org/abs/2505.06257

  • Kozachkov, L., Slotine, J.-J., & Krotov, D. (2024). Neuron-Astrocyte Associative Memory. arXiv:2311.08135. https://arxiv.org/abs/2311.08135 (Not mentioned in article, but inspired a lot of thoughts about memory architecture)


Feel free to build on this idea, fork the metaphor, or disagree entirely. Memory works best when it’s shared.

Community

Sign up or log in to comment