TemporalSelfAttention - A Time-Biased Attention Module
Give Transformers a sense of time - not by scaling, but by structure.
Why?
Standard attention treats all tokens equally in time.
This works for syntax, but breaks for:
- Temporal event ordering
- Causal reasoning
- Timeline consistency
- Long-range narrative coherence
π‘ Insight: These models simulate time via token position. We inject it structurally with a tiny inductive bias.
Core Equation
The time-aware attention score is computed as:
Notation
Symbol | Description |
---|---|
Attention score between query at position and key at position | |
Query vector for position | |
Key vector for position | |
Dimension of key vectors | |
Learnable time bias strength | |
Time difference function | |
Relative time difference |
How To Use
from temporal_attention import TemporalSelfAttention
model = TemporalSelfAttention(
embed_dim=64,
num_heads=1,
bias_type="linear", # or 'gaussian'
gamma=1.0,
causal=False
)
# x: (B, T, D), timestamps: (B, T)
output, weights = model(x, timestamps)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support