license: apache-2.0
metrics:
- mse
- mae
- mase
- wql
- crps
pipeline_tag: time-series-forecasting
datasets:
- thuml/UTSD
- Salesforce/lotsa_data
- autogluon/chronos_datasets
tags:
- time-series
- forecasting
- foundation-models
- pretrained-models
- time-series-foundation-models
Sundial
Sundial is a familiy of generative time series foundation models. The model can make zero-shot predictions for both point and probabilistic forecasting.
The base version is pre-trained on 1 trillion time points with 128M parameters. For more information, please see this paper and GitHub.
Figure 1. Overall architecture of Sundial. The input time series is divided into patch tokens, which are embedded from original continuous values. The patch embeddings are fed into a decoder-only Transformer, a stable and speedup version that learns token representations via causal self-attention. The model is optimized using our TimeFlow Loss, a parameterized loss function that models per-token probability distribution conditioned on the learned representations, and generates multiple plausible predictions under the flow-matching framework.
Quickstart
pip install transformers==4.40.1 # Use this version and Python 3.10 for stable compatibility
import torch
from transformers import AutoModelForCausalLM
# load pretrain model
model = AutoModelForCausalLM.from_pretrained('thuml/sundial-base-128m', trust_remote_code=True)
# prepare input
batch_size, lookback_length = 1, 2880
seqs = torch.randn(batch_size, lookback_length)
# generate forecast
prediction_length = 96
num_samples = 20
output = model.generate(seqs, max_new_tokens=prediction_length, num_samples=num_samples)
print(output.shape) # generate 20 probable predictions
A notebook example is also provided here. Try it out!
Evaluation
We evaluate performance on the following benchmarks:
We evaluate inference speed with the following time series foundation models:
We are actively working around it and are glad to hear from suggestions and noteworthy cases :)
Specification
- Architecture: Causal Transformer (Decoder-only)
- Pre-training Scale: 1032B time points
- Context Length: up to 2880
- One-Step Forecast Length: 720
- Patch Length: 16
- Parameter Count: 128M
- Number of Layers: 12
- Speedup with KV Cache & FlashAttention
Acknowledgments
This work was supported by the National Natural Science Foundation of China (62022050 and U2342217), the BNRist Innovation Fund (BNR2024RC01010), and the National Engineering Research Center for Big Data Software.
The model is mostly built from the Internet public time series dataset, which comes from different research teams and providers. We sincerely thank all individuals and organizations who have contributed the data. Without their generous sharing, this model would not have existed.
Citation
@article{liu2025sundial,
title={Sundial: A Family of Highly Capable Time Series Foundation Models},
author={Liu, Yong and Qin, Guo and Shi, Zhiyuan and Chen, Zhi and Yang, Caiyin and Huang, Xiangdong and Wang, Jianmin and Long, Mingsheng},
journal={arXiv preprint arXiv:2502.00816},
year={2025}
}
License
This model is licensed under the Apache-2.0 License.