thuml
/

sundial-base-128m

Time Series Forecasting

foundation models

pretrained models

generative models

time series foundation models

Model card Files Files and versions

Yong99 commited on May 14

Commit

8f6e3aa

·

verified ·

1 Parent(s): 665f60b

Update README.md

Files changed (1) hide show

README.md +5 -7

README.md CHANGED Viewed

@@ -21,16 +21,13 @@ tags:
 # Sundial
-Sundial is a familiy of **generative** time series foundation models.
-The model can make zero-shot predictions for both **point** and **probabilistic** forecasting.
-The base version is pre-trained on **1 trillion** time points with **128M** parameters,
-For more information, please see this [paper](https://arxiv.org/pdf/2502.00816) and [GitHub](https://github.com/thuml/Sundial).
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/64fbe24a2d20ced4e91de38a/dqZeI_loCzIfRrDTiUp1r.png)
 # Evaluation
@@ -83,6 +80,7 @@ A notebook example is also provided [here](https://github.com/thuml/Sundial/blob
 * Patch Length: 16
 * Parameter Count: 128M
 * Number of Layers: 12
 ## Acknowledgments

 # Sundial
+Sundial is a familiy of **generative** time series foundation models. The model can make zero-shot predictions for both **point** and **probabilistic** forecasting.
+The base version is pre-trained on **1 trillion** time points with **128M** parameters. For more information, please see this [paper](https://arxiv.org/pdf/2502.00816) and [GitHub](https://github.com/thuml/Sundial).
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/64fbe24a2d20ced4e91de38a/B5w-TNPnTBpChexIhsVOp.png)
+Figure 1. Overall architecture of Sundial. The input time series is divided into patch tokens, which are embedded from original continuous values. The patch embeddings are fed into a decoder-only Transformer, a stable and speedup version that learns token representations via causal self-attention. The model is optimized using our TimeFlow Loss, a parameterized loss function that models per-token probability distribution conditioned on the learned representations, and generates multiple plausible predictions under the flow-matching framework.
 # Evaluation
 * Patch Length: 16
 * Parameter Count: 128M
 * Number of Layers: 12
+* Speedup with KV Cache & FlashAttention
 ## Acknowledgments