Yong99 commited on
Commit
8f6e3aa
·
verified ·
1 Parent(s): 665f60b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -7
README.md CHANGED
@@ -21,16 +21,13 @@ tags:
21
 
22
  # Sundial
23
 
24
- Sundial is a familiy of **generative** time series foundation models.
25
 
26
- The model can make zero-shot predictions for both **point** and **probabilistic** forecasting.
27
 
28
- The base version is pre-trained on **1 trillion** time points with **128M** parameters,
29
-
30
- For more information, please see this [paper](https://arxiv.org/pdf/2502.00816) and [GitHub](https://github.com/thuml/Sundial).
31
-
32
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64fbe24a2d20ced4e91de38a/dqZeI_loCzIfRrDTiUp1r.png)
33
 
 
34
 
35
  # Evaluation
36
 
@@ -83,6 +80,7 @@ A notebook example is also provided [here](https://github.com/thuml/Sundial/blob
83
  * Patch Length: 16
84
  * Parameter Count: 128M
85
  * Number of Layers: 12
 
86
 
87
  ## Acknowledgments
88
 
 
21
 
22
  # Sundial
23
 
24
+ Sundial is a familiy of **generative** time series foundation models. The model can make zero-shot predictions for both **point** and **probabilistic** forecasting.
25
 
26
+ The base version is pre-trained on **1 trillion** time points with **128M** parameters. For more information, please see this [paper](https://arxiv.org/pdf/2502.00816) and [GitHub](https://github.com/thuml/Sundial).
27
 
28
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64fbe24a2d20ced4e91de38a/B5w-TNPnTBpChexIhsVOp.png)
 
 
 
 
29
 
30
+ Figure 1. Overall architecture of Sundial. The input time series is divided into patch tokens, which are embedded from original continuous values. The patch embeddings are fed into a decoder-only Transformer, a stable and speedup version that learns token representations via causal self-attention. The model is optimized using our TimeFlow Loss, a parameterized loss function that models per-token probability distribution conditioned on the learned representations, and generates multiple plausible predictions under the flow-matching framework.
31
 
32
  # Evaluation
33
 
 
80
  * Patch Length: 16
81
  * Parameter Count: 128M
82
  * Number of Layers: 12
83
+ * Speedup with KV Cache & FlashAttention
84
 
85
  ## Acknowledgments
86