Transformers documentation
TimesFM
TimesFM
Overview
TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model proposed in A decoder-only foundation model for time-series forecasting by Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. It is a decoder only model that uses non-overlapping patches of time-series data as input and outputs some output patch length prediction in an autoregressive fashion.
The abstract from the paper is the following:
Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a patched-decoder style attention model on a large time-series corpus, and can work well across different forecasting history lengths, prediction lengths and temporal granularities.
This model was contributed by kashif. The original code can be found here.
To use the model:
import torch
from transformers import TimesFmModelForPrediction
model = TimesFmModelForPrediction.from_pretrained(
"google/timesfm-2.0-500m-pytorch",
torch_dtype=torch.bfloat16,
attn_implementation="sdpa",
device_map="cuda" if torch.cuda.is_available() else None
)
# Create dummy inputs
forecast_input = [
np.sin(np.linspace(0, 20, 100)),
np.sin(np.linspace(0, 20, 200)),
np.sin(np.linspace(0, 20, 400)),
]
frequency_input = [0, 1, 2]
# Convert inputs to sequence of tensors
forecast_input_tensor = [
torch.tensor(ts, dtype=torch.bfloat16).to("cuda" if torch.cuda.is_available() else "cpu")
for ts in forecast_input
]
frequency_input_tensor = torch.tensor(frequency_input, dtype=torch.long).to(
"cuda" if torch.cuda.is_available() else "cpu"
)
# Get predictions from the pre-trained model
with torch.no_grad():
outputs = model(past_values=forecast_input_tensor, freq=frequency_input_tensor, return_dict=True)
point_forecast_conv = outputs.mean_predictions.float().cpu().numpy()
quantile_forecast_conv = outputs.full_predictions.float().cpu().numpy()
TimesFmConfig
class transformers.TimesFmConfig
< source >( patch_length: int = 32 context_length: int = 512 horizon_length: int = 128 freq_size: int = 3 num_hidden_layers: int = 50 hidden_size: int = 1280 intermediate_size: int = 1280 head_dim: int = 80 num_attention_heads: int = 16 tolerance: float = 1e-06 rms_norm_eps: float = 1e-06 quantiles: typing.List[float] = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] pad_val: float = 1123581321.0 attention_dropout: float = 0.0 use_positional_embedding: bool = False initializer_range: float = 0.02 min_timescale: int = 1 max_timescale: int = 10000 **kwargs )
Parameters
- patch_length (
int
, optional, defaults to 32) — The length of one patch in the input sequence. - context_length (
int
, optional, defaults to 512) — The length of the input context. - horizon_length (
int
, optional, defaults to 128) — The length of the prediction horizon. - freq_size (
int
, optional, defaults to 3) — The number of frequency embeddings. - num_hidden_layers (
int
, optional, defaults to 50) — Number of Transformer layers. - hidden_size (
int
, optional, defaults to 1280) — Size of the hidden layers in the feed-forward networks. - intermediate_size (
int
, optional, defaults to 1280) — Dimension of the MLP representations. - head_dim (
int
, optional, defaults to 80) — Size of the key, query, value projections per attention head. Theinner_dim
of the projection layer will be defined asnum_attention_heads * head_dim
. - num_attention_heads (
int
, optional, defaults to 16) — Number of attention heads for each attention layer in the Transformer encoder. - tolerance (
float
, optional, defaults to 1e-06) — The tolerance for the quantile loss. - rms_norm_eps (
float
, optional, defaults to 1e-06) — The epsilon used by the RMS normalization layers. - quantiles (
List[float]
, optional, defaults to[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
) — The quantiles to predict. - pad_val (
float
, optional, defaults to 1123581321.0) — The value used to pad the predictions. - attention_dropout (
float
, optional, defaults to 0.0) — The dropout probability for the attention scores. - use_positional_embedding (
bool
, optional, defaults toFalse
) — Whether to add positional embeddings. - initializer_range (
float
, optional, defaults to 0.02) — The standard deviation of the truncated_normal_initializer for initializing all weight matrices. - min_timescale (
int
, optional, defaults to 1) — The start of the geometric positional index. Determines the periodicity of the added signal. - max_timescale (
int
, optional, defaults to 10000) — The end of the geometric positional index. Determines the frequency of the added signal.
This is the configuration class to store the configuration of a TimesFmModelForPrediction or a TFTimesFmModel
. It is used to
instantiate a TimesFM model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the TimesFM
google/timesfm-2.0-500m-pytorch architecture.
Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.
TimesFmModel
class transformers.TimesFmModel
< source >( config: TimesFmConfig )
Parameters
- config (TimesFmConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.
The bare TimesFM Model outputting raw hidden-states without any specific head on top. This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)
This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
Patched time-series decoder without any specific output layer.
forward
< source >( past_values: Tensor past_values_padding: LongTensor freq: Tensor output_attentions: bool = False output_hidden_states: bool = False )
Parameters
- past_values — list of time series forecast contexts. Each context time series can be a torch Tensor of potentially different context lengths.
- freq — frequency of each context time series in the inputs. 0 for high frequency (default), 1 for medium, and 2 for low.
- output_attentions (
bool
, optional) — Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail. tensors for more detail. - output_hidden_states (
bool
, optional) — Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail. - past_values_padding (
torch.LongTensor
of shape(batch_size, sequence_length)
) — The padding indicator of the time series.
The TimesFmModel forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
TimesFmModelForPrediction
TimesFM model for quantile and mean prediction.
forward
< source >( past_values: typing.Sequence[torch.Tensor] freq: typing.Optional[typing.Sequence[typing.Union[torch.Tensor, int]]] = None window_size: typing.Optional[int] = None future_values: typing.Optional[torch.Tensor] = None forecast_context_len: typing.Optional[int] = None return_forecast_on_context: bool = False truncate_negative: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None ) → transformers.models.timesfm.modeling_timesfm.TimesFmOutputForPrediction
or tuple(torch.FloatTensor)
Parameters
- past_values — list of time series forecast contexts. Each context time series can be a torch Tensor of potentially different context lengths.
- freq — frequency of each context time series in the inputs. 0 for high frequency (default), 1 for medium, and 2 for low.
- output_attentions (
bool
, optional) — Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail. tensors for more detail. - output_hidden_states (
bool
, optional) — Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail. - window_size (
int
, optional) — Window size of trend + residual decomposition. If None then we do not do decomposition. - future_values (
torch.Tensor
, optional) — Optional future time series values to be used for loss computation. - forecast_context_len (
int
, optional) — Optional max context length. - return_forecast_on_context (
bool
, optional) — True to return the forecast on the context when available, i.e. after the first input patch. - truncate_negative (
bool
, optional) — Truncate to only non-negative values if any of the contexts have non-negative values, otherwise do nothing. - output_attentions (
bool
, optional) — Whether to output the attentions. - output_hidden_states (
bool
, optional) — Whether to output the hidden states.
Returns
transformers.models.timesfm.modeling_timesfm.TimesFmOutputForPrediction
or tuple(torch.FloatTensor)
A transformers.models.timesfm.modeling_timesfm.TimesFmOutputForPrediction
or a tuple of
torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various
elements depending on the configuration (TimesFmConfig) and inputs.
- mean_predictions (
torch.Tensor
of shape(batch_size, sequence_length)
) — The mean predictions of the time series. - full_predictions (
torch.Tensor
of shape(batch_size, sequence_length)
) — The full predictions of the time series including the mean and the quantiles. - loss (
torch.Tensor
of shape(1,)
, optional, returned whenfuture_values
is provided) — The loss of the TimesFM model.
The TimesFmModelForPrediction forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example:
>>> from transformers import TimesFmModelForPrediction
>>> model = TimesFmModelForPrediction.from_pretrained("google/timesfm-2.0-500m-pytorch")
>>> forecast_input = [torch.linspace(0, 20, 100).sin(), torch.linspace(0, 20, 200).sin(), torch.linspace(0, 20, 400).sin()]
>>> frequency_input = torch.tensor([0, 1, 2], dtype=torch.long)
>>> # Generate
>>> with torch.no_grad():
>>> outputs = model(past_values=forecast_input, freq=frequency_input, return_dict=True)
>>> point_forecast_conv = outputs.mean_predictions
>>> quantile_forecast_conv = outputs.full_predictions