ITFormer-0.5B: Bridging Time Series Signals and Natural Language for Multi-Modal QA
Model Overview
ITFormer-0.5B is a state-of-the-art multi-modal framework that bridges time-series data and natural language for dynamic question answering (QA). Built using the Instruct Time Transformer (ITFormer) architecture, this model is specifically designed to handle complex multi-task, temporal-textual QA tasks. It is trained on the EngineMT-QA dataset, which focuses on real-world aircraft engine operational and maintenance data.
ITFormer-0.5B excels in integrating time-series data (such as sensor readings from engines) with natural language queries, enabling real-time, intelligent decision-making across several key areas: understanding, perception, reasoning, and decision-making.
Key Features
- Multi-modal Fusion: Combines time-series sensor data with textual input for cross-modal reasoning.
- Efficient and Scalable: Achieves high performance with fewer than 1% additional trainable parameters, making it both scalable and efficient.
- State-of-the-art Performance: ITFormer-0.5B outperforms existing models in temporal-textual QA tasks, showing improvements in accuracy, BLEU, and F1 scores across all tasks.
- Domain-Specific Application: Trained on the EngineMT-QA dataset, which consists of questions related to aircraft engine performance, faults, and maintenance.
Model Components
ITFormer-0.5B integrates several key components to enhance multi-modal reasoning:
- Time Token Position Encoding (TPE): Encodes temporal, channel, and segment-level position information for better time-series representation.
- Learnable Instruct Tokens (LIT): Helps align the temporal features with task-specific queries, enabling efficient multi-modal interaction.
- Instruct Time Attention (ITA): A mechanism that dynamically aligns and fuses temporal features with textual queries.
- Time Token as Language (TAL): Represents temporal features as language-compatible tokens, allowing for smooth integration with LLMs.
Model Architecture
ITFormer-0.5B integrates a time-series encoder with a frozen large language model (LLM). The encoder extracts semantic features from the time-series data, and the LLM processes the corresponding textual query. The fused representation is then passed through the LLM’s decoder to generate the final answer.
Tasks
ITFormer-0.5B is capable of answering questions across four primary tasks:
- Understanding: Interprets sensor data to understand engine status and behavior.
- Perception: Detects and diagnoses faults in the engine components based on time-series data.
- Reasoning: Makes predictions about future engine health and identifies degradation trends.
- Decision-Making: Suggests actionable maintenance strategies based on predicted trends and failure probabilities.
Dataset: EngineMT-QA
ITFormer-0.5B is trained on the EngineMT-QA dataset, a large-scale, multi-task dataset specifically created for time-series question answering. The dataset contains over 110k question-answer pairs based on real-world engine operational and maintenance scenarios.
Example Tasks from EngineMT-QA:
- Understanding: What does the increase in temperature at the LPT outlet indicate in the provided engine signal?
- Perception: What is the health status of the High-Pressure Turbine (HPT) in the given engine signal?
- Reasoning: Given the engine signal across multiple cycles, what is the predicted probability of failure?
- Decision-Making: Based on the engine signal data, what immediate actions should be taken to address observed issues?
Model Performance
ITFormer-0.5B Achieves State-of-the-Art Results
ITFormer-0.5B outperforms existing models across all QA tasks, achieving:
- Understanding: Rouge-L of 81.22, BLEU of 69.23
- Decision-Making: Rouge-L of 75.42, BLEU of 54.50
- Perception: F1 Score of 79.26
- Reasoning: Accuracy of 73.81
These results highlight the model's capability to handle complex temporal-textual reasoning and its robustness across different task types.
Usage
Installation
To use the ITFormer-0.5B model, you need to install the transformers
library and the Hugging Face model hub interface:
pip install transformers
pip install huggingface_hub
Loading the Model
You can load the model using the transformers
library:
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model and tokenizer
model_name = "your_username/ITFormer-0.5B"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Example input query
query = "What is the status of the High-Pressure Turbine (HPT)?"
# Tokenize and generate output
inputs = tokenizer(query, return_tensors="pt")
outputs = model.generate(inputs["input_ids"])
# Decode the output
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded_output)
Example Question-Answering
To perform multi-modal question answering with the ITFormer-0.5B model, you can provide both the time-series data and textual query. For instance:
# Example question with engine signal data and query
time_series_data = "path_to_time_series_data" # Example: "engine_signal_data.csv"
query = "What is the condition of the engine?"
# Process the data and make predictions
# Assuming you have preprocessed the time-series data into the correct format
inputs = tokenizer(query, return_tensors="pt")
outputs = model.generate(inputs["input_ids"])
# Decode and output the result
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded_output)
Model Deployment
ITFormer-0.5B is optimized for efficient deployment, requiring minimal computational overhead due to its small number of trainable parameters. You can fine-tune or directly use the model for specific temporal-textual tasks, such as fault detection or predictive maintenance.
Citation
If you use ITFormer-0.5B in your research, please cite the following:
@article{ITFormer,
title={ITFormer: Bridging Time Series and Natural Language for Multi-Modal QA with Large-Scale Multitask Dataset},
author={Yilin Wang, Peixuan Lei, Jie Song, Haoyu Zhe,Tao Chen, Yuxuan Zhang, Lei Jia, Yuanxiang Li, Zhongyu Wei},
journal={ICML 2025},
year={2025},
url={https://huggingface.co/papers/2506.20093}
}
License
This model is released under the MIT License.
Acknowledgements
We would like to acknowledge the development of the EngineMT-QA dataset and the contributions of the researchers involved in time-series and multi-modal AI. Special thanks to Hugging Face for providing a platform for model sharing and research collaboration.
- Downloads last month
- 21