ChronoGPT-Instruct

ChronoGPT-Instruct is a family of chronologically consistent, instruction-following large language models (LLMs) that eliminate lookahead bias by training exclusively on time-stamped data available before a fixed knowledge-cutoff date Ο„.
Each ChronoGPT-Instruct-Ο„ extends the ChronoGPT-Ο„ base models through supervised instruction fine-tuning while strictly maintaining temporal separation from all post-Ο„ information.

These models provide the research community with a transparent, replicable benchmark for testing lookahead-bias-free prediction in economics, finance, and other time-sensitive domains.


πŸ” Model Overview

Property Description
Architecture Transformer-decoder
Parameters β‰ˆ 1.55 B
Layers 52 layers
Embedding dim 1,536
Context length 1,792 tokens
Tokenizer GPT2Tokenizer (Hugging Face)
Training stage Pretraining + Instruction Fine-tuning (SFT)
License MIT
Languages English

🧠 Training & Data

Chronological Consistency

Each model’s corpus satisfies chronologically consistency in both pretraining and instruction-finetuning phases. Texts dated after the model year are excluded, ensuring zero overlap with evaluation data. A GPT-4.1 classifier screens every instruction-response pair.

Instruction-Finetuning Corpus

Stage Source # Examples Avg Length
1 LLMs-from-Scratch 1 097 102
2 GPT-3 Self-Instruct 67 136 183
3 AllenAI Tulu-3 Mixture 356 886 2 513

Only English, non-code entries with pre-2000 content (classifier label = 0 & confidence = 10) are retained.

We release the SFT dataset at https://huggingface.co/datasets/manelalab/ChronoInstruct-SFT.


πŸš€ Usage Examples

You can try ChronoGPT-instruct directly in your browser via Google Colab:

Open in Colab


πŸ‘©β€πŸ’» Citation

@article{He_Lv_Manela_Wu_chronogpt_2025,
  title={Chronologically Consistent Generative AI},
  author={He, Songrun and Lv, Linying and Manela, Asaf and Wu, Jimmy},
  journal={Working Paper},
  year={2025}
}
Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including manelalab/chrono-gpt-instruct-v1-20161231