metadata

library_name: pytorch
license: mit
language:
  - en
tags:
  - chronologically consistent
  - modded-nanogpt
  - hellaswag
pipeline_tag: text-generation
inference: false

ChronoGPT

Model Description

ChronoGPT is a series high-performance chronologically consistent large language models (LLMs) designed to eliminate lookahead bias and training leakage while maintaining good language understanding in time-sensitive applications. The model is pretrained on diverse, high-quality, open-source, and timestamped text to maintain chronological consistency.

All models in the series achieve HellaSwag benchmark scores that surpass those of the GPT-2 124M model with the same parameter count. This approach preserves the integrity of historical analysis and enables more reliable economic and financial modeling.

Developed by: Songrun He, Linying Lv, Asaf Manela, Jimmy Wu
Model type: Transformer-based autoregressive decoder (Modified modded-NanoGPT architecture)
Language(s) (NLP): English
License: MIT License

Model Sources

Paper: "Chronologically Consistent Large Language Models" (He, Lv, Manela, Wu, 2025)

How to Get Started with the Model

The model is compatible with following requirements:

pip install -r requirements.txt

Here is an example code of using the model:

from modeling_chronogpt import ChronoGPT
import tiktoken
import torch

device = 'cuda:0'
max_length = 1792

tokenizer = tiktoken.get_encoding("gpt2")
model = ChronoGPT.from_pretrained("manelalab/chrono-gpt-v1-19991231", trust_remote_code=True).to(device)

text = "Obviously, the time continuum has been disrupted, creating a new temporal event sequence resulting in this alternate reality. -- Dr. Brown, Back to the Future Part II"

inputs = torch.tensor(tokenizer.encode(text))[:max_length].reshape(1,-1).to(device)
logits, emb = model(inputs)

Training Details

Training Data

Pretraining corpus: Our initial model chrono-gpt-v1-19991231 is pretrained on 21 billion tokens of pre-2000, diverse, high-quality, and open-source text data to ensure no leakage of data afterwards.
Incremental updates: Yearly updates from 2000 to 2024 with an additional 65 billion tokens of timestamped text.

Training Procedure

Architecture: modded NanoGPT-based model with the Muon optimizer, Skip connections, rotary embeddings and flex attention.
Objective: Autoregressive text generation.

Evaluation

Testing Data, Factors & Metrics

Language understanding: Evaluated on HellaSwag benchmark tasks.
Financial forecasting: Evaluated using return prediction task based on Dow Jones Newswire data.
Comparison models: ChronoGPT was benchmarked against BERT, FinBERT, StoriesLM-v1-1963, and Llama 3.1.

Results

HellaSwag Score: chrono-gpt-v1-19991231 and chrono-gpt-v1-20241231 achieved HellaSwag score of 0.295 and 0.324 respectively, outperforming GPT-2 (0.294).
Stock return predictions: During the sample from 2008-01 to 2023-07, chrono-gpt-v1-realtime achieves a long-short portfolio Sharpe ratio of 4.50, outperforming BERT, FinBERT, and StoriesLM-v1-1963, and comparable to Llama 3.1 8B (4.90).

Citation

@article{He2025ChronoBERT,
  title={Chronologically Consistent Large Language Models},
  author={He, Songrun and Lv, Linying and Manela, Asaf and Wu, Jimmy},
  journal={Working Paper},
  year={2025}
}

Model Card Authors

Songrun He (Washington University in St. Louis, [email protected])
Linying Lv (Washington University in St. Louis, [email protected])
Asaf Manela (Washington University in St. Louis, [email protected])
Jimmy Wu (Washington University in St. Louis, [email protected])