File size: 5,446 Bytes
bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 ea5aaed 3d950d6 ea5aaed bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 4b683c6 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 bfb39c5 3d950d6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
---
library_name: pytorch
license: mit
language:
- en
tags:
- chronologically consistent
- modded-nanogpt
- hellaswag
pipeline_tag: text-generation
inference: false
---
# ChronoGPT
## ChronoGPT Highlights
ChronoGPT is a series **high-performance chronologically consistent large language models (LLMs)** designed to eliminate lookahead bias and training leakage while maintaining good language understanding in time-sensitive applications. The model is pretrained on **diverse, high-quality, open-source, and timestamped text** to maintain chronological consistency.
All models in the series achieve **HellaSwag benchmark scores that surpass those of the GPT-2 124M model.** This approach preserves the integrity of historical analysis and enables more reliable economic and financial modeling.
- **Developed by:** Songrun He, Linying Lv, Asaf Manela, Jimmy Wu
- **Model type:** Transformer-based autoregressive decoder (Modified modded-NanoGPT architecture)
- **Language(s) (NLP):** English
- **License:** MIT License
## Model Overview
**ChronoGPT** has the following features:
- Type: Causal Language Models
- Training Stage: Pretraining
- Number of Parameters: ~1,552 Million
- Encoder & Decoder Partitioning: 26 encoder and 26 decoder layers
- Tokenizer: GPT2Tokenizer from HuggingFace
- Context Length: 1,792
- Embedding Dimension: 1,536
## 🚀 Quickstart
You can try ChronoGPT directly in your browser via Google Colab:
<p align="left">
<a href="https://colab.research.google.com/github/LinyingLyu/ChronoGPT/blob/main/ChronoGPT_tutorial.ipynb" target="_blank">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a>
</p>
Or run it locally with:
```bash
pip install -r requirements.txt
```
### Text Generation
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
```python
import torch
import torch.nn.functional as F
import tiktoken
from huggingface_hub import HfApi, login
from ChronoGPT_inference import *
# ----------------------------- Setup -----------------------------
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
cache_dir = 'cache' # Update this path as needed
tokenizer = tiktoken.get_encoding("gpt2")
max_length = 50
num_return_sequences = 5
seed = 123
# -------------------------- Load Model --------------------------
model = ChronoGPT.from_pretrained(
"manelalab/chrono-gpt-v1-20241231",
trust_remote_code=True,
cache_dir=cache_dir
).to(device)
# ------------------------ Prepare Input -------------------------
prompt = "Hello, I am a language model,"
tokens = tokenizer.encode(prompt)
tokens = torch.tensor(tokens, dtype=torch.long).unsqueeze(0)
tokens = tokens.repeat(num_return_sequences, 1).to(device)
# -------------------- Sampling Initialization -------------------
xgen = tokens.clone()
sample_rng = torch.Generator(device=device)
sample_rng.manual_seed(seed)
# ------------------------- Text Generation -----------------------
while xgen.size(1) < max_length:
with torch.no_grad():
with torch.autocast(device_type='cuda', dtype=torch.bfloat16):
logits, _ = model(xgen)
logits = logits[:, -1, :] # Last token logits
probs = F.softmax(logits, dim=-1)
topk_probs, topk_indices = torch.topk(probs, 50, dim=-1)
sampled_idx = torch.multinomial(topk_probs, 1, generator=sample_rng)
next_token = torch.gather(topk_indices, -1, sampled_idx)
xgen = torch.cat([xgen, next_token], dim=1)
# ------------------------- Decode Output -------------------------
for i in range(num_return_sequences):
decoded_tokens = xgen[i, :max_length].tolist()
decoded_text = tokenizer.decode(decoded_tokens)
print(f"Rank sample {i}:\n{decoded_text}\n")
```
### Extract Embeddings
The following contains a code snippet illustrating how to use the model generate embeddings of all layers based on given inputs.
```python
import torch
import torch.nn.functional as F
import tiktoken
from huggingface_hub import HfApi, login
from ChronoGPT_inference import *
# ----------------------------- Setup -----------------------------
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
cache_dir = 'cache' # Update this path as needed
tokenizer = tiktoken.get_encoding("gpt2")
# -------------------------- Load Model --------------------------
model = ChronoGPT.from_pretrained(
"manelalab/chrono-gpt-v1-20241231",
trust_remote_code=True,
cache_dir=cache_dir
).to(device)
# ----------------------- Embedding Generation ---------------------
text = "Obviously, the time continuum has been disrupted, creating a new temporal event sequence resulting in this alternate reality."
inputs = torch.tensor(tokenizer.encode(text))[:max_length].reshape(1,-1).to(device)
logits, emb = model(inputs)
print('Dimension of embeddings:', emb[0].shape)
```
## Citation
```
@article{He2025ChronoBERT,
title={Chronologically Consistent Large Language Models},
author={He, Songrun and Lv, Linying and Manela, Asaf and Wu, Jimmy},
journal={Working Paper},
year={2025}
}
```
### Model Card Authors
- Songrun He (Washington University in St. Louis, [email protected])
- Linying Lv (Washington University in St. Louis, [email protected])
- Asaf Manela (Washington University in St. Louis, [email protected])
- Jimmy Wu (Washington University in St. Louis, [email protected]) |