GPT-Neo 1.3B - LUMI Conversational
π GPT-Neo 1.3B trained on Europe's LUMI supercomputer using AMD MI250X GPUs
Model Description
This model is a fine-tuned version of EleutherAI/gpt-neo-1.3B trained for conversational AI tasks.
Key Features
- π High-quality conversational AI (loss converged to 1.27)
- π§ AMD GPU optimized - Trained with ROCm 6.2.4
- β‘ Raw PyTorch implementation - No HuggingFace Trainer dependency
- π 21,665 conversation examples - OpenAssistant dataset
Training Details
Infrastructure
- GPUs: 8x AMD Instinct MI250X (64GB each)
- Framework: PyTorch with raw distributed training
- Communication: NCCL with Slingshot-11 network
Training Configuration
- Base Model: EleutherAI/gpt-neo-1.3B
- Dataset: OpenAssistant Conversations (21,665 examples)
- Batch Size: 64 global (8 per GPU)
- Steps: 338 training steps (early convergence)
- Learning Rate: 5e-6 with cosine annealing
- Precision: BF16 mixed precision
Performance
- Final Loss: 1.27 (excellent convergence)
- Training Time: 29 hours on 8 GPUs
- Memory Usage: ~12GB per GPU
- Inference Speed: 22.1 tokens/second
Usage
Quick Start
from transformers import GPTNeoForCausalLM, GPT2Tokenizer
# Load model and tokenizer
model = GPTNeoForCausalLM.from_pretrained("raimondskrauklis/gpt-neo-1.3b-lumi-conversational")
tokenizer = GPT2Tokenizer.from_pretrained("raimondskrauklis/gpt-neo-1.3b-lumi-conversational")
# Generate response
prompt = "Human: What is machine learning?\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=150, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Conversation Format
The model was trained on conversations with this format:
Human: [question or statement]
Assistant: [response]
Model Performance
Based on validation testing, the model demonstrates:
General Knowledge: Accurate explanations with historical context
Programming: Generates working code with explanations
Technical Discussion: Coherent responses about complex topics
Problem Solving: Structured, logical approaches
Consistency: Reliable performance across different domains
Technical Specifications
MetricValueParameters1.3BTraining Steps338Final Loss1.27Training Time29 hoursGPUs Used8x AMD MI250XMemory per GPU~12GBInference Speed22.1 tokens/sec
Acknowledgments
LUMI: European pre-exascale supercomputer infrastructure
EleutherAI: Base GPT-Neo model
OpenAssistant: Conversation dataset
Citation
bibtex@misc{gptneo-lumi-conversational,
title={GPT-Neo 1.3B Fine-tuned on LUMI Supercomputer},
author={Raimonds Krauklis},
year={2025},
howpublished={Hugging Face Model Hub},
url={https://huggingface.co/raimondskrauklis/gpt-neo-1.3b-lumi-conversational}
}
License
This model is released under the Apache 2.0 license, same as the base GPT-Neo model.
- Downloads last month
- 13
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for raimondskrauklis/gpt-neo-1.3b-lumi-conversational
Base model
EleutherAI/gpt-neo-1.3B