SmolLM-135M Fine-tuned on Dostoyevsky
This model is a fine-tuned version of HuggingFaceTB/SmolLM-135M on a curated dataset of Fyodor Dostoyevsky's major works. The model has been trained to generate text in the distinctive style of the Russian literary master.
Model Details
Model Description
- Developed by: satyapratheek
- Model type: Causal Language Model
- Language(s): English
- License: MIT
- Finetuned from model: HuggingFaceTB/SmolLM-135M
Dataset
The model was trained on a custom dataset consisting of four major works by Fyodor Dostoyevsky:
- Crime and Punishment (Project Gutenberg #2554)
- The Brothers Karamazov (Project Gutenberg #28054)
- The Idiot
- Notes from the Underground (Project Gutenberg #600)
Dataset Statistics:
- Total chunks: 6,217 text segments
- Average chunk length: 512 tokens
- All texts are public domain English translations
Training Details
Training Data
The dataset was preprocessed using the following pipeline:
- Raw texts cleaned with
gutenberg-cleaner
to remove headers/footers - Text normalization with
ftfy
andunidecode
- Chunking into 512-token segments
- Filtering for substantial paragraphs (>200 characters)
Training Procedure
Training Hardware:
- Device: Apple MacBook Air M1 (8GB unified memory)
- Compute: Apple Metal Performance Shaders (MPS)
- Memory Usage: Peak ~6GB unified memory
Training Hyperparameters:
- Training regime: LoRA (Low-Rank Adaptation)
- LoRA rank: 8
- LoRA alpha: 16
- LoRA dropout: 0.05
- Epochs: 3
- Batch size: 2 (per device)
- Gradient accumulation steps: 4
- Effective batch size: 8
- Learning rate: 2e-4
- Optimizer: AdamW
- Learning rate scheduler: Linear decay
- Max sequence length: 512 tokens
Training Results:
- Total training time: 4 hours, 56 minutes, 35 seconds
- Training steps: 2,334
- Final training loss: 3.254
- Training samples per second: 1.048
- Trainable parameters: 460,800 (LoRA adapters only)
Framework Versions
- Transformers: 4.53.0
- PyTorch: Latest with MPS support
- PEFT: Latest
- Datasets: Latest
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("satyapratheek/smollm-dostoyevsky")
tokenizer = AutoTokenizer.from_pretrained("satyapratheek/smollm-dostoyevsky")
# Generate text
prompt = "The man walked through the streets of St. Petersburg, contemplating"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=150,
temperature=0.8,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
generated_text = tokenizer.decode(outputs, skip_special_tokens=True)
print(generated_text)
Model Performance
The model demonstrates strong adaptation to Dostoyevsky's writing style, including:
- Philosophical depth: Captures the existential and psychological themes
- Character introspection: Generates internal monologues characteristic of Dostoyevsky's protagonists
- Russian cultural context: Maintains appropriate historical and cultural references
- Narrative complexity: Preserves the multi-layered storytelling approach
Limitations and Biases
- Time period bias: Reflects 19th-century perspectives and social norms
- Translation artifacts: Based on English translations, may not capture original Russian nuances
- Dataset scope: Limited to four major works, may not represent Dostoyevsky's complete style evolution
- Model size: As a 135M parameter model, has limited capacity compared to larger language models
Ethical Considerations
This model is trained exclusively on public domain texts and is intended for:
- Educational purposes
- Creative writing assistance
- Literary style analysis
- Research into author-specific language patterns
Users should be aware that the model may generate content reflecting historical perspectives that may not align with contemporary values.
Citation
@misc{smollm-dostoyevsky-2025,
author = {satyapratheek},
title = {SmolLM-135M Fine-tuned on Dostoyevsky},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/satyapratheek/smollm-dostoyevsky}
}
Acknowledgments
- Base model: HuggingFaceTB/SmolLM-135M
- Dataset source: Project Gutenberg
- Training framework: Hugging Face Transformers with PEFT
- Hardware: Apple M1 MacBook Air (8GB)
This model was trained as part of a fine-tuning experiment to explore author-style adaptation using efficient training methods on consumer hardware.
- Downloads last month
- 10
Model tree for satyapratheek/smollm-dostoyevsky
Base model
HuggingFaceTB/SmolLM-135M