SmolLM-135M Fine-tuned on Dostoyevsky

This model is a fine-tuned version of HuggingFaceTB/SmolLM-135M on a curated dataset of Fyodor Dostoyevsky's major works. The model has been trained to generate text in the distinctive style of the Russian literary master.

Model Details

Model Description

Developed by: satyapratheek
Model type: Causal Language Model
Language(s): English
License: MIT
Finetuned from model: HuggingFaceTB/SmolLM-135M

Dataset

The model was trained on a custom dataset consisting of four major works by Fyodor Dostoyevsky:

Crime and Punishment (Project Gutenberg #2554)
The Brothers Karamazov (Project Gutenberg #28054)
The Idiot
Notes from the Underground (Project Gutenberg #600)

Dataset Statistics:

Total chunks: 6,217 text segments
Average chunk length: 512 tokens
All texts are public domain English translations

Training Details

Training Data

The dataset was preprocessed using the following pipeline:

Raw texts cleaned with gutenberg-cleaner to remove headers/footers
Text normalization with ftfy and unidecode
Chunking into 512-token segments
Filtering for substantial paragraphs (>200 characters)

Training Procedure

Training Hardware:

Device: Apple MacBook Air M1 (8GB unified memory)
Compute: Apple Metal Performance Shaders (MPS)
Memory Usage: Peak ~6GB unified memory

Training Hyperparameters:

Training regime: LoRA (Low-Rank Adaptation)
LoRA rank: 8
LoRA alpha: 16
LoRA dropout: 0.05
Epochs: 3
Batch size: 2 (per device)
Gradient accumulation steps: 4
Effective batch size: 8
Learning rate: 2e-4
Optimizer: AdamW
Learning rate scheduler: Linear decay
Max sequence length: 512 tokens

Training Results:

Total training time: 4 hours, 56 minutes, 35 seconds
Training steps: 2,334
Final training loss: 3.254
Training samples per second: 1.048
Trainable parameters: 460,800 (LoRA adapters only)

Framework Versions

Transformers: 4.53.0
PyTorch: Latest with MPS support
PEFT: Latest
Datasets: Latest

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("satyapratheek/smollm-dostoyevsky")
tokenizer = AutoTokenizer.from_pretrained("satyapratheek/smollm-dostoyevsky")

# Generate text
prompt = "The man walked through the streets of St. Petersburg, contemplating"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        temperature=0.8,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

generated_text = tokenizer.decode(outputs, skip_special_tokens=True)
print(generated_text)

Model Performance

The model demonstrates strong adaptation to Dostoyevsky's writing style, including:

Philosophical depth: Captures the existential and psychological themes
Character introspection: Generates internal monologues characteristic of Dostoyevsky's protagonists
Russian cultural context: Maintains appropriate historical and cultural references
Narrative complexity: Preserves the multi-layered storytelling approach

Limitations and Biases

Time period bias: Reflects 19th-century perspectives and social norms
Translation artifacts: Based on English translations, may not capture original Russian nuances
Dataset scope: Limited to four major works, may not represent Dostoyevsky's complete style evolution
Model size: As a 135M parameter model, has limited capacity compared to larger language models

Ethical Considerations

This model is trained exclusively on public domain texts and is intended for:

Educational purposes
Creative writing assistance
Literary style analysis
Research into author-specific language patterns

Users should be aware that the model may generate content reflecting historical perspectives that may not align with contemporary values.

Citation

@misc{smollm-dostoyevsky-2025,
  author = {satyapratheek},
  title = {SmolLM-135M Fine-tuned on Dostoyevsky},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/satyapratheek/smollm-dostoyevsky}
}

Acknowledgments

Base model: HuggingFaceTB/SmolLM-135M
Dataset source: Project Gutenberg
Training framework: Hugging Face Transformers with PEFT
Hardware: Apple M1 MacBook Air (8GB)

This model was trained as part of a fine-tuning experiment to explore author-style adaptation using efficient training methods on consumer hardware.

satyapratheek
/

smollm-dostoyevsky