SmolLM-135M Fine-tuned on Dostoyevsky

This model is a fine-tuned version of HuggingFaceTB/SmolLM-135M on a curated dataset of Fyodor Dostoyevsky's major works. The model has been trained to generate text in the distinctive style of the Russian literary master.

Model Details

Model Description

  • Developed by: satyapratheek
  • Model type: Causal Language Model
  • Language(s): English
  • License: MIT
  • Finetuned from model: HuggingFaceTB/SmolLM-135M

Dataset

The model was trained on a custom dataset consisting of four major works by Fyodor Dostoyevsky:

  • Crime and Punishment (Project Gutenberg #2554)
  • The Brothers Karamazov (Project Gutenberg #28054)
  • The Idiot
  • Notes from the Underground (Project Gutenberg #600)

Dataset Statistics:

  • Total chunks: 6,217 text segments
  • Average chunk length: 512 tokens
  • All texts are public domain English translations

Training Details

Training Data

The dataset was preprocessed using the following pipeline:

  1. Raw texts cleaned with gutenberg-cleaner to remove headers/footers
  2. Text normalization with ftfy and unidecode
  3. Chunking into 512-token segments
  4. Filtering for substantial paragraphs (>200 characters)

Training Procedure

Training Hardware:

  • Device: Apple MacBook Air M1 (8GB unified memory)
  • Compute: Apple Metal Performance Shaders (MPS)
  • Memory Usage: Peak ~6GB unified memory

Training Hyperparameters:

  • Training regime: LoRA (Low-Rank Adaptation)
  • LoRA rank: 8
  • LoRA alpha: 16
  • LoRA dropout: 0.05
  • Epochs: 3
  • Batch size: 2 (per device)
  • Gradient accumulation steps: 4
  • Effective batch size: 8
  • Learning rate: 2e-4
  • Optimizer: AdamW
  • Learning rate scheduler: Linear decay
  • Max sequence length: 512 tokens

Training Results:

  • Total training time: 4 hours, 56 minutes, 35 seconds
  • Training steps: 2,334
  • Final training loss: 3.254
  • Training samples per second: 1.048
  • Trainable parameters: 460,800 (LoRA adapters only)

Framework Versions

  • Transformers: 4.53.0
  • PyTorch: Latest with MPS support
  • PEFT: Latest
  • Datasets: Latest

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("satyapratheek/smollm-dostoyevsky")
tokenizer = AutoTokenizer.from_pretrained("satyapratheek/smollm-dostoyevsky")

# Generate text
prompt = "The man walked through the streets of St. Petersburg, contemplating"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        temperature=0.8,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

generated_text = tokenizer.decode(outputs, skip_special_tokens=True)
print(generated_text)

Model Performance

The model demonstrates strong adaptation to Dostoyevsky's writing style, including:

  • Philosophical depth: Captures the existential and psychological themes
  • Character introspection: Generates internal monologues characteristic of Dostoyevsky's protagonists
  • Russian cultural context: Maintains appropriate historical and cultural references
  • Narrative complexity: Preserves the multi-layered storytelling approach

Limitations and Biases

  • Time period bias: Reflects 19th-century perspectives and social norms
  • Translation artifacts: Based on English translations, may not capture original Russian nuances
  • Dataset scope: Limited to four major works, may not represent Dostoyevsky's complete style evolution
  • Model size: As a 135M parameter model, has limited capacity compared to larger language models

Ethical Considerations

This model is trained exclusively on public domain texts and is intended for:

  • Educational purposes
  • Creative writing assistance
  • Literary style analysis
  • Research into author-specific language patterns

Users should be aware that the model may generate content reflecting historical perspectives that may not align with contemporary values.

Citation

@misc{smollm-dostoyevsky-2025,
  author = {satyapratheek},
  title = {SmolLM-135M Fine-tuned on Dostoyevsky},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/satyapratheek/smollm-dostoyevsky}
}

Acknowledgments

  • Base model: HuggingFaceTB/SmolLM-135M
  • Dataset source: Project Gutenberg
  • Training framework: Hugging Face Transformers with PEFT
  • Hardware: Apple M1 MacBook Air (8GB)

This model was trained as part of a fine-tuning experiment to explore author-style adaptation using efficient training methods on consumer hardware.

Downloads last month
10
Safetensors
Model size
135M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for satyapratheek/smollm-dostoyevsky

Adapter
(9)
this model