Model Card for Model ID

This is a LoRA-based fine-tuning of the DistilBERT base model for movie genre classification based on the description of the movie.

Model Description

It was trained on the dataset from: https://www.kaggle.com/datasets/hijest/genre-classification-dataset-imdb. The dataset was modified by reducing the number of genre labels (from 27 to 8) and balancing the number of samples per genre (min 1000 and max 3000 samples per genre) to simplify the task and improve performance. The final dataset contains 16737 training examples and 4185 validation examples. To get good performance while classifying across 27 genres requires substantial computational resources and time. However, since this is a personal project trained on a local machine, the focus was on experimentation rather than exhaustive coverage.

  • Developed by: Dilyara Arynova
  • Finetuned from model: DistilBERT base model (uncased)

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "dilyaraarynova/distilbert-base-uncased-movie-genre-classification" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "A young wizard starts his journey at a magical school." inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) outputs = model(**inputs) predicted_label = outputs.logits.argmax().item()

Training Details

Training Data

The model was trained on a modified version of the IMDB Movie Genre Classification Dataset (https://www.kaggle.com/datasets/hijest/genre-classification-dataset-imdb. ) , which includes movie plot descriptions and associated genre labels. The dataset was:

  • Filtered to retain only 8 broad genre categories to reduce class imbalance and complexity.
  • Cleaned to ensure reasonably balanced samples per class.
  • Preprocessed using Hugging Face Datasets and tokenized with the distilbert-base-uncased tokenizer.

Training Procedure

The model was fine-tuned using parameter-efficient fine-tuning (PEFT) with LoRA (Low-Rank Adaptation), applied to a pre-trained distilbert-base-uncased model.

Training Hyperparameters

Base model: distilbert-base-uncased

PEFT method: LoRA with r=8, alpha=64, dropout=0.05

Optimizer: AdamW (via Hugging Face Trainer)

Learning rate: 2e-5

Batch size: 8

Epochs: 10

Weight decay: 0.01

Warmup ratio: 0.1

Evaluation strategy: Epoch

Load best model at end: True

Speeds, Sizes, Times

Training hardware: Local machine (laptop)

Training time: 7754.7237 ('train_samples_per_second': 4.913)

Trainable parameters: 648,219 || all parameters of the base model: 67,622,454 || trainable%: 0.9586

Model Accuracy Comparison

Model Accuracy
Base (Untrained) 0.1173
Trained (LoRA) 0.6767
Improvement +0.5594

Example Predictions

Untrained Base Model Predictions:

  • Input: L.R. Brane loves his life - his car, his apartment, his job, but especially his girlfriend, Vespa. One day while showering, Vespa runs out of shampoo. L.R. runs across the street to a convenience store to buy some more, a quick trip of no more than a few minutes. When he returns, Vespa is gone and every trace of her existence has been wiped out. L.R.'s life becomes a tortured existence as one strange event after another occurs to confirm in his mind that a conspiracy is working against his finding Vespa.
    • Prediction: music
  • Input: Spain, March 1964: Quico is a very naughty child of three belonging to a wealthy middle-class family. Since Cristina's birth, he feels he has lost the privileged position of 'prince' of the house for his eight months old sister. So, with his brother Juan, who is eight years old and is quite disobedient, spend their time committing prank after prank, causing the resulting anger of his mother, the nanny and the old housemaid. The rest of the family members are two much older brothers, his resigned mother and a retrograde father of authoritarian ideas. But many years have passed, and the civil war that won the despot Don Pablo is simply for their children Dad's war.
    • Prediction: music
  • Input: One year in the life of Albin and his family of shepherds in the North of Transylvania. In direct cinema style, this documentary follows their day to day routines, and their struggle to adapt to a new world where traditions are gradually replaced by modern values. Since joining the EU, Romania has been facing, like several other Eastern European countries, the pressure of modern values, introducing in farmer's lives the cruel notion of competition, the temptation of migrating to the higher salaries abroad, and the marginalization of locally produced food against industrial products.
    • Prediction: music
  • Input: Two strangers fall in love after meeting on a train, but life pulls them apart before they can be together.
    • Prediction: music
  • Input: A group of coworkers accidentally locks themselves in their office overnight and chaos ensues.
    • Prediction: music
  • Input: A family moves into an old house, only to discover it's haunted by the spirits of its former residents.
    • Prediction: music

Trained LoRA Model Predictions:

  • Input: L.R. Brane loves his life - his car, his apartment, his job, but especially his girlfriend, Vespa. One day while showering, Vespa runs out of shampoo. L.R. runs across the street to a convenience store to buy some more, a quick trip of no more than a few minutes. When he returns, Vespa is gone and every trace of her existence has been wiped out. L.R.'s life becomes a tortured existence as one strange event after another occurs to confirm in his mind that a conspiracy is working against his finding Vespa.
    • Prediction: horror
  • Input: Spain, March 1964: Quico is a very naughty child of three belonging to a wealthy middle-class family. Since Cristina's birth, he feels he has lost the privileged position of 'prince' of the house for his eight months old sister. So, with his brother Juan, who is eight years old and is quite disobedient, spend their time committing prank after prank, causing the resulting anger of his mother, the nanny and the old housemaid. The rest of the family members are two much older brothers, his resigned mother and a retrograde father of authoritarian ideas. But many years have passed, and the civil war that won the despot Don Pablo is simply for their children Dad's war.
    • Prediction: comedy
  • Input: One year in the life of Albin and his family of shepherds in the North of Transylvania. In direct cinema style, this documentary follows their day to day routines, and their struggle to adapt to a new world where traditions are gradually replaced by modern values. Since joining the EU, Romania has been facing, like several other Eastern European countries, the pressure of modern values, introducing in farmer's lives the cruel notion of competition, the temptation of migrating to the higher salaries abroad, and the marginalization of locally produced food against industrial products.
    • Prediction: documentary
  • Input: Two strangers fall in love after meeting on a train, but life pulls them apart before they can be together.
    • Prediction: romance
  • Input: A group of coworkers accidentally locks themselves in their office overnight and chaos ensues.
    • Prediction: action
  • Input: A family moves into an old house, only to discover it's haunted by the spirits of its former residents.
    • Prediction: horror

Framework versions

  • PEFT 0.15.1
Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for dilyaraarynova/distilbert-base-uncased-movie-genre-classification

Adapter
(298)
this model