Bigram Character-Level Language Model: Makemore (Part 1)

Introduced to the concept of a bigram character-level language model, this repository explores its training, sampling, and evaluation processes. The model evaluation was conducted using the Negative Log Likelihood (NLL) loss to assess its quality.

Overview

The model was trained in two distinct ways, both yielding identical results:

  1. Frequency-Based Approach: Directly counting and normalizing bigram frequencies.
  2. Gradient-Based Optimization: Optimizing the counts matrix using a gradient-based framework guided by minimizing the NLL loss.

This demonstrated that both methods converge to the same result, showcasing their equivalence in achieving the desired outcome.

Documentation

For a better reading experience and detailed notes, visit my Road to GPT Documentation Site.

Acknowledgments

Notes and implementations inspired by the Makemore - Part 1 video by Andrej Karpathy.

For more of my projects, visit my Portfolio Site.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train MuzzammilShah/NeuralNetworks-LanguageModels-1