Lean AI - A New Architecture for Efficient, Explainable AI

License: MIT

Introduction

Lean AI is not just another large language model. It is a new AI architecture designed from the ground up to challenge the "bigger is better" paradigm. Born from a journey of rapid prototyping and rigorous debugging, Lean AI embodies a philosophy of efficiency, interpretability, and logical reasoning over brute-force statistical pattern matching.

This project demonstrates a complete, working prototype of an AI that can:

  1. Learn Common Sense: Be pre-trained on a large, public knowledge graph to gain a foundational understanding of the world.
  2. Specialize its Knowledge: Ingest new, specialized text documents and fine-tune its brain to become an expert in new topics without forgetting its prior knowledge.
  3. Reason and Explain: Answer direct and inferential questions, providing the logical path it took to reach a conclusion.

The core of Lean AI is the Relational Graph Processor (RGP), a model inspired by knowledge graph embeddings that treats relationships as translations in a high-dimensional vector space.

The Lean AI Philosophy

Our design was guided by a few core principles:

  • Structure Before Scale: Instead of learning structure from massive, unstructured data, we give the AI structured data to learn the rules of logic and reason efficiently.
  • Parameter Frugality: Every parameter must justify its existence. We prioritize simple, elegant models over complex, opaque ones.
  • Interpretability by Design: The architecture is inherently auditable. We can trace why the AI made a specific decision.
  • Efficiency is Key: The model is designed to be small, fast, and capable of running on common hardware.

Architecture: The RGP v1.0 Core

After experimenting with more complex architectures (like Mixture-of-Experts and deep kernels), we discovered that a radically simpler design was far more effective and robust. The final prototype uses an architecture inspired by the TransE model.

  • Core Idea: Relationships are treated as simple vector additions in an embedding space: head_vector + relation_vector โ‰ˆ tail_vector.
  • Embeddings: The AI's "brain" consists of two main embedding tables: one for all entities (concepts) and one for all relations.
  • Training: We use a MarginRankingLoss function with Negative Sampling. This crucial technique teaches the model not only what is true ((dog, IsA, animal)) but also what is false ((dog, IsA, car)), forcing it to build a highly organized and logical internal "map" of the world.
  • Fine-Tuning: The model supports lifelong learning via a two-stage "Anchor" fine-tuning process. This allows it to integrate new knowledge without suffering from catastrophic forgetting.

Training Data: The "Primer" Brain

The foundational model provided here was "primed" on a curated subset of the ConceptNet 5.7 knowledge graph.

  • Total Facts Processed: ~300,000 unique, high-quality assertions.
  • Final Vocabulary Size: 216,508 entities and 8 core relations (isa, hasproperty, usedfor, etc.).
  • Training Method: The model was trained for 200 epochs on a GPU using the weighted, negative sampling method to produce a robust common-sense foundation.

How to Use the Pre-Trained Model

This section explains how to load our pre-trained "Primer" brain and use it for an example fine-tuning task.

1. Prerequisites

First, you need to create a Python environment and install the necessary libraries.

pip install torch
pip install spacy
python -m spacy download en_core_web_sm

You will also need to download the following files from this Hugging Face repository and place them in the same directory as your script:

  • primer_model_full_final.pth (The main model weights)
  • primer_entities_full_final.json (The entity-to-index mapping)
  • primer_relations_full_final.json (The relation-to-index mapping)
  • run_scholar.py (The script below)

2. The run_scholar.py Script

Create a file named run_scholar.py and paste the following code into it. This is the clean, standalone script for loading our pre-trained model and fine-tuning it with new knowledge.

# =======================================================
# Lean AI - Project "Scholar"
# Standalone Fine-Tuning and Verification Script
# =======================================================
import torch, torch.nn as nn, torch.optim as optim, json, spacy

print("Lean AI - Project \"Scholar\" | Loading and Fine-Tuning a Pre-Trained Brain")

def clean_text(text):
    """A simple utility to clean text for processing."""
    return text.lower().strip().replace('?', '')

class RGP(nn.Module):
    """The Relational Graph Processor v1.0 architecture."""
    def __init__(self, num_e, num_r, dim):
        super().__init__()
        self.e_emb = nn.Embedding(num_e, dim)
        self.r_emb = nn.Embedding(num_r, dim)
        # We don't initialize weights here, as we will load them.
    def forward(self, h, r):
        return self.e_emb(h) + self.r_emb(r)

def ingest(text, nlp):
    """Extracts simple (h,r,t) triples from text."""
    doc = nlp(text)
    triples, last_subj = [], None
    for sent in doc.sents:
        verbs = [tok for tok in sent if tok.pos_ in ('VERB', 'AUX')]
        for verb in verbs:
            relation = {'be':'isa', 'have':'hasproperty', 'use':'usedfor', 'cause':'causes'}.get(verb.lemma_)
            if not relation: continue
            subjects = [c for c in verb.children if c.dep_=='nsubj']; objects = [c for c in verb.children if c.dep_ in ('dobj', 'attr', 'acomp')]
            if subjects and objects:
                subj = subjects[0]
                if subj.pos_=='PRON' and last_subj: subj = last_subj
                elif subj.pos_!='PRON': last_subj = subj
                h,t = clean_text(subj.text), clean_text(objects[0].text)
                triples.append((h,relation,t))
    return list(set(triples))

def fine_tune_with_anchors(model, bridging_facts, specialist_facts, e2i, r2i):
    """Performs the two-stage anchoring and fine-tuning process."""
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"\nFine-tuning on device: {device}")
    model.to(device)
    loss_fn = nn.L1Loss()

    if bridging_facts:
        print(f"\n--- Stage 1: Anchoring {len(bridging_facts)} facts ---")
        h,r,t = [torch.LongTensor([i[j] for i in bridging_facts]).to(device) for j in range(3)]
        optimizer = optim.AdamW(model.parameters(), lr=1e-3)
        model.train()
        for _ in range(50):
            optimizer.zero_grad(); pred=model(h,r); loss=loss_fn(pred,model.e_emb(t)); loss.backward(); optimizer.step()
            with torch.no_grad(): model.e_emb.weight.div_(torch.norm(model.e_emb.weight, p=2, dim=1, keepdim=True))

    if specialist_facts:
        print(f"\n--- Stage 2: Specialist Tuning {len(specialist_facts)} facts ---")
        h,r,t = [torch.LongTensor([e2i[f[i]] for f in specialist_facts]).to(device) for i in [0,2]], torch.LongTensor([r2i[f[1]] for f in specialist_facts]).to(device)
        optimizer = optim.AdamW(model.parameters(), lr=1e-5)
        model.train()
        for _ in range(100):
            optimizer.zero_grad(); pred=model(h,r); loss=loss_fn(pred,model.e_emb(t)); loss.backward(); optimizer.step()
            with torch.no_grad(): model.e_emb.weight.div_(torch.norm(model.e_emb.weight, p=2, dim=1, keepdim=True))

    print("--- Fine-Tuning Complete ---")
    model.to('cpu'); return model

if __name__ == '__main__':
    # Step 1: Load the pre-trained Primer Brain
    print("--- Loading Pre-Trained 'Primer' Brain ---")
    try:
        with open('primer_entities_full_final.json', 'r') as f: e2i = json.load(f)
        with open('primer_relations_full_final.json', 'r') as f: r2i = json.load(f)
        state_dict = torch.load('primer_model_full_final.pth')
        PRIMER_DIM = state_dict['e_emb.weight'].shape[1]
        
        model = RGP(len(e2i), len(r2i), PRIMER_DIM)
        model.load_state_dict(state_dict)
        print("Primer Brain loaded successfully.")
    except FileNotFoundError as e:
        print(f"\nCRITICAL ERROR: Could not find primer model file: {e}"); exit()
        
    # Step 2: Ingest new text
    nlp = spacy.load('en_core_web_sm')
    specialist_text = "A plant uses sunlight. Sunlight causes photosynthesis. Photosynthesis is a process."
    specialist_facts = ingest(specialist_text, nlp)
    print(f"\n--- Ingested New Specialist Facts ---\n  - {specialist_facts}")

    # Step 3: Dynamically expand vocabulary and model for new concepts
    entities = sorted(e2i, key=e2i.get)
    new_words = [w for w in set(h for h,r,t in specialist_facts)|set(t for h,r,t in specialist_facts) if w not in e2i]
    if new_words:
        print(f"\nNew concepts found: {new_words}")
        old_e_num, old_r_num = model.e_emb.num_embeddings, model.r_emb.num_embeddings
        old_e_emb, old_r_emb = model.e_emb.weight.data, model.r_emb.weight.data
        for word in new_words: e2i[word] = old_e_num + new_words.index(word)
        model = RGP(len(e2i), len(r2i), PRIMER_DIM)
        model.e_emb.weight.data[:old_e_num] = old_e_emb
        model.r_emb.weight.data[:old_r_num] = old_r_emb
        print(f"Model vocabulary expanded to {len(e2i)} entities.")

    # Step 4: Define bridging facts and fine-tune
    bridging_facts = [('plant','isa','organism'), ('sunlight','isa','light')]
    bridging_facts_idx = [(e2i[h],r2i[r],e2i[t]) for h,r,t in bridging_facts if h in e2i and r in r2i and t in e2i]
    fine_tuned_model = fine_tune_with_anchors(model, bridging_facts_idx, specialist_facts, e2i, r2i)

    # Step 5: Verify the final model
    print("\n--- Final Verification ---")
    fine_tuned_model.eval()
    def test_query(head, relation):
        final_entities = sorted(e2i, key=e2i.get)
        try:
            h, r = torch.LongTensor([e2i[head]]), torch.LongTensor([r2i[relation]])
            with torch.no_grad(): pred_emb = fine_tuned_model(h, r)
            distances = torch.norm(fine_tuned_model.e_emb.weight - pred_emb, dim=1)
            _, top3_indices = torch.topk(distances, 3, largest=False)
            return [final_entities[i] for i in top3_indices]
        except KeyError as e: return [f"Concept not in vocab: {e}" ]

    print(f"\nQ (Old Knowledge): 'dog' + 'isa' -> {test_query('dog', 'isa')}")
    print(f"Q (Anchored Knowledge): 'plant' + 'isa' -> {test_query('plant', 'isa')}")
    print(f"Q (Specialist Knowledge): 'plant' + 'usedfor' -> {test_query('plant', 'usedfor')}")

3. How to Run

  1. Place all four files (run_scholar.py, primer_model...pth, and the two .json files) in the same directory.
  2. Open a terminal in that directory.
  3. Execute the script: python run_scholar.py

You will see the output as the model loads its common-sense brain, learns about photosynthesis, and then demonstrates that it has successfully integrated the new knowledge without forgetting the old.

The Future: RGP v2.0 and Beyond

This prototype successfully demonstrates the potential of the Lean AI philosophy. However, we have identified a key limitation: the simple head + relation โ‰ˆ tail model can struggle to learn highly novel facts that contradict the "gravitational pull" of its existing knowledge.

The next phase of development, Project "Cognito," will focus on designing RGP v2.0. This next-generation core will incorporate an attention mechanism, allowing the model to learn relationships in context, making it far more powerful and adaptable.

Thank you for exploring Lean AI. We believe this is a crucial step towards building more efficient, trustworthy, and truly intelligent systems.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support