---
library_name: transformers
tags:
- gemma
- causal-lm
- peft
- lora
- fine-tuning
- paul-graham
- text-generation
datasets:
- arthurmello/paul-graham-essays
base_model:
- google/gemma-3-27b-it
---

# Model Card for `gemma-3b-it-paul-graham-lora`

This is a fine-tuned version of the [`google/gemma-3b-it`](https://huggingface.co/google/gemma-3b-it) model using [LoRA (Low-Rank Adaptation)](https://arxiv.org/abs/2106.09685), trained to generate essays in the style of Paul Graham.

## Model Details

### Model Description

This model adapts `google/gemma-3b-it` using LoRA fine-tuning on a collection of Paul Graham's essays. It is intended for creative writing and stylistic emulation of Paul Graham’s tone and argumentative structure.

- **Developed by:** Arthur Mello
- **Model type:** Causal Language Model (Decoder-only)
- **Language(s) (NLP):** English
- **Finetuned from model:** `google/gemma-3b-it`

## Uses

### Direct Use

The model can be used to generate essays or long-form answers in the style of Paul Graham. It is useful for educational, literary, or entertainment purposes.

### Out-of-Scope Use

- Factual question answering (it may hallucinate)
- Use in high-risk domains (e.g., legal, medical)
- Emulating real individuals without clear consent

## Bias, Risks, and Limitations

The model inherits biases from the original Gemma model and the Paul Graham dataset. Since the dataset reflects a single author's perspective, generated outputs may lean toward specific ideological or philosophical stances.

### Recommendations

Users should review outputs carefully and avoid using the model as a source of truth or factual information.

## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("arthurmello/gemma-3b-it-paul-graham-lora")
tokenizer = AutoTokenizer.from_pretrained("arthurmello/gemma-3b-it-paul-graham-lora")

prompt = "<start_of_turn>user\nWrite an essay on the future of AI<end_of_turn><eos>\n<start_of_turn>model\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Training Details

### Training Data

The model was fine-tuned on a cleaned dataset of Paul Graham’s essays, formatted into a chat-like structure using a custom `chat_template`.

**Dataset:** [`arthurmello/paul-graham-essays`](https://huggingface.co/datasets/arthurmello/paul-graham-essays)

### Training Procedure

#### Training Hyperparameters

- **LoRA rank:** 16  
- **LoRA alpha:** 64  
- **LoRA dropout:** 0.05  
- **Batch size:** 1 (gradient accumulation = 4)  
- **Max sequence length:** 1500 tokens  
- **Learning rate:** 1e-4
- **Precision:** bf16  
- **Epochs:** 10
- **Scheduler:** cosine  
- **Weight decay:** 0.1  

## Evaluation

### Metrics

- **Training loss:** Decreased from 2.57 to 1.80  
- **Validation loss:** Plateaued around 2.5–2.6, indicating stability with limited overfitting

## Environmental Impact

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute).

- **Hardware Type:** L4 GPU  
- **Hours used:** ~30 min  
- **Cloud Provider:** Google Colab Pro  
- **Compute Region:** Europe

## Technical Specifications

### Model Architecture and Objective

Decoder-only transformer model with LoRA adapters applied to key projection layers, attention outputs, and feedforward projections.

### Compute Infrastructure

#### Hardware

- 1× NVIDIA L4 (22.5GB)

#### Software

- Python 3.11  
- PyTorch  
- Hugging Face Transformers, PEFT, TRL

## Model Card Contact

For questions or feedback, contact [@arthurmello](https://huggingface.co/arthurmello) or [LinkedIn](https://www.linkedin.com/in/melloarthur/)