--- library_name: transformers tags: - gemma - causal-lm - peft - lora - fine-tuning - paul-graham - text-generation datasets: - arthurmello/paul-graham-essays base_model: - google/gemma-3-27b-it --- # Model Card for `gemma-3b-it-paul-graham-lora` This is a fine-tuned version of the [`google/gemma-3b-it`](https://huggingface.co/google/gemma-3b-it) model using [LoRA (Low-Rank Adaptation)](https://arxiv.org/abs/2106.09685), trained to generate essays in the style of Paul Graham. ## Model Details ### Model Description This model adapts `google/gemma-3b-it` using LoRA fine-tuning on a collection of Paul Graham's essays. It is intended for creative writing and stylistic emulation of Paul Graham’s tone and argumentative structure. - **Developed by:** Arthur Mello - **Model type:** Causal Language Model (Decoder-only) - **Language(s) (NLP):** English - **Finetuned from model:** `google/gemma-3b-it` ## Uses ### Direct Use The model can be used to generate essays or long-form answers in the style of Paul Graham. It is useful for educational, literary, or entertainment purposes. ### Out-of-Scope Use - Factual question answering (it may hallucinate) - Use in high-risk domains (e.g., legal, medical) - Emulating real individuals without clear consent ## Bias, Risks, and Limitations The model inherits biases from the original Gemma model and the Paul Graham dataset. Since the dataset reflects a single author's perspective, generated outputs may lean toward specific ideological or philosophical stances. ### Recommendations Users should review outputs carefully and avoid using the model as a source of truth or factual information. ## How to Get Started with the Model ```python from transformers import AutoTokenizer, AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("arthurmello/gemma-3b-it-paul-graham-lora") tokenizer = AutoTokenizer.from_pretrained("arthurmello/gemma-3b-it-paul-graham-lora") prompt = "user\nWrite an essay on the future of AI\nmodel\n" inputs = tokenizer(prompt, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=300) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Training Details ### Training Data The model was fine-tuned on a cleaned dataset of Paul Graham’s essays, formatted into a chat-like structure using a custom `chat_template`. **Dataset:** [`arthurmello/paul-graham-essays`](https://huggingface.co/datasets/arthurmello/paul-graham-essays) ### Training Procedure #### Training Hyperparameters - **LoRA rank:** 16 - **LoRA alpha:** 64 - **LoRA dropout:** 0.05 - **Batch size:** 1 (gradient accumulation = 4) - **Max sequence length:** 1500 tokens - **Learning rate:** 1e-4 - **Precision:** bf16 - **Epochs:** 10 - **Scheduler:** cosine - **Weight decay:** 0.1 ## Evaluation ### Metrics - **Training loss:** Decreased from 2.57 to 1.80 - **Validation loss:** Plateaued around 2.5–2.6, indicating stability with limited overfitting ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute). - **Hardware Type:** L4 GPU - **Hours used:** ~30 min - **Cloud Provider:** Google Colab Pro - **Compute Region:** Europe ## Technical Specifications ### Model Architecture and Objective Decoder-only transformer model with LoRA adapters applied to key projection layers, attention outputs, and feedforward projections. ### Compute Infrastructure #### Hardware - 1× NVIDIA L4 (22.5GB) #### Software - Python 3.11 - PyTorch - Hugging Face Transformers, PEFT, TRL ## Model Card Contact For questions or feedback, contact [@arthurmello](https://huggingface.co/arthurmello) or [LinkedIn](https://www.linkedin.com/in/melloarthur/)