--- license: gpl-3.0 language: - en base_model: - openai-community/gpt2 datasets: - gofilipa/heritage_gender pipeline_tag: text-generation --- Fine-tuned on gpt-2, using commentaries from The Heritage Foundation on the topic of "[gender](https://www.heritage.org/gender?f%5B0%5D=content_type%3Acommentary)." Training setup: ```python import torch from transformers import ( pipeline, AutoModelForCausalLM, AutoTokenizer, ) from datasets import load_dataset from trl import SFTTrainer, SFTConfig # Add this to monitor MPS memory usage def print_mps_memory(): if torch.backends.mps.is_available(): print(f"MPS allocated: {torch.mps.current_allocated_memory() / 1024**3:.2f} GB") print(f"MPS cached: {torch.mps.driver_allocated_memory() / 1024**3:.2f} GB") # Call this periodically during training print_mps_memory() # Check if MPS is available if torch.backends.mps.is_available(): device = torch.device("mps") print("MPS device found.") else: device = torch.device("cpu") print("MPS device not found, using CPU.") tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2") tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right" model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2") model = model.to(device) # Move model to MPS ds = load_dataset("gofilipa/heritage_foundation-gender") # Limit dataset size for testing train_dataset = ds['train'].select(range(min(2000, len(ds['train'])))) # Use only first 2000 samples # Clear memory first if torch.backends.mps.is_available(): torch.mps.empty_cache() # Reduce training parameters for lower memory usage training_params = SFTConfig( output_dir="../checkpoints", per_device_train_batch_size=1, # Keep at 1 per_device_eval_batch_size=1, gradient_accumulation_steps=2, # Reduce from 4 to 2 num_train_epochs=3, # slowly increased as memory allows, from 1-3 learning_rate=2e-4, weight_decay=0.001, dataset_text_field="text", # Fixed: removed [:400] report_to="none", bf16=False, fp16=False, dataloader_pin_memory=False, remove_unused_columns=False, max_seq_length=512, # Add this to limit sequence length gradient_checkpointing=True, # Add this to save memory ) trainer = SFTTrainer( model = model, train_dataset = train_dataset, processing_class = tokenizer, args = training_params ) trainer.train() ```