--- library_name: transformers pipeline_tag: text-classification license: mit tags: - prompting - zero-shot - few-shot - football - sentiment - adaptive-retrieval model_name: Football Sentiment Prompting (0/1/5-shot) language: - en datasets: - james-kramer/football_news inference: false --- # Method Card — Football Sentiment Prompting (0/1/5-shot) ## TL;DR We compare zero-shot, adaptive one-shot, and adaptive 5-shot prompting for binary sentiment on football news. Same train/val/test as fine-tuning; we report metrics/CMs and discuss quality/latency/cost. ## Data - Dataset: `james-kramer/football_news` (Hugging Face) - Task: Binary sentiment (0=negative, 1=positive) - Splits: Stratified 80/10/10 - Cleaning: strip text; drop empty/NA ## Models / APIs - **LLM used:** gpt-4o-mini (OpenAI API, September 2025 snapshot) - **Similarity backend:** sklearn TF-IDF + cosine similarity ## Prompting Strategy - Zero-shot: instruction + schema (return 0 or 1 only). - Adaptive one-shot: retrieve most similar train example and include it as exemplar. - Adaptive 5-shot: retrieve top-5 similar exemplars. ## Prompt Templates **Zero-shot** You are a concise sentiment classifier. Decide if the following football-related sentence is positive or negative. Only answer with a single word: "positive" or "negative". Sentence: "text", Answer: **Adaptive One-shot** You are a concise sentiment classifier for football news. Decide if each sentence is positive or negative. Only answer with one word. Example: [], Sentence: "ex_text", Label: "ex_label", Now classify the target sentence. Sentence: "text", Answer: **Adaptive K-shot (e.g., K=5)** You are a concise sentiment classifier for football news. Decide if the sentence is positive or negative. Only answer with one word. examples: [], Sentence: "text", Answer: ## Evaluation Protocol - Metrics: accuracy, precision, recall, F1; confusion matrix - Latency: avg wall-clock per example - Seed: 42 - Reproducibility: prompts/selection/eval code in this repo ## Results (Val/Test) - Val: - Zero-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.416s/ex - One-shot: acc 0.5, f1 0.2857142857, cm [[4, 1], [4, 1]], ~0.304s/ex - 5-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.451s/ex - Test: - Zero-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.282s/ex - One-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.354s/ex - 5-shot: acc 0.7, f1 0.5714285714, cm [[5, 0], [3, 2]], ~0.449s/ex ## Tradeoffs - Quality: zero-shot ≈ 5-shot ≥ one-shot on this dataset. - Latency: increases with K (see Results section; ~0.28s/ex for zero-shot → ~0.45s/ex for 5-shot). - Cost: scales roughly linearly with prompt length (token count). For this dataset (~20 examples), 5-shot prompts were ~3× the token usage of zero-shot. ## Limits & Risks - No leakage: retrieve exemplars from **train** only. - Bias: sports phrasing may sway sentiment; small data → instability. ## Reproducibility - Code: `prompts/`, `selection.py`, `evaluate_prompting.py` - Seed: 42 - Python ≥ 3.10 ## Usage Disclosure This card and pipeline were organized with GenAI assistance; experiments and results were implemented and verified by the author.