--- title: Submission Template emoji: 🔥 colorFrom: yellow colorTo: green sdk: docker pinned: false --- # Climate Disinformation Classification using XGBOOST over TF-IDF vectorized input optimized using RandomizedSearchCV ## Model Description This is a model based on XGBOOST classifier for TF-IDF vectorized texts for the Frugal AI Challenge 2024, specifically for the text classification task of identifying climate disinformation. The model serves as a performance floor. ### Intended Use - **Primary intended uses**: Comparison for climate disinformation classification models - **Primary intended users**: Researchers and developers participating in the Frugal AI Challenge - **Out-of-scope use cases**: Not intended for production use or real-world classification tasks ## Training Data The model uses the QuotaClimat/frugalaichallenge-text-train dataset: - Size: ~6000 examples - Split: 80% train, 20% test - 8 categories of climate disinformation claims ### Labels 0. No relevant claim detected 1. Global warming is not happening 2. Not caused by humans 3. Not bad or beneficial 4. Solutions harmful/unnecessary 5. Science is unreliable 6. Proponents are biased 7. Fossil fuels are needed ## Performance ### Metrics - **Accuracy**: 0.9815384615384616 - **Environmental Impact**: - Emissions tracked in gCO2eq: 0.19426531051455168 - Energy consumption tracked in Wh: 0.5262726046395284 ### Model Architecture The model implements a random choice between the 8 possible labels, serving as the simplest possible baseline. ## Environmental Impact Environmental impact is tracked using CodeCarbon, measuring: - Carbon emissions during inference - Energy consumption during inference This tracking helps establish a baseline for the environmental impact of model deployment and inference. ## Limitations - Text Classification using XGBOOST - Input text vectorized with TF-IDF - XGBOOST parameter search with RandomizedSearchCV - Serves as baseline reference - Not suitable for any real-world applications ## Ethical Considerations - Dataset contains sensitive topics related to climate disinformation - Model makes random predictions and should not be used for actual classification - Environmental impact is tracked to promote awareness of AI's carbon footprint ```