frugal-ai-challenge

Sleeping

App Files Files Community

frugal-ai-challenge / README.md

Thomas Boulier

docs: add evaluation instructions in README.md

22f1da2 9 months ago

preview code

raw

history blame contribute delete

2.34 kB

	---
	title: Frugal AI Challenge Submission
	emoji: 🌍
	colorFrom: blue
	colorTo: green
	sdk: docker
	pinned: false
	---


	# Models for Climate Disinformation Classification

	## Evaluate locally

	To evaluate the model locally, you can use the following command:

	```bash
	python main.py --config config_evaluation_{model_name}.json
	```

	where `{model_name}` is either `distilBERT` or `embeddingML`.


	## Models Description

	### DistilBERT Model

	The model uses the `distilbert-base-uncased` model from the Hugging Face Transformers library, fine-tuned on the
	training dataset (see below).

	### Embedding + ML Model

	The model uses a simple embedding layer followed by a classic ML model. Currently, the embedding layer is a simple
	TF-IDF vectorizer, and the ML model is a logistic regression.

	## Training Data

	The model uses the [`QuotaClimat/frugalaichallenge-text-train`](https://huggingface.co/datasets/QuotaClimat/frugalaichallenge-text-train) dataset:
	- Size: ~6000 examples
	- Split: 80% train, 20% test
	- 8 categories of climate disinformation claims

	### Labels
	0. No relevant claim detected
	1. Global warming is not happening
	2. Not caused by humans
	3. Not bad or beneficial
	4. Solutions harmful/unnecessary
	5. Science is unreliable
	6. Proponents are biased
	7. Fossil fuels are needed

	## Performance

	### Metrics
	- Accuracy: ~12.5% (random chance with 8 classes)
	- Environmental Impact:
	- Emissions tracked in gCO2eq
	- Energy consumption tracked in Wh

	### Model Architecture
	The model implements a random choice between the 8 possible labels, serving as the simplest possible baseline.

	## Environmental Impact

	Environmental impact is tracked using CodeCarbon, measuring:
	- Carbon emissions during inference
	- Energy consumption during inference

	This tracking helps establish a baseline for the environmental impact of model deployment and inference.

	## Limitations
	- Makes completely random predictions
	- No learning or pattern recognition
	- No consideration of input text
	- Serves only as a baseline reference
	- Not suitable for any real-world applications

	## Ethical Considerations

	- Dataset contains sensitive topics related to climate disinformation
	- Model makes random predictions and should not be used for actual classification
	- Environmental impact is tracked to promote awareness of AI's carbon footprint
	```