Model Card for Flower Color Classification AutoML Predictor

This model predicts flower colors (Red, Yellow, Orange, Purple, Pink, White) based on physical characteristics using AutoGluon's AutoML capabilities.

Model Details

Model Description

This AutoML model classifies flowers into 6 color categories based on physical measurements including flower diameter, petal dimensions, petal count, and stem height. The model was trained using AutoGluon's TabularPredictor with automated hyperparameter tuning and model selection.

Developed by: [Mary Zhang]
Model type: AutoML Ensemble (WeightedEnsemble_L2)
Language(s): English
License: MIT
Finetuned from model: N/A (trained from scratch)

Model Sources

Repository: [https://huggingface.co/maryzhang/24679-tabular-autolguon-predictor-flower-color]
Dataset: scottymcgee/flowers

Uses

Direct Use

This model can be used to predict flower colors based on physical measurements. It's suitable for:

Educational purposes in machine learning and AutoML
Botanical classification assistance
Demonstrating tabular classification techniques

Out-of-Scope Use

This model should not be used for scientific flower taxonomy or species identification
Not intended for production botanical research without validation
Performance may degrade on flower types significantly different from the training data

Bias, Risks, and Limitations

Limited scope: Trained on a small synthetic dataset (300 samples) derived from 30 original flower measurements
Synthetic data bias: Model may overfit to synthetic data patterns that don't reflect natural variation
Generalization concerns: Perfect validation performance (100% accuracy) suggests potential overfitting
Small feature set: Only 5 physical measurements may not capture full complexity of flower color determination
Class imbalance: Some colors may be underrepresented in the original dataset

Recommendations

Users should be aware that this model achieved perfect performance on synthetic validation data, which may not reflect real-world performance. For practical applications, additional validation on diverse, real flower datasets is recommended.

How to Get Started with the Model

import cloudpickle
import pandas as pd
from huggingface_hub import hf_hub_download

# Download and load the model
model_path = hf_hub_download(
    repo_id="your-username/flower-color-automl-predictor",
    filename="autogluon_predictor.pkl"
)

with open(model_path, "rb") as f:
    predictor = cloudpickle.load(f)

# Prepare sample data
sample_data = pd.DataFrame({
    'flower_diameter_cm': [5.5],
    'petal_length_cm': [3.0],
    'petal_width_cm': [1.2],
    'petal_count': [15],
    'stem_height_cm': [45.0]
})

# Make predictions
prediction = predictor.predict(sample_data)
probabilities = predictor.predict_proba(sample_data)

print(f"Predicted color: {prediction[0]}")
print(f"Probabilities: {probabilities.iloc[0].to_dict()}")

Training Details

Training Data

The model was trained on the scottymcgee/flowers dataset:

Source: 30 original flower measurements expanded to 300 synthetic samples
Features: 5 numerical features (flower_diameter_cm, petal_length_cm, petal_width_cm, petal_count, stem_height_cm)
Target: 6 color classes (Red, Yellow, Orange, Purple, Pink, White)
Synthetic augmentation: Bootstrap sampling with controlled Gaussian noise (10% of feature standard deviation)

Training Procedure

Training Hyperparameters

Framework: AutoGluon TabularPredictor
Time limit: 300 seconds (5 minutes)
Preset: best_quality
Problem type: multiclass
Evaluation metric: accuracy
Training regime: Automated hyperparameter optimization

Speeds, Sizes, Times

Training time: ~300 seconds
Best model: WeightedEnsemble_L2 (ensemble method)
Training data: 240 samples (80% of augmented data)
Validation data: 60 samples (20% of augmented data)

Evaluation

Testing Data, Factors & Metrics

Testing Data

Validation set: 60 samples from augmented data (synthetic)
Test set: 30 samples from original data (real measurements)

Metrics

Primary metric: Accuracy
Secondary metrics: Weighted F1-Score, Macro F1-Score
Classification report: Per-class precision, recall, F1-score

Results

Validation Set Performance (Synthetic Data)

Accuracy: 100.00%
Weighted F1-Score: 100.00%
Macro F1-Score: 100.00%

Test Set Performance (Original Data)

Accuracy: [TO_UPDATE - run on original data]
Weighted F1-Score: [TO_UPDATE]
Macro F1-Score: [TO_UPDATE]

Per-Class Performance (Validation Set)

Class	Precision	Recall	F1-Score	Support
Orange	1.00	1.00	1.00	10
Pink	1.00	1.00	1.00	18
Purple	1.00	1.00	1.00	6
Red	1.00	1.00	1.00	10
White	1.00	1.00	1.00	10
Yellow	1.00	1.00	1.00	6

Summary

The model achieved perfect performance on the validation set, suggesting either excellent learning of the underlying patterns or potential overfitting to the synthetic data. Performance on the original test set will provide better insight into real-world generalization.

Environmental Impact

Training was performed using AutoGluon's efficient AutoML framework with a 5-minute time limit, minimizing computational resource usage.

Technical Specifications

Model Architecture and Objective

Base framework: AutoGluon TabularPredictor
Best model: WeightedEnsemble_L2 (ensemble of multiple base models)
Objective: Multi-class classification with 6 target classes
Feature preprocessing: Automated by AutoGluon
Model selection: Automated hyperparameter optimization

Compute Infrastructure

Software

AutoGluon: Latest version
Python: 3.x
Dependencies: pandas, scikit-learn, cloudpickle

Citation

@model{flower_color_automl_2024,
  title={Flower Color Classification using AutoGluon},
  author={[Your Name]},
  year={2024},
  url={https://huggingface.co/your-username/flower-color-automl-predictor}
}

Dataset Citation:

@dataset{scottymcgee_flowers_2024,
  title={Tabular Flower Data},
  author={Scotty McGee},
  year={2024},
  url={https://huggingface.co/datasets/scottymcgee/flowers}
}

Model Card Authors

Mary Zhang

Model Card Contact

[email protected]

AI Usage

Claude used for editing functions and debugging code.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train maryzhang/24679-tabular-autolguon-predictor-flower-color

Evaluation results

Validation Accuracy on Tabular Flower Data
self-reported

1.000
Weighted F1-Score on Tabular Flower Data
self-reported

1.000
Macro F1-Score on Tabular Flower Data
self-reported

1.000

View on Papers With Code