Model Card for Flower Color Classification AutoML Predictor

This model predicts flower colors (Red, Yellow, Orange, Purple, Pink, White) based on physical characteristics using AutoGluon's AutoML capabilities.

Model Details

Model Description

This AutoML model classifies flowers into 6 color categories based on physical measurements including flower diameter, petal dimensions, petal count, and stem height. The model was trained using AutoGluon's TabularPredictor with automated hyperparameter tuning and model selection.

  • Developed by: [Mary Zhang]
  • Model type: AutoML Ensemble (WeightedEnsemble_L2)
  • Language(s): English
  • License: MIT
  • Finetuned from model: N/A (trained from scratch)

Model Sources

Uses

Direct Use

This model can be used to predict flower colors based on physical measurements. It's suitable for:

  • Educational purposes in machine learning and AutoML
  • Botanical classification assistance
  • Demonstrating tabular classification techniques

Out-of-Scope Use

  • This model should not be used for scientific flower taxonomy or species identification
  • Not intended for production botanical research without validation
  • Performance may degrade on flower types significantly different from the training data

Bias, Risks, and Limitations

  • Limited scope: Trained on a small synthetic dataset (300 samples) derived from 30 original flower measurements
  • Synthetic data bias: Model may overfit to synthetic data patterns that don't reflect natural variation
  • Generalization concerns: Perfect validation performance (100% accuracy) suggests potential overfitting
  • Small feature set: Only 5 physical measurements may not capture full complexity of flower color determination
  • Class imbalance: Some colors may be underrepresented in the original dataset

Recommendations

Users should be aware that this model achieved perfect performance on synthetic validation data, which may not reflect real-world performance. For practical applications, additional validation on diverse, real flower datasets is recommended.

How to Get Started with the Model

import cloudpickle
import pandas as pd
from huggingface_hub import hf_hub_download

# Download and load the model
model_path = hf_hub_download(
    repo_id="your-username/flower-color-automl-predictor",
    filename="autogluon_predictor.pkl"
)

with open(model_path, "rb") as f:
    predictor = cloudpickle.load(f)

# Prepare sample data
sample_data = pd.DataFrame({
    'flower_diameter_cm': [5.5],
    'petal_length_cm': [3.0],
    'petal_width_cm': [1.2],
    'petal_count': [15],
    'stem_height_cm': [45.0]
})

# Make predictions
prediction = predictor.predict(sample_data)
probabilities = predictor.predict_proba(sample_data)

print(f"Predicted color: {prediction[0]}")
print(f"Probabilities: {probabilities.iloc[0].to_dict()}")

Training Details

Training Data

The model was trained on the scottymcgee/flowers dataset:

  • Source: 30 original flower measurements expanded to 300 synthetic samples
  • Features: 5 numerical features (flower_diameter_cm, petal_length_cm, petal_width_cm, petal_count, stem_height_cm)
  • Target: 6 color classes (Red, Yellow, Orange, Purple, Pink, White)
  • Synthetic augmentation: Bootstrap sampling with controlled Gaussian noise (10% of feature standard deviation)

Training Procedure

Training Hyperparameters

  • Framework: AutoGluon TabularPredictor
  • Time limit: 300 seconds (5 minutes)
  • Preset: best_quality
  • Problem type: multiclass
  • Evaluation metric: accuracy
  • Training regime: Automated hyperparameter optimization

Speeds, Sizes, Times

  • Training time: ~300 seconds
  • Best model: WeightedEnsemble_L2 (ensemble method)
  • Training data: 240 samples (80% of augmented data)
  • Validation data: 60 samples (20% of augmented data)

Evaluation

Testing Data, Factors & Metrics

Testing Data

  • Validation set: 60 samples from augmented data (synthetic)
  • Test set: 30 samples from original data (real measurements)

Metrics

  • Primary metric: Accuracy
  • Secondary metrics: Weighted F1-Score, Macro F1-Score
  • Classification report: Per-class precision, recall, F1-score

Results

Validation Set Performance (Synthetic Data)

  • Accuracy: 100.00%
  • Weighted F1-Score: 100.00%
  • Macro F1-Score: 100.00%

Test Set Performance (Original Data)

  • Accuracy: [TO_UPDATE - run on original data]
  • Weighted F1-Score: [TO_UPDATE]
  • Macro F1-Score: [TO_UPDATE]

Per-Class Performance (Validation Set)

Class Precision Recall F1-Score Support
Orange 1.00 1.00 1.00 10
Pink 1.00 1.00 1.00 18
Purple 1.00 1.00 1.00 6
Red 1.00 1.00 1.00 10
White 1.00 1.00 1.00 10
Yellow 1.00 1.00 1.00 6

Summary

The model achieved perfect performance on the validation set, suggesting either excellent learning of the underlying patterns or potential overfitting to the synthetic data. Performance on the original test set will provide better insight into real-world generalization.

Environmental Impact

Training was performed using AutoGluon's efficient AutoML framework with a 5-minute time limit, minimizing computational resource usage.

Technical Specifications

Model Architecture and Objective

  • Base framework: AutoGluon TabularPredictor
  • Best model: WeightedEnsemble_L2 (ensemble of multiple base models)
  • Objective: Multi-class classification with 6 target classes
  • Feature preprocessing: Automated by AutoGluon
  • Model selection: Automated hyperparameter optimization

Compute Infrastructure

Software

  • AutoGluon: Latest version
  • Python: 3.x
  • Dependencies: pandas, scikit-learn, cloudpickle

Citation

@model{flower_color_automl_2024,
  title={Flower Color Classification using AutoGluon},
  author={[Your Name]},
  year={2024},
  url={https://huggingface.co/your-username/flower-color-automl-predictor}
}

Dataset Citation:

@dataset{scottymcgee_flowers_2024,
  title={Tabular Flower Data},
  author={Scotty McGee},
  year={2024},
  url={https://huggingface.co/datasets/scottymcgee/flowers}
}

Model Card Authors

Mary Zhang

Model Card Contact

[email protected]

AI Usage

Claude used for editing functions and debugging code.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train maryzhang/24679-tabular-autolguon-predictor-flower-color

Evaluation results