Model Card for Flower Color Classification AutoML Predictor
This model predicts flower colors (Red, Yellow, Orange, Purple, Pink, White) based on physical characteristics using AutoGluon's AutoML capabilities.
Model Details
Model Description
This AutoML model classifies flowers into 6 color categories based on physical measurements including flower diameter, petal dimensions, petal count, and stem height. The model was trained using AutoGluon's TabularPredictor with automated hyperparameter tuning and model selection.
- Developed by: [Mary Zhang]
- Model type: AutoML Ensemble (WeightedEnsemble_L2)
- Language(s): English
- License: MIT
- Finetuned from model: N/A (trained from scratch)
Model Sources
- Repository: [https://huggingface.co/maryzhang/24679-tabular-autolguon-predictor-flower-color]
- Dataset: scottymcgee/flowers
Uses
Direct Use
This model can be used to predict flower colors based on physical measurements. It's suitable for:
- Educational purposes in machine learning and AutoML
- Botanical classification assistance
- Demonstrating tabular classification techniques
Out-of-Scope Use
- This model should not be used for scientific flower taxonomy or species identification
- Not intended for production botanical research without validation
- Performance may degrade on flower types significantly different from the training data
Bias, Risks, and Limitations
- Limited scope: Trained on a small synthetic dataset (300 samples) derived from 30 original flower measurements
- Synthetic data bias: Model may overfit to synthetic data patterns that don't reflect natural variation
- Generalization concerns: Perfect validation performance (100% accuracy) suggests potential overfitting
- Small feature set: Only 5 physical measurements may not capture full complexity of flower color determination
- Class imbalance: Some colors may be underrepresented in the original dataset
Recommendations
Users should be aware that this model achieved perfect performance on synthetic validation data, which may not reflect real-world performance. For practical applications, additional validation on diverse, real flower datasets is recommended.
How to Get Started with the Model
import cloudpickle
import pandas as pd
from huggingface_hub import hf_hub_download
# Download and load the model
model_path = hf_hub_download(
repo_id="your-username/flower-color-automl-predictor",
filename="autogluon_predictor.pkl"
)
with open(model_path, "rb") as f:
predictor = cloudpickle.load(f)
# Prepare sample data
sample_data = pd.DataFrame({
'flower_diameter_cm': [5.5],
'petal_length_cm': [3.0],
'petal_width_cm': [1.2],
'petal_count': [15],
'stem_height_cm': [45.0]
})
# Make predictions
prediction = predictor.predict(sample_data)
probabilities = predictor.predict_proba(sample_data)
print(f"Predicted color: {prediction[0]}")
print(f"Probabilities: {probabilities.iloc[0].to_dict()}")
Training Details
Training Data
The model was trained on the scottymcgee/flowers dataset:
- Source: 30 original flower measurements expanded to 300 synthetic samples
- Features: 5 numerical features (flower_diameter_cm, petal_length_cm, petal_width_cm, petal_count, stem_height_cm)
- Target: 6 color classes (Red, Yellow, Orange, Purple, Pink, White)
- Synthetic augmentation: Bootstrap sampling with controlled Gaussian noise (10% of feature standard deviation)
Training Procedure
Training Hyperparameters
- Framework: AutoGluon TabularPredictor
- Time limit: 300 seconds (5 minutes)
- Preset: best_quality
- Problem type: multiclass
- Evaluation metric: accuracy
- Training regime: Automated hyperparameter optimization
Speeds, Sizes, Times
- Training time: ~300 seconds
- Best model: WeightedEnsemble_L2 (ensemble method)
- Training data: 240 samples (80% of augmented data)
- Validation data: 60 samples (20% of augmented data)
Evaluation
Testing Data, Factors & Metrics
Testing Data
- Validation set: 60 samples from augmented data (synthetic)
- Test set: 30 samples from original data (real measurements)
Metrics
- Primary metric: Accuracy
- Secondary metrics: Weighted F1-Score, Macro F1-Score
- Classification report: Per-class precision, recall, F1-score
Results
Validation Set Performance (Synthetic Data)
- Accuracy: 100.00%
- Weighted F1-Score: 100.00%
- Macro F1-Score: 100.00%
Test Set Performance (Original Data)
- Accuracy: [TO_UPDATE - run on original data]
- Weighted F1-Score: [TO_UPDATE]
- Macro F1-Score: [TO_UPDATE]
Per-Class Performance (Validation Set)
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Orange | 1.00 | 1.00 | 1.00 | 10 |
| Pink | 1.00 | 1.00 | 1.00 | 18 |
| Purple | 1.00 | 1.00 | 1.00 | 6 |
| Red | 1.00 | 1.00 | 1.00 | 10 |
| White | 1.00 | 1.00 | 1.00 | 10 |
| Yellow | 1.00 | 1.00 | 1.00 | 6 |
Summary
The model achieved perfect performance on the validation set, suggesting either excellent learning of the underlying patterns or potential overfitting to the synthetic data. Performance on the original test set will provide better insight into real-world generalization.
Environmental Impact
Training was performed using AutoGluon's efficient AutoML framework with a 5-minute time limit, minimizing computational resource usage.
Technical Specifications
Model Architecture and Objective
- Base framework: AutoGluon TabularPredictor
- Best model: WeightedEnsemble_L2 (ensemble of multiple base models)
- Objective: Multi-class classification with 6 target classes
- Feature preprocessing: Automated by AutoGluon
- Model selection: Automated hyperparameter optimization
Compute Infrastructure
Software
- AutoGluon: Latest version
- Python: 3.x
- Dependencies: pandas, scikit-learn, cloudpickle
Citation
@model{flower_color_automl_2024,
title={Flower Color Classification using AutoGluon},
author={[Your Name]},
year={2024},
url={https://huggingface.co/your-username/flower-color-automl-predictor}
}
Dataset Citation:
@dataset{scottymcgee_flowers_2024,
title={Tabular Flower Data},
author={Scotty McGee},
year={2024},
url={https://huggingface.co/datasets/scottymcgee/flowers}
}
Model Card Authors
Mary Zhang
Model Card Contact
AI Usage
Claude used for editing functions and debugging code.
Dataset used to train maryzhang/24679-tabular-autolguon-predictor-flower-color
Evaluation results
- Validation Accuracy on Tabular Flower Dataself-reported1.000
- Weighted F1-Score on Tabular Flower Dataself-reported1.000
- Macro F1-Score on Tabular Flower Dataself-reported1.000