03062025_V2_UMAP_Embedding_Classifier

This repository contains two final AutoGluon TabularPredictor models (binary and multi-class) built using UMAP-reduced embeddings from the Alibaba-NLP/gte-large-en-v1.5 model.

Key Details

  • UMAP for Binary Classification: Best n_components tuned via Optuna = 11.
  • UMAP for Multi-class Classification: Best n_components tuned via Optuna = 43.
  • Data: 112 technical questions with tiering classifications (0–4).
  • Performance Metrics:
    • Binary: Accuracy β‰ˆ95.65%, F1 β‰ˆ0.97, ROC AUC β‰ˆ0.91.
    • Multi-class: Accuracy β‰ˆ56.52%, F1 β‰ˆ0.59, ROC AUC β‰ˆ0.74.

Usage

  1. Loading the Models:

    from autogluon.tabular import TabularPredictor
    binary_predictor = TabularPredictor.load("binary_final_model")
    multi_predictor = TabularPredictor.load("multiclass_final_model")
    
  2. Preprocessing: Generate embeddings for your input text using the Alibaba-NLP/gte-large-en-v1.5 model and apply the UMAP transformation with the provided reducer files (umap_reducer_binary.joblib and umap_reducer_multi.joblib).

  3. Prediction: Use predict() and predict_proba() to obtain predictions.

License

This project is licensed under the Apache-2.0 License.

Contact

For questions or collaboration, please contact LeiPricingManager.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results