--- language: - en license: apache-2.0 library_name: autogluon tags: - binary-classification - multi-class-classification - text-classification - embeddings - umap - autogluon datasets: - 112_Tiering_Questions_02.28.2025.json model-index: - name: "03062025_V2_UMAP_Embedding_Classifier (Binary)" results: - task: type: text-classification name: Binary Classification dataset: name: 112_Tiering_Questions_02.28.2025.json type: tabular metrics: - name: Accuracy type: accuracy value: 0.9565 - name: F1 type: f1 value: 0.97 - name: ROC AUC type: roc_auc value: 0.91 - name: "03062025_V2_UMAP_Embedding_Classifier (Multi-class)" results: - task: type: text-classification name: Multi-class Classification dataset: name: 112_Tiering_Questions_02.28.2025.json type: tabular metrics: - name: Accuracy type: accuracy value: 0.5652 - name: F1 type: f1 value: 0.59 - name: ROC AUC type: roc_auc value: 0.74 --- # 03062025_V2_UMAP_Embedding_Classifier This repository contains two final AutoGluon TabularPredictor models (binary and multi-class) built using UMAP-reduced embeddings from the [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) model. ## Key Details - **UMAP for Binary Classification**: Best n_components tuned via Optuna = 11. - **UMAP for Multi-class Classification**: Best n_components tuned via Optuna = 43. - **Data**: 112 technical questions with tiering classifications (0–4). - **Performance Metrics**: - **Binary**: Accuracy ≈95.65%, F1 ≈0.97, ROC AUC ≈0.91. - **Multi-class**: Accuracy ≈56.52%, F1 ≈0.59, ROC AUC ≈0.74. ## Usage 1. **Loading the Models**: ```python from autogluon.tabular import TabularPredictor binary_predictor = TabularPredictor.load("binary_final_model") multi_predictor = TabularPredictor.load("multiclass_final_model") ``` 2. **Preprocessing**: Generate embeddings for your input text using the Alibaba-NLP/gte-large-en-v1.5 model and apply the UMAP transformation with the provided reducer files (umap_reducer_binary.joblib and umap_reducer_multi.joblib). 3. **Prediction**: Use predict() and predict_proba() to obtain predictions. ## License This project is licensed under the Apache-2.0 License. ## Contact For questions or collaboration, please contact LeiPricingManager.