Tabular Classification
Safetensors
xiyuanz commited on
Commit
80a938a
·
verified ·
1 Parent(s): d8357f7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -6
README.md CHANGED
@@ -1,9 +1,91 @@
1
  ---
2
- tags:
3
- - model_hub_mixin
4
- - pytorch_model_hub_mixin
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Library: [More Information Needed]
9
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ pipeline_tag: tabular-classification
 
4
  ---
5
 
6
+ # TabPFNMix Classifier
7
+
8
+ TabPFNMix classifier is a tabular foundation model that is pre-trained on purely synthetic datasets sampled from a mix of random classifiers.
9
+
10
+ ## Architecture
11
+
12
+ TabPFNMix is based on a 12-layer encoder-decoder Transformer of 37 M parameters. We use a pre-training strategy incorporating in-context learning, similar to that used by TabPFN and TabForestPFN.
13
+
14
+ ## Usage
15
+
16
+ To use TabPFNMix classifier, install AutoGluon by running:
17
+
18
+ ```sh
19
+ pip install autogluon
20
+ ```
21
+
22
+ A minimal example showing how to perform fine-tuning and inference using the TabPFNMix classifier:
23
+
24
+ ```python
25
+ import pandas as pd
26
+
27
+ from autogluon.tabular import TabularPredictor
28
+
29
+
30
+ if __name__ == '__main__':
31
+ train_data = pd.read_csv('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
32
+ subsample_size = 5000
33
+ if subsample_size is not None and subsample_size < len(train_data):
34
+ train_data = train_data.sample(n=subsample_size, random_state=0)
35
+ test_data = pd.read_csv('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
36
+
37
+ tabpfnmix_default = {
38
+ "model_path_classifier": "autogluon/tabpfn-mix-1.0-classifier",
39
+ "model_path_regressor": "autogluon/tabpfn-mix-1.0-regressor",
40
+ "n_ensembles": 1,
41
+ "max_epochs": 30,
42
+ }
43
+
44
+ hyperparameters = {
45
+ "TABPFNMIX": [
46
+ tabpfnmix_default,
47
+ ],
48
+ }
49
+
50
+ label = "class"
51
+
52
+ predictor = TabularPredictor(label=label)
53
+ predictor = predictor.fit(
54
+ train_data=train_data,
55
+ hyperparameters=hyperparameters,
56
+ verbosity=3,
57
+ )
58
+
59
+ predictor.leaderboard(test_data, display=True)
60
+ ```
61
+
62
+ ## Citation
63
+
64
+ If you find TabPFNMix useful for your research, please consider citing the associated papers:
65
+
66
+ ```
67
+ @article{erickson2020autogluon,
68
+ title={Autogluon-tabular: Robust and accurate automl for structured data},
69
+ author={Erickson, Nick and Mueller, Jonas and Shirkov, Alexander and Zhang, Hang and Larroy, Pedro and Li, Mu and Smola, Alexander},
70
+ journal={arXiv preprint arXiv:2003.06505},
71
+ year={2020}
72
+ }
73
+
74
+ @article{hollmann2022tabpfn,
75
+ title={Tabpfn: A transformer that solves small tabular classification problems in a second},
76
+ author={Hollmann, Noah and M{\"u}ller, Samuel and Eggensperger, Katharina and Hutter, Frank},
77
+ journal={arXiv preprint arXiv:2207.01848},
78
+ year={2022}
79
+ }
80
+
81
+ @article{breejen2024context,
82
+ title={Why In-Context Learning Transformers are Tabular Data Classifiers},
83
+ author={Breejen, Felix den and Bae, Sangmin and Cha, Stephen and Yun, Se-Young},
84
+ journal={arXiv preprint arXiv:2405.13396},
85
+ year={2024}
86
+ }
87
+ ```
88
+
89
+ ## License
90
+
91
+ This project is licensed under the Apache-2.0 License.