Frequently Asked Questions
Are my data and models secure?
Yes, your data and models are secure. AutoTrain uses the Hugging Face Hub to store your data and models. All your data and models are uploaded to your Hugging Face account as private repositories and are only accessible by you. Read more about security here.
Do you upload my data to the Hugging Face Hub?
AutoTrain will not upload your dataset to the Hub if you are using the local backend or training in the same space. AutoTrain will push your dataset to the Hub if you are using features like: DGX Cloud or using local CLI to train on Hugging Face’s infrastructure.
You can safely remove the dataset from the Hub after training is complete. If uploaded, the dataset will be stored in your Hugging Face account as a private repository and will only be accessible by you and the training process. It is not used once the training is complete.
My training space paused for no reason mid-training
AutoTrain Training Spaces will pause itself after training is done (or failed). This is done to save resources and costs. If your training failed, you can still see the space logs and find out what went wrong. Note: you won’t be able to retrive the logs if you restart the space.
Another reason for the space to pause is if the space is space’s sleep time kicking in. If you have a long running training job, you must set the sleep time to a much higher value. The space will anyways pause itself after the training is done thus saving you costs.
I get error Your installed package nvidia-ml-py is corrupted. Skip patch functions
This error can be safely ignored. It is a warning from the nvitop
library and does not affect the functionality of AutoTrain.
I get 409 conflict error when using the UI
This error occurs when you try to create a project with the same name as an existing project. To resolve this error, you can either delete the existing project or create a new project with a different name.
This error can also occur when you are trying to train a model while a model is already training in the same space or locally.
The model I want to use doesn’t show up in the model selection dropdown.
If the model you want to use is not available in the model selection dropdown,
you can add it in the environment variable AUTOTRAIN_CUSTOM_MODELS
in the space settings.
For example, if you want to add the xxx/yyy
model, go to space settings, create a variable named AUTOTRAIN_CUSTOM_MODELS
and set the value to xxx/yyy
.
You can also pass the model name as query parameter in the URL. For example, if you want to use the xxx/yyy
model,
you can use the URL https://huggingface.co/spaces/your_autotrain_space?custom_models=xxx/yyy
.
How do I use AutoTrain locally?
AutoTrain can be used locally by installing the AutoTrain Advanced pypi package. You can read more in Use AutoTrain Locally section.
Can I run AutoTrain on Colab?
To start the UI on Colab, you can simply click on the following link:
Please note, to run the app on Colab, you will need an ngrok token. You can get one by signing up for free on ngrok. This is because Colab does not allow exposing ports to the internet directly.
To use the CLI instead on Colab, you can follow the same instructions as for using AutoTrain locally.
Does AutoTrain have a docker image?
Yes, AutoTrain has a docker image. You can find the docker image on Docker Hub here.
Is windows supported?
Unfortunately, AutoTrain does not officially support Windows at the moment. You can try using WSL (Windows Subsystem for Linux) to run AutoTrain on Windows or the docker image.
“—project-name” argument can not be set as a directory
--project-name
argument should not be a path. it will be created where autotrain command is run.
This parameter must be alphanumeric and can contain hypens.
I am getting config.json not found error
This means you have trained an adapter model (peft=true) which doesnt generate config.json.
It doesnt matter though, the model can still be loaded with AutoModelForCausalLM or with Inference endpoints.
If you want to merge weights with base models, you can use autotrain tools
. Please read about it in miscelleneous section.
Does autotrain support multi-gpu training?
Yes, autotrain supports multi-gpu training. AutoTrain will determine on its own if the user is running the command on a multi-gpu setup and will use multi-gpu ddp if number of gpus is greater than 1 and less than 4 and deepspeed if number of gpus is greater than or equal to 4.
How can i use a hub dataset with multiple configs?
If your hub dataset has multiple configs, you can use train_split
parameter to specify the both the config and the split.
For example, in this dataset here,
there are multiple configs: pair
, pair-class
, pair-score
and triplet
.
If i want to use train
split of pair-class
config, i can use write pair-class:train
as train_split
in the UI or the CLI / config.
An example config is shown below:
data:
path: sentence-transformers/all-nli
train_split: pair-class:train
valid_split: pair-class:test
column_mapping:
sentence1_column: premise
sentence2_column: hypothesis
target_column: label