Inference Providers documentation
Audio Classification
Audio Classification
Audio classification is the task of assigning a label or class to a given audio.
Example applications:
- Recognizing which command a user is giving
- Identifying a speaker
- Detecting the genre of a song
For more details about the audio-classification
task, check out its dedicated page! You will find examples and related materials.
Recommended models
- speechbrain/google_speech_command_xvector: An easy-to-use model for command recognition.
- ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition: An emotion recognition model.
- facebook/mms-lid-126: A language identification model.
Explore all available models and find the one that suits you best here.
Using the API
No snippet available for this task.
API specification
Request
Headers | ||
---|---|---|
authorization | string | Authentication header in the form 'Bearer: hf_****' when hf_**** is a personal user access token with “Inference Providers” permission. You can generate one from your settings page. |
Payload | ||
---|---|---|
inputs* | string | The input audio data as a base64-encoded string. If no parameters are provided, you can also provide the audio data as a raw bytes payload. |
parameters | object | |
function_to_apply | enum | Possible values: sigmoid, softmax, none. |
top_k | integer | When specified, limits the output to the top K most probable classes. |
Response
Body | ||
---|---|---|
(array) | object[] | Output is an array of objects. |
label | string | The predicted class label. |
score | number | The corresponding probability. |