YAML Metadata
Warning:
The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other
T5-Base Fine-tuned for Spotify Features Prediction
T5-Base fine-tuned to convert natural language prompts into Spotify audio feature JSON
Model Details
- Base Model: t5-base
- Model Type: Text-to-JSON generation
- Language: English
- Task: Convert natural language music preferences into Spotify audio feature JSON objects
- Fine-tuning Dataset: Custom dataset of prompts to Spotify audio features
Training Configuration
- Epochs: 7
- Learning Rate: 3e-4
- Batch Size: 8 (per device)
- Gradient Accumulation Steps: 4
- Scheduler: Cosine with warmup
- Optimizer: AdamW
- Max Length: 256 tokens
- Precision: bfloat16
Usage
from transformers import T5ForConditionalGeneration, T5Tokenizer
import json
# Load model and tokenizer
model = T5ForConditionalGeneration.from_pretrained("afsagag/t5-spotify-features")
tokenizer = T5Tokenizer.from_pretrained("afsagag/t5-spotify-features")
# Example usage
prompt = "I want energetic dance music with high energy and danceability"
input_text = f"prompt: {prompt}"
# Tokenize and generate
input_ids = tokenizer(input_text, return_tensors="pt", max_length=256, truncation=True).input_ids
outputs = model.generate(
input_ids,
max_length=256,
num_beams=4,
early_stopping=True,
do_sample=False
)
# Decode result
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
# Parse JSON output
try:
spotify_features = json.loads(result)
print("Generated Spotify Features:", spotify_features)
except json.JSONDecodeError:
print("Generated text is not valid JSON")
Expected Output Format
The model generates JSON objects with Spotify audio features:
{
"danceability": 0.85,
"energy": 0.90,
"valence": 0.75,
"acousticness": 0.15,
"instrumentalness": 0.05,
"speechiness": 0.08,
}
Metrics
- Per-set Mean Absolute Error: Measures average prediction accuracy across feature sets
- Per-set Root Mean Squared Error: Measures prediction variance
- Per-feature Correlation: Pearson correlation for individual audio features
Model Files
config.json
: Model configurationpytorch_model.bin
: Model weightstokenizer.json
: Tokenizer vocabularytokenizer_config.json
: Tokenizer configurationspecial_tokens_map.json
: Special token mappings
Limitations
- Model may occasionally generate invalid JSON that requires post-processing
- Trained on specific prompt formats starting with "prompt: "
- Performance depends on similarity to training data distribution
- May not generalize well to very abstract or unusual music descriptions
Training Data
The model was trained on a custom dataset pairing natural language music descriptions with corresponding Spotify audio feature values.
Ethical Considerations
This model generates music preference predictions and should not be used as the sole basis for music recommendation systems without human oversight.
- Downloads last month
- 20
Model tree for afsagag/t5-spotify-features
Base model
google-t5/t5-base