Spaces:

jacob-c
/

largermodel_lyrics_generation

Paused

App Files Files Community

root commited on Mar 24

Commit

a459327

1 Parent(s): d7fb7e8

ss

Browse files

Files changed (8) hide show

.gitattributes +0 -3
.gitignore +1 -0
DEPLOYMENT.md +42 -0
README.md +40 -7
app.py +186 -0
example.py +38 -0
requirements.txt +12 -0
utils.py +42 -0

.gitattributes CHANGED Viewed

@@ -16,16 +16,13 @@
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text

 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Byte-compiled / optimized / DLL files

DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,42 @@

+# Deploying to Hugging Face Spaces
+This guide explains how to deploy the Music Genre Classifier & Lyrics Generator to Hugging Face Spaces.
+## Prerequisites
+1. A Hugging Face account
+2. Access to the Llama 3.1 8B Instruct model (requires acceptance of the model license)
+3. A Hugging Face API token
+## Deployment Steps
+### 1. Create a New Space
+1. Go to the Hugging Face website and log in
+2. Navigate to "Spaces" in the top navigation
+3. Click "Create new Space"
+4. Choose "Gradio" as the SDK
+5. Give your Space a name and description
+6. Select "T4 GPU" as the hardware
+### 2. Set up Environment Variables
+Set up your Hugging Face access token as an environment variable:
+1. Go to your profile settings in Hugging Face
+2. Navigate to "Access Tokens" and create a new token with "write" access
+3. In your Space settings, under "Repository secrets", add a new secret:
+   - Name: `HF_TOKEN`
+   - Value: Your Hugging Face access token
+### 3. Upload the Files
+Upload all the files from this repository to your Space.
+### 4. Wait for Deployment
+Hugging Face will automatically build and deploy your Space. This may take a few minutes, especially since it needs to download the models.
+### 5. Access Your Application
+Once deployed, you can access your application on your Hugging Face Space URL.

README.md CHANGED Viewed

@@ -1,14 +1,47 @@
 ---
-title: Largermodel Lyrics Generation
-emoji: 👁
-colorFrom: pink
-colorTo: green
 sdk: gradio
-sdk_version: 5.22.0
 app_file: app.py
 pinned: false
 license: mit
-short_description: lyrics generation with larger model
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Music Genre Classifier & Lyrics Generator
+emoji: 🎵
+colorFrom: indigo
+colorTo: purple
 sdk: gradio
+sdk_version: 4.12.0
 app_file: app.py
 pinned: false
 license: mit
+short_description: AI-powered music genre detection and genre-specific lyrics generation
 ---
+# Music Genre Classifier & Lyrics Generator
+This Hugging Face Space application provides two AI-powered features:
+1. **Music Genre Classification**: Upload a music file and get an analysis of its genre using the [dima806/music_genres_classification](https://huggingface.co/dima806/music_genres_classification) model.
+2. **Lyrics Generation**: Based on the detected genre, the app generates original lyrics using [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) that match both the style of the genre and approximate length of the song.
+## Features
+- Upload any music file for instant genre classification
+- Receive genre predictions with confidence scores
+- Get AI-generated lyrics tailored to the detected music genre
+- Lyrics length is automatically adjusted based on the song duration
+- Simple and intuitive user interface
+## Usage
+1. Visit the live application on Hugging Face Spaces
+2. Upload your music file using the provided interface
+3. Click "Analyze & Generate" to process the audio
+4. View the detected genre and generated lyrics in the output panels
+## Technical Details
+- Uses MFCC features extraction from audio for genre classification
+- Leverages 4-bit quantization for efficient LLM inference on T4 GPU
+- Implements a specialized prompt engineering approach to generate genre-specific lyrics
+- Automatically scales lyrics length based on audio duration
+## Links
+- [Music Genre Classification Model](https://huggingface.co/dima806/music_genres_classification)
+- [Llama 3.1 8B Instruct Model](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)

app.py ADDED Viewed

	@@ -0,0 +1,186 @@

+import os
+import io
+import gradio as gr
+import torch
+import numpy as np
+from transformers import (
+    AutoModelForSequenceClassification,
+    AutoTokenizer,
+    pipeline,
+    AutoModelForCausalLM,
+    BitsAndBytesConfig
+)
+from huggingface_hub import login
+from utils import (
+    load_audio,
+    extract_audio_duration,
+    extract_mfcc_features,
+    calculate_lyrics_length,
+    format_genre_results,
+    ensure_cuda_availability
+)
+# Login to Hugging Face Hub if token is provided
+if "HF_TOKEN" in os.environ:
+    login(token=os.environ["HF_TOKEN"])
+# Constants
+GENRE_MODEL_NAME = "dima806/music_genres_classification"
+LLM_MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
+SAMPLE_RATE = 22050  # Standard sample rate for audio processing
+# Check CUDA availability (for informational purposes)
+CUDA_AVAILABLE = ensure_cuda_availability()
+# Load genre classification model
+genre_tokenizer = AutoTokenizer.from_pretrained(GENRE_MODEL_NAME)
+genre_model = AutoModelForSequenceClassification.from_pretrained(GENRE_MODEL_NAME)
+# Load LLM with appropriate quantization for T4 GPU
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+)
+llm_tokenizer = AutoTokenizer.from_pretrained(LLM_MODEL_NAME)
+llm_model = AutoModelForCausalLM.from_pretrained(
+    LLM_MODEL_NAME,
+    device_map="auto",
+    quantization_config=bnb_config,
+    torch_dtype=torch.float16,
+)
+# Create LLM pipeline
+llm_pipeline = pipeline(
+    "text-generation",
+    model=llm_model,
+    tokenizer=llm_tokenizer,
+    max_new_tokens=512,
+)
+def extract_audio_features(audio_file):
+    """Extract audio features from an audio file."""
+    # Load the audio file using utility function
+    y, sr = load_audio(audio_file, SAMPLE_RATE)
+    # Get audio duration in seconds
+    duration = extract_audio_duration(y, sr)
+    # Extract MFCCs for genre classification
+    mfccs_mean = extract_mfcc_features(y, sr, n_mfcc=20)
+    return {
+        "features": mfccs_mean,
+        "duration": duration
+    }
+def classify_genre(audio_features):
+    """Classify the genre of the audio using the loaded model."""
+    inputs = genre_tokenizer(str(audio_features), return_tensors="pt", truncation=True, max_length=512)
+    with torch.no_grad():
+        outputs = genre_model(**inputs)
+        predictions = outputs.logits.softmax(dim=-1)
+    # Get the top 3 genres
+    values, indices = torch.topk(predictions, 3)
+    # Map indices to genre labels
+    genre_labels = genre_model.config.id2label
+    top_genres = []
+    for i, (value, index) in enumerate(zip(values[0], indices[0])):
+        genre = genre_labels[index.item()]
+        confidence = value.item()
+        top_genres.append((genre, confidence))
+    return top_genres
+def generate_lyrics(genre, duration):
+    """Generate lyrics based on the genre and with appropriate length."""
+    # Calculate appropriate lyrics length based on audio duration
+    lines_count = calculate_lyrics_length(duration)
+    # Create prompt for the LLM
+    prompt = f"""
+You are a talented songwriter who specializes in {genre} music.
+Write original {genre} song lyrics for a song that is {duration:.1f} seconds long.
+The lyrics should:
+- Perfectly capture the essence and style of {genre} music
+- Be approximately {lines_count} lines long
+- Have a coherent theme and flow
+- Include a chorus and verses if appropriate for the genre
+- Be completely original
+Your lyrics:
+"""
+    # Generate lyrics using the LLM
+    response = llm_pipeline(
+        prompt,
+        do_sample=True,
+        temperature=0.7,
+        top_p=0.9,
+        repetition_penalty=1.1,
+        return_full_text=False
+    )
+    # Extract and clean generated lyrics
+    lyrics = response[0]["generated_text"].strip()
+    return lyrics
+def process_audio(audio_file):
+    """Main function to process audio file, classify genre, and generate lyrics."""
+    if audio_file is None:
+        return "Please upload an audio file.", None
+    try:
+        # Extract audio features
+        audio_data = extract_audio_features(audio_file)
+        # Classify genre
+        top_genres = classify_genre(audio_data["features"])
+        # Format genre results using utility function
+        genre_results = format_genre_results(top_genres)
+        # Generate lyrics based on top genre
+        primary_genre, _ = top_genres[0]
+        lyrics = generate_lyrics(primary_genre, audio_data["duration"])
+        return genre_results, lyrics
+    except Exception as e:
+        return f"Error processing audio: {str(e)}", None
+# Create Gradio interface
+with gr.Blocks(title="Music Genre Classifier & Lyrics Generator") as demo:
+    gr.Markdown("# Music Genre Classifier & Lyrics Generator")
+    gr.Markdown("Upload a music file to classify its genre and generate matching lyrics.")
+    with gr.Row():
+        with gr.Column():
+            audio_input = gr.Audio(label="Upload Music", type="filepath")
+            submit_btn = gr.Button("Analyze & Generate")
+        with gr.Column():
+            genre_output = gr.Textbox(label="Detected Genres", lines=5)
+            lyrics_output = gr.Textbox(label="Generated Lyrics", lines=15)
+    submit_btn.click(
+        fn=process_audio,
+        inputs=[audio_input],
+        outputs=[genre_output, lyrics_output]
+    )
+    gr.Markdown("### How it works")
+    gr.Markdown("""
+    1. Upload an audio file of your choice
+    2. The system will classify the genre using the dima806/music_genres_classification model
+    3. Based on the detected genre, it will generate appropriate lyrics using Llama-3.1-8B-Instruct
+    4. The lyrics length is automatically adjusted based on your audio duration
+    """)
+# Launch the app
+demo.launch()

example.py ADDED Viewed

	@@ -0,0 +1,38 @@

+import os
+import sys
+from app import process_audio
+def main():
+    """
+    Example function to demonstrate the application with a sample audio file.
+    Usage:
+    python example.py <path_to_audio_file>
+    """
+    if len(sys.argv) != 2:
+        print("Usage: python example.py <path_to_audio_file>")
+        return
+    audio_file = sys.argv[1]
+    if not os.path.exists(audio_file):
+        print(f"Error: File {audio_file} does not exist.")
+        return
+    print(f"Processing audio file: {audio_file}")
+    # Call the main processing function
+    genre_results, lyrics = process_audio(audio_file)
+    # Print results
+    print("\n" + "="*50)
+    print("GENRE CLASSIFICATION RESULTS:")
+    print("="*50)
+    print(genre_results)
+    print("\n" + "="*50)
+    print("GENERATED LYRICS:")
+    print("="*50)
+    print(lyrics)
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+gradio>=4.12.0
+transformers>=4.36.2
+torch>=2.1.2
+torchaudio>=2.1.2
+numpy>=1.26.2
+accelerate>=0.25.0
+librosa>=0.10.1
+huggingface-hub>=0.20.3
+bitsandbytes>=0.41.1
+sentencepiece>=0.1.99
+safetensors>=0.4.1
+scipy>=1.12.0

utils.py ADDED Viewed

	@@ -0,0 +1,42 @@

+import torch
+import numpy as np
+import librosa
+def load_audio(audio_file, sr=22050):
+    """Load an audio file and convert to mono if needed."""
+    y, sr = librosa.load(audio_file, sr=sr, mono=True)
+    return y, sr
+def extract_audio_duration(y, sr):
+    """Get the duration of audio in seconds."""
+    return len(y) / sr
+def extract_mfcc_features(y, sr, n_mfcc=20):
+    """Extract MFCC features from audio."""
+    mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=n_mfcc)
+    mfccs_mean = np.mean(mfccs.T, axis=0)
+    return mfccs_mean
+def calculate_lyrics_length(duration):
+    """Calculate appropriate lyrics length based on audio duration."""
+    # Average song is 3.5 minutes with 20-30 lines
+    # So roughly 7-8 lines per minute
+    return max(10, int(duration / 60 * 8))
+def format_genre_results(top_genres):
+    """Format genre classification results for display."""
+    result = "Top Detected Genres:\n"
+    for genre, confidence in top_genres:
+        result += f"- {genre}: {confidence*100:.2f}%\n"
+    return result
+def ensure_cuda_availability():
+    """Check and report CUDA availability for informational purposes."""
+    cuda_available = torch.cuda.is_available()
+    if cuda_available:
+        device_count = torch.cuda.device_count()
+        device_name = torch.cuda.get_device_name(0) if device_count > 0 else "Unknown"
+        print(f"CUDA is available with {device_count} device(s). Using: {device_name}")
+    else:
+        print("CUDA is not available. Using CPU for inference.")
+    return cuda_available