root commited on
Commit
a459327
·
1 Parent(s): d7fb7e8
Files changed (8) hide show
  1. .gitattributes +0 -3
  2. .gitignore +1 -0
  3. DEPLOYMENT.md +42 -0
  4. README.md +40 -7
  5. app.py +186 -0
  6. example.py +38 -0
  7. requirements.txt +12 -0
  8. utils.py +42 -0
.gitattributes CHANGED
@@ -16,16 +16,13 @@
16
  *.onnx filter=lfs diff=lfs merge=lfs -text
17
  *.ot filter=lfs diff=lfs merge=lfs -text
18
  *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
  *.pickle filter=lfs diff=lfs merge=lfs -text
21
  *.pkl filter=lfs diff=lfs merge=lfs -text
22
  *.pt filter=lfs diff=lfs merge=lfs -text
23
  *.pth filter=lfs diff=lfs merge=lfs -text
24
  *.rar filter=lfs diff=lfs merge=lfs -text
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
  *.tflite filter=lfs diff=lfs merge=lfs -text
30
  *.tgz filter=lfs diff=lfs merge=lfs -text
31
  *.wasm filter=lfs diff=lfs merge=lfs -text
 
16
  *.onnx filter=lfs diff=lfs merge=lfs -text
17
  *.ot filter=lfs diff=lfs merge=lfs -text
18
  *.parquet filter=lfs diff=lfs merge=lfs -text
 
19
  *.pickle filter=lfs diff=lfs merge=lfs -text
20
  *.pkl filter=lfs diff=lfs merge=lfs -text
21
  *.pt filter=lfs diff=lfs merge=lfs -text
22
  *.pth filter=lfs diff=lfs merge=lfs -text
23
  *.rar filter=lfs diff=lfs merge=lfs -text
24
  *.safetensors filter=lfs diff=lfs merge=lfs -text
 
25
  *.tar.* filter=lfs diff=lfs merge=lfs -text
 
26
  *.tflite filter=lfs diff=lfs merge=lfs -text
27
  *.tgz filter=lfs diff=lfs merge=lfs -text
28
  *.wasm filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ # Byte-compiled / optimized / DLL files
DEPLOYMENT.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deploying to Hugging Face Spaces
2
+
3
+ This guide explains how to deploy the Music Genre Classifier & Lyrics Generator to Hugging Face Spaces.
4
+
5
+ ## Prerequisites
6
+
7
+ 1. A Hugging Face account
8
+ 2. Access to the Llama 3.1 8B Instruct model (requires acceptance of the model license)
9
+ 3. A Hugging Face API token
10
+
11
+ ## Deployment Steps
12
+
13
+ ### 1. Create a New Space
14
+
15
+ 1. Go to the Hugging Face website and log in
16
+ 2. Navigate to "Spaces" in the top navigation
17
+ 3. Click "Create new Space"
18
+ 4. Choose "Gradio" as the SDK
19
+ 5. Give your Space a name and description
20
+ 6. Select "T4 GPU" as the hardware
21
+
22
+ ### 2. Set up Environment Variables
23
+
24
+ Set up your Hugging Face access token as an environment variable:
25
+
26
+ 1. Go to your profile settings in Hugging Face
27
+ 2. Navigate to "Access Tokens" and create a new token with "write" access
28
+ 3. In your Space settings, under "Repository secrets", add a new secret:
29
+ - Name: `HF_TOKEN`
30
+ - Value: Your Hugging Face access token
31
+
32
+ ### 3. Upload the Files
33
+
34
+ Upload all the files from this repository to your Space.
35
+
36
+ ### 4. Wait for Deployment
37
+
38
+ Hugging Face will automatically build and deploy your Space. This may take a few minutes, especially since it needs to download the models.
39
+
40
+ ### 5. Access Your Application
41
+
42
+ Once deployed, you can access your application on your Hugging Face Space URL.
README.md CHANGED
@@ -1,14 +1,47 @@
1
  ---
2
- title: Largermodel Lyrics Generation
3
- emoji: 👁
4
- colorFrom: pink
5
- colorTo: green
6
  sdk: gradio
7
- sdk_version: 5.22.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
- short_description: lyrics generation with larger model
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Music Genre Classifier & Lyrics Generator
3
+ emoji: 🎵
4
+ colorFrom: indigo
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 4.12.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
+ short_description: AI-powered music genre detection and genre-specific lyrics generation
12
  ---
13
 
14
+ # Music Genre Classifier & Lyrics Generator
15
+
16
+ This Hugging Face Space application provides two AI-powered features:
17
+
18
+ 1. **Music Genre Classification**: Upload a music file and get an analysis of its genre using the [dima806/music_genres_classification](https://huggingface.co/dima806/music_genres_classification) model.
19
+
20
+ 2. **Lyrics Generation**: Based on the detected genre, the app generates original lyrics using [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) that match both the style of the genre and approximate length of the song.
21
+
22
+ ## Features
23
+
24
+ - Upload any music file for instant genre classification
25
+ - Receive genre predictions with confidence scores
26
+ - Get AI-generated lyrics tailored to the detected music genre
27
+ - Lyrics length is automatically adjusted based on the song duration
28
+ - Simple and intuitive user interface
29
+
30
+ ## Usage
31
+
32
+ 1. Visit the live application on Hugging Face Spaces
33
+ 2. Upload your music file using the provided interface
34
+ 3. Click "Analyze & Generate" to process the audio
35
+ 4. View the detected genre and generated lyrics in the output panels
36
+
37
+ ## Technical Details
38
+
39
+ - Uses MFCC features extraction from audio for genre classification
40
+ - Leverages 4-bit quantization for efficient LLM inference on T4 GPU
41
+ - Implements a specialized prompt engineering approach to generate genre-specific lyrics
42
+ - Automatically scales lyrics length based on audio duration
43
+
44
+ ## Links
45
+
46
+ - [Music Genre Classification Model](https://huggingface.co/dima806/music_genres_classification)
47
+ - [Llama 3.1 8B Instruct Model](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
app.py ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import io
3
+ import gradio as gr
4
+ import torch
5
+ import numpy as np
6
+ from transformers import (
7
+ AutoModelForSequenceClassification,
8
+ AutoTokenizer,
9
+ pipeline,
10
+ AutoModelForCausalLM,
11
+ BitsAndBytesConfig
12
+ )
13
+ from huggingface_hub import login
14
+ from utils import (
15
+ load_audio,
16
+ extract_audio_duration,
17
+ extract_mfcc_features,
18
+ calculate_lyrics_length,
19
+ format_genre_results,
20
+ ensure_cuda_availability
21
+ )
22
+
23
+ # Login to Hugging Face Hub if token is provided
24
+ if "HF_TOKEN" in os.environ:
25
+ login(token=os.environ["HF_TOKEN"])
26
+
27
+ # Constants
28
+ GENRE_MODEL_NAME = "dima806/music_genres_classification"
29
+ LLM_MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
30
+ SAMPLE_RATE = 22050 # Standard sample rate for audio processing
31
+
32
+ # Check CUDA availability (for informational purposes)
33
+ CUDA_AVAILABLE = ensure_cuda_availability()
34
+
35
+ # Load genre classification model
36
+ genre_tokenizer = AutoTokenizer.from_pretrained(GENRE_MODEL_NAME)
37
+ genre_model = AutoModelForSequenceClassification.from_pretrained(GENRE_MODEL_NAME)
38
+
39
+ # Load LLM with appropriate quantization for T4 GPU
40
+ bnb_config = BitsAndBytesConfig(
41
+ load_in_4bit=True,
42
+ bnb_4bit_quant_type="nf4",
43
+ bnb_4bit_compute_dtype=torch.float16,
44
+ )
45
+
46
+ llm_tokenizer = AutoTokenizer.from_pretrained(LLM_MODEL_NAME)
47
+ llm_model = AutoModelForCausalLM.from_pretrained(
48
+ LLM_MODEL_NAME,
49
+ device_map="auto",
50
+ quantization_config=bnb_config,
51
+ torch_dtype=torch.float16,
52
+ )
53
+
54
+ # Create LLM pipeline
55
+ llm_pipeline = pipeline(
56
+ "text-generation",
57
+ model=llm_model,
58
+ tokenizer=llm_tokenizer,
59
+ max_new_tokens=512,
60
+ )
61
+
62
+ def extract_audio_features(audio_file):
63
+ """Extract audio features from an audio file."""
64
+ # Load the audio file using utility function
65
+ y, sr = load_audio(audio_file, SAMPLE_RATE)
66
+
67
+ # Get audio duration in seconds
68
+ duration = extract_audio_duration(y, sr)
69
+
70
+ # Extract MFCCs for genre classification
71
+ mfccs_mean = extract_mfcc_features(y, sr, n_mfcc=20)
72
+
73
+ return {
74
+ "features": mfccs_mean,
75
+ "duration": duration
76
+ }
77
+
78
+ def classify_genre(audio_features):
79
+ """Classify the genre of the audio using the loaded model."""
80
+ inputs = genre_tokenizer(str(audio_features), return_tensors="pt", truncation=True, max_length=512)
81
+
82
+ with torch.no_grad():
83
+ outputs = genre_model(**inputs)
84
+ predictions = outputs.logits.softmax(dim=-1)
85
+
86
+ # Get the top 3 genres
87
+ values, indices = torch.topk(predictions, 3)
88
+
89
+ # Map indices to genre labels
90
+ genre_labels = genre_model.config.id2label
91
+
92
+ top_genres = []
93
+ for i, (value, index) in enumerate(zip(values[0], indices[0])):
94
+ genre = genre_labels[index.item()]
95
+ confidence = value.item()
96
+ top_genres.append((genre, confidence))
97
+
98
+ return top_genres
99
+
100
+ def generate_lyrics(genre, duration):
101
+ """Generate lyrics based on the genre and with appropriate length."""
102
+ # Calculate appropriate lyrics length based on audio duration
103
+ lines_count = calculate_lyrics_length(duration)
104
+
105
+ # Create prompt for the LLM
106
+ prompt = f"""
107
+ You are a talented songwriter who specializes in {genre} music.
108
+ Write original {genre} song lyrics for a song that is {duration:.1f} seconds long.
109
+ The lyrics should:
110
+ - Perfectly capture the essence and style of {genre} music
111
+ - Be approximately {lines_count} lines long
112
+ - Have a coherent theme and flow
113
+ - Include a chorus and verses if appropriate for the genre
114
+ - Be completely original
115
+
116
+ Your lyrics:
117
+ """
118
+
119
+ # Generate lyrics using the LLM
120
+ response = llm_pipeline(
121
+ prompt,
122
+ do_sample=True,
123
+ temperature=0.7,
124
+ top_p=0.9,
125
+ repetition_penalty=1.1,
126
+ return_full_text=False
127
+ )
128
+
129
+ # Extract and clean generated lyrics
130
+ lyrics = response[0]["generated_text"].strip()
131
+ return lyrics
132
+
133
+ def process_audio(audio_file):
134
+ """Main function to process audio file, classify genre, and generate lyrics."""
135
+ if audio_file is None:
136
+ return "Please upload an audio file.", None
137
+
138
+ try:
139
+ # Extract audio features
140
+ audio_data = extract_audio_features(audio_file)
141
+
142
+ # Classify genre
143
+ top_genres = classify_genre(audio_data["features"])
144
+
145
+ # Format genre results using utility function
146
+ genre_results = format_genre_results(top_genres)
147
+
148
+ # Generate lyrics based on top genre
149
+ primary_genre, _ = top_genres[0]
150
+ lyrics = generate_lyrics(primary_genre, audio_data["duration"])
151
+
152
+ return genre_results, lyrics
153
+
154
+ except Exception as e:
155
+ return f"Error processing audio: {str(e)}", None
156
+
157
+ # Create Gradio interface
158
+ with gr.Blocks(title="Music Genre Classifier & Lyrics Generator") as demo:
159
+ gr.Markdown("# Music Genre Classifier & Lyrics Generator")
160
+ gr.Markdown("Upload a music file to classify its genre and generate matching lyrics.")
161
+
162
+ with gr.Row():
163
+ with gr.Column():
164
+ audio_input = gr.Audio(label="Upload Music", type="filepath")
165
+ submit_btn = gr.Button("Analyze & Generate")
166
+
167
+ with gr.Column():
168
+ genre_output = gr.Textbox(label="Detected Genres", lines=5)
169
+ lyrics_output = gr.Textbox(label="Generated Lyrics", lines=15)
170
+
171
+ submit_btn.click(
172
+ fn=process_audio,
173
+ inputs=[audio_input],
174
+ outputs=[genre_output, lyrics_output]
175
+ )
176
+
177
+ gr.Markdown("### How it works")
178
+ gr.Markdown("""
179
+ 1. Upload an audio file of your choice
180
+ 2. The system will classify the genre using the dima806/music_genres_classification model
181
+ 3. Based on the detected genre, it will generate appropriate lyrics using Llama-3.1-8B-Instruct
182
+ 4. The lyrics length is automatically adjusted based on your audio duration
183
+ """)
184
+
185
+ # Launch the app
186
+ demo.launch()
example.py ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ from app import process_audio
4
+
5
+ def main():
6
+ """
7
+ Example function to demonstrate the application with a sample audio file.
8
+
9
+ Usage:
10
+ python example.py <path_to_audio_file>
11
+ """
12
+ if len(sys.argv) != 2:
13
+ print("Usage: python example.py <path_to_audio_file>")
14
+ return
15
+
16
+ audio_file = sys.argv[1]
17
+ if not os.path.exists(audio_file):
18
+ print(f"Error: File {audio_file} does not exist.")
19
+ return
20
+
21
+ print(f"Processing audio file: {audio_file}")
22
+
23
+ # Call the main processing function
24
+ genre_results, lyrics = process_audio(audio_file)
25
+
26
+ # Print results
27
+ print("\n" + "="*50)
28
+ print("GENRE CLASSIFICATION RESULTS:")
29
+ print("="*50)
30
+ print(genre_results)
31
+
32
+ print("\n" + "="*50)
33
+ print("GENERATED LYRICS:")
34
+ print("="*50)
35
+ print(lyrics)
36
+
37
+ if __name__ == "__main__":
38
+ main()
requirements.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gradio>=4.12.0
2
+ transformers>=4.36.2
3
+ torch>=2.1.2
4
+ torchaudio>=2.1.2
5
+ numpy>=1.26.2
6
+ accelerate>=0.25.0
7
+ librosa>=0.10.1
8
+ huggingface-hub>=0.20.3
9
+ bitsandbytes>=0.41.1
10
+ sentencepiece>=0.1.99
11
+ safetensors>=0.4.1
12
+ scipy>=1.12.0
utils.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import numpy as np
3
+ import librosa
4
+
5
+ def load_audio(audio_file, sr=22050):
6
+ """Load an audio file and convert to mono if needed."""
7
+ y, sr = librosa.load(audio_file, sr=sr, mono=True)
8
+ return y, sr
9
+
10
+ def extract_audio_duration(y, sr):
11
+ """Get the duration of audio in seconds."""
12
+ return len(y) / sr
13
+
14
+ def extract_mfcc_features(y, sr, n_mfcc=20):
15
+ """Extract MFCC features from audio."""
16
+ mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=n_mfcc)
17
+ mfccs_mean = np.mean(mfccs.T, axis=0)
18
+ return mfccs_mean
19
+
20
+ def calculate_lyrics_length(duration):
21
+ """Calculate appropriate lyrics length based on audio duration."""
22
+ # Average song is 3.5 minutes with 20-30 lines
23
+ # So roughly 7-8 lines per minute
24
+ return max(10, int(duration / 60 * 8))
25
+
26
+ def format_genre_results(top_genres):
27
+ """Format genre classification results for display."""
28
+ result = "Top Detected Genres:\n"
29
+ for genre, confidence in top_genres:
30
+ result += f"- {genre}: {confidence*100:.2f}%\n"
31
+ return result
32
+
33
+ def ensure_cuda_availability():
34
+ """Check and report CUDA availability for informational purposes."""
35
+ cuda_available = torch.cuda.is_available()
36
+ if cuda_available:
37
+ device_count = torch.cuda.device_count()
38
+ device_name = torch.cuda.get_device_name(0) if device_count > 0 else "Unknown"
39
+ print(f"CUDA is available with {device_count} device(s). Using: {device_name}")
40
+ else:
41
+ print("CUDA is not available. Using CPU for inference.")
42
+ return cuda_available