Tonic commited on
Commit
f2c336a
ยท
verified ยท
1 Parent(s): 7e452be

Update main.py

Browse files
Files changed (1) hide show
  1. main.py +76 -108
main.py CHANGED
@@ -34,28 +34,23 @@ On ๐ŸŒGithub: [Tonic-AI](https://github.com/tonic-ai) & contribute to๐ŸŒŸ [Mult
34
  useage_instructions = """
35
  ## Overview
36
  JASCO is a powerful text-to-music generation system that allows you to create music using text descriptions and various controls including chords, drums, and melody. This guide explains how to use each feature of the interface.
37
-
38
  ## Model Selection
39
  Four different models are available:
40
  1. `facebook/jasco-chords-drums-400M` - Basic model with chord and drum support (400M parameters)
41
  2. `facebook/jasco-chords-drums-1B` - Enhanced model with chord and drum support (1B parameters)
42
  3. `facebook/jasco-chords-drums-melody-400M` - Model with melody support (400M parameters)
43
  4. `facebook/jasco-chords-drums-melody-1B` - Full-featured model with melody support (1B parameters)
44
-
45
  ## Input Controls
46
-
47
  ### 1. Text Description
48
  - Enter a descriptive text about the music you want to generate
49
  - Examples:
50
  - "80s pop with groovy synth bass and electric piano"
51
  - "Strings, woodwind, orchestral, symphony"
52
  - "Jazz quartet with walking bass and smooth piano"
53
-
54
  ### 2. Chord Progression
55
  Format: `(Chord, Time), (Chord, Time), ...`
56
  - Time is in seconds (0-10 seconds range)
57
  - Example: `(C, 0.0), (D, 2.0), (F, 4.0), (Ab, 6.0), (Bb, 7.0), (C, 8.0)`
58
-
59
  Supported chord types:
60
  ```python
61
  Basic Chords: C, D, E, F, G, A, B
@@ -66,116 +61,88 @@ Minor Seventh: Cm7, Dm7, Em7, Fm7, Gm7, Am7, Bm7
66
  Flat Chords: Ab, Bb (and their variations)
67
  Special: N (No chord/silence)
68
  ```
69
-
70
  ### 3. Drums Input
71
  Two options for adding drums:
72
  1. File Upload:
73
  - Select "file" in Drums Input Source
74
  - Upload a WAV file containing drum patterns
75
  - Recommended length: 2-4 bars
76
-
77
  2. Microphone Recording:
78
  - Select "mic" in Drums Input Source
79
  - Record drum patterns using your microphone
80
  - Keep recordings short and rhythmic
81
-
82
  ### 4. Melody Input
83
  - Upload a melody salience matrix as a PyTorch tensor
84
  - Format: Shape [n_melody_bins, T]
85
  - File should be saved using `torch.save()`
86
-
87
  ### 5. Generation Parameters
88
-
89
  #### Classifier Free Guidance (CFG) Controls:
90
  - CFG ALL: Controls overall adherence to input conditions (default: 1.25)
91
  - Range: 1.0-3.0
92
  - Higher values = stronger conditioning
93
-
94
  - CFG TEXT: Controls text conditioning strength (default: 2.5)
95
  - Range: 1.0-4.0
96
  - Higher values = closer match to text description
97
-
98
  #### ODE Parameters:
99
  - ODE Solver: Choose between 'euler' and 'dopri5'
100
  - euler: Faster, less accurate
101
  - dopri5: Slower, more accurate
102
-
103
  - ODE Tolerance: Numerical precision (default: 1e-4)
104
  - Lower values = higher precision, slower generation
105
-
106
  - Euler Steps: Number of steps for euler solver (default: 10)
107
  - Higher values = more accurate, slower generation
108
-
109
  ## Generation Process
110
-
111
  1. Select a model based on your needs:
112
  - Use 400M models for faster generation
113
  - Use 1B models for higher quality
114
  - Choose melody-enabled models if using melody input
115
-
116
  2. Enter your text description
117
-
118
  3. Input chord progression:
119
  ```
120
  Example:
121
  (C, 0.0), (Am, 2.5), (F, 5.0), (G, 7.5)
122
  ```
123
-
124
  4. (Optional) Add drums via file upload or microphone
125
-
126
  5. (Optional) Upload melody matrix
127
-
128
  6. Adjust generation parameters or use defaults
129
-
130
  7. Click "Make Musix"
131
-
132
  ## Output
133
  - The system generates two variations of your music
134
  - Each generation is 10 seconds long
135
  - Output is provided as WAV files
136
  - You can download or play directly in the interface
137
-
138
  ## Tips for Best Results
139
-
140
  1. Text Descriptions:
141
  - Be specific about instruments
142
  - Include genre information
143
  - Mention desired mood or style
144
-
145
  2. Chord Progressions:
146
  - Use common progressions for better results
147
  - Space chords evenly
148
  - Include resolution points
149
-
150
  3. Drums:
151
  - Use clean, clear drum patterns
152
  - Avoid complex patterns for better results
153
  - Keep volume levels consistent
154
-
155
  4. Memory Management:
156
  - The interface caches models after first use
157
  - Switch models only when necessary
158
  - Clear browser cache if experiencing issues
159
-
160
  ## Example Usage
161
-
162
  ```python
163
  # Example 1: Pop Music
164
  Text: "Upbeat pop song with electric piano and synthesizer"
165
  Chords: (C, 0.0), (Am, 2.5), (F, 5.0), (G, 7.5)
166
  Model: facebook/jasco-chords-drums-400M
167
-
168
  # Example 2: Orchestral
169
  Text: "Epic orchestral piece with strong strings and brass"
170
  Chords: (Cm, 0.0), (G, 3.0), (Bb, 6.0), (Cm, 8.0)
171
  Model: facebook/jasco-chords-drums-melody-1B
172
-
173
  # Example 3: Jazz
174
  Text: "Smooth jazz quartet with walking bass and piano"
175
  Chords: (Dmaj7, 0.0), (Em7, 2.5), (A7, 5.0), (Dmaj7, 7.5)
176
  Model: facebook/jasco-chords-drums-1B
177
  ```
178
-
179
  ## Error Handling
180
  - If generation fails, try:
181
  1. Simplifying chord progressions
@@ -183,7 +150,6 @@ Model: facebook/jasco-chords-drums-1B
183
  3. Using simpler text descriptions
184
  4. Checking input format compliance
185
  5. Refreshing the page
186
-
187
  ## Performance Considerations
188
  - First generation may be slower due to model loading
189
  - Subsequent generations with same model are faster
@@ -545,77 +511,77 @@ def predict_full(model, text, chords_sym, melody_file,
545
  ode_rtol, ode_atol,
546
  ode_solver, ode_steps,
547
  progress=gr.Progress()):
548
- """Generate music using JASCO (Joint Audio-Symbolic Conditioning) model.
549
-
550
- This function generates two variations of music based on text descriptions, chord progressions,
551
- and optional melody and drum inputs. It uses the JASCO model to create high-quality music samples
552
- with both global (text) and local (chords, drums, melody) controls.
553
-
554
- Args:
555
- model (str): The JASCO model to use. Options:
556
- - 'facebook/jasco-chords-drums-400M': Basic model with chord and drum support (400M parameters)
557
- - 'facebook/jasco-chords-drums-1B': Enhanced model with chord and drum support (1B parameters)
558
- - 'facebook/jasco-chords-drums-melody-400M': Model with melody support (400M parameters)
559
- - 'facebook/jasco-chords-drums-melody-1B': Full-featured model with melody support (1B parameters)
560
- text (str): Text description of the desired music. Examples:
561
- - "80s pop with groovy synth bass and electric piano"
562
- - "Strings, woodwind, orchestral, symphony"
563
- - "Jazz quartet with walking bass and smooth piano"
564
- chords_sym (str): Chord progression in format "(Chord, Time), (Chord, Time), ...". Time is in seconds (0-10).
565
- Example: "(C, 0.0), (D, 2.0), (F, 4.0), (Ab, 6.0), (Bb, 7.0), (C, 8.0)"
566
- melody_file (File): Optional. PyTorch tensor file containing melody salience matrix.
567
- Shape should be [n_melody_bins, T].
568
- drums_file (Audio): Optional. WAV file containing drum patterns (2-4 bars recommended).
569
- drums_mic (Audio): Optional. Microphone recording of drum patterns.
570
- drum_input_src (str): Source of drum input. Either "file" or "mic".
571
- cfg_coef_all (float): Classifier Free Guidance coefficient for overall conditioning.
572
- Controls adherence to all input conditions. Range: 1.0-3.0. Default: 1.25.
573
- cfg_coef_txt (float): Classifier Free Guidance coefficient for text conditioning.
574
- Controls strength of text description matching. Range: 1.0-4.0. Default: 2.5.
575
- ode_rtol (float): Relative tolerance for ODE solver. Default: 1e-4.
576
- ode_atol (float): Absolute tolerance for ODE solver. Default: 1e-4.
577
- ode_solver (str): ODE solver to use. Options:
578
- - 'euler': Faster, less accurate
579
- - 'dopri5': Slower, more accurate
580
- ode_steps (int): Number of steps for euler solver. Default: 10.
581
- progress (gr.Progress): Gradio progress bar for tracking generation progress.
582
-
583
- Returns:
584
- tuple: Two WAV file paths containing the generated music variations.
585
-
586
- Raises:
587
- gr.Error: If there are issues with:
588
- - Model loading
589
- - Invalid melody matrix shape
590
- - Generation process
591
- - User interruption
592
-
593
- Notes:
594
- - First generation may be slower due to model loading
595
- - Subsequent generations with same model are faster
596
- - Higher parameter models (1B) require more memory
597
- - Melody-enabled models may be slower
598
- - The function generates two variations of the music
599
- - Each generation is 10 seconds long
600
- - Output is provided as WAV files
601
-
602
- Example:
603
- wavs = predict_full(
604
- model='facebook/jasco-chords-drums-melody-400M',
605
- text="80s pop with groovy synth bass and electric piano",
606
- chords_sym="(C, 0.0), (Am, 2.5), (F, 5.0), (G, 7.5)",
607
- melody_file=None,
608
- drums_file=None,
609
- drums_mic=None,
610
- drum_input_src="file",
611
- cfg_coef_all=1.25,
612
- cfg_coef_txt=2.5,
613
- ode_rtol=1e-4,
614
- ode_atol=1e-4,
615
- ode_solver='euler',
616
- ode_steps=10
617
- )
618
- """
619
  global INTERRUPTING
620
  INTERRUPTING = False
621
  progress(0, desc="Loading model...")
@@ -629,7 +595,7 @@ def predict_full(model, text, chords_sym, melody_file,
629
  progress((min(max_generated, to_generate), to_generate))
630
  if INTERRUPTING:
631
  raise gr.Error("Interrupted.")
632
-
633
  MODEL.set_custom_progress_callback(_progress)
634
 
635
  drums = drums_mic if drum_input_src == "mic" else drums_file
@@ -645,7 +611,8 @@ def predict_full(model, text, chords_sym, melody_file,
645
  ode_rtol=ode_rtol,
646
  ode_atol=ode_atol,
647
  euler=ode_solver == 'euler',
648
- euler_steps=ode_steps)
 
649
 
650
  return wavs
651
 
@@ -751,4 +718,5 @@ if __name__ == "__main__":
751
  except Exception as e:
752
  print(f"Error launching demo: {str(e)}")
753
  finally:
754
- cleanup_cache()
 
 
34
  useage_instructions = """
35
  ## Overview
36
  JASCO is a powerful text-to-music generation system that allows you to create music using text descriptions and various controls including chords, drums, and melody. This guide explains how to use each feature of the interface.
 
37
  ## Model Selection
38
  Four different models are available:
39
  1. `facebook/jasco-chords-drums-400M` - Basic model with chord and drum support (400M parameters)
40
  2. `facebook/jasco-chords-drums-1B` - Enhanced model with chord and drum support (1B parameters)
41
  3. `facebook/jasco-chords-drums-melody-400M` - Model with melody support (400M parameters)
42
  4. `facebook/jasco-chords-drums-melody-1B` - Full-featured model with melody support (1B parameters)
 
43
  ## Input Controls
 
44
  ### 1. Text Description
45
  - Enter a descriptive text about the music you want to generate
46
  - Examples:
47
  - "80s pop with groovy synth bass and electric piano"
48
  - "Strings, woodwind, orchestral, symphony"
49
  - "Jazz quartet with walking bass and smooth piano"
 
50
  ### 2. Chord Progression
51
  Format: `(Chord, Time), (Chord, Time), ...`
52
  - Time is in seconds (0-10 seconds range)
53
  - Example: `(C, 0.0), (D, 2.0), (F, 4.0), (Ab, 6.0), (Bb, 7.0), (C, 8.0)`
 
54
  Supported chord types:
55
  ```python
56
  Basic Chords: C, D, E, F, G, A, B
 
61
  Flat Chords: Ab, Bb (and their variations)
62
  Special: N (No chord/silence)
63
  ```
 
64
  ### 3. Drums Input
65
  Two options for adding drums:
66
  1. File Upload:
67
  - Select "file" in Drums Input Source
68
  - Upload a WAV file containing drum patterns
69
  - Recommended length: 2-4 bars
 
70
  2. Microphone Recording:
71
  - Select "mic" in Drums Input Source
72
  - Record drum patterns using your microphone
73
  - Keep recordings short and rhythmic
 
74
  ### 4. Melody Input
75
  - Upload a melody salience matrix as a PyTorch tensor
76
  - Format: Shape [n_melody_bins, T]
77
  - File should be saved using `torch.save()`
 
78
  ### 5. Generation Parameters
 
79
  #### Classifier Free Guidance (CFG) Controls:
80
  - CFG ALL: Controls overall adherence to input conditions (default: 1.25)
81
  - Range: 1.0-3.0
82
  - Higher values = stronger conditioning
 
83
  - CFG TEXT: Controls text conditioning strength (default: 2.5)
84
  - Range: 1.0-4.0
85
  - Higher values = closer match to text description
 
86
  #### ODE Parameters:
87
  - ODE Solver: Choose between 'euler' and 'dopri5'
88
  - euler: Faster, less accurate
89
  - dopri5: Slower, more accurate
 
90
  - ODE Tolerance: Numerical precision (default: 1e-4)
91
  - Lower values = higher precision, slower generation
 
92
  - Euler Steps: Number of steps for euler solver (default: 10)
93
  - Higher values = more accurate, slower generation
 
94
  ## Generation Process
 
95
  1. Select a model based on your needs:
96
  - Use 400M models for faster generation
97
  - Use 1B models for higher quality
98
  - Choose melody-enabled models if using melody input
 
99
  2. Enter your text description
 
100
  3. Input chord progression:
101
  ```
102
  Example:
103
  (C, 0.0), (Am, 2.5), (F, 5.0), (G, 7.5)
104
  ```
 
105
  4. (Optional) Add drums via file upload or microphone
 
106
  5. (Optional) Upload melody matrix
 
107
  6. Adjust generation parameters or use defaults
 
108
  7. Click "Make Musix"
 
109
  ## Output
110
  - The system generates two variations of your music
111
  - Each generation is 10 seconds long
112
  - Output is provided as WAV files
113
  - You can download or play directly in the interface
 
114
  ## Tips for Best Results
 
115
  1. Text Descriptions:
116
  - Be specific about instruments
117
  - Include genre information
118
  - Mention desired mood or style
 
119
  2. Chord Progressions:
120
  - Use common progressions for better results
121
  - Space chords evenly
122
  - Include resolution points
 
123
  3. Drums:
124
  - Use clean, clear drum patterns
125
  - Avoid complex patterns for better results
126
  - Keep volume levels consistent
 
127
  4. Memory Management:
128
  - The interface caches models after first use
129
  - Switch models only when necessary
130
  - Clear browser cache if experiencing issues
 
131
  ## Example Usage
 
132
  ```python
133
  # Example 1: Pop Music
134
  Text: "Upbeat pop song with electric piano and synthesizer"
135
  Chords: (C, 0.0), (Am, 2.5), (F, 5.0), (G, 7.5)
136
  Model: facebook/jasco-chords-drums-400M
 
137
  # Example 2: Orchestral
138
  Text: "Epic orchestral piece with strong strings and brass"
139
  Chords: (Cm, 0.0), (G, 3.0), (Bb, 6.0), (Cm, 8.0)
140
  Model: facebook/jasco-chords-drums-melody-1B
 
141
  # Example 3: Jazz
142
  Text: "Smooth jazz quartet with walking bass and piano"
143
  Chords: (Dmaj7, 0.0), (Em7, 2.5), (A7, 5.0), (Dmaj7, 7.5)
144
  Model: facebook/jasco-chords-drums-1B
145
  ```
 
146
  ## Error Handling
147
  - If generation fails, try:
148
  1. Simplifying chord progressions
 
150
  3. Using simpler text descriptions
151
  4. Checking input format compliance
152
  5. Refreshing the page
 
153
  ## Performance Considerations
154
  - First generation may be slower due to model loading
155
  - Subsequent generations with same model are faster
 
511
  ode_rtol, ode_atol,
512
  ode_solver, ode_steps,
513
  progress=gr.Progress()):
514
+ """Generate music using JASCO (Joint Audio-Symbolic Conditioning) model.
515
+
516
+ This function generates two variations of music based on text descriptions, chord progressions,
517
+ and optional melody and drum inputs. It uses the JASCO model to create high-quality music samples
518
+ with both global (text) and local (chords, drums, melody) controls.
519
+
520
+ Args:
521
+ model (str): The JASCO model to use. Options:
522
+ - 'facebook/jasco-chords-drums-400M': Basic model with chord and drum support (400M parameters)
523
+ - 'facebook/jasco-chords-drums-1B': Enhanced model with chord and drum support (1B parameters)
524
+ - 'facebook/jasco-chords-drums-melody-400M': Model with melody support (400M parameters)
525
+ - 'facebook/jasco-chords-drums-melody-1B': Full-featured model with melody support (1B parameters)
526
+ text (str): Text description of the desired music. Examples:
527
+ - "80s pop with groovy synth bass and electric piano"
528
+ - "Strings, woodwind, orchestral, symphony"
529
+ - "Jazz quartet with walking bass and smooth piano"
530
+ chords_sym (str): Chord progression in format "(Chord, Time), (Chord, Time), ...". Time is in seconds (0-10).
531
+ Example: "(C, 0.0), (D, 2.0), (F, 4.0), (Ab, 6.0), (Bb, 7.0), (C, 8.0)"
532
+ melody_file (File): Optional. PyTorch tensor file containing melody salience matrix.
533
+ Shape should be [n_melody_bins, T].
534
+ drums_file (Audio): Optional. WAV file containing drum patterns (2-4 bars recommended).
535
+ drums_mic (Audio): Optional. Microphone recording of drum patterns.
536
+ drum_input_src (str): Source of drum input. Either "file" or "mic".
537
+ cfg_coef_all (float): Classifier Free Guidance coefficient for overall conditioning.
538
+ Controls adherence to all input conditions. Range: 1.0-3.0. Default: 1.25.
539
+ cfg_coef_txt (float): Classifier Free Guidance coefficient for text conditioning.
540
+ Controls strength of text description matching. Range: 1.0-4.0. Default: 2.5.
541
+ ode_rtol (float): Relative tolerance for ODE solver. Default: 1e-4.
542
+ ode_atol (float): Absolute tolerance for ODE solver. Default: 1e-4.
543
+ ode_solver (str): ODE solver to use. Options:
544
+ - 'euler': Faster, less accurate
545
+ - 'dopri5': Slower, more accurate
546
+ ode_steps (int): Number of steps for euler solver. Default: 10.
547
+ progress (gr.Progress): Gradio progress bar for tracking generation progress.
548
+
549
+ Returns:
550
+ tuple: Two WAV file paths containing the generated music variations.
551
+
552
+ Raises:
553
+ gr.Error: If there are issues with:
554
+ - Model loading
555
+ - Invalid melody matrix shape
556
+ - Generation process
557
+ - User interruption
558
+
559
+ Notes:
560
+ - First generation may be slower due to model loading
561
+ - Subsequent generations with same model are faster
562
+ - Higher parameter models (1B) require more memory
563
+ - Melody-enabled models may be slower
564
+ - The function generates two variations of the music
565
+ - Each generation is 10 seconds long
566
+ - Output is provided as WAV files
567
+
568
+ Example:
569
+ wavs = predict_full(
570
+ model='facebook/jasco-chords-drums-melody-400M',
571
+ text="80s pop with groovy synth bass and electric piano",
572
+ chords_sym="(C, 0.0), (Am, 2.5), (F, 5.0), (G, 7.5)",
573
+ melody_file=None,
574
+ drums_file=None,
575
+ drums_mic=None,
576
+ drum_input_src="file",
577
+ cfg_coef_all=1.25,
578
+ cfg_coef_txt=2.5,
579
+ ode_rtol=1e-4,
580
+ ode_atol=1e-4,
581
+ ode_solver='euler',
582
+ ode_steps=10
583
+ )
584
+ """
585
  global INTERRUPTING
586
  INTERRUPTING = False
587
  progress(0, desc="Loading model...")
 
595
  progress((min(max_generated, to_generate), to_generate))
596
  if INTERRUPTING:
597
  raise gr.Error("Interrupted.")
598
+
599
  MODEL.set_custom_progress_callback(_progress)
600
 
601
  drums = drums_mic if drum_input_src == "mic" else drums_file
 
611
  ode_rtol=ode_rtol,
612
  ode_atol=ode_atol,
613
  euler=ode_solver == 'euler',
614
+ euler_steps=ode_steps
615
+ )
616
 
617
  return wavs
618
 
 
718
  except Exception as e:
719
  print(f"Error launching demo: {str(e)}")
720
  finally:
721
+ cleanup_cache()
722
+