Spaces:

Tonic
/

audiocraft

Running on Zero

App Files Files Community

Tonic commited on 12 days ago

Commit

f2c336a

verified ·

1 Parent(s): 7e452be

Update main.py

Browse files

Files changed (1) hide show

main.py +76 -108

main.py CHANGED Viewed

@@ -34,28 +34,23 @@ On 🌐Github: [Tonic-AI](https://github.com/tonic-ai) & contribute to🌟 [Mult
 useage_instructions = """
 ## Overview
 JASCO is a powerful text-to-music generation system that allows you to create music using text descriptions and various controls including chords, drums, and melody. This guide explains how to use each feature of the interface.
 ## Model Selection
 Four different models are available:
 1. `facebook/jasco-chords-drums-400M` - Basic model with chord and drum support (400M parameters)
 2. `facebook/jasco-chords-drums-1B` - Enhanced model with chord and drum support (1B parameters)
 3. `facebook/jasco-chords-drums-melody-400M` - Model with melody support (400M parameters)
 4. `facebook/jasco-chords-drums-melody-1B` - Full-featured model with melody support (1B parameters)
 ## Input Controls
 ### 1. Text Description
 - Enter a descriptive text about the music you want to generate
 - Examples:
   - "80s pop with groovy synth bass and electric piano"
   - "Strings, woodwind, orchestral, symphony"
   - "Jazz quartet with walking bass and smooth piano"
 ### 2. Chord Progression
 Format: `(Chord, Time), (Chord, Time), ...`
 - Time is in seconds (0-10 seconds range)
 - Example: `(C, 0.0), (D, 2.0), (F, 4.0), (Ab, 6.0), (Bb, 7.0), (C, 8.0)`
 Supported chord types:
 ```python
 Basic Chords: C, D, E, F, G, A, B
@@ -66,116 +61,88 @@ Minor Seventh: Cm7, Dm7, Em7, Fm7, Gm7, Am7, Bm7
 Flat Chords: Ab, Bb (and their variations)
 Special: N (No chord/silence)
 ```
 ### 3. Drums Input
 Two options for adding drums:
 1. File Upload:
    - Select "file" in Drums Input Source
    - Upload a WAV file containing drum patterns
    - Recommended length: 2-4 bars
 2. Microphone Recording:
    - Select "mic" in Drums Input Source
    - Record drum patterns using your microphone
    - Keep recordings short and rhythmic
 ### 4. Melody Input
 - Upload a melody salience matrix as a PyTorch tensor
 - Format: Shape [n_melody_bins, T]
 - File should be saved using `torch.save()`
 ### 5. Generation Parameters
 #### Classifier Free Guidance (CFG) Controls:
 - CFG ALL: Controls overall adherence to input conditions (default: 1.25)
   - Range: 1.0-3.0
   - Higher values = stronger conditioning
 - CFG TEXT: Controls text conditioning strength (default: 2.5)
   - Range: 1.0-4.0
   - Higher values = closer match to text description
 #### ODE Parameters:
 - ODE Solver: Choose between 'euler' and 'dopri5'
   - euler: Faster, less accurate
   - dopri5: Slower, more accurate
 - ODE Tolerance: Numerical precision (default: 1e-4)
   - Lower values = higher precision, slower generation
 - Euler Steps: Number of steps for euler solver (default: 10)
   - Higher values = more accurate, slower generation
 ## Generation Process
 1. Select a model based on your needs:
    - Use 400M models for faster generation
    - Use 1B models for higher quality
    - Choose melody-enabled models if using melody input
 2. Enter your text description
 3. Input chord progression:
 ```
 Example:
 (C, 0.0), (Am, 2.5), (F, 5.0), (G, 7.5)
 ```
 4. (Optional) Add drums via file upload or microphone
 5. (Optional) Upload melody matrix
 6. Adjust generation parameters or use defaults
 7. Click "Make Musix"
 ## Output
 - The system generates two variations of your music
 - Each generation is 10 seconds long
 - Output is provided as WAV files
 - You can download or play directly in the interface
 ## Tips for Best Results
 1. Text Descriptions:
    - Be specific about instruments
    - Include genre information
    - Mention desired mood or style
 2. Chord Progressions:
    - Use common progressions for better results
    - Space chords evenly
    - Include resolution points
 3. Drums:
    - Use clean, clear drum patterns
    - Avoid complex patterns for better results
    - Keep volume levels consistent
 4. Memory Management:
    - The interface caches models after first use
    - Switch models only when necessary
    - Clear browser cache if experiencing issues
 ## Example Usage
 ```python
 # Example 1: Pop Music
 Text: "Upbeat pop song with electric piano and synthesizer"
 Chords: (C, 0.0), (Am, 2.5), (F, 5.0), (G, 7.5)
 Model: facebook/jasco-chords-drums-400M
 # Example 2: Orchestral
 Text: "Epic orchestral piece with strong strings and brass"
 Chords: (Cm, 0.0), (G, 3.0), (Bb, 6.0), (Cm, 8.0)
 Model: facebook/jasco-chords-drums-melody-1B
 # Example 3: Jazz
 Text: "Smooth jazz quartet with walking bass and piano"
 Chords: (Dmaj7, 0.0), (Em7, 2.5), (A7, 5.0), (Dmaj7, 7.5)
 Model: facebook/jasco-chords-drums-1B
 ```
 ## Error Handling
 - If generation fails, try:
   1. Simplifying chord progressions
@@ -183,7 +150,6 @@ Model: facebook/jasco-chords-drums-1B
   3. Using simpler text descriptions
   4. Checking input format compliance
   5. Refreshing the page
 ## Performance Considerations
 - First generation may be slower due to model loading
 - Subsequent generations with same model are faster
@@ -545,77 +511,77 @@ def predict_full(model, text, chords_sym, melody_file,
                 ode_rtol, ode_atol,
                 ode_solver, ode_steps,
                 progress=gr.Progress()):
-               """Generate music using JASCO (Joint Audio-Symbolic Conditioning) model.
-                This function generates two variations of music based on text descriptions, chord progressions,
-                and optional melody and drum inputs. It uses the JASCO model to create high-quality music samples
-                with both global (text) and local (chords, drums, melody) controls.
-                Args:
-                    model (str): The JASCO model to use. Options:
-                        - 'facebook/jasco-chords-drums-400M': Basic model with chord and drum support (400M parameters)
-                        - 'facebook/jasco-chords-drums-1B': Enhanced model with chord and drum support (1B parameters)
-                        - 'facebook/jasco-chords-drums-melody-400M': Model with melody support (400M parameters)
-                        - 'facebook/jasco-chords-drums-melody-1B': Full-featured model with melody support (1B parameters)
-                    text (str): Text description of the desired music. Examples:
-                        - "80s pop with groovy synth bass and electric piano"
-                        - "Strings, woodwind, orchestral, symphony"
-                        - "Jazz quartet with walking bass and smooth piano"
-                    chords_sym (str): Chord progression in format "(Chord, Time), (Chord, Time), ...". Time is in seconds (0-10).
-                        Example: "(C, 0.0), (D, 2.0), (F, 4.0), (Ab, 6.0), (Bb, 7.0), (C, 8.0)"
-                    melody_file (File): Optional. PyTorch tensor file containing melody salience matrix.
-                        Shape should be [n_melody_bins, T].
-                    drums_file (Audio): Optional. WAV file containing drum patterns (2-4 bars recommended).
-                    drums_mic (Audio): Optional. Microphone recording of drum patterns.
-                    drum_input_src (str): Source of drum input. Either "file" or "mic".
-                    cfg_coef_all (float): Classifier Free Guidance coefficient for overall conditioning.
-                        Controls adherence to all input conditions. Range: 1.0-3.0. Default: 1.25.
-                    cfg_coef_txt (float): Classifier Free Guidance coefficient for text conditioning.
-                        Controls strength of text description matching. Range: 1.0-4.0. Default: 2.5.
-                    ode_rtol (float): Relative tolerance for ODE solver. Default: 1e-4.
-                    ode_atol (float): Absolute tolerance for ODE solver. Default: 1e-4.
-                    ode_solver (str): ODE solver to use. Options:
-                        - 'euler': Faster, less accurate
-                        - 'dopri5': Slower, more accurate
-                    ode_steps (int): Number of steps for euler solver. Default: 10.
-                    progress (gr.Progress): Gradio progress bar for tracking generation progress.
-                Returns:
-                    tuple: Two WAV file paths containing the generated music variations.
-                Raises:
-                    gr.Error: If there are issues with:
-                        - Model loading
-                        - Invalid melody matrix shape
-                        - Generation process
-                        - User interruption
-                Notes:
-                    - First generation may be slower due to model loading
-                    - Subsequent generations with same model are faster
-                    - Higher parameter models (1B) require more memory
-                    - Melody-enabled models may be slower
-                    - The function generates two variations of the music
-                    - Each generation is 10 seconds long
-                    - Output is provided as WAV files
-                Example:
-                    wavs = predict_full(
-                        model='facebook/jasco-chords-drums-melody-400M',
-                        text="80s pop with groovy synth bass and electric piano",
-                        chords_sym="(C, 0.0), (Am, 2.5), (F, 5.0), (G, 7.5)",
-                        melody_file=None,
-                        drums_file=None,
-                        drums_mic=None,
-                        drum_input_src="file",
-                        cfg_coef_all=1.25,
-                        cfg_coef_txt=2.5,
-                        ode_rtol=1e-4,
-                        ode_atol=1e-4,
-                        ode_solver='euler',
-                        ode_steps=10
-                    )
-                    """
     global INTERRUPTING
     INTERRUPTING = False
     progress(0, desc="Loading model...")
@@ -629,7 +595,7 @@ def predict_full(model, text, chords_sym, melody_file,
         progress((min(max_generated, to_generate), to_generate))
         if INTERRUPTING:
             raise gr.Error("Interrupted.")
     MODEL.set_custom_progress_callback(_progress)
     drums = drums_mic if drum_input_src == "mic" else drums_file
@@ -645,7 +611,8 @@ def predict_full(model, text, chords_sym, melody_file,
         ode_rtol=ode_rtol,
         ode_atol=ode_atol,
         euler=ode_solver == 'euler',
-        euler_steps=ode_steps)
     return wavs
@@ -751,4 +718,5 @@ if __name__ == "__main__":
     except Exception as e:
         print(f"Error launching demo: {str(e)}")
     finally:
-        cleanup_cache()

 useage_instructions = """
 ## Overview
 JASCO is a powerful text-to-music generation system that allows you to create music using text descriptions and various controls including chords, drums, and melody. This guide explains how to use each feature of the interface.
 ## Model Selection
 Four different models are available:
 1. `facebook/jasco-chords-drums-400M` - Basic model with chord and drum support (400M parameters)
 2. `facebook/jasco-chords-drums-1B` - Enhanced model with chord and drum support (1B parameters)
 3. `facebook/jasco-chords-drums-melody-400M` - Model with melody support (400M parameters)
 4. `facebook/jasco-chords-drums-melody-1B` - Full-featured model with melody support (1B parameters)
 ## Input Controls
 ### 1. Text Description
 - Enter a descriptive text about the music you want to generate
 - Examples:
   - "80s pop with groovy synth bass and electric piano"
   - "Strings, woodwind, orchestral, symphony"
   - "Jazz quartet with walking bass and smooth piano"
 ### 2. Chord Progression
 Format: `(Chord, Time), (Chord, Time), ...`
 - Time is in seconds (0-10 seconds range)
 - Example: `(C, 0.0), (D, 2.0), (F, 4.0), (Ab, 6.0), (Bb, 7.0), (C, 8.0)`
 Supported chord types:
 ```python
 Basic Chords: C, D, E, F, G, A, B
 Flat Chords: Ab, Bb (and their variations)
 Special: N (No chord/silence)
 ```
 ### 3. Drums Input
 Two options for adding drums:
 1. File Upload:
    - Select "file" in Drums Input Source
    - Upload a WAV file containing drum patterns
    - Recommended length: 2-4 bars
 2. Microphone Recording:
    - Select "mic" in Drums Input Source
    - Record drum patterns using your microphone
    - Keep recordings short and rhythmic
 ### 4. Melody Input
 - Upload a melody salience matrix as a PyTorch tensor
 - Format: Shape [n_melody_bins, T]
 - File should be saved using `torch.save()`
 ### 5. Generation Parameters
 #### Classifier Free Guidance (CFG) Controls:
 - CFG ALL: Controls overall adherence to input conditions (default: 1.25)
   - Range: 1.0-3.0
   - Higher values = stronger conditioning
 - CFG TEXT: Controls text conditioning strength (default: 2.5)
   - Range: 1.0-4.0
   - Higher values = closer match to text description
 #### ODE Parameters:
 - ODE Solver: Choose between 'euler' and 'dopri5'
   - euler: Faster, less accurate
   - dopri5: Slower, more accurate
 - ODE Tolerance: Numerical precision (default: 1e-4)
   - Lower values = higher precision, slower generation
 - Euler Steps: Number of steps for euler solver (default: 10)
   - Higher values = more accurate, slower generation
 ## Generation Process
 1. Select a model based on your needs:
    - Use 400M models for faster generation
    - Use 1B models for higher quality
    - Choose melody-enabled models if using melody input
 2. Enter your text description
 3. Input chord progression:
 ```
 Example:
 (C, 0.0), (Am, 2.5), (F, 5.0), (G, 7.5)
 ```
 4. (Optional) Add drums via file upload or microphone
 5. (Optional) Upload melody matrix
 6. Adjust generation parameters or use defaults
 7. Click "Make Musix"
 ## Output
 - The system generates two variations of your music
 - Each generation is 10 seconds long
 - Output is provided as WAV files
 - You can download or play directly in the interface
 ## Tips for Best Results
 1. Text Descriptions:
    - Be specific about instruments
    - Include genre information
    - Mention desired mood or style
 2. Chord Progressions:
    - Use common progressions for better results
    - Space chords evenly
    - Include resolution points
 3. Drums:
    - Use clean, clear drum patterns
    - Avoid complex patterns for better results
    - Keep volume levels consistent
 4. Memory Management:
    - The interface caches models after first use
    - Switch models only when necessary
    - Clear browser cache if experiencing issues
 ## Example Usage
 ```python
 # Example 1: Pop Music
 Text: "Upbeat pop song with electric piano and synthesizer"
 Chords: (C, 0.0), (Am, 2.5), (F, 5.0), (G, 7.5)
 Model: facebook/jasco-chords-drums-400M
 # Example 2: Orchestral
 Text: "Epic orchestral piece with strong strings and brass"
 Chords: (Cm, 0.0), (G, 3.0), (Bb, 6.0), (Cm, 8.0)
 Model: facebook/jasco-chords-drums-melody-1B
 # Example 3: Jazz
 Text: "Smooth jazz quartet with walking bass and piano"
 Chords: (Dmaj7, 0.0), (Em7, 2.5), (A7, 5.0), (Dmaj7, 7.5)
 Model: facebook/jasco-chords-drums-1B
 ```
 ## Error Handling
 - If generation fails, try:
   1. Simplifying chord progressions
   3. Using simpler text descriptions
   4. Checking input format compliance
   5. Refreshing the page
 ## Performance Considerations
 - First generation may be slower due to model loading
 - Subsequent generations with same model are faster
                 ode_rtol, ode_atol,
                 ode_solver, ode_steps,
                 progress=gr.Progress()):
+    """Generate music using JASCO (Joint Audio-Symbolic Conditioning) model.
+    This function generates two variations of music based on text descriptions, chord progressions,
+    and optional melody and drum inputs. It uses the JASCO model to create high-quality music samples
+    with both global (text) and local (chords, drums, melody) controls.
+    Args:
+        model (str): The JASCO model to use. Options:
+            - 'facebook/jasco-chords-drums-400M': Basic model with chord and drum support (400M parameters)
+            - 'facebook/jasco-chords-drums-1B': Enhanced model with chord and drum support (1B parameters)
+            - 'facebook/jasco-chords-drums-melody-400M': Model with melody support (400M parameters)
+            - 'facebook/jasco-chords-drums-melody-1B': Full-featured model with melody support (1B parameters)
+        text (str): Text description of the desired music. Examples:
+            - "80s pop with groovy synth bass and electric piano"
+            - "Strings, woodwind, orchestral, symphony"
+            - "Jazz quartet with walking bass and smooth piano"
+        chords_sym (str): Chord progression in format "(Chord, Time), (Chord, Time), ...". Time is in seconds (0-10).
+            Example: "(C, 0.0), (D, 2.0), (F, 4.0), (Ab, 6.0), (Bb, 7.0), (C, 8.0)"
+        melody_file (File): Optional. PyTorch tensor file containing melody salience matrix.
+            Shape should be [n_melody_bins, T].
+        drums_file (Audio): Optional. WAV file containing drum patterns (2-4 bars recommended).
+        drums_mic (Audio): Optional. Microphone recording of drum patterns.
+        drum_input_src (str): Source of drum input. Either "file" or "mic".
+        cfg_coef_all (float): Classifier Free Guidance coefficient for overall conditioning.
+            Controls adherence to all input conditions. Range: 1.0-3.0. Default: 1.25.
+        cfg_coef_txt (float): Classifier Free Guidance coefficient for text conditioning.
+            Controls strength of text description matching. Range: 1.0-4.0. Default: 2.5.
+        ode_rtol (float): Relative tolerance for ODE solver. Default: 1e-4.
+        ode_atol (float): Absolute tolerance for ODE solver. Default: 1e-4.
+        ode_solver (str): ODE solver to use. Options:
+            - 'euler': Faster, less accurate
+            - 'dopri5': Slower, more accurate
+        ode_steps (int): Number of steps for euler solver. Default: 10.
+        progress (gr.Progress): Gradio progress bar for tracking generation progress.
+    Returns:
+        tuple: Two WAV file paths containing the generated music variations.
+    Raises:
+        gr.Error: If there are issues with:
+            - Model loading
+            - Invalid melody matrix shape
+            - Generation process
+            - User interruption
+    Notes:
+        - First generation may be slower due to model loading
+        - Subsequent generations with same model are faster
+        - Higher parameter models (1B) require more memory
+        - Melody-enabled models may be slower
+        - The function generates two variations of the music
+        - Each generation is 10 seconds long
+        - Output is provided as WAV files
+    Example:
+        wavs = predict_full(
+            model='facebook/jasco-chords-drums-melody-400M',
+            text="80s pop with groovy synth bass and electric piano",
+            chords_sym="(C, 0.0), (Am, 2.5), (F, 5.0), (G, 7.5)",
+            melody_file=None,
+            drums_file=None,
+            drums_mic=None,
+            drum_input_src="file",
+            cfg_coef_all=1.25,
+            cfg_coef_txt=2.5,
+            ode_rtol=1e-4,
+            ode_atol=1e-4,
+            ode_solver='euler',
+            ode_steps=10
+        )
+    """
     global INTERRUPTING
     INTERRUPTING = False
     progress(0, desc="Loading model...")
         progress((min(max_generated, to_generate), to_generate))
         if INTERRUPTING:
             raise gr.Error("Interrupted.")
     MODEL.set_custom_progress_callback(_progress)
     drums = drums_mic if drum_input_src == "mic" else drums_file
         ode_rtol=ode_rtol,
         ode_atol=ode_atol,
         euler=ode_solver == 'euler',
+        euler_steps=ode_steps
+    )
     return wavs
     except Exception as e:
         print(f"Error launching demo: {str(e)}")
     finally:
+        cleanup_cache()