Real-Time AI Sound Generation on Arm: A Personal Tool for Creative Freedom

Community Article Published June 3, 2025

image/png By Michael Gamble, Partner & Ecosystem Lead, Arm

As a software engineer and music producer, I’m always exploring how technology can expand creative expression. That curiosity recently led me to build a personal sound generation app that runs directly on-device—powered by an Arm-based CPU and open-source generative AI models. It’s fast, private, and enables me to generate studio-ready sounds from a simple prompt, all within seconds.

This project brings together the best of several worlds:

  • The Stable Audio Open model from Stability AI, sourced from Hugging Face
  • Execution powered by PyTorch and TorchAudio
  • A fast, efficient pipeline that runs natively on Arm-based CPUs
  • A seamless creative handoff to Ableton Live

A New Kind of Creative Companion

When I’m deep in a music project using Ableton Live, I don’t want to interrupt my workflow to dig through libraries or browse sound packs. I wanted a tool that could meet me where I am—right in the flow.

Now, I can simply describe the sound I’m imagining (“analog bassline,” “cinematic riser,” “lofi snare”), and within seconds, the generated .wav file appears in my Ableton browser. From there, I can tweak it, loop it, or turn it into an instrument.

Every sound is unique. No one else will generate exactly what I do. That sense of personal ownership fuels my creativity.

Powered by Arm: On-Device, On-Demand

This sound generator runs entirely on-device using Arm-based CPU technology—no GPU, no cloud inference, no latency. Thanks to Arm's efficiency and performance-per-watt, the app stays responsive even during multi-step diffusion runs.

The generation engine is built on:

Sample Code: Optimized CPU Generation

To maximize performance on Arm CPUs, I enabled full thread utilization:

# Use all available Arm CPU threads
torch.set_num_threads(os.cpu_count())

To maintain low memory usage across generations:

# Clear memory periodically
if gen_count % 3 == 0:
    gc.collect()
    print(f"Memory cleared at generation {gen_count}")

Core generation loop, tuned for speed and efficiency:

output = generate_diffusion_cond(
    model,
    steps=7,                  # Reduced step count for faster inference
    cfg_scale=1,
    conditioning=conditioning,
    sample_size=sample_size,
    sigma_min=0.3,
    sigma_max=500,
    sampler_type="dpmpp-3m-sde",
    device=device
)

Device Flexibility: CPU, Metal, CUDA

Although optimized for CPU, the program can also run on Metal (Apple Silicon) or CUDA if needed:

device = "mps"    # Apple Silicon
# device = "cuda" # NVIDIA
# device = "cpu"  # Arm CPU (default)
model = model.to(device).to(torch.float32)

Seamless Workflow with Ableton Live

The tool outputs .wav files directly to a project folder monitored by Ableton Live. Here's a sample CLI interaction:

Enter a prompt for generating audio:
Ambient texture
Enter a tempo for the audio:
100
Generated audio saved to: Ambient texture.wav

I immediately see the file show up in my browser within Live, ready to be arranged, modulated, and transformed.

Why This Matters

This project is a personal prototype—but it’s also a window into the future of content creation. With efficient, on-device AI inference on Arm CPUs, artists and developers can:

  • Stay in creative flow without waiting on cloud resources
  • Ensure data privacy and full ownership of outputs
  • Extend AI tools into edge devices, DAWs, and new creative interfaces

This is what happens when open-source innovation meets efficient compute: real-time generative power, accessible to every creator.


Explore the ecosystem that made this possible:

Community

Sign up or log in to comment