FLUX.1 [schnell] -- Flumina Server App (FP8 Version)
This repository contains an implementation of the FLUX.1 [schnell] FP8 version, which utilizes float8 numerics instead of bfloat16. This update allows for a 2x performance improvement, significantly speeding up inference tasks when deployed via Fireworks AI’s Flumina Server App toolkit.
Getting Started -- Serverless deployment on Fireworks
This FP8 Server App is deployed to Fireworks as-is in a "serverless" deployment, offering high-speed, hassle-free performance.
Grab an API Key from Fireworks and set it in your environment variables:
export API_KEY=YOUR_API_KEY_HERE
Text-to-Image Example Call
curl -X POST 'https://api.fireworks.ai/inference/v1/workflows/accounts/fireworks/models/flux-1-schnell-fp8/text_to_image' \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: image/jpeg" \
-d '{
"prompt": "Woman laying in the grass",
"aspect_ratio": "16:9",
"guidance_scale": 3.5,
"num_inference_steps": 4,
"seed": 0
}' \
--output output.jpg
What is Flumina?
Flumina is Fireworks.ai’s innovative platform for hosting Server Apps that lets users deploy deep learning inference to production environments in just minutes.
What does Flumina offer for FLUX models?
Flumina provides the following advantages for FLUX models:
- Clear, precise definitions of server-side workloads by reviewing the server app implementation (right here).
- Extensibility interface for dynamic loading and dispatching of add-ons server-side. For FLUX, this includes:
- ControlNet (Union) adapters
- LoRA adapters
- Off-the-shelf support for on-demand capacity scaling with Server Apps on Fireworks.
- Customization of the deployment logic through modifications to the Server App, with easy redeployment.
- Support for FP8 numerics, unlocking faster, more efficient inference capabilities.
Deploying FLUX.1 [schnell] FP8 Version to Fireworks On-Demand
Deploying Custom FLUX.1 [schnell] FP8 Apps to Fireworks On-demand
Coming soon!
- Downloads last month
- 6