This is the PyWorker implementation for running ACE Step v1 3.5B text-to-music workflows in ComfyUI. It provides a unified interface for executing complete ComfyUI audio-generation workflows through a proxy-based architecture and returning generated audio assets.

Each request has a static cost of 100. ComfyUI does not support concurrent workloads, and there is no provision to run multiple ComfyUI instances per worker node.

Requirements

This worker requires the following components:

ComfyUI (https://github.com/comfyanonymous/ComfyUI)
ComfyUI API Wrapper (https://github.com/ai-dock/comfyui-api-wrapper)
ACE Step v1 3.5B model and required custom nodes

A Docker image is provided with the ACE Step model pre-installed, but any image may be used if the above requirements are met.

Endpoint

The worker exposes a single synchronous endpoint:

/generate/sync: Processes a complete ComfyUI workflow JSON and generates audio output

Request Format

The ACE Step worker only supports custom workflow mode. Modifier-based workflows are not supported.

{
  "input": {
    "request_id": "uuid-string",
    "workflow_json": {
      // Complete ComfyUI ACE Step workflow JSON
    },
    "s3": { },
    "webhook": { }
  }
}

Request Fields

Required Fields

input: Container for all request parameters
input.workflow_json: Complete ComfyUI workflow graph for ACE Step audio generation

Optional Fields

input.request_id: Client-defined request identifier
input.s3: S3-compatible storage configuration
input.webhook: Webhook configuration for completion notifications

The special string "__RANDOM_INT__" may be used in the workflow JSON and will be replaced with a random integer before submission to ComfyUI.

S3 Configuration

Generated audio assets can be automatically uploaded to S3-compatible storage. Configuration can be supplied per request or via environment variables. Request-level values take precedence.

Via Request JSON

"s3": {
  "access_key_id": "your-s3-access-key",
  "secret_access_key": "your-s3-secret-access-key",
  "endpoint_url": "https://s3.amazonaws.com",
  "bucket_name": "your-bucket",
  "region": "us-east-1"
}

Via Environment Variables

S3_ACCESS_KEY_ID=your-key
S3_SECRET_ACCESS_KEY=your-secret
S3_BUCKET_NAME=your-bucket
S3_ENDPOINT_URL=https://s3.amazonaws.com
S3_REGION=us-east-1

Webhook Configuration

Webhooks are triggered on request completion or failure.

Via Request JSON

"webhook": {
  "url": "https://your-webhook-url",
  "extra_params": {
    "custom_field": "value"
  }
}

Via Environment Variables

WEBHOOK_URL=https://your-webhook-url
WEBHOOK_TIMEOUT=30

Example Request

ACE Step Text-to-Music Workflow

{
  "input": {
    "workflow_json": {
      "14": {
        "inputs": {
          "tags": "funk, pop, upbeat, 105 BPM",
          "lyrics": "Turn it up and let it flow",
          "lyrics_strength": 0.99,
          "clip": ["40", 1]
        },
        "class_type": "TextEncodeAceStepAudio"
      },
      "17": {
        "inputs": {
          "seconds": 180,
          "batch_size": 1
        },
        "class_type": "EmptyAceStepLatentAudio"
      },
      "40": {
        "inputs": {
          "ckpt_name": "ace_step_v1_3.5b.safetensors"
        },
        "class_type": "CheckpointLoaderSimple"
      }
    }
  }
}

Response Format

A successful response includes execution metadata, ComfyUI output details, and generated audio assets.

Response Fields

id: Unique request identifier
status: completed, failed, processing, generating, or queued
message: Human-readable status message
comfyui_response: Raw response from ComfyUI, including execution status and progress
output: Array of generated outputs
timings: Timing information for the request

Output Object

Each entry in output includes:

filename: Generated file name (e.g., .mp3)
local_path: File path on the worker
url: Pre-signed download URL (if S3 is configured)
type: Output type (output)
subfolder: Output directory (e.g., audio)
node_id: ComfyUI node that produced the output
output_type: Output category (e.g., audio)

Notes and Limitations

Only full ComfyUI workflow JSONs are supported
Concurrent requests are not supported per worker
ACE Step model must be installed before processing requests
Audio generation duration and runtime depend on workflow configuration