Start here

Introduction

API endpoints

Neural Analog API endpoint

Audio Upscaling API

Queue GPU-powered audio upscaling and restoration for an audio file, stem, or existing restored version.

post/upscale-audio

Get a key

Audio Upscaling API

Queue audio upscaling and restoration for a track, stem, or previous restoration.

Use this after the source audio is complete. If you imported from a link, first poll GET /status/audio/{audio_id} until is_complete is true. Then pass that audio_id here. You may also pass stem_id to restore one stem, or source_upscaled_id to run another restoration pass on an existing restored version.

The Audio Upscaling API runs GPU-powered enhancement on imported files, stems, mastered artifacts, or previous restorations so the result can be downloaded, mastered, or used for stem splitting.

The endpoint returns immediately with id, the restored artifact ID:

text

POST /upscale-audio -> id
GET /status/upscaled/{id}
GET /download/upscaled/{id}

Choose preset, model_name, stereo_mode, and related restoration fields to control the repair style. The user's plan must allow the selected preset.

Run GPU audio upscaling and restoration on imported files, stems, or generated artifacts using Neural Analog presets.

Use POST /upscale-audio to create an upscaled artifact for mastering, downloading, or further stem processing.

Parameters

x-api-key

optionalheaderstring | null

No description provided.

Request Body

audio_id

requiredstring

Source audio asset ID to restore.

Example: "6c62f8e7-02a3-48c0-a5b5-5de87ed9c31a"

preset

optionalstring

Restoration preset to apply before optional mastering. The preset selects the backend and determines which advanced parameters are used; parameters that do not apply to the selected preset are ignored. Use universal_enhancer for general music cleanup, constant_bpm to remove small tempo drift and align the first beat, denoise, dereverb, decrowd, or phantom_center for targeted repair, and voice presets such as novasr, lavasr, and reuse for speech.

"universal_enhancer""constant_bpm""novasr""flashsr""dacvae""declip""dialogue_isolate""denoise""dereverb""decrowd""phantom_center""audiosr""apollo_voice""lavasr""reuse""acestep_15_xl""stable_audio_3""universr""aero"

Default: "universal_enhancer"

stereo_mode

optionalstring

Applies to stereo-capable restoration backends such as universal_enhancer, apollo_voice, universr, reuse, flashsr, audiosr, aero, and acestep_15_xl. Presets such as declip and dialogue_isolate ignore this field. How stereo material is processed. single_pass keeps the stereo file together, mid_sides processes center and side content separately, left_right processes channels independently, and mono folds to mono.

"single_pass""mid_sides""left_right""mono"

Default: "single_pass"

frequency_cutoff

optionalinteger

Only used by the audiosr and universr presets. Other restoration presets ignore this field. Upper frequency boundary in hertz for bandwidth extension models. AudioSR accepts 3000, 4000, 5000, 8000, 10000, 13000, or 16000. UniverSR accepts 4000, 6000, 8000, or 12000.

Default: 13000

model_name

optionalstring

Only selects variants for denoise, aero, universr, acestep_15_xl, and stable_audio_3. Most restoration presets choose their model from the preset and ignore this field. Underlying restoration model variant. Music variants are tuned for full mixes, voice variants for speech bandwidth, UniverSR variants for broad audio/vocal super-resolution, and ACE-Step/Stable Audio variants for prompt-guided remastering.

"denoise""denoise_debleed""music_musedb""voice_4_16""voice_8_16""voice_8_24""voice_12_48""universr-audio""universr-audio-finetune-v1""universr-vocal""acestep-v15-xl-turbo""acestep-v15-xl-sft""stable-audio-3-medium"

Default: "music_musedb"

reconstruction_method

optionalstring

Only used by the audiosr and universr presets. For AudioSR, multiband_ensemble low-passes the original audio at frequency_cutoff minus 1000 Hz, high-passes the AudioSR output at the same crossover, then sums both bands. original_signal uses frequency_cutoff as a hard final spectrum boundary: original source bins below the cutoff and generated bins at or above it. For UniverSR, original preserves the legacy reconstruction path, while original_signal keeps the bandwidth-limited input for model conditioning but takes the final low-frequency bins from the original 48 kHz source signal.

"multiband_ensemble""original""original_signal"

Default: "original"

strength

optionalnumber

Only used by prompt-guided restoration presets such as acestep_15_xl and stable_audio_3. Other restoration presets ignore this field. Processing intensity from subtle cleanup to aggressive restoration. Higher values preserve less of the degraded source.

Default: 0.95

prompt

optionalstring

Only used by prompt-guided restoration presets such as acestep_15_xl and stable_audio_3. Other restoration presets ignore this field. Short text prompt used by prompt-guided restoration models to steer the desired sound.

Default: "high quality remaster, studio recording, official release, CD quality."

Example: "clean studio master, full bandwidth, natural transients"

prompt_strength

optionalnumber

Only used by the stable_audio_3 preset. Other restoration presets ignore this field. Stable Audio 3 classifier-free guidance scale. Higher values make the text prompt influence generation more strongly relative to the reference audio.

Default: 1

inpaint_regions

optionalarray<object> | null

Only used by the stable_audio_3 preset. Optional source regions to regenerate with Stable Audio 3 inpainting while preserving the rest of the input audio. Omit to run ordinary audio-to-audio remix.

Example: [{"end":8,"start":4}]

stem_id

optionalstring | null

Optional source stem ID. Omit to restore the full audio file.

Example: "abf8a992-1c4e-4935-93f0-197116e77e49"

source_upscaled_id

optionalstring | null

Existing restored version to use as the source for another pass.

Example: "d66cf940-bf26-45bb-80f7-332f26b6859a"

source_mastered_id

optionalstring | null

Existing mastered artifact to use as the source for restoration.

Example: "f5db8e4b-2e74-4198-a8de-0c3a398620e9"

source_temporary_mix_key

optionalstring | null

Short-lived Current Main Mix or Current All Stems Mix R2 source key.

selection

optionalobject | null

Optional source region to restore. When provided, processing runs only on this time range.

Example: {"end":42,"start":12.5}

bit_depth

optionalinteger

Output WAV bit depth for the restored audio.

1624

Default: 24

hq_streaming_format

optionalstring

No description provided.

"aac""mp3""flac"

Default: "aac"

Restoration and Audio Upscaling Models

Use these values in the preset field for POST /upscale-audio.

universal_enhancerMP3 Music Restoration (Apollo, 2025)

Upscale low quality MP3 back to high quality. Trained on pairs of high quality music, and their degraded mp3 versions. Restores missing high frequencies and removes compression artifacts. Output sample rate matches the prepared input rate (44.1kHz or 48kHz). Recommended for: online source imports, songs downloaded from the internet, compressed sound, tracks missing >16Khz. Not effective to regenerate 16Khz+ audio? Try AudioSR instead.

stable_audio_3Neural Remix - Stable Audio 3 (Stable Audio 3 Medium, 2026)

Recreates the track with Stability AI's Stable Audio 3 Medium audio-to-audio editing model. This prompt-guided remix mode uses the source audio as a reference while the text prompt steers style, tone, and instrumentation. Recommended for: creative remixes, inpainting missing musical ideas, alternate takes, and low-end or texture repair that benefits from generative reconstruction. Try a short section first for predictable results.

acestep_15_xlNeural Remix - ACEStep 1.5 XL (ACE-Step 1.5 XL, 2026)

Recreates the track with the ACE-Step XL 2026 music generation model in remix mode. This is similar to SUNO 'cover' mode. It will use similar sounds as the reference, but use new notes if you lower the reference strength. Recommended for: Bass tracks, inpainting missing notes in stems, low end mudiness. Interesting results when blended with the original audio.

audiosrAudioSR Upscaler (Audio SR, 2022)

Regenerates high frequencies above the selected cutoff while keeping lower frequencies from the original audio. Recommended for: Low quality recordings, missing high frequencies, hissing noise around 10Khz (hihats, voice, suno hiss). Low thresholds (eg: 3Khz) will change audio the most, while high threshold will mostly preserve what's already here. Not recommended for: Bass tracks, muddy low end. Use Neural Remix instead.

universrUniverSR Upscaler (UniverSR, 2026)

Upscale music, voice, and sound effects. UniverSR is a 2026 model developed by the University of Seoul which performs audio super-resolution directly in the complex STFT domain using flow matching. Very similar to AudioSR, but with more coherent high ends. Recommended for: Low quality recordings, missing high frequencies, hissing noise around 10Khz (hihats, voice, suno hiss). Low thresholds (eg: 4Khz) will change audio the most, while high threshold will preserve what's already here. Not recommended for: muddy bass or muddy low end. Use Neural Remix instead.

novasrNovaSR Speech Upscaler (NovaSR, 2026)

Upscaling model trained on English speech. Best for restoring podcasts, voice-overs, or isolated vocal tracks.

flashsrFlashSR Upscaler (FlashSR, Jan 2025)

Fast single-step super-resolution model for bandwidth extension and detail recovery. Recommended for: Western music, single instrument upscaling, preparing 44.1Khz stems before Atmos Export. Not recommended for: Bass.

constant_bpmRemove speed variation (make bpm constant) (Beat This!, 2024)

Fixes wonky tempo in music generated by SUNO when the speed keeps changing throughout the song. Recommended for: SUNO songs that should stay at a constant BPM and align cleanly to a beat grid. It can also help tighten tempo variations in live performances.

dacvaeNeural Reconstruction (DACVAE, 2024)

Leverage the DACVAE neural codec to regenerate all frequencies. Helps with removing out of distribution frequencies. Recommended for: weird hihats sounds in electronic music.

declipRemove Clipping

Fixes harsh crackling when volume is too high. Algorithms find optimal settings to remove all crackling, while still maintaining loudness.

dialogue_isolateRemove Room Echo

Remove short reverb from voice recordings recorded a room with a laptop microphone. Good for podcasts, voice-overs, and dialogue that have a subtle room echo. Less effective on music or singing.

denoiseRemove Noise

Reduces hiss, hum, background noise, and optional source bleed while keeping the main vocals/instruments. Uses the same model family as stem splitting, but returns a single cleaned track only.

dereverbRemove Long Reverb

Removes long reverb tails, delay, and echo to make the sound drier. Great for singing, music, and live recording with hall echo. Not too good with subtle echo in clean recording. Uses the same model family as stem splitting, but returns a single dry track only.

decrowdRemove Crowd Noise

Removes audience noise from live recordings while preserving the performance. Uses the same model family as stem splitting, but returns a single cleaned track only.

phantom_centerKeep Only Center Mono

Extracts the "phantom center", the content that should be mono in a track. Use this for: bass, kick drums, podcast voice. Removes phaser, chorus, or flanger from instrument stems. Good for mixing.

reuseRE-USE Speech Enhancer (NVIDIA RE-USE, 2026)

Improve clarity, upscale, remove reverb, remove noise and audio glitches in multilingual speech. Recommended for: getting dry vocals, clearer podcasts, noisy voice notes, etc. Not recommended for: full music mixes or instrumental upscaling. Use UniverSR, AudioSR, or a music restoration model instead.

apollo_voiceSinging Upscaler (Apollo Voice, 2025)

Specialized voice upscaling. Restores missing high frequencies and removes compression artifacts in lower frequencies. Output sample rate is 44.1kHz for lower-rate inputs and 48kHz for 48kHz+ inputs.

Example

Python

import os
import requests
response = requests.post(
    "https://api.neuralanalog.com/upscale-audio",
    headers={"X-API-Key": os.environ["NEURALANALOG_API_KEY"]},
    json={
        "preset": "universal_enhancer",
        "stereo_mode": "mid_sides",
        "audio_id": "00000000-0000-0000-0000-000000000000",
    },
)
print(response.json())

Success Response

200Successful Response

id

requiredstring

ID of the queued restored audio version.

Example: "d66cf940-bf26-45bb-80f7-332f26b6859a"

status

requiredstring

Queueing status for the restoration job.

Example: "processing"

message

requiredstring

Human-readable queueing result.

Example: "Audio restoration queued"