Neural Analog API endpoint

Audio Upscaling API

Queue GPU-powered audio upscaling and restoration for an audio file, stem, or existing restored version.

post/upscale-audio
Get a key
Audio Upscaling API

Queue audio upscaling and restoration for a track, stem, or previous restoration.

Use this after the source audio is complete. If you imported from a link, first poll GET /status/audio/{audio_id} until is_complete is true. Then pass that audio_id here. You may also pass stem_id to restore one stem, or source_upscaled_id to run another restoration pass on an existing restored version.

The Audio Upscaling API runs GPU-powered enhancement on imported files, stems, mastered artifacts, or previous restorations so the result can be downloaded, mastered, or used for stem splitting.

The endpoint returns immediately with id, the restored artifact ID:

text
POST /upscale-audio -> id
GET /status/upscaled/{id}
GET /download/upscaled/{id}

Choose preset, model_name, stereo_mode, and related restoration fields to control the repair style. The user's plan must allow the selected preset.

Run GPU audio upscaling and restoration on imported files, stems, or generated artifacts using Neural Analog presets.

Use POST /upscale-audio to create an upscaled artifact for mastering, downloading, or further stem processing.

Parameters

x-api-key
optionalheaderstring | null
No description provided.

Request Body

audio_id
requiredstring

Source audio asset ID to restore.

Example: "6c62f8e7-02a3-48c0-a5b5-5de87ed9c31a"

preset
optionalstring

Restoration preset to apply before optional mastering. The preset selects the backend and determines which advanced parameters are used; parameters that do not apply to the selected preset are ignored. Use universal_enhancer for general music cleanup, denoise/denoise_debleed/dereverb/decrowd/phantom_center for targeted repair, and voice presets such as novasr, lavasr, and reuse for speech.

"universal_enhancer""novasr""flashsr""dacvae""declip""dialogue_isolate""denoise""denoise_debleed""dereverb""decrowd""phantom_center""audiosr""apollo_voice""lavasr""reuse""acestep_15_xl""stable_audio_3""universr""aero"

Default: "universal_enhancer"

stereo_mode
optionalstring

Applies to stereo-capable restoration backends such as universal_enhancer, apollo_voice, universr, reuse, flashsr, audiosr, aero, and acestep_15_xl. Presets such as declip and dialogue_isolate ignore this field. How stereo material is processed. single_pass keeps the stereo file together, mid_sides processes center and side content separately, left_right processes channels independently, and mono folds to mono.

"single_pass""mid_sides""left_right""mono"

Default: "single_pass"

frequency_cutoff
optionalinteger

Only used by the audiosr preset. Other restoration presets ignore this field. Upper frequency boundary in hertz for bandwidth extension models. Values are clamped between 0 and 22000.

Default: 13000

multiband_ensemble
optionalboolean

Only used by the audiosr preset. Other restoration presets ignore this field. Run compatible restoration models in frequency bands and blend the result for cleaner high-frequency recovery.

Default: true

model_name
optionalstring

Only selects variants for aero, universr, acestep_15_xl, and stable_audio_3. Most restoration presets choose their model from the preset and ignore this field. Underlying restoration model variant. Music variants are tuned for full mixes, voice variants for speech bandwidth, Universr variants for broad audio/vocal super-resolution, and ACE-Step/Stable Audio variants for prompt-guided remastering.

"music_musedb""voice_4_16""voice_8_16""voice_8_24""voice_12_48""universr-audio""universr-vocal""acestep-v15-xl-turbo""acestep-v15-xl-sft""stable-audio-3-medium"

Default: "music_musedb"

reconstruction_method
optionalstring

Only used by the universr preset. original preserves the legacy UniverSR reconstruction path and takes the final low-frequency bins from the bandwidth-limited model input. original_signal keeps that bandwidth-limited input for model conditioning but takes the final low-frequency bins from the original 48 kHz source signal.

"original""original_signal"

Default: "original_signal"

strength
optionalnumber

Only used by prompt-guided restoration presets such as acestep_15_xl and stable_audio_3. Other restoration presets ignore this field. Processing intensity from subtle cleanup to aggressive restoration. Higher values preserve less of the degraded source.

Default: 0.95

prompt
optionalstring

Only used by prompt-guided restoration presets such as acestep_15_xl and stable_audio_3. Other restoration presets ignore this field. Short text prompt used by prompt-guided restoration models to steer the desired sound.

Default: "high quality remaster, studio recording, official release, CD quality."

Example: "clean studio master, full bandwidth, natural transients"

inpaint_regions
optionalarray<object> | null

Only used by the stable_audio_3 preset. Optional source regions to regenerate with Stable Audio 3 inpainting while preserving the rest of the input audio. Omit to run ordinary audio-to-audio remix.

Example: [{"end":8,"start":4}]

stem_id
optionalstring | null

Optional source stem ID. Omit to restore the full audio file.

Example: "abf8a992-1c4e-4935-93f0-197116e77e49"

source_upscaled_id
optionalstring | null

Existing restored version to use as the source for another pass.

Example: "d66cf940-bf26-45bb-80f7-332f26b6859a"

source_mastered_id
optionalstring | null

Existing mastered artifact to use as the source for restoration.

Example: "f5db8e4b-2e74-4198-a8de-0c3a398620e9"

selection
optionalobject | null

Optional source region to restore. When provided, processing runs only on this time range.

Example: {"end":42,"start":12.5}

bit_depth
optionalinteger

Output WAV bit depth for the restored audio.

1624

Default: 24

hq_streaming_format
optionalstring

No description provided.

"aac""mp3""flac"

Default: "aac"

Restoration and Audio Upscaling Models

Use these values in the preset field for POST /upscale-audio.

universal_enhancerMP3 Music Restoration (Apollo, 2025)

Upscale low quality MP3 back to high quality. Trained on pairs of high quality music, and their degraded mp3 versions. Restores missing high frequencies and removes compression artifacts. Output sample rate matches the prepared input rate (44.1kHz or 48kHz). Recommended for: online source imports, songs downloaded from the internet, compressed sound, tracks missing >16Khz. Not effective to regenerate 16Khz+ audio? Try AudioSR instead.

stable_audio_3Neural Remix - Stable Audio 3 (Stable Audio 3 Medium, 2026)

Recreates the track with Stability AI's Stable Audio 3 Medium audio-to-audio editing model. This prompt-guided remix mode uses the source audio as a reference while the text prompt steers style, tone, and instrumentation. Recommended for: creative remixes, inpainting missing musical ideas, alternate takes, and low-end or texture repair that benefits from generative reconstruction. Try a short section first for predictable results.

acestep_15_xlNeural Remix - ACEStep 1.5 XL (ACE-Step 1.5 XL, 2026)

Recreates the track with the ACE-Step XL 2026 music generation model in remix mode. This is similar to SUNO 'cover' mode. It will use similar sounds as the reference, but use new notes if you lower the reference strength. Recommended for: Bass tracks, inpainting missing notes in stems, low end mudiness. Interesting results when blended with the original audio.

audiosrAudioSR Upscaler (Audio SR, 2022)

Regenerates high frequencies above the selected cutoff while keeping lower frequencies from the original audio. Output sample rate: 48kHz. Recommended for: Low quality recordings, missing high frequencies, hissing noise around 12-13Khz (hihats, sibilance, brilliance). Low thresholds (eg: 3Khz) will change audio the most, while high threshold will mostly preserve what's already here. Not recommended for: Bass tracks, muddy low end. Use Neural Remix instead.

universrUniverSR Upscaler (UniverSR, 2026)

Upscale music, voice, and sound effects to 48 kHz. UniverSR is a 2026 model developed by the University of Seoul which performs audio super-resolution directly in the complex STFT domain using flow matching. Very similar to AudioSR, but with more coherent high ends. Output sample rate: 48kHz. Recommended for: Low quality recordings, missing high frequencies, hissing noise around 12-13Khz (hihats, sibilance, brilliance). Low thresholds (eg: 4Khz) will change audio the most, while high threshold will preserve what's already here. Not recommended for: muddy bass or muddy low end. Use Neural Remix instead.

novasrNovaSR Speech Upscaler (NovaSR, 2026)

Upscaling model trained on English speech. Best for restoring podcasts, voice-overs, or isolated vocal tracks. Output sample rate: 48kHz.

flashsrFlashSR Upscaler (FlashSR, Jan 2025)

Fast single-step super-resolution model for bandwidth extension and detail recovery. Output sample rate: 48kHz. Recommended for: Western music, single instrument upscaling, upscaling 44.1Khz stems to 48 kHz (before Atmos Export). Not recommended for: Bass.

dacvaeNeural Reconstruction (DACVAE, 2024)

Leverage the DACVAE neural codec to regenerate all frequencies. Helps with removing out of distribution frequencies. Output sample rate: 48kHz. Recommended for: weird hihats sounds in electronic music.

declipRemove Clipping

Fixes harsh crackling when volume is too high. Algorithms find optimal settings to remove all crackling, while still maintaining loudness.

dialogue_isolateRemove Room Echo

Remove short reverb from voice recordings recorded a room with a laptop microphone. Good for podcasts, voice-overs, and dialogue that have a subtle room echo. Less effective on music or singing.

denoiseRemove Noise

Reduces hiss, hum, and background noise while keeping the main vocals/instruments. Uses the same model family as stem splitting, but returns a single cleaned track only.

denoise_debleedDenoise and debleed

Reduces background noise and source bleed while keeping the main instrumental content. Uses the same model family as stem splitting, but returns a single cleaned track only.

dereverbRemove Long Reverb

Removes long reverb tails, delay, and echo to make the sound drier. Great for singing, music, and live recording with hall echo. Not too good with subtle echo in clean recording. Uses the same model family as stem splitting, but returns a single dry track only.

decrowdRemove Crowd Noise

Removes audience noise from live recordings while preserving the performance. Uses the same model family as stem splitting, but returns a single cleaned track only.

phantom_centerKeep Only Center Mono

Extracts the "phantom center", the content that should be mono in a track. Use this for: bass, kick drums, podcast voice. Removes phaser, chorus, or flanger from instrument stems. Good for mixing.

reuseRE-USE Speech Enhancer (NVIDIA RE-USE, 2026)

Improve clarity, upscale, remove reverb, remove noise and audio glitches in multilingual speech. Output sample rate: 48kHz. Recommended for: getting dry vocals, clearer podcasts, noisy voice notes, etc. Not recommended for: full music mixes or instrumental upscaling. Use UniverSR, AudioSR, or a music restoration model instead.

apollo_voiceSinging Upscaler (Apollo Voice, 2025)

Specialized voice upscaling. Restores missing high frequencies and removes compression artifacts in lower frequencies. Output sample rate: 48kHz.

lavasrLavaSR Speech Upscaler (LavaSR v2, 2026)

Fast and high-quality speech upscaling. Evolution of NovaSR that uses Vocos architecture for efficiency. Output sample rate: 48kHz.

aeroAERO Upscaler (AERO, 2022)

Spectral super-resolution model with selectable voice and music variants. Output sample rate depends on selected variant: Music (MUSDB)=44.1kHz, Voice 4-16=16kHz, Voice 8-16=16kHz, Voice 8-24=24kHz, Voice 12-48=48kHz. Not recommended for: Bass.

Example

Python
import os
import requests
response = requests.post(
    "https://api.neuralanalog.com/upscale-audio",
    headers={"X-API-Key": os.environ["NEURALANALOG_API_KEY"]},
    json={
        "preset": "universal_enhancer",
        "stereo_mode": "mid_sides",
        "audio_id": "00000000-0000-0000-0000-000000000000",
    },
)
print(response.json())

Success Response

200Successful Response
id
requiredstring

ID of the queued restored audio version.

Example: "d66cf940-bf26-45bb-80f7-332f26b6859a"

status
requiredstring

Queueing status for the restoration job.

Example: "processing"

message
requiredstring

Human-readable queueing result.

Example: "Audio restoration queued"