Neural Analog API endpoint
Audio Upscaling API
Queue GPU-powered audio upscaling and restoration for an audio file, stem, or existing restored version.
post/upscale-audioQueue audio upscaling and restoration for a track, stem, or previous restoration.
Use this after the source audio is complete. If you imported from a
link, first poll GET /status/audio/{audio_id} until is_complete is true.
Then pass that audio_id here. You may also pass stem_id to restore one
stem, or source_upscaled_id to run another restoration pass on an existing
restored version.
The Audio Upscaling API runs GPU-powered enhancement on imported files, stems, mastered artifacts, or previous restorations so the result can be downloaded, mastered, or used for stem splitting.
The endpoint returns immediately with id, the restored artifact ID:
POST /upscale-audio -> id
GET /status/upscaled/{id}
GET /download/upscaled/{id}Choose preset, model_name, stereo_mode, and related restoration
fields to control the repair style. The user's plan must allow the selected
preset.
Run GPU audio upscaling and restoration on imported files, stems, or generated artifacts using Neural Analog presets.
Use POST /upscale-audio to create an upscaled artifact for mastering, downloading, or further stem processing.
Parameters
x-api-keyRequest Body
audio_idSource audio asset ID to restore.
Example: "6c62f8e7-02a3-48c0-a5b5-5de87ed9c31a"
presetRestoration preset to apply before optional mastering. The preset selects the backend and determines which advanced parameters are used; parameters that do not apply to the selected preset are ignored. Use universal_enhancer for general music cleanup, denoise/denoise_debleed/dereverb/decrowd/phantom_center for targeted repair, and voice presets such as novasr, lavasr, and reuse for speech.
Default: "universal_enhancer"
stereo_modeApplies to stereo-capable restoration backends such as universal_enhancer, apollo_voice, universr, reuse, flashsr, audiosr, aero, and acestep_15_xl. Presets such as declip and dialogue_isolate ignore this field. How stereo material is processed. single_pass keeps the stereo file together, mid_sides processes center and side content separately, left_right processes channels independently, and mono folds to mono.
Default: "single_pass"
frequency_cutoffOnly used by the audiosr preset. Other restoration presets ignore this field. Upper frequency boundary in hertz for bandwidth extension models. Values are clamped between 0 and 22000.
Default: 13000
multiband_ensembleOnly used by the audiosr preset. Other restoration presets ignore this field. Run compatible restoration models in frequency bands and blend the result for cleaner high-frequency recovery.
Default: true
model_nameOnly selects variants for aero, universr, acestep_15_xl, and stable_audio_3. Most restoration presets choose their model from the preset and ignore this field. Underlying restoration model variant. Music variants are tuned for full mixes, voice variants for speech bandwidth, Universr variants for broad audio/vocal super-resolution, and ACE-Step/Stable Audio variants for prompt-guided remastering.
Default: "music_musedb"
reconstruction_methodOnly used by the universr preset. original preserves the legacy UniverSR reconstruction path and takes the final low-frequency bins from the bandwidth-limited model input. original_signal keeps that bandwidth-limited input for model conditioning but takes the final low-frequency bins from the original 48 kHz source signal.
Default: "original_signal"
strengthOnly used by prompt-guided restoration presets such as acestep_15_xl and stable_audio_3. Other restoration presets ignore this field. Processing intensity from subtle cleanup to aggressive restoration. Higher values preserve less of the degraded source.
Default: 0.95
promptOnly used by prompt-guided restoration presets such as acestep_15_xl and stable_audio_3. Other restoration presets ignore this field. Short text prompt used by prompt-guided restoration models to steer the desired sound.
Default: "high quality remaster, studio recording, official release, CD quality."
Example: "clean studio master, full bandwidth, natural transients"
inpaint_regionsOnly used by the stable_audio_3 preset. Optional source regions to regenerate with Stable Audio 3 inpainting while preserving the rest of the input audio. Omit to run ordinary audio-to-audio remix.
Example: [{"end":8,"start":4}]
stem_idOptional source stem ID. Omit to restore the full audio file.
Example: "abf8a992-1c4e-4935-93f0-197116e77e49"
source_upscaled_idExisting restored version to use as the source for another pass.
Example: "d66cf940-bf26-45bb-80f7-332f26b6859a"
source_mastered_idExisting mastered artifact to use as the source for restoration.
Example: "f5db8e4b-2e74-4198-a8de-0c3a398620e9"
selectionOptional source region to restore. When provided, processing runs only on this time range.
Example: {"end":42,"start":12.5}
bit_depthOutput WAV bit depth for the restored audio.
Default: 24
hq_streaming_formatNo description provided.
Default: "aac"
Restoration and Audio Upscaling Models
Use these values in the preset field for POST /upscale-audio.
universal_enhancerMP3 Music Restoration (Apollo, 2025)Upscale low quality MP3 back to high quality. Trained on pairs of high quality music, and their degraded mp3 versions. Restores missing high frequencies and removes compression artifacts. Output sample rate matches the prepared input rate (44.1kHz or 48kHz). Recommended for: online source imports, songs downloaded from the internet, compressed sound, tracks missing >16Khz. Not effective to regenerate 16Khz+ audio? Try AudioSR instead.
stable_audio_3Neural Remix - Stable Audio 3 (Stable Audio 3 Medium, 2026)Recreates the track with Stability AI's Stable Audio 3 Medium audio-to-audio editing model. This prompt-guided remix mode uses the source audio as a reference while the text prompt steers style, tone, and instrumentation. Recommended for: creative remixes, inpainting missing musical ideas, alternate takes, and low-end or texture repair that benefits from generative reconstruction. Try a short section first for predictable results.
acestep_15_xlNeural Remix - ACEStep 1.5 XL (ACE-Step 1.5 XL, 2026)Recreates the track with the ACE-Step XL 2026 music generation model in remix mode. This is similar to SUNO 'cover' mode. It will use similar sounds as the reference, but use new notes if you lower the reference strength. Recommended for: Bass tracks, inpainting missing notes in stems, low end mudiness. Interesting results when blended with the original audio.
audiosrAudioSR Upscaler (Audio SR, 2022)Regenerates high frequencies above the selected cutoff while keeping lower frequencies from the original audio. Output sample rate: 48kHz. Recommended for: Low quality recordings, missing high frequencies, hissing noise around 12-13Khz (hihats, sibilance, brilliance). Low thresholds (eg: 3Khz) will change audio the most, while high threshold will mostly preserve what's already here. Not recommended for: Bass tracks, muddy low end. Use Neural Remix instead.
universrUniverSR Upscaler (UniverSR, 2026)Upscale music, voice, and sound effects to 48 kHz. UniverSR is a 2026 model developed by the University of Seoul which performs audio super-resolution directly in the complex STFT domain using flow matching. Very similar to AudioSR, but with more coherent high ends. Output sample rate: 48kHz. Recommended for: Low quality recordings, missing high frequencies, hissing noise around 12-13Khz (hihats, sibilance, brilliance). Low thresholds (eg: 4Khz) will change audio the most, while high threshold will preserve what's already here. Not recommended for: muddy bass or muddy low end. Use Neural Remix instead.
novasrNovaSR Speech Upscaler (NovaSR, 2026)Upscaling model trained on English speech. Best for restoring podcasts, voice-overs, or isolated vocal tracks. Output sample rate: 48kHz.
flashsrFlashSR Upscaler (FlashSR, Jan 2025)Fast single-step super-resolution model for bandwidth extension and detail recovery. Output sample rate: 48kHz. Recommended for: Western music, single instrument upscaling, upscaling 44.1Khz stems to 48 kHz (before Atmos Export). Not recommended for: Bass.
dacvaeNeural Reconstruction (DACVAE, 2024)Leverage the DACVAE neural codec to regenerate all frequencies. Helps with removing out of distribution frequencies. Output sample rate: 48kHz. Recommended for: weird hihats sounds in electronic music.
declipRemove ClippingFixes harsh crackling when volume is too high. Algorithms find optimal settings to remove all crackling, while still maintaining loudness.
dialogue_isolateRemove Room EchoRemove short reverb from voice recordings recorded a room with a laptop microphone. Good for podcasts, voice-overs, and dialogue that have a subtle room echo. Less effective on music or singing.
denoiseRemove NoiseReduces hiss, hum, and background noise while keeping the main vocals/instruments. Uses the same model family as stem splitting, but returns a single cleaned track only.
denoise_debleedDenoise and debleedReduces background noise and source bleed while keeping the main instrumental content. Uses the same model family as stem splitting, but returns a single cleaned track only.
dereverbRemove Long ReverbRemoves long reverb tails, delay, and echo to make the sound drier. Great for singing, music, and live recording with hall echo. Not too good with subtle echo in clean recording. Uses the same model family as stem splitting, but returns a single dry track only.
decrowdRemove Crowd NoiseRemoves audience noise from live recordings while preserving the performance. Uses the same model family as stem splitting, but returns a single cleaned track only.
phantom_centerKeep Only Center MonoExtracts the "phantom center", the content that should be mono in a track. Use this for: bass, kick drums, podcast voice. Removes phaser, chorus, or flanger from instrument stems. Good for mixing.
reuseRE-USE Speech Enhancer (NVIDIA RE-USE, 2026)Improve clarity, upscale, remove reverb, remove noise and audio glitches in multilingual speech. Output sample rate: 48kHz. Recommended for: getting dry vocals, clearer podcasts, noisy voice notes, etc. Not recommended for: full music mixes or instrumental upscaling. Use UniverSR, AudioSR, or a music restoration model instead.
apollo_voiceSinging Upscaler (Apollo Voice, 2025)Specialized voice upscaling. Restores missing high frequencies and removes compression artifacts in lower frequencies. Output sample rate: 48kHz.
lavasrLavaSR Speech Upscaler (LavaSR v2, 2026)Fast and high-quality speech upscaling. Evolution of NovaSR that uses Vocos architecture for efficiency. Output sample rate: 48kHz.
aeroAERO Upscaler (AERO, 2022)Spectral super-resolution model with selectable voice and music variants. Output sample rate depends on selected variant: Music (MUSDB)=44.1kHz, Voice 4-16=16kHz, Voice 8-16=16kHz, Voice 8-24=24kHz, Voice 12-48=48kHz. Not recommended for: Bass.
Example
import os
import requests
response = requests.post(
"https://api.neuralanalog.com/upscale-audio",
headers={"X-API-Key": os.environ["NEURALANALOG_API_KEY"]},
json={
"preset": "universal_enhancer",
"stereo_mode": "mid_sides",
"audio_id": "00000000-0000-0000-0000-000000000000",
},
)
print(response.json())Success Response
idID of the queued restored audio version.
Example: "d66cf940-bf26-45bb-80f7-332f26b6859a"
statusQueueing status for the restoration job.
Example: "processing"
messageHuman-readable queueing result.
Example: "Audio restoration queued"