That depends on the operation, the selected model, and your audio input length. Usually, it's about 30s. Large models on long audio can take up to 5mn. You can leave the page and come back later.