You are here

Wobble artifact produced by a text-to-speech software.

For everything after the recording stage: hardware/software and how you use it.

Wobble artifact produced by a text-to-speech software.

Postby wetduck » Fri Sep 18, 2020 8:52 am

Hello to everyone!

I hope you're having a great day. So, just the other day my boss handed us a new software to learn. This software uses AI to convert text into speech and the results are like deceivably great. The only problem is, some parts of it have this "wobble" in them probably because the AI has to change the tone/intonation of the words.

I've tried using Audition "Auto heal" but instead of just deleting the wobble it silences the part of the audio. It doesn't fix it all. I'm also not sure what the correct term of this problem is.

I was wondering if anyone has any solution to fixing this problem. This is what I see in the waveformImage. The word said is "world" but the r in the letter vibrates.

Here is the link to the audio: shorturl.at/fqLQX (Google drive link)




Thank you!
wetduck
Posts: 2
Joined: Fri Sep 18, 2020 8:35 am

Re: Wobble artifact produced by a text-to-speech software.

Postby Tomás Mulcahy » Fri Sep 18, 2020 9:59 am

The word "world" sounds fine. A much bigger problem is that the whole delivery is not convincing at all. Lacking emotion and spoken too quickly. It's an impressive algorithm for sure, but it needs work. Maybe slowing down the overall speed might make the r sound more agreeable for you?
User avatar
Tomás Mulcahy
Frequent Poster
Posts: 1802
Joined: Wed Apr 25, 2001 12:00 am
Location: Cork, Ireland.

Re: Wobble artifact produced by a text-to-speech software.

Postby Hugh Robjohns » Fri Sep 18, 2020 10:19 am

The 'wobbles' or vibrations, as you call it, sounds like editing artefacts to me --points where the different sound elements are being stitched together.
User avatar
Hugh Robjohns
Moderator
Posts: 28582
Joined: Fri Jul 25, 2003 12:00 am
Location: Worcestershire, UK
Technical Editor, Sound On Sound

Re: Wobble artifact produced by a text-to-speech software.

Postby BJG145 » Fri Sep 18, 2020 10:47 am

The whole thing sounds wobbly to me; "world", "Frodo", "quiet Hobbit". I don't think that's something you can fix in the mix; either something went wrong during the creation/edit or the algorithm needs work. Maybe you could try getting the system to repeat this section a couple of times and see if the glitches are identical. (In ye olden days you sometimes had to feed in different words to get the best result; eg using "whirled" instead of "world".)
User avatar
BJG145
Jedi Poster
Posts: 4584
Joined: Sat Aug 06, 2005 12:00 am

Re: Wobble artifact produced by a text-to-speech software.

Postby wetduck » Tue Sep 22, 2020 2:48 am

Hello everyone!

Thank you for taking a time out of your day to check my post. My initial thought was that these wobbles are not possible to fix after I export them from the text-to-speech software (because there are a lot of them).

Surprisingly, a coworker found out that typing the problematic words in ALL CAPS fixes them for some reason.

Thank you all and have a good day!
wetduck
Posts: 2
Joined: Fri Sep 18, 2020 8:35 am