How would you rank these vocal synthesis engines? Vocaloid 5, Piapro, Alter/Ego, Synthesizer V, and UTAU.
Ok note that since I have never actually used the engines and havent even seen the full UI for them I cannot rank them as well as someone who has, however I will be going by what I know most about them.
Given its AI self-tuning feature, even a beginner can make a well-tuned cover -- I checked. Of course, this doesn't mean it’s impossible to make something that doesn't sound clunky: I’ve seen a few covers using GENBU (who is described as “may appear a bit unpolished on the outside”) but overall the voicebanks appear extensive, allowing a clear voice and the scripting option seems very interesting. Even without the AI tuning, making the voice sound realistic seems very achievable: I’ve seen enough hand-tuned covers that sound smooth as silk. Downsides is that the voices so far all kind of seem same-y to me, and that the current voicebanks aren’t many.
Yeah, we’ve all seen clunky UTAU covers and from what i’ve seen a lot of them are like that, but I believe that being able to make your own voicebank -- for FREE is a big deal. The infinite possibilities of voicebanks means that you’ll most likely be able to find a voicebank that fits your every need. The UI itself feels very basic and flat, which makes sense since it’s a very old program, but
Ah, the one I’m most famiiar with. Given that they share voicebanks and are kinda different engines (That I have not used) I lumped them in together. Note that the amount of it I’ve consumed will probably affect the review and my perception of it. I think vocaloid is better at expressing emotion, but not by a huge margin -- i’ve yet to hear a vocaloid singing with a tone of voice that says “i will strangle you”, but I have heard vocaloid like with slight warbling that makes them sound like they’re holding back tears. Though I’m tenative as I don’t know much about the difference between Piapro studio and VOCALOID editor, I’d rate Vocaloid studio a bit higher as it has more voicebanks from ehat I can tell.
The only one on this list that I have not heard of before. From what I’ve seen from quick searching, the UI looks annoying to work with, and it operates on a text to speech system, which I think would get annoying real quick, given that the results feel a bit randomized. I feel like making a well tuned cover out of this would be the most difficult.