Welcome Guest [Log In] [Register]




Welcome to the UTAforum!

UTAU is a Japanese voice-synthesis program used to create singing vocals for music. It is much like the more popular, commercially sold Vocaloid products, but UTAU is freeware!

To learn more about UTAU and various UTAU voices/characters, you can click here.

You're currently viewing our forum as a guest. This means you are limited to certain areas of the board and there are some features you can't use. If you join our community, you'll be able to access member-only sections, and use many member-only features such as customizing your profile, sending personal messages, and voting in polls. Registration is simple, fast, and completely free.


Join our community!


If you're already a member please log in to your account to access all of our features:

Username:   Password:
Add Reply
Which things do influence UTAU voices?
Topic Started: Oct 5 2011, 12:04 PM (647 Views)
Berin
Member Avatar

Um..So I wondered which factors do have influence on how your UTAU's voice will sound. What do you have to do to get a soft voice? What to get a strong voice? I heard that not only the resampler you use or the voice you record it with have much influence, but also the length of your recordings for example.

The only voice I made an UTAU of was my normal voice. the recordings are between 0.6 and 1.5 seconds long; I recorded it with a headset with a flexible mic, the distance between my mouth and it wasn't very big. When I recorded, I was pretty relaxed (it varied pretty much in the cv vb). I oto'd my CV vb like that:
Spoiler: click to toggle
and the resampler I use for her is TIPS. The result is, that Beryll heliodora's voice sounds pretty soft, maybe a little bit tired (?) or sad. You can hear the result in her newest song (CV-VC)---> http://soundcloud.com/beri-chan/mintoko-tamashiine-and-beryll

What things would be important if I, for example, wanted to make a strong voice now? How do your UTAUs sound and why do they sound like that? I'd really like to know that for future appends!
Edited by Berin, Oct 5 2011, 12:11 PM.
Posted Image
Posted ImagePosted ImagePosted Image
Offline Profile Quote Post Goto Top
 
Aleksandr
Member Avatar
Kanaya West
Hmm.... my Aleks 1.0 Bank and. 2.0 Bank is a good example of differences of sounds.

Aleks 1.0 sounded very robotic, and was kind of "weak" sounding, also she had no accent (or I guess, an american accent, the kind that I am used to hearing so it sounds un-accented to me?) Also, it was kind of breathy. But it had hardly any background fuzz and strong consonants (I recorded by just saying everything, very relaxed but was very careful on making my sounds the same volume and length) Also the mic I used was older, but it was very clear.

Aleks 2.0 is more human-y sounding, I mean it's still robotic sometimes but it's much better. She has a HUUUGE accent (not just because of her incorrect r's), it's sort of a southern sound... yeah, I am a southerner >w<; I don't try very hard for a perfect japanese accent, or anything, in fact I prefer banks that have their own sound to them. But uhm, anyways, She also has quite a bit of background fuzz (specially on -i sounds) and some of her consonants are not too strong. What I did differently is, first off, I had a better mic, but I was doing it not at home and the person hadn't configured their comp for the mic yet, so everything was recorded WAY too low. I had to up the volume on all of them, also upping the background fuzz. Secondly, I still said my notes, in the same tone and with the same emphasis, but I held out my vowels for a second. Aleks 1.'s sounds were between .35 -.55 seconds long, while 2.0 was .85 - 1.5 seconds long. It made a difference. Also Aleks 2.0 sounds are less breathy in the long run =w=

hope this helps~ I won't get into my otoing mostly because I always do it wrong orz;
I will push my ships on you, always.


Aleks' World of Fun // Ask!Zhenya
Offline Profile Quote Post Goto Top
 
Berin
Member Avatar

so although you kinda recorded it the same way, the 2.0 sounds better and more human-like, and that just because you used a different mic and the recordings were longer? I always thought longer recordings would support breathy-ness, but it seems like this is not the case...thank you!
Posted Image
Posted ImagePosted ImagePosted Image
Offline Profile Quote Post Goto Top
 
annamaeblythe
Member Avatar
Banned
fff-
Longer samples make your voice sound like a cheese-cloth in UTAU.
Here's (CV) Anaka with about 3-5 second per syllable samples: http://www.youtube.com/watch?v=4kuuRPiyFps
Here's (VCV) Anaka with samples that are about .5 seconds per syllable: http://www.youtube.com/watch?v=FihLWpmpNGs
(different mic, slightly different style.)

Anaka has about 30 different banks.
The first ten or so were all recorded on the same mic, but are kind of like night and day: http://www.youtube.com/watch?v=PqrMzVqrC6Q

but, that aside,
Almost everything has an influence on your bank. I can't think of anything that doesn't :P It's just that some things are more influential than others.
Once you get into VCV, sample length isn't what kills a bank, it's the fact that very rarely can someone hold out a note for five seconds five times in a row.
Pitch won't matter much, unless you change it a lot during samples.
And stuff like that.
Offline Profile Quote Post Goto Top
 
Berin
Member Avatar

30 different banks...?! You..work really hard on your voicebank *.* 5sec samples really are too long, I would never make the samples longer than 2 secs...
Well if really EVERYTHING has an influence..then maybe we could collect some things that are important to make specific voicetypes. For example a very soft or strong voice. That would be a question for the people who already made a few appends for their utaus...Oh or how you avoid or make a mechanical/robotic voice! Or what would be the optimal conditions to make a voicebank?
Edited by Berin, Oct 5 2011, 04:15 PM.
Posted Image
Posted ImagePosted ImagePosted Image
Offline Profile Quote Post Goto Top
 
Aleksandr
Member Avatar
Kanaya West
Hm, I think the mic I used affected my voicebanks the most.

The first mic was rather old, but it was very very clear and I my environment was very quiet, so there was little-to-none background fuzz. I also did noise removal on all of my notes with Audacity, which creates a bit of a clearer voice as well (but noise removal is kind of a double-edged sword...) Also it was a freestanding desk mic, not a headset.

The second was a brand new, Sennheiser HD headset. Sennheiser is very good for cheap headsets, but they can be a bit of a trouble to use just because it IS a headset. I was at my bf's house (it's his mic .w. ...) and there was a bit going on, like people going up and down stairs, etc. So the environment was a bit more noisy, as well the computer was not configured like I said, so everything was recorded very quietly. The background noise on the headset was higher of that of my old mic, but the voice quality was much much better. I am prolly going to re-do everything again (ACT 2.5?) so it's louder and with better r sounds =w=

But the breathiness on Act 1 and 2 are both because I SAID the sounds. I have a very naturally breathy voice, it especially comes out on sounds like k and sh, so it was emphasized more when I said them quickly rather than trying to hold out the sounds. c:

But yes, EVERYTHING affects the UTAU, EXCEPT what program you use to record with. this has been disproved.
I will push my ships on you, always.


Aleks' World of Fun // Ask!Zhenya
Offline Profile Quote Post Goto Top
 
annamaeblythe
Member Avatar
Banned
Actually, SOMEONE "disproved" that the program had no effect, but I can do that right now and show that her claim was total BS.
Because it is total BS.

Anyway, here's the main thing:
If you want a bank to have a soft tone, you need to speak softly. However, if your microphone picks up static easily, then it won't be soft, it will be breathy. The solution is to get close to your mic and act like you're smooching on it all intimate like so that it will be soft intonation, but loud enough so that there isn't static picked up.

And if you want a hard voice, you just say it loudly, but not loudly enough for clipping.

In both cases, shorter samples is better o 3o
Like, NEVER anywhere near a second. Around .5 seconds.
Offline Profile Quote Post Goto Top
 
khisui
No Avatar

When I made my UTAU I wanted to make her voice sound tender, soft and not too pitchy. When i recorded the samples, I didn't make loud voices. My voice was actually silenter than normally talking and I hold back my voice until there was a breath-sounding voice in my singing. Plus, the samples were 4 seconds long and it effects to the UTAU's sound.
Offline Profile Quote Post Goto Top
 
MillyAqualine
Member Avatar

Personally, I would say :

-oto
-kind of resampler
-for some, VCV/CV difference (but not for all though)
-flags (like configurations in Vocaloid, if you trick a bit with some flags, they can modify the voice... And not talking only about genderbending)
-octave (the UTAU can sound different if you put it at an octave higher or lower than his/her voice)

Posted Image
Spoiler: click to toggle

Posted ImagePosted ImagePleeease, help me ! T^T
Spoiler: click to toggle

There are as many Utauloids worldwide as many HL mods X'DDD
http://alexonsager.net/pokemon/
Offline Profile Quote Post Goto Top
 
1 user reading this topic (1 Guest and 0 Anonymous)
« Previous Topic · UTAU Discussion · Next Topic »
Add Reply

- Affiliates

- Chatbox

Welcome to the Cbox! Please be sure to follow the rules of the forum when posting. To use an avatar, simply upload a 45px by 45px image to an online host, and paste the direct link to it in the URL/Email blank next to your name!

• No flooding/spamming.
• No roleplaying in the Cbox.
• Your Cbox username must match your forum username.
(not exactly, but close enough that other people can tell who's who).