How A Text-to-Speech Synthesizer Created One Of The World’s Biggest Pop Stars
Hatsune Miku is a pop star. She’s been featured in several chart topping songs and albums. She toured with Lady Gaga. She has sold out stadium-sized concert venues, including two consecutive shows in New York City in the same night a few weeks ago.
She is also not human.
Hatsune Miku is a vocaloid computer program. Her voice is generated with text-to-speech. And when thousands flock to see her concerts, they watch as a computer generated hologram dances on stage while singing out the lyrics.
Just like a real pop star, everyone knows what she looks like. And that’s because the look is always the same. Miku wears a school-girl uniform, complete with knee high black socks and a pair of long, turquoise pig tails.
But unlike a real pop star, anyone with the software can create songs for her. To date she has over 100,000 songs to her name. Anyone who installed the program on their computer has complete control over what Miku sings and how she does it. They can then share the music they created online or sell it.
So how did all of this happen? How does it work? Let’s take a look at how text-to-speech created this worldwide phenomenon.
As with any other text-to-speech engine, the voice has to come from a voice actor/actress. Japanese voice actress Saki Fujita is the source of Miku’s voice. Fujita had voiced acted in several anime series and video games prior. She was chosen because of her “soft, beautiful voice”.
Recordings we made of Fujita’s voice at controlled pitches and tones. Each of these samples contained one Japanese phonic, which is a single speech sound.
The computer program gives the user all of these phonics. The user can then put them together to create words, lyrics and phrases. Users can edit the pitch of her voice with a built-in synthesizer engine which can be controlled with their keyboards.
On August 31, 2007, Hatsune Miku was born. Crypton released the software for sale. Anyone who purchased and downloaded the application could create songs featuring Miku’s voice. Customers were given full control of what she could sing, and how she could sing it.
Since then, several add ons have been released. The latest version, Hatsune Miku V3, released in fall 2013, added new tones of Miku’s voice and also an English vocal library.
Hatsune Miku wasn’t intended to become a pop star necessarily. Her appearance was created to give the vocaloid voice an image.
At the time of initial release, Crypton originally targeted professional musicians. There were already other vocaloids on the market at the time. To make their product stand out, Crypton created Miku’s voice a character. They wanted to put an image to the voice to give their customers something to visualize when thinking about her voice.
The box art was the final product. Crypton also provided some basic information about Miku. She was 5’2”, weighed 93 lbs, and was 16 years old.
Crypton however, did not provide any information on her personality. Nor did they foresee the cultural impact she would have.
Hatsune Miku was an instant success. By July 2008, 40,000 units had been sold. More than just professional producers were purchasing the software. Miku was becoming an icon.
Just as the customers were able to control what Miku sang, they were also able to create her personality, which breathed life into the character. The rise in Miku’s popularity led to Cypton merchandizing her. She became much more than the face of vocaloid program, she became a star.
Professional artists, and casual music fans were creating music with Miku’s voices at a rapid rate. In August 2010, it was reported that 22,000 original songs had been written for Miku.
In 2011 with the release of V3, Hatsune Miku came to America. Crypton began heavily marketing to the west to try to broaden their audience. The song “World is Mine” by the Japanese pop band Supercell which features Miku’s vocals was release in the US and ranked 7th on iTunes ranking of top world singles in its first week.
Since her first concert In August 2009, Miku has toured across the world and has performed in front of hundreds of thousands of people. A live band a real backing-musicians accompany the Miku hologram on stage.
Seven years after her first concert, the forever 16-year-old pop star is still selling out shows. Two shows were played on Saturday, May 28 at the Hammerstein Ballroom in New York City. Thousands of New Yorkers attended. Both shows sold out.
Fans of Japanese pop (J-pop), anime, and cosplayers were largely present at the shows. Fans dressed up as Miku and were ecstatic as the hologram danced onstage and sang some of the more popular user-generated songs.
As another example of the unbelievable heights this vocaloid reached, Miku performed live on David Letterman in 2014.
So How Did This Text-to-Speech Program Turn Into A Worldwide Phenomenon?
It is without a doubt that Hatsune Miku is a global pop star. She has a persona. People can visualize her and hear her voice in their heads. So how was a text-to-speech application able to achieve something most people only dream of?
Although Crytpon intended to market their program to professional producers, it was the casual music lovers that made it popular. Much like how a text-to-speech program can give someone the ability to talk, Hatsune Miku gave users the ability to sing.
With her high quality voice and a synthesizer engine that made it easy to manipulate her tone and pitch, everyone was given the tools to be a music producer, and to be a pop star. This was something that most of these people could only dream of before. But now, they had the tools to do it.
Users posted their songs on YouTube, which then got vigorously shared by thousands of people. Original Hatsune Miku’s songs were being produced on a daily basis.
Professional musicians also took advantage of the easy-to-use program. Artists including Supercell and Pharrell Williams have featured Miku’s voice in their work, further advancing her popularity.
When you break it all down, it was the fact that Crytpon’s text-to-speech program gave users to ability to sing like a pop star. They essentially released a text-to-speech application with a twist. And that twist captured the imagination of thousands and led to the birth of one of the biggest pop stars in the world.
What do you think?
Why do you think this text-to-speech program was able to create a pop star? How else do you think speech technology can impact the world in the future? Let us know in the comments!
Learn More about NeoSpeech’s Text-to-Speech
To learn more about the different areas in which Text-to-Speech technology can be used, visit our Text-to-Speech Areas of Application page. And to learn more about the products we offer, visit our Text-to-Speech Products page.
If you’re interested in adding Text-to-Speech software to your application or would like to learn more about TTS, please fill out our Sales Inquiry form and one of our friendly team members will be happy to help.