Most woke users on X got no chill. When OpenAI released o1, one of them queried if they would be launching voice features soon. “How about a couple of weeks of gratitude for magic intelligence in the sky, and then you can have more toys soon?” replied Sam Altman, with a tinge of sarcasm.
A couple of weeks later, Kyutai, a French non-profit AI research laboratory, has come to the rescue with Moshi, a real-time native multimodal foundational AI model capable of conversing with humans in real time, much like what OpenAI’s advanced model was intended to do.
“Are you slowly losing faith in the objective reality and existence of Advanced Voice Mode? Talk to Moshi instead,” posted former OpenAI co-founder Andrej Karpathy on X.
We release two Moshi models, adapted from our demo by replacing Moshi’s voice with artificially generated ones, one male and one female. We are looking forward to hearing what the community will build with it, and we thank …