Microsoft is preparing to launch a clone of your voice bank for Teams, so you can sound like yourself in a meeting even if you’re speaking another language. During 2024 Microsoft Ignite, the company also announced a new tool, Interpreter in Teams, that will allow real-time, speech-to-speech translation to be live-streamed directly into Teams. Beginning in the first half of 2025, users will be able to use Interpreter within Teams to replicate their voice in as many as nine target languages: English, French, German, Italian, Japanese, Korean, Portuguese, Mandarin Chinese, and Spanish.
Microsoft has confirmed that the tool will be accessible exclusively to users with a Microsoft 365 subscription. Microsoft said it would also not preserve any biometric information with the technology, that it wouldn’t impose more emotional tones than what comes naturally to the voice, and it can be turned off within Teams’ settings.
A Microsoft spokesperson said, “We designed Interpreter to accurately convey the intent of the speaker without assumptions or irrelevant content. Users can allow voice simulation while in a meeting or from the settings section named Voice Simulation Consent.”
Other tech firms are also working on similar voice replication technologies. Then, on the heels of this news just yesterday, Meta announced it’s trialing a new translation tool that will automatically translate spoken audio in Instagram Reels, while ElevenLabs showcases its multi-lingual speech generation tool, which boasted their AI-generated voices can speak fluently and effortlessly in multiple languages regardless of what language they’re initially trained on.
Though cost-effective, AI-driven translations still struggle with the nuance and contextual knowledge of a human interpreter when dealing with idioms, cultural references, or other complex phrases. These hindrances, nevertheless, are not forecasted to hinder the natural language processing and translation technology market from growing at a compound year-on-year growth rate of 20.2%, reaching $35.1 billion by 2026 — Markets and Markets.
However, the emergence of voice synthesis through AI poses a threat to security.
Deepfakes have flooded social media, making it harder to separate fact from fiction. Deepfake videos of President Joe Biden, Taylor Swift, and Vice President Kamala Harris have gone viral this year, collectively attracting millions of views and shares. Another misuse of the technology has been deepfake scams where someone you know appears to be speaking on a video. The FTC said the losses due to identity-related scams exceeded $1 billion in 2023.
In fact, a few weeks ago, cybercriminals used a voice-cloning tool to set up a highly convincing—but fake—team meeting with executive management. This impersonation was so real that it cost the firm $25 million. It is this same risk that has led some companies (including OpenAI) to be cautious: at the start of 2023, OpenAI did not publish a previous version of voice cloning technology; it announced its Voice Engine, created from thousands of hours of voice recordings, still won’t be available due to fears over how people might misuse the tech.
Microsoft’s Interpreter in Teams, even with its focus on just three languages, is also at risk. That, of course, could provide bad actors with an avenue to abuse the function by playing deceptive soundbites (like a false command for sensitive information) and then translating it into the language of that individual’s target.
We anticipate gaining a clearer understanding of the safeguards Microsoft will implement for the Interpreter feature in the coming months.