OpenAI has introduced a new automatic speech recognition (ASR) system called Whisper as open-source software. According to the business, Whisper allows for a strong transcription in multiple languages as well as translation into English from those languages.
Speech recognition systems, which are at the core of software and services from tech behemoths like Google, Amazon, and Meta, have been developed by countless businesses. Whisper was trained on 680,000 hours of multilingual and "multitask" data gathered from the web, which led to a greater recognition of distinctive dialects, background noise, and technical jargon, but according to OpenAI, this is what makes Whisper unique.
Whisper has flaws, especially regarding text prediction. Whisper may include words in its transcriptions that weren't said, according to OpenAI, because the system was trained on a lot of "noisy" data. This is possibly because Whisper is simultaneously attempting to forecast the next word in audio and to transcribe the audio itself. Whisper also doesn't function equally well across linguistic barriers, exhibiting a higher error rate for speakers of languages underrepresented in the training set.
"The primary intended users of [the Whisper] models are AI researchers studying robustness, generalization, capabilities, biases and constraints of the current model. However, Whisper is also potentially quite useful as an automatic speech recognition solution for developers, especially for English speech recognition," OpenAI wrote in the GitHub repo for Whisper, from where several versions of the system can be downloaded. "[The models] show strong ASR results in ~10 languages. They may exhibit additional capabilities … if fine-tuned on certain tasks like voice activity detection, speaker classification or speaker diarization but have not been robustly evaluated in these areas."
Unfortunately, that last bit is nothing new to the world of speech recognition. Even the finest systems have biases; a 2020 Stanford study found that systems from Amazon, Apple, Google, IBM, and Microsoft made far fewer mistakes — roughly 35% — with white users than Black users. OpenAI anticipates using Whisper's transcription capabilities to enhance current accessibility tools despite this.
Whisper's debut does not necessarily indicate what OpenAI has in store for the future. While concentrating more on commercial projects like DALL-E 2 and GPT-3, the company is also working on several purely theoretical research lines, such as artificial intelligence systems that learn by watching videos.
About Open AI
OpenAI is a non-profit research organization aiming to develop and guide artificial intelligence (AI) in ways that benefit all humanity. Elon Musk and Sam Altman created the business in 2015, with its headquarters in San Francisco, California. OpenAI was partly developed due to its founders' existential worries about the possibility of disaster by irresponsible use and abuse of general-purpose AI. The organization has a long-term focus on AI's capabilities and basic advancements. With a $1 billion endowment, the company's two founders and other investors launched it. Elon Musk left the organization in February 2018 because of potential conflicts with his work at Tesla, the electronics company founded by Nikola Tesla.
The company's declared goal of developing secure artificial general intelligence for humanity's benefit is mirrored in its intention to interact with other academic institutions and individuals openly. Except in situations where they might have a negative impact on safety, the company's research and patents are meant to be accessible to the general public.